There are a couple of methods. You may experience a loss in performance and functionality relative to regular NumPy arrays.
record array
You can use pd.DataFrame.to_records
with index=False
. Technically, this is a record array, but for many purposes this will be sufficient.
res1 = df.to_records(index=False)
print(res1)
rec.array([(1, 2), (10, 20)],
dtype=[('a', '<i8'), ('b', '<i8')])
structured array
Manually, you can construct a structured array via conversion to tuple
by row, then specifying a list of tuples for the dtype
parameter.
s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))
print(res2)
array([(1, 2), (10, 20)],
dtype=[('a', '<i8'), ('b', '<i8')])
What's the difference?
Very little. recarray
is a subclass of ndarray
, the regular NumPy array type. On the other hand, the structured array in the second example is of type ndarray
.
type(res1) # numpy.recarray
isinstance(res1, np.ndarray) # True
type(res2) # numpy.ndarray
The main difference is record arrays facilitate attribute lookup, while structured arrays will yield AttributeError
:
print(res1.a)
array([ 1, 10], dtype=int64)
print(res2.a)
AttributeError: 'numpy.ndarray' object has no attribute 'a'
Related: NumPy “record array” or “structured array” or “recarray”