It would help if you gave a concrete example, but I'll demonstrate with @jpp's
In [509]: L = [[0.5, True, 'hello'], [1.25, False, 'test']]
In [510]: df = pd.DataFrame(L)
In [511]: df
0 1 2
0 0.50 True hello
1 1.25 False test
In [512]: df.dtypes
0 float64
1 bool
2 object
dtype: object
doesn't like to use string dtypes, so the last column is object
In [513]: arr = df.values
In [514]: arr
array([[0.5, True, 'hello'],
[1.25, False, 'test']], dtype=object)
So because of the mix in column dtypes, pandas
is making the whole thing object
. I don't know pandas well enough to know if you can control the dtype better.
To make a numpy
structured array from L
, the obvious thing to do is:
In [515]: np.array([tuple(row) for row in L], dtype='f,bool,U10')
array([(0.5 , True, 'hello'), (1.25, False, 'test')],
dtype=[('f0', '<f4'), ('f1', '?'), ('f2', '<U10')])
That answers the question of how to specify a different dtype per 'column'. But keep in mind that this array is 1d, and has fields
not columns
But whether it's possible to deduce or set the dtype automatically, that's trickier. It might be possible to build a recarray
from the columns, or use one of the functions in np.lib.recfunctions
If I use a list 'transpose' I can format each column as a separate numpy array.
In [537]: [np.array(col) for col in zip(*L)]
[array([0.5 , 1.25]),
array([ True, False]),
array(['hello', 'test'], dtype='<U5')]
Then join them into one array with rec.fromarrays
In [538]: np.rec.fromarrays([np.array(col) for col in zip(*L)])
rec.array([(0.5 , True, 'hello'), (1.25, False, 'test')],
dtype=[('f0', '<f8'), ('f1', '?'), ('f2', '<U5')])
Or I could use genfromtxt
to deduce fields from a csv
In [526]: np.savetxt('test.txt', np.array(L,object),delimiter=',',fmt='%s')
In [527]: cat test.txt
In [529]: data = np.genfromtxt('test.txt',dtype=None,delimiter=',',encoding=None)
In [530]: data
array([(0.5 , True, 'hello'), (1.25, False, 'test')],
dtype=[('f0', '<f8'), ('f1', '?'), ('f2', '<U5')])
is the ability to have mixed data types in a single data structure. This functionality is not provided by NumPy. If you dodf.values
, the dtype you will get is one that can hold any of the values, which in this case would benp.object
, for all the values. – Mindoro