I want to store the following pandas data frame in a parquet file using PyArrow:
import pandas as pd
df = pd.DataFrame({'field': [[{}, {}]]})
The type of the field
column is list of dicts:
field
0 [{}, {}]
I first define the corresponding PyArrow schema:
import pyarrow as pa
schema = pa.schema([pa.field('field', pa.list_(pa.struct([])))])
Then I use from_pandas()
:
table = pa.Table.from_pandas(df, schema=schema, preserve_index=False)
This throws the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "table.pxi", line 930, in pyarrow.lib.Table.from_pandas
File "/anaconda3/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 371, in dataframe_to_arrays
convert_types)]
File "/anaconda3/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 370, in <listcomp>
for c, t in zip(columns_to_convert,
File "/anaconda3/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 366, in convert_column
return pa.array(col, from_pandas=True, type=ty)
File "array.pxi", line 177, in pyarrow.lib.array
File "error.pxi", line 77, in pyarrow.lib.check_status
File "error.pxi", line 87, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Unknown list item type: struct<>
Am I doing something wrong or is this not supported by PyArrow?
I use pyarrow 0.9.0, pandas 23.4, python 3.6.