I have a <class 'numpy.ndarray'> array that I would like saved to a parquet file to pass to a ML model I'm building. My array has 159573 arrays and each array has 1395 array in each.
Here is a sample of my data:
[[0. 0. 0. ... 0.24093714 0.75547471 0.74532781]
[0. 0. 0. ... 0.24093714 0.75547471 0.74532781]
[0. 0. 0. ... 0.24093714 0.75547471 0.74532781]
...
[0. 0. 0. ... 0.89473684 0.29282009 0.29277004]
[0. 0. 0. ... 0.89473684 0.29282009 0.29277004]
[0. 0. 0. ... 0.89473684 0.29282009 0.29277004]]
I tried to convert using this code:
import pyarrow as pa
pa_table = pa.table({"data": Main_x})
pa.parquet.write_table(pa_table, "full_data.parquet")
I get this stacktrace:
5 frames
/usr/local/lib/python3.7/dist-packages/pyarrow/table.pxi in pyarrow.lib.table()
/usr/local/lib/python3.7/dist-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pydict()
/usr/local/lib/python3.7/dist-packages/pyarrow/array.pxi in pyarrow.lib.asarray()
/usr/local/lib/python3.7/dist-packages/pyarrow/array.pxi in pyarrow.lib.array()
/usr/local/lib/python3.7/dist-packages/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()
/usr/local/lib/python3.7/dist-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: only handle 1-dimensional arrays
I'm wondering if there is a way to save a multi-dimensional array to parquet?