I have an dataframe with a structure like this:
Coumn1 Coumn2
0 (0.00030271668219938874, 0.0002655923890415579... (0.0016430083196610212, 0.0014970217598602176,...
1 (0.00015607803652528673, 0.0001314736582571640... (0.0022136708721518517, 0.0014974646037444472,...
2 (0.011317798867821693, 0.011339936405420303, 0... (0.004868391435593367, 0.004406007472425699, 0...
3 (3.94578673876822e-05, 3.075833956245333e-05, ... (0.0075020878575742245, 0.0096737677231431, 0....
4 (0.0004926157998852432, 0.0003811710048466921,... (0.010351942852139473, 0.008231297135353088, 0...
.. ... ...
130 (0.011190211400389671, 0.011337820440530777, 0... (0.010182800702750683, 0.011351295746862888, 0...
131 (0.006286659277975559, 0.007315031252801418, 0... (0.02104150503873825, 0.02531484328210354, 0.0...
132 (0.0022791570518165827, 0.0025983047671616077,... (0.008847278542816639, 0.009222050197422504, 0...
133 (0.0007059817435219884, 0.0009831463685259223,... (0.0028264704160392284, 0.0029402063228189945,...
134 (0.0018992726691067219, 0.002058899961411953, ... (0.0019639385864138603, 0.002009353833273053, ...
[135 rows x 2 columns]
where each cell holds a list/tuple of some float values:
type(psd_res.data_frame['Column1'][0])
<class 'tuple'>
type(psd_res.data_frame['Column1'][0][0])
<class 'numpy.float64'>
(each cell entry contains the same amount of entries in the tuple)
when i try to save the dataframe now as parquet i get an error (fastparquet):
Can't infer object conversion type: 0 (0.00030271668219938874, 0.0002655923890415579...
1 (0.00015607803652528673, 0.0001314736582571640...
...
Name: Column1, dtype: object
Full stack trace: https://pastebin.com/8Myu8hNV
and i also tried it with the other engine pyarrow:
pyarrow.lib.ArrowInvalid: ('Could not convert (0.00030271668219938874, ..., 0.0002464042045176029)
with type tuple: did not recognize Python value type when inferring an Arrow data type',
'Conversion failed for column UO-Pumpe with type object')
So i found this thread https://github.com/dask/fastparquet/issues/458. It seems to be a bug in fastparquet - but it should work in pyarrow which fails for me.
I then tried some things i found like infer_objects()
and astype(float)
... nothing worked so far.
Does anyone have a solution how i can save my dataframe to parquet?