Can not save pandas dataframe to parquet with lists of floats as cell value

I have an dataframe with a structure like this:

                                                Coumn1                                             Coumn2
0    (0.00030271668219938874, 0.0002655923890415579...  (0.0016430083196610212, 0.0014970217598602176,...
1    (0.00015607803652528673, 0.0001314736582571640...  (0.0022136708721518517, 0.0014974646037444472,...
2    (0.011317798867821693, 0.011339936405420303, 0...  (0.004868391435593367, 0.004406007472425699, 0...
3    (3.94578673876822e-05, 3.075833956245333e-05, ...  (0.0075020878575742245, 0.0096737677231431, 0....
4    (0.0004926157998852432, 0.0003811710048466921,...  (0.010351942852139473, 0.008231297135353088, 0...
..                                                 ...                                                ...
130  (0.011190211400389671, 0.011337820440530777, 0...  (0.010182800702750683, 0.011351295746862888, 0...
131  (0.006286659277975559, 0.007315031252801418, 0...  (0.02104150503873825, 0.02531484328210354, 0.0...
132  (0.0022791570518165827, 0.0025983047671616077,...  (0.008847278542816639, 0.009222050197422504, 0...
133  (0.0007059817435219884, 0.0009831463685259223,...  (0.0028264704160392284, 0.0029402063228189945,...
134  (0.0018992726691067219, 0.002058899961411953, ...  (0.0019639385864138603, 0.002009353833273053, ...

[135 rows x 2 columns]

where each cell holds a list/tuple of some float values:

type(psd_res.data_frame['Column1'][0])
<class 'tuple'>
type(psd_res.data_frame['Column1'][0][0])
<class 'numpy.float64'>

(each cell entry contains the same amount of entries in the tuple)

when i try to save the dataframe now as parquet i get an error (fastparquet):

Can't infer object conversion type: 0    (0.00030271668219938874, 0.0002655923890415579...
1    (0.00015607803652528673, 0.0001314736582571640...
...

Name: Column1, dtype: object

Full stack trace: https://pastebin.com/8Myu8hNV

and i also tried it with the other engine pyarrow:

pyarrow.lib.ArrowInvalid: ('Could not convert (0.00030271668219938874, ..., 0.0002464042045176029)
  with type tuple: did not recognize Python value type when inferring an Arrow data type', 
  'Conversion failed for column UO-Pumpe with type object')

So i found this thread https://github.com/dask/fastparquet/issues/458. It seems to be a bug in fastparquet - but it should work in pyarrow which fails for me.

I then tried some things i found like infer_objects() and astype(float) ... nothing worked so far.

Does anyone have a solution how i can save my dataframe to parquet?

Recommended topics

Hot tags