How to append data to existing Parquet from Polars
Asked Answered
I

1

4

I have multiple polars dataframes and I want to append them to an existing Parquet file.

df.write_parquet("path.parquet") overwrites the existing parquet file. How can I append?

Issi answered 25/12, 2022 at 20:10 Comment(0)
G
5

Polars does not support appending to Parquet files, and most tools do not, see for example this SO post.

Your best bet would be to cast the dataframe to an Arrow table using .to_arrow(), and use pyarrow.dataset.write_dataset. In particular, see the comment on the parameter existing_data_behavior. Still, that requires organizing your data in partitions, which effectively means you have a separate parquet file per partition, stored in the same directory. So each df you have, becomes its own parquet file, and you abstract away from that on the read. Polars does not support writing partitions as far as I'm aware. There is support for reading though, see the source argument in pl.read_parquet.

Georgia answered 26/12, 2022 at 10:30 Comment(1)
Thank you! I will think about it. So simple if Polars not support it, I can convert df to pandas and use hdf5 as usually..Issi

© 2022 - 2024 — McMap. All rights reserved.