I am getting the below error while using sink_parquet on a LazyFrame. Earlier I was using .collect()
on the output of the scan_parquet()
to convert the result into a DataFrame but unfortunately it is not working with larger than RAM datasets. Here is the error I received -
PanicException: sink_parquet not yet supported in standard engine. Use 'collect().write_parquet()'
I am trying to write the LazyFrame (the output from scan_parquet) into a local file after I added some filter and join conditions on the LazyFrame. It seems the error is coming from the below location -
https://github.com/pola-rs/polars/blob/master/py-polars/polars/internals/lazyframe/frame.py#L1235 (In Python)
https://github.com/pola-rs/polars/blob/master/polars/polars-lazy/src/physical_plan/planner/lp.rs#L154 (In Rust) .
I have tried updating to the latest version 0.15.16 0.16.1 but this issue still exists .
Sample code :
pl.scan_parquet("path/to/file1.parquet")
.select([
pl.col("col2"),
pl.col("col2").apply( lambda x : ...)
.alias("splited_levels"),
..followed by more columns and .alias()
])
.join(<another lazyframe>,on="some key",how="inner")
.filter(...)
.filter(..)
..followed by some more filters
.sink_parquet("path/to/result2.parquet")
The parquet file should be written in local system. Instead I am getting the below error -
PanicException: sink_parquet not yet supported in standard engine. Use 'collect().write_parquet()'
Here are the details of the installed packages after I used polars.show_versions()
-
--- Version info----
Polars : 0.15.16
Index type : UInt32
Platform : Linux-4.15.0-191-generic-x86_64-with-glibc2.28
Python: 3.9.16
[GCC 8.3.0]
--- Optional dependencies---
pyarrow : 11.0.0
pandas : not installed
numpy : 1.24.1
fsspec : 2023.1.0
connectorx : not installed
xlsx2csv : not installed
deltalake: not installed
matplotlib : not installed
Update : I have raised a github issue here for the same and it seems all types of queries are not supported for streaming at this moment . So I am looking for a work around in this case or any alternative way of doing this with polars https://github.com/pola-rs/polars/issues/6603