I usually use zstandard as my compression algorithm for my dataframes.
This is the code I use (a bit simplified) to write those parquet files:
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
parquetFilename = "test.parquet"
df = pd.DataFrame(
{
"num_legs": [2, 4, 8, 0],
"num_wings": [2, 0, 0, 0],
"num_specimen_seen": [10, 2, 1, 8],
},
index=["falcon", "dog", "spider", "fish"],
)
df = pa.Table.from_pandas(df)
pq.write_table(df, parquetFilename, compression="zstd")
And to read these parquet files:
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
parquetFilename = "test.parquet"
df = pq.read_table(parquetFilename)
df = df.to_pandas()
For more details see these sites for more information:
Finally a shameless plug for a blog post I wrote. It is about the speed vs space balance of zstandard and snappy compression in parquet files using pyarrow. It is relevent to your question and includes some more "real world" code examples of reading and writing parquet files in zstandard. I will actually be writing a follow up soon too. if you're interested let me know.