Using Python, Parquet, and Spark and running into ArrowNotImplementedError: Support for codec 'snappy' not built
after upgrading to pyarrow=3.0.0
. My previous version without this error was pyarrow=0.17
. The error does not appear in pyarrow=1.0.1
and does appear in pyarrow=2.0.0
. The idea is to write a pandas DataFrame as a Parquet Dataset (on Windows) using Snappy compression, and later to process the Parquet Dataset using Spark.
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
df = pd.DataFrame({
'x': [0, 0, 0, 1, 1, 1],
'a': np.random.random(6),
'b': np.random.random(6)})
table = pa.Table.from_pandas(df, preserve_index=False)
pq.write_to_dataset(table, root_path=r'c:/data', partition_cols=['x'], flavor='spark')
pyarrow
? – Atonypyarrow
was installed viaconda install pyarrow
– Squawconda list --export
andprint(pa.cpp_build_info)
andpa.show_versions()
? – Spearingpyarrow
is not from conda-forge. It shows up inconda list
aspyarrow=3.0.0=pypi_0
which I thought meant it came from pypi. However, yourcpp_build_info
does not match what comes from the PYPI distribution either (both conda-forge and pypi use MSVC version 19.16.27045.0). Uninstall pyarrow and reinstall, ensuring you are installing from conda-forge...conda install -c conda-forge pyarrow
– Spearingconda install -c conda-forge pyarrow
instead ofconda install pyarrow
. If you provide this as an answer I can accept. But why is it like this? Because in both cases it shows up aspyarrow=3.0.0
so this would not be the expected behavior. – Squawpypi_0
just means "not conda" then it really could have come from anywhere. – Spearing