Decompression 'SNAPPY' not available with fastparquet
Asked Answered
B

3

28

I am trying to use fastparquet to open a file, but I get the error:

RuntimeError: Decompression 'SNAPPY' not available.  Options: ['GZIP', 'UNCOMPRESSED']

I have the following installed and have rebooted my interpreter:

python                    3.6.5                hc3d631a_2  
python-snappy             0.5.2                    py36_0    conda-forge
snappy                    1.1.7                hbae5bb6_3  
fastparquet               0.1.5                    py36_0    conda-forge

Everything downloaded smoothly. I didn't know if I needed snappy or python-snappy so I got one had no fix and got the other, still with no success. All related issues I have found are fixed when downloading snappy, but I am still getting this error with having two snappys! Any help would be appreciated.

Brooke answered 11/6, 2018 at 15:1 Comment(5)
Any update on this?Florella
I ended up using pyspark to read my files because I never got a response. I am unsure how to fix this, but my project has since moved forward.Brooke
Didn't work for me either, even with pyspark installed as suggested by @Catbuilts. I circumvented the issue by using GZIP compression to save the Parquet file, then switching to pyarrow engine as that was far faster.Severally
conda install -c conda-forge python-snappy fastparquet snappy worked for me. Installing those from conda base channel did not work somehow.Hallowell
Hi just wondering how did you setup pyspark and get the result for this problem? I got the same error when using pandas.Cohn
F
29

Run:

pip install python-snappy
pip install pyarrow 

It should do the trick.

I think you lack the pyarrow package.

If you have an error with pip, use conda instead (i.e., conda install python-snappy or if you still have errors conda install -c conda-forge python-snappy).

Felicidadfelicie answered 25/3, 2019 at 9:12 Comment(2)
Installing pyarrow is irrelevant. conda install -c conda-forge python-snappy fastparquet snappy worked for me. Installing those from base channel did not work somehow.Hallowell
^ this is the solution here; you need both python-snappy (the wrapper) and snappy (the C lib) from the same channelPolymath
T
13

You need to install python-snappy as stated by the response of Catbuilts. However, it is only a wrapper around the snappy implementation in c that should be installed in your computer, this issue has been addressed in this answer about installing snappy-c.

Assuming you have a DEB-based system, such as ubuntu, you can get it with:

sudo apt-get install libsnappy-dev
python3 -m pip install --user python-snappy

To test it, you can try the following script:

import pandas as pd
import snappy  # Not required but snappy (python-snappy) module should be reachable
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
write("/tmp/deleteme", df, compression="SNAPPY")
df_parquet = ParquetFile("/tmp/deleteme").to_pandas()
df_parquet.head()
Tade answered 17/1, 2020 at 14:45 Comment(0)
L
0

The following installations are pretty helpful

pip install fastparquet

pip install python-snappy

pip install pyarrow
Longish answered 21/4, 2021 at 20:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.