I am writing a parquet file from a Spark DataFrame the following way:
df.write.parquet("path/myfile.parquet", mode = "overwrite", compression="gzip")
This creates a folder with multiple files in it.
When I try to read this into pandas, I get the following errors, depending on which parser I use:
import pandas as pd
df = pd.read_parquet("path/myfile.parquet", engine="pyarrow")
PyArrow:
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
ArrowIOError: Invalid parquet file. Corrupt footer.
fastparquet:
File "C:\Program Files\Anaconda3\lib\site-packages\fastparquet\util.py", line 38, in default_open return open(f, mode)
PermissionError: [Errno 13] Permission denied: 'path/myfile.parquet'
I am using the following versions:
- Spark 2.4.0
- Pandas 0.23.4
- pyarrow 0.10.0
- fastparquet 0.2.1
I tried gzip as well as snappy compression. Both do not work. I of course made sure that I have the file in a location where Python has permissions to read/write.
It would already help if somebody was able to reproduce this error.