How does one read a zipped csv file into a python polars
DataFrame?
The only current solution is writing the entire thing into memory and then passing it into pl.read_csv
.
How does one read a zipped csv file into a python polars
DataFrame?
The only current solution is writing the entire thing into memory and then passing it into pl.read_csv
.
From the documentation:
Path to a file or a file-like object. By file-like object, we refer to objects with a
read()
method, such as a file handler (e.g. via builtin open function) orStringIO
orBytesIO
. If fsspec is installed, it will be used to open remote files.
So, to read "my_file.csv" that is inside a "something.zip":
/something.zip
/my_file.csv
import polars as pl
from zipfile import ZipFile
zip_file = "something.zip"
pl.read_csv(
Zipfile("something.zip").read("my_file.csv")
)
Here, the use of .open
instead of .read
throws a FileNotFound
error.
However, it is still possible to use open
, we just need to call .read()
, as follows:
pl.read_csv(
Zipfile("something.zip").open("my_file.csv", method='r').read()
)
The difference lies in what read
vs open
return. As read
returns "file bytes for name" with the .read()
method already called. While open
returns a "file-like object for 'name'", a class ZipExtFile
, that does contain the .read()
method but this method is not called on the return of .open()
which means that in order to use it, we have to add it, as I do above.
As of October 2024, I had no issues reading a GZIP file directly.
import polars as pl
pl.read_csv('data.csv.gz')
Note that I didn't try zip, just gzip.
© 2022 - 2025 — McMap. All rights reserved.
with file.open(csv_path) as fpz: with gzip.open(fpz) as fp: df = pl.read_csv(fp)
– Mamba