Python Polars Read Zipped CSV
Asked Answered
S

2

6

How does one read a zipped csv file into a python polars DataFrame?

The only current solution is writing the entire thing into memory and then passing it into pl.read_csv.

Solano answered 26/3, 2022 at 4:3 Comment(2)
I think you could use a reader. I.e. something like this? with file.open(csv_path) as fpz: with gzip.open(fpz) as fp: df = pl.read_csv(fp)Mamba
Apologies, it's zipped, not gzippedSolano
V
6

Read a zipped csv file into Polars Dataframe without extracting the file

From the documentation:

Path to a file or a file-like object. By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO or BytesIO. If fsspec is installed, it will be used to open remote files.

So, to read "my_file.csv" that is inside a "something.zip":

    /something.zip

        /my_file.csv
import polars as pl
from zipfile import ZipFile


zip_file = "something.zip"

pl.read_csv(
    Zipfile("something.zip").read("my_file.csv")
)

Here, the use of .open instead of .read throws a FileNotFound error. However, it is still possible to use open, we just need to call .read(), as follows:

pl.read_csv(
    Zipfile("something.zip").open("my_file.csv", method='r').read()
)

The difference lies in what read vs open return. As read returns "file bytes for name" with the .read() method already called. While open returns a "file-like object for 'name'", a class ZipExtFile, that does contain the .read() method but this method is not called on the return of .open() which means that in order to use it, we have to add it, as I do above.

Vocalist answered 16/11, 2022 at 11:56 Comment(0)
G
2

As of October 2024, I had no issues reading a GZIP file directly.

import polars as pl

pl.read_csv('data.csv.gz')

Note that I didn't try zip, just gzip.

Grishilda answered 1/10, 2024 at 2:49 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.