Polars for Python: How to get rid of "Ensure you pass a path to the file instead of a python file object" warning when reading to a dataframe?
Asked Answered
T

3

11

The statement

  • I'm reading data sets using Polars.read_csv() method via a Python file handler:
 with gzip.open(os.path.join(getParameters()['rdir'], dataset)) as compressed_file:
    df = pl.read_csv(compressed_file, sep = '\t', ignore_errors=True)
  • A performance warning keeps popping up:
Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance.

Possible solutions

  • I already tried Python warning suppression, but it seems Polars literally just prints out this statement without any default warning associated.
  • Another possibility would be to read using non-handler methods?

Any ideas on how to get rid of this annoying message will be highly appreciated.

Tullusus answered 9/3, 2023 at 22:52 Comment(7)
did you try passing os.path.join(getParameters()['rdir'], dataset) directly to read_csv?Shiv
Warnings and Errors are different classes in Python and are suppressed differently. How did you "[try] Python warning suppression"?Shiv
@PaulH Yeah I already tried. The problem is that I need to use the gzip.open() method. This I believe is what is raising the warning, since it's not a path. For your second question, I used the warnings.filterwarnings('ignore') method.Tullusus
Did you try doing what the warning said?Mikemikel
When you passed the file name directly, what happened?Shiv
@Mikemikel I cannot with the current approach since I need to use the gzip.open() method, otherwise I would have to decompress.Tullusus
@PaulH It returns the same warning since I'm also passing the unzipping method as argument.Tullusus
S
4

I had a similar issue with opening from a ZipFile object. The solution was to add a .read() method to the filename. Maybe the same would work in your case?

 with gzip.open(os.path.join(getParameters()['rdir'], dataset)) as compressed_file:
    df = pl.read_csv(compressed_file.read(), sep = '\t', ignore_errors=True)
Symposium answered 10/3, 2023 at 0:51 Comment(3)
Fantastic @RustyPython, this worked like a charm. I am setting your answer as the solution.Tullusus
Note for future use: this asks python to read the whole file into memory before passing it over to polars. I think this could run out of memory in situations where passing the IO object would not.Seabrooke
This requires you to load the entire file buffer into memory. Not only does this negate the advantage of lazy-frames, but it ends up requiring even more memory than the dataframe once loaded as the buffer ALSO exists in memory during load.Urina
H
4

Encountered the same issue, and surely the warning is very annoying, but again loading the entire file into memory just for the sake of hiding a warning sounded like an overkill.

My work-around, was deleting the name property of the compressed file. It worked like a charm, and it doesn't come with any performance penalties - provided of course that you don't need to use the name property later in your code.

In your example:

with gzip.open(os.path.join(getParameters()['rdir'], dataset)) as compressed_file:
    del compressed_file.name
    df = pl.read_csv(compressed_file, sep = '\t', ignore_errors=True)
Herrington answered 15/4, 2024 at 20:54 Comment(0)
G
4

In polars==1.2.1 you can just suppress the warning:

import warnings
warnings.filterwarnings("ignore", message="Polars found a filename")
Goalkeeper answered 19/7, 2024 at 5:4 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.