Does reading an entire file leave the file handle open?
Asked Answered
N

4

401

If you read an entire file with content = open('Path/to/file', 'r').read() is the file handle left open until the script exits? Is there a more concise method to read a whole file?

Newcastle answered 13/9, 2011 at 23:44 Comment(0)
O
624

The answer to that question depends somewhat on the particular Python implementation.

To understand what this is all about, pay particular attention to the actual file object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after the read() call returns.

This means that the file object is garbage. The only remaining question is "When will the garbage collector collect the file object?".

in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.

A better solution, to make sure that the file is closed, is this pattern:

with open('Path/to/file', 'r') as content_file:
    content = content_file.read()

which will always close the file immediately after the block ends; even if an exception occurs.

Edit: To put a finer point on it:

Other than file.__exit__(), which is "automatically" called in a with context manager setting, the only other way that file.close() is automatically called (that is, other than explicitly calling it yourself,) is via file.__del__(). This leads us to the question of when does __del__() get called?

A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.

-- https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203

In particular:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

[...]

CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

-- https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types

(Emphasis mine)

but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!

Oxytocic answered 13/9, 2011 at 23:49 Comment(8)
For a while, there weren't really other Python implementations; but relying on implementation details is not really Pythonic.Weatherford
Is it still implementation-specific, or was it standardized already? Not calling __exit__() in such cases sounds like a design flaw.Unveil
@rr: see my edits; but in the off chance I missunderstood you, __exit__() is guarnateed to be called before executing the the next statement after a with clause, it's object.__del__() (which, for file also calls close()) that is not guaranteed.Oxytocic
Wouldn't that use of "with" just be the same as open().read()? Basically losing the ref from the open allowing it to be GCed? Is there a difference with the with (IE, does only scope change initiate GC)? You have file.close, is this really about not wanting to keep around a useless ref to closed handle? If you want it portable GC behaviour is not predictable (even using with) so you should explicitly close surely and not rely on the GC to do that?Thorley
@Thorley It's precisely because of those 3 issues, GC being unpredictable, try/finally being fiddly and the highly common usefulless of cleanup handlers that with solves. The difference between "explicitly closing" and "managing with with" is that the exit handler is called even if an exception is thrown. You could put the close() in a finally clause, but that is not much different from using with instead, a bit messier (3 extra lines instead of 1), and a little harder to get just right.Oxytocic
What I don't get about that is why 'with' would be anymore reliable since it's not explicit either. Is it because the spec says it has to do that its always implemented like that?Thorley
@Thorley it's more reliable because with foo() as f: [...] is basically the same as f = foo(), f.__enter__(), [...] and f.__exit__() with exceptions handled, so that __exit__ is always called. So the file always gets closed.Leonardoleoncavallo
@Leonardoleoncavallo do you have citation from the docs to confirm this, or is it implementation dependentDwt
P
119

You can use pathlib.

For Python 3.5 and above:

from pathlib import Path
contents = Path(file_path).read_text()

For older versions of Python use pathlib2:

$ pip install pathlib2

Then:

from pathlib2 import Path
contents = Path(file_path).read_text()

This is the actual read_text implementation:

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()
Paternity answered 20/9, 2016 at 10:19 Comment(1)
I encountered issues with this solution, maybe someone has an answer to my question? Thanks in advance.Precast
I
4

Well, if you have to read file line by line to work with each line, you can use

with open('Path/to/file', 'r') as f:
    s = f.readline()
    while s:
        # do whatever you want to
        s = f.readline()

Or even better way:

with open('Path/to/file') as f:
    for line in f:
        # do whatever you want to
Inseminate answered 3/12, 2019 at 5:21 Comment(0)
G
2

Instead of retrieving the file content as a single string, it can be handy to store the content as a list of all lines the file comprises:

with open('Path/to/file', 'r') as content_file:
    content_list = content_file.read().strip().split("\n")

As can be seen, one needs to add the concatenated methods .strip().split("\n") to the main answer in this thread.

Here, .strip() just removes whitespace and newline characters at the endings of the entire file string, and .split("\n") produces the actual list via splitting the entire file string at every newline character \n.

Moreover, this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.

Guildsman answered 15/1, 2020 at 17:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.