with and closing of files in Python
Asked Answered
B

2

14

I have read, that file opened like this is closed automatically when leaving the with block:

with open("x.txt") as f:
    data = f.read()
    do something with data

yet when opening from web, I need this:

from contextlib import closing
from urllib.request import urlopen

with closing(urlopen('http://www.python.org')) as page:
    for line in page:
        print(line)

why and what is the difference? (I am using Python3)

Breen answered 28/10, 2014 at 22:11 Comment(1)
I was about to suggest that someone should file a docs bug on contextlib for this, and that if it's not you whoever it is should credit you… but before I could finish, Martijn already filed the bug, with the link back here. :)Kid
K
16

The details get a little technical, so let's start with the simple version:

Some types know how to be used in a with statement. File objects, like what you get back from open, are an example of such a type. As it turns out, the objects that you get back from urllib.request.urlopen, are also an example of such a type, so your second example could be written the same way as the first.

But some types don't know how to be used in a with statement. The closing function is designed to wrap such types—as long as they have a close method, it will call their close method when you exit the with statement.

Of course some types don't know how to be used in a with statement, and also can't be used with closing because their cleanup method isn't named close (or because cleaning them up is more complicated than just closing them). In that case, you need to write a custom context manager. But even that isn't usually that hard.


In technical terms:

A with statement requires a context manager, an object with __enter__ and __exit__ methods. It will call the __enter__ method, and give you the value returned by that method in the as clause, and it will then call the __exit__ method at the end of the with statement.

File objects inherit from io.IOBase, which is a context manager whose __enter__ method returns itself, and whose __exit__ calls self.close().

The object returned by urlopen is (assuming an http or https URL) an HTTPResponse, which, as the docs say, "can be used with a with statement".

The closing function:

Return a context manager that closes thing upon completion of the block. This is basically equivalent to:

@contextmanager
def closing(thing):
    try:
        yield thing
    finally:
        thing.close()

It's not always 100% clear in the docs which types are context managers and which types aren't. Especially since there's been a major drive since 3.1 to make everything that could be a context manager into one (and, for that matter, to make everything that's mostly-file-like into an actual IOBase if it makes sense), but it's still not 100% complete as of 3.4.

You can always just try it and see. If you get an AttributeError: __exit__, then the object isn't usable as a context manager. If you think it should be, file a bug suggesting the change. If you don't get that error, but the docs don't mention that it's legal, file a bug suggesting the docs be updated.

Kid answered 28/10, 2014 at 22:19 Comment(1)
Wonderful explanation, 8 years later.Dela
H
8

You don't. urlopen('http://www.python.org') returns a context manager too:

with urlopen('http://www.python.org') as page:

This is documented on the urllib.request.urlopen() page:

For ftp, file, and data urls and requests explicity handled by legacy URLopener and FancyURLopener classes, this function returns a urllib.response.addinfourl object which can work as context manager [...].

Emphasis mine. For HTTP responses, http.client.HTTPResponse() object is returned, which also is a context manager:

The response is an iterable object and can be used in a with statement.

The Examples section also uses the object as a context manager:

As the python.org website uses utf-8 encoding as specified in it’s meta tag, we will use the same for decoding the bytes object.

>>> with urllib.request.urlopen('http://www.python.org/') as f:
...     print(f.read(100).decode('utf-8'))
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm

Objects returned by open() are context managers too; they implement the special methods object.__enter__() and object.__exit__().

The contextlib.closing() documentation uses an example with urlopen() that is out of date; in Python 2 the predecessor for urllib.request.urlopen() did not produce a context manager and you needed to use that tool to auto-close the connection with a context manager. This was fixed with issues 5418 and 12365, but that example was not updated. I created issue 22755 asking for a different example.

Hilaire answered 28/10, 2014 at 22:16 Comment(10)
I was just about to write this :-). the key is that the docs say that it returns a "file-like" object. If it can't be used as a context manager, it's not actually file-like.Esotropia
but why is this example in the python docs, then? docs.python.org/3/library/contextlib.htmlBreen
@nekomimi: a holdover from Python 2 probably, where the object was not a context manager.Hilaire
The quoted docs aren't the relevant part. He's got an http url, which doesn't return an addinfourl, it returns an HTTPResponse. Of course those can also be used as context managers.Kid
@abarnert: as always, it is a little more complex. Thanks for correcting me, updated.Hilaire
I don't think HTTPResponse is documented to be an IOBase, although that's pretty strongly implied by having methods like readinto. But it is definitely documented to be a context manager.Kid
@abarnert: argh, I searched for 'context manager' and 'contextmanager' and '__enter__'. I did not search for 'with statement'..Hilaire
@MartijnPieters: I think most things turned into context managers post-3.1 use the "can be used in a with statement" language, while most older ones use the "is a context manager" language.Kid
@abarnert: Interesting, is the context manager terminology being deprecated?Hilaire
@MartijnPieters: I don't think so. My guess is that whoever did most of the work in the big push post-3.1 thought it would be more helpful to just say "can be used in a with statement" than to say something that some Python users don't understand and have to link to the glossary, and nobody really argued about it?Kid

© 2022 - 2024 — McMap. All rights reserved.