Are urllib2 and httplib thread safe?
Asked Answered
J

1

20

I'm looking for information on thread safety of urllib2 and httplib. The official documentation (http://docs.python.org/library/urllib2.html and http://docs.python.org/library/httplib.html) lacks any information on this subject; the word thread is not even mentioned there...

UPDATE

Ok, they are not thread-safe out of the box. What's required to make them thread-safe or is there a scenario in which they can be thread-safe? I'm asking because it's seems that

  • using separate OpenerDirector in each thread
  • not sharing HTTP connection among threads

would suffice to safely use these libs in threads. Similar usage scenario was proposed in question urllib2 and cookielib thread safety

Jaime answered 28/4, 2011 at 21:17 Comment(0)
J
42

httplib and urllib2 are not thread-safe.

urllib2 does not provide serialized access to a global (shared) OpenerDirector object, which is used by urllib2.urlopen().

Similarly, httplib does not provide serialized access to HTTPConnection objects (i.e. by using a thread-safe connection pool), so sharing HTTPConnection objects between threads is not safe.

I suggest using httplib2 or urllib3 as an alternative if thread-safety is required.

Generally, if a module's documentation does not mention thread-safety, I would assume it is not thread-safe. You can look at the module's source code for verification.

When browsing the source code to determine whether a module is thread-safe, you can start by looking for uses of thread synchronization primitives from the threading or multiprocessing modules, or use of queue.Queue.

UPDATE

Here is a relevant source code snippet from urllib2.py (Python 2.7.2):

_opener = None
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
    global _opener
    if _opener is None:
        _opener = build_opener()
    return _opener.open(url, data, timeout)

def install_opener(opener):
    global _opener
    _opener = opener

There is an obvious race condition when concurrent threads call install_opener() and urlopen().

Also, note that calling urlopen() with a Request object as the url parameter may mutate the Request object (see the source for OpenerDirector.open()), so it is not safe to concurrently call urlopen() with a shared Request object.

All told, urlopen() is thread-safe if the following conditions are met:

  • install_opener() is not called from another thread.
  • A non-shared Request object, or string is used as the url parameter.
Julie answered 28/4, 2011 at 21:56 Comment(11)
@Julie - Could you say something about how you would determine thread-safety by inspecting a particular module's code? I often have this exact question...Reverie
@ire_and_curses: I've expanded my answer accordingly.Julie
The idea of forcing users to inspect library's source code to find out if given library is thread-safe looks strange to me. There are libraries using synchronization code and not being thread-safe (cookielib) and there are libraries not using synchronization code witch are thread-safe because they utilize lock-free structures and algorithms.Jaime
@Piotr Dobrogost: I agree that users should not be forced to inspect a library's source code to find out if it is thread-safe. If a library is developed with thread-safety in mind, then I assume the docs will indicate this. If the docs do not talk about thread-safety, then I assume the library is not thread safe. To verify my assumption, a peek at the library's code is often necessary. Regarding lock-free data structures and cookielib, thread-safety is a complicated topic and I only provided a baseline of things to look for within a module that may indicate it is thread-safe.Julie
Is this actually true? From the code, there is a shared OpenDirector object, but HTTP requests will be handled by the HTTPHandler which doesn't have any meaningful state. So each open() call will ultimately result in a new HTTPConnection object (line 1116, urllib2.py). So at that point, it doesn't matter if the HTTPConnection is object is thread-safe since there will be a different instance of it per call to urllib2.open. This seems to support that: mail.python.org/pipermail/python-list/2005-January/916884.html. It also looks like from that they document when things aren't thread-safe.Cardiovascular
@PeteAykroyd Referencing source code by line number without giving version is useless. For instance line 1116 of urllib2.py in Python 2.7.2 is a blank line...Jaime
@PiotrDobrogost Good point. So looking now at Python 2.7.2, urllib2.py, I've just quickly re-traced the code but it seems like if you are opening an HTTP connection, you end up calling http_open (line 1199) which calls do_open with the request object and the the class httplib.HTTPConnection. In the do_open function, it will then create a new HTTPConnection and use that for the request. It seems like this answer assumes that each HTTP request shares the same HTTPConnection and it seems to me that this is not the case.Cardiovascular
Could you provide a single example when urllib2.urlopen(url) is not safe to call from several threads?Vaenfila
@Gregg: Can't upvote you twice. I've added the link to the OpenerDirector.open() source code.Vaenfila
If you do not use urllib2.urlopen, but instead use OpenerDirector.open() AND you do not share request objects, then this should be thread safe.Dyanne
urllib3 documents that it is thread-safe, but httplib2 does not, as far as I can tell.Genseric

© 2022 - 2024 — McMap. All rights reserved.