urllib.urlopen works but urllib2.urlopen doesn't
Asked Answered
M

4

11

I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen will successfully read the page but urllib2.urlopen will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a different test script):

import urllib, urllib2
print urllib.urlopen("http://127.0.0.1").read()  # prints "running"
print urllib2.urlopen("http://127.0.0.1").read() # throws an exception

Here's the stack trace:

Traceback (most recent call last):
  File "urltest.py", line 5, in <module>
    print urllib2.urlopen("http://127.0.0.1").read()
  File "C:\Python25\lib\urllib2.py", line 121, in urlopen
    return _opener.open(url, data)
  File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python25\lib\urllib2.py", line 412, in error
    result = self._call_chain(*args)
  File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "C:\Python25\lib\urllib2.py", line 575, in http_error_302
    return self.parent.open(new)
  File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python25\lib\urllib2.py", line 418, in error
    return self._call_chain(*args)
  File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Timeout

Any ideas? I might end up needing some of the more advanced features of urllib2, so I don't want to just resort to using urllib, plus I want to understand this problem.

Misalliance answered 14/10, 2008 at 14:57 Comment(0)
P
16

Sounds like you have proxy settings defined that urllib2 is picking up on. When it tries to proxy "127.0.0.01/", the proxy gives up and returns a 504 error.

From Obscure python urllib2 proxy gotcha:

proxy_support = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_support)
print opener.open("http://127.0.0.1").read()

# Optional - makes this opener default for urlopen etc.
urllib2.install_opener(opener)
print urllib2.urlopen("http://127.0.0.1").read()
Pyrite answered 14/10, 2008 at 15:49 Comment(2)
This fixed the problem, though I have no idea how or why it thought to use a proxy, since my script was only three lines long and I have no environment variables which indicate anything about any proxy. Still, it's good to have this resolved, so thanks for the help.Misalliance
OpenerDirector instance has no attribute 'urlopen' - you need to change the above fragment to be opener.open(...Crabtree
C
1

Does calling urlib2.open first followed by urllib.open have the same results? Just wondering if the first call to open is causing the http server to get busy causing the timeout?

Comitia answered 14/10, 2008 at 15:6 Comment(1)
Nope, urllib2 gets the error regardless of whether it's called first, and urllib never gets the error even when it's called multiple times. Good thoughts though.Misalliance
M
1

I don't know what's going on, but you may find this helpful in figuring it out:

>>> import urllib2
>>> urllib2.urlopen('http://mit.edu').read()[:10]
'<!DOCTYPE '
>>> urllib2._opener.handlers[1].set_http_debuglevel(100)
>>> urllib2.urlopen('http://mit.edu').read()[:10]
connect: (mit.edu, 80)
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: mit.edu\r\nConnection: close\r\nUser-Agent: Python-urllib/2.5\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 14 Oct 2008 15:52:03 GMT
header: Server: MIT Web Server Apache/1.3.26 Mark/1.5 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
header: Last-Modified: Tue, 14 Oct 2008 04:02:15 GMT
header: ETag: "71d3f96-2895-48f419c7"
header: Accept-Ranges: bytes
header: Content-Length: 10389
header: Connection: close
header: Content-Type: text/html
'<!DOCTYPE '
Moonshot answered 14/10, 2008 at 15:53 Comment(0)
P
1

urllib.urlopen() throws the following request at the server:

GET / HTTP/1.0
Host: 127.0.0.1
User-Agent: Python-urllib/1.17

while urllib2.urlopen() throws this:

GET / HTTP/1.1
Accept-Encoding: identity
Host: 127.0.0.1
Connection: close
User-Agent: Python-urllib/2.5

So, your server either doesn't understand HTTP/1.1 or the extra header fields.

Preeminence answered 14/10, 2008 at 15:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.