why does urllib.urlopen(url) fail while urllib2.urlopen(url) works. What specifically about the server response is causing this?
Asked Answered
T

2

8

I just want a better idea of what's going on here, I can of course "work around" the problem by using urllib2.

import urllib
import urllib2

url = "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"

# urllib2 works fine (foo.headers / foo.read() also behave)
foo = urllib2.urlopen(url)

# urllib throws errors though, what specifically is causing this?
bar = urllib.urlopen(url)

http://pae.st/AxDW/ shows this code in action with the exception/stacktrace. foo.headers and foo.read() work fine

[email protected] ~ $: curl -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html"

HTTP/1.1 302 Object Moved
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Location: /S-FSTWJcduy5w/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html
Server: Microsoft-IIS/7.5
Set-Cookie: SESSIONID=FSTWJcduy5w; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SYSTEMID=0; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
Set-Cookie: SESSIONDATE=02/23/2012 17:07:00; domain=.crutchfield.com; expires=Fri, 22-Feb-2013 22:06:43 GMT; path=/
X-AspNet-Version: 4.0.30319
HostName: cws105
Date: Thu, 23 Feb 2012 22:06:43 GMT

Thanks.

Traditor answered 23/2, 2012 at 22:15 Comment(0)
M
7

This server is both non-deterministic and sensitive to HTTP version. urllib2 is HTTP/1.1, urllib is HTTP/1.0. You can reproduce this by running curl --http1.0 -I "http://www.crutchfield.com/S-pqvJFyfA8KG/p_15410415/Dynamat-10415-Xtreme-Speaker-Kit.html" a few times in a row. You should see the output curl: (52) Empty reply from server occasionally; that's the error urllib is reporting. (If you re-issue the request a bunch of times with urllib, it should succeed sometimes.)

Matrimony answered 23/2, 2012 at 22:39 Comment(1)
Seems that urllib.urlopen(url) works just under 10% time. hooray for non-deterministic servers!Traditor
P
0

I solved the Problem. I simply using now the urrlib instead of urllib2 and anything works fine thank you all :)

Photoconductivity answered 24/2, 2015 at 10:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.