urllib3 download a file using specified user agent
Asked Answered
M

1

5

What is the correct way to update the user agent information in urllib3?

How can I check that the user agent information was indeed changed and is being used?

For example:

user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
http = urllib3.PoolManager(10, headers=user_agent)

r1 = http.request('GET', 'http://example.com/')
if r1.status is 200:
    with open('somefile','w+') as f:
        f.write(r1.data)

When I create a PoolManager at http I looked at it by dir(http) and saw that http.headers was empty by default and updated to the user agent info specified, but is it being used? Is there anyway to check without having to look at apache logs?

And actually checking /var/log/apache2/access.log after trying to update the user agent:

>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
>>> http = urllib3.PoolManager(2, headers=user_agent)
>>> r = http.request('GET','localhost')
>>> with open('/var/log/apache2/access.log','r') as f:
...     last_line = f.readlines()[-1]
... 
>>> last_line
'127.0.0.1 - - [08/Dec/2014:20:42:04 -0500] "GET / HTTP/1.1" 200 461 "-" "-"\n'
Merce answered 9/12, 2014 at 1:26 Comment(1)
There are a number of websites that will show you your user agent when you hit them. You could try downloading one of those.Alleviate
S
13

header argument should be headers:

http = urllib3.PoolManager(10, header=user_agent)

You can confirm that headers were set correctly using sites like httpbin.org:

>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) ..'}
>>> http = urllib3.PoolManager(10, headers=user_agent)
>>> r1 = http.urlopen('GET', 'http://httpbin.org/headers')
>>> print(r1.data)
{
  "headers": {
    "Accept-Encoding": "identity",
    "Connect-Time": "1",
    "Connection": "close",
    "Host": "httpbin.org",
    "Total-Route-Time": "0",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0",
    "Via": "1.1 vegur",
    "X-Request-Id": "5ef53f21-6caf-4e45-8123-98e417cd05ba"
  }
}

or you can use a packet analyzer (eg. Wireshark).

Sectionalize answered 9/12, 2014 at 1:38 Comment(2)
How to change value of an specific header in an HTTPResponse object?Thunderpeal
@SajadHTLO, in HTTPResponse? No, it's not possible.Sectionalize

© 2022 - 2024 — McMap. All rights reserved.