To do this with urllib
, you need to:
- Construct a
Cookie
object. The constructor isn't documented in the docs, but if you help(http.cookiejar.Cookie)
in the interactive interpreter, you can see that its constructor demands values for all 16 attributes. Notice that the docs say, "It is not expected that users of http.cookiejar construct their own Cookie instances."
- Add it to the cookiejar with
cj.set_cookie(cookie)
.
- Tell the cookiejar to add the correct headers to the request with
cj.add_cookie_headers(req)
.
Assuming you've configured the policy correctly, you're set.
But this is a huge pain. As the docs for urllib.request
say:
See also The Requests package is recommended for a higher-level HTTP client interface.
And, unless you have some good reason you can't install requests
, you really should go that way. urllib
is tolerable for really simple cases, and it can be handy when you need to get deep under the covers—but for everything else, requests
is much better.
With requests
, your whole program becomes a one-liner:
webpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': 'Mozilla/5.0'}).text
… although it's probably more readable as a few lines:
cookies = {'required_cookie': required_value}
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.thewebsite.com/', cookies=cookies, headers=headers)
webpage = response.text
urllib
andhttp
packages instead ofrequests
? Because this entire script could be replace the the one-linerwebpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': Mozilla/5.0'}).text
. – Turbulence