How to send cookies with urllib
Asked Answered
M

2

8

I'm attempting to connect to a website that requires you to have a specific cookie to access it. For the sake of this question, we'll call the cookie 'required_cookie' and the value 'required_value'.

This is my code:

import urllib
import http.cookiejar

cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))

opener.addheaders = [('required_cookie', 'required_value'), ('User-Agent', 'Mozilla/5.0')]

urllib.request.install_opener(opener)

req = Request('https://www.thewebsite.com/')
webpage = urlopen(req).read()
print(webpage)

I'm new to urllib so please answer me as a beginner

Meetly answered 4/8, 2018 at 4:10 Comment(4)
You need to add the cookie to the cookie jar, and then tell the cookie jar to add its cookies to the request.Turbulence
as I said, complete beginner - how do I do this?Meetly
If you're a complete beginner, is there a reason you need to use the complicated urllib and http packages instead of requests? Because this entire script could be replace the the one-liner webpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': Mozilla/5.0'}).text.Turbulence
@Turbulence Yeah this works perfectly, thanks. Any chance you could put that into an answer to finish the topic?Meetly
T
11

To do this with urllib, you need to:

  • Construct a Cookie object. The constructor isn't documented in the docs, but if you help(http.cookiejar.Cookie) in the interactive interpreter, you can see that its constructor demands values for all 16 attributes. Notice that the docs say, "It is not expected that users of http.cookiejar construct their own Cookie instances."
  • Add it to the cookiejar with cj.set_cookie(cookie).
  • Tell the cookiejar to add the correct headers to the request with cj.add_cookie_headers(req).

Assuming you've configured the policy correctly, you're set.

But this is a huge pain. As the docs for urllib.request say:

See also The Requests package is recommended for a higher-level HTTP client interface.

And, unless you have some good reason you can't install requests, you really should go that way. urllib is tolerable for really simple cases, and it can be handy when you need to get deep under the covers—but for everything else, requests is much better.

With requests, your whole program becomes a one-liner:

webpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': 'Mozilla/5.0'}).text

… although it's probably more readable as a few lines:

cookies = {'required_cookie': required_value}
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.thewebsite.com/', cookies=cookies, headers=headers)
webpage = response.text
Turbulence answered 4/8, 2018 at 5:45 Comment(1)
If one was stuck with just urllib, one could use a MozillaCookieJar and load a cookies.txt file; this might be simpler than constructing the Cookie objects directlyCriss
A
9

With the help of Kite documentation: https://www.kite.com/python/answers/how-to-add-a-cookie-to-an-http-request-using-urllib-in-python
You can add cookie this way:

import urllib
a_request = urllib.request.Request("http://www.kite.com/")
a_request.add_header("Cookie", "cookiename=cookievalue")
page = urllib.request.urlopen(a_request).read()

or in a different way:

from urllib.request import Request
url = "https://www.kite.com/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Cookie':'myCookie=lovely'})
page = urllib.request.urlopen(req).read()
Assumption answered 4/8, 2021 at 15:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.