Using cookies.txt file with Python Requests
Asked Answered
S

5

27

I'm trying to access an authenticated site using a cookies.txt file (generated with a Chrome extension) with Python Requests:

import requests, cookielib

cj = cookielib.MozillaCookieJar('cookies.txt')
cj.load()
r = requests.get(url, cookies=cj)

It doesn't throw any error or exception, but yields the login screen, incorrectly. However, I know that my cookie file is valid, because I can successfully retrieve my content using it with wget. Any idea what I'm doing wrong?

Edit:

I'm tracing cookielib.MozillaCookieJar._really_load and can verify that the cookies are correctly parsed (i.e. they have the correct values for the domain, path, secure, etc. tokens). But as the transaction is still resulting in the login form, it seems that wget must be doing something additional (as the exact same cookies.txt file works for it).

Schoonover answered 7/2, 2013 at 3:14 Comment(2)
Related: Using Chrome's cookies in Python-RequestsFirecrest
I use this extension: chrome.google.com/webstore/detail/cookietxt-export/…Schoonover
F
18

MozillaCookieJar inherits from FileCookieJar which has the following docstring in its constructor:

Cookies are NOT loaded from the named file until either the .load() or
.revert() method is called.

You need to call .load() method then.

Also, like Jermaine Xu noted the first line of the file needs to contain either # Netscape HTTP Cookie File or # HTTP Cookie File string. Files generated by the plugin you use do not contain such a string so you have to insert it yourself. I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5

EDIT

Session cookies are saved with 0 in the 5th column. If you don't pass ignore_expires=True to load() method all such cookies are discarded when loading from a file.

File session_cookie.txt:

# Netscape HTTP Cookie File
.domain.com TRUE    /   FALSE   0   name    value

Python script:

import cookielib

cj = cookielib.MozillaCookieJar('session_cookie.txt')
cj.load()
print len(cj)

Output: 0

EDIT 2

Although we managed to get cookies into the jar above they are subsequently discarded by cookielib because they still have 0 value in the expires attribute. To prevent this we have to set the expire time to some future time like so:

for cookie in cj:
    # set cookie expire date to 14 days from now
    cookie.expires = time.time() + 14 * 24 * 3600

EDIT 3

I checked both wget and curl and both use 0 expiry time to denote session cookies which means it's the de facto standard. However Python's implementation uses empty string for the same purpose hence the problem raised in the question. I think Python's behavior in this regard should be in line with what wget and curl do and that's why I raised the bug at http://bugs.python.org/issue17164
I'll note that replacing 0s with empty strings in the 5th column of the input file and passing ignore_discard=True to load() is the alternate way of solving the problem (no need to change expiry time in this case).

Firecrest answered 7/2, 2013 at 19:47 Comment(8)
Yes I'm calling load, and I've added the right header to the top of the file, but it's still not working (I also tried with urllib2 instead of requests). This is a complete mystery.Schoonover
@Schoonover Please never, ever give made up code without any warning again.Firecrest
And what do you mean by that exactly? What "made up code" did I give without warning?Schoonover
@Schoonover The following two lines; cj = cookielib.MozillaCookieJar('cookies.txt') r = requests.get(url, cookies=cj) which do not contain call to .load() method which you DO have in your code somewhere in-between these two lines. So, it's not the real code you have but a made up one.Firecrest
Thanks for the update! I was full of hope, because you're right: the ignore_expires argument does make a difference, but unfortunately it's still the same result: cannot login. I'm wondering if there's a way I could compare what wget does with what my script does (i.e. in terms of exact HTTP transactions)?Schoonover
Don't loose your hope, yet. I'm here to help you. The fact session cookies were deemed expired during loading should warn us they can be treated the same further on the path...Firecrest
Right, it works (with the trick in your second edit): excellent! Since this is the appropriate answer to my actual question (contrary to mine that I also just posted), I'm accepting it as the answer, of course. Many thanks Piotr!Schoonover
FYI: The cookielib module has been renamed to http.cookiejar in Python 3. sourceSatisfy
A
12

I tried taking into account everything that Piotr Dobrogost had valiantly figured out about MozillaCookieJar but to no avail. I got fed up and just parsed the damn cookies.txt myself and now all is well:

import re
import requests

def parseCookieFile(cookiefile):
    """Parse a cookies.txt file and return a dictionary of key value pairs
    compatible with requests."""

    cookies = {}
    with open (cookiefile, 'r') as fp:
        for line in fp:
            if not re.match(r'^\#', line):
                lineFields = line.strip().split('\t')
                cookies[lineFields[5]] = lineFields[6]
    return cookies

cookies = parseCookieFile('cookies.txt')

import pprint
pprint.pprint(cookies)

r = requests.get('https://example.com', cookies=cookies)

Anciently answered 12/2, 2019 at 22:15 Comment(1)
Saved my day... Make sure to use try and except block if you have some empty lines in your txt file like myself. I'll post an answer by myselfAmphibrach
L
5

This worked for me:

from http.cookiejar import MozillaCookieJar
from pathlib import Path
import requests

cookies = Path('/Users/name/cookies.txt')
jar = MozillaCookieJar(cookies)
jar.load()
requests.get('https://path.to.site.com', cookies=jar)
<Response [200]>
Lastex answered 24/3, 2021 at 6:10 Comment(0)
A
3

I tried editing Tristan answer to add some info to it but it seems SO edit q is full therefore, I am writing this answer, since, I have struggled real bad on using existing cookies with python request.

  1. First, get the cookies from the Chrome. Easiest way would be to use an extension called 'cookies.txt'
https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/related
  1. After downloading those cookies, use the below code to make sure that you are able to parse the file without any issues.
import re, requests, pprint
    
def parseCookieFile(cookiefile):
    """Parse a cookies.txt file and return a dictionary of key value pairs
    compatible with requests."""

    cookies = {}
    with open (cookiefile, 'r') as fp:
        for line in fp:
            if not re.match(r'^\#', line):
                lineFields = re.findall(r'[^\s]+', line) #capturing anything but empty space
                try:
                    cookies[lineFields[5]] = lineFields[6]
                except Exception as e:
                    print (e)
          
    return cookies
    
cookies = parseCookieFile('cookies.txt') #replace the filename
pprint.pprint(cookies)
  1. Next, use those cookies with python request
x = requests.get('your__url', verify=False, cookies=cookies)
print (x.content)

This should save your day from going on different SO posts and trying those cookielib and other methods which never worked for me.

Amphibrach answered 6/1, 2022 at 6:50 Comment(0)
S
0

I finally found a way to make it work (I got the idea by looking at curl's verbose ouput): instead of loading my cookies from a file, I simply created a dict with the required value/name pairs:

cd = {'v1': 'n1', 'v2': 'n2'}
r = requests.get(url, cookies=cd)

and it worked (although it doesn't explain why the previous method didn't). Thanks for all the help, it's really appreciated.

Schoonover answered 7/2, 2013 at 22:21 Comment(1)
I'm glad you didn't ask the question you meant to ask - "How to send cookies using Requests|urllib2|Python?" because a) this had been already asked and answered, b) we got a chance to learn something new. :)Firecrest

© 2022 - 2024 — McMap. All rights reserved.