Python: Log in a website using urllib
Asked Answered
D

2

9

I want to log in to this website: https://www.fitbit.com/login This is my code I use:

import urllib2
import urllib
import cookielib

login_url = 'https://www.fitbit.com/login'
acc_pwd = {'login':'Log In','email':'username','password':'pwd'}
cj = cookielib.CookieJar() ## add cookies
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent','Mozilla/5.0 \
                    (compatible; MSIE 6.0; Windows NT 5.1)')]
data = urllib.urlencode(acc_pwd)
try:
    opener.open(login_url,data,10)
    print 'log in - success!'
except:
    print 'log in - times out!', login_url

I use chrome to inspect the element of the input box, I tried many key pairs, but none works. Any one can help me take a look at this website? What is the correct data I show put in my variable acc_pwd?

Thank you very much

Desinence answered 13/5, 2014 at 19:5 Comment(0)
C
8

You're forgetting the hidden fields of the form:

<form id="loginForm" class="validate-enabled failure form" method="post" action="https://www.fitbit.com/login" name="login">
    <input type="hidden" value="Log In" name="login">
    <input type="hidden" value="" name="includeWorkflow">
    <input id="loginRedirect" type="hidden" value="" name="redirect">
    <input id="disableThirdPartyLogin" type="hidden" value="false" name="disableThirdPartyLogin">
    <input class="field email" type="text" tabindex="23" name="email" placeholder="E-mail">
    <input class="field password" type="password" tabindex="24" name="password" placeholder="Mot de passe">
</form>

so you may want to update:

acc_pwd = {'login':'Log In',
           'email':'username',
           'password':'pwd',
           'disableThirdPartyLogin':'false',
           'loginRedirect':'',
           'includeWorkflow':'',
           'login':'Log In'
          }

which might get checked by their service. Though, given the name of the field disableThirdPartyLogin, I'm wondering if there's no dirty javascript bound to the form's submit action that actually adds a value before actually doing the POST. You might want to check that with developer tools and POST values analyzed.

Testing that looks it does not, though the javascript adds some values, which may be from cookies:

__fp    w686jv_O1ZZztQ7FkK21Ry2MI7JbqWTf
_sourcePage tJvTQfA5dkvGrJMFkFsv6XbX0f6OV1Ndj1zeGcz7OKzA3gkNXMXGnj27D-H9WXS-
disableThirdPartyLogin  false
email   [email protected]
includeWorkflow 
login   Log In
password    aeou
redirect    

here's my take on doing this using requests (which has a better API than urllib ;-) )

>>> import requests
>>> import cookielib
>>> jar = cookielib.CookieJar()
>>> login_url = 'https://www.fitbit.com/login'
>>> acc_pwd = {'login':'Log In',
...            'email':'username',
...            'password':'pwd',
...            'disableThirdPartyLogin':'false',
...            'loginRedirect':'',
...            'includeWorkflow':'',
...            'login':'Log In'
...           }
>>> r = requests.get(login_url, cookies=jar)
>>> r = requests.post(login_url, cookies=jar, data=acc_pwd)

and don't forget to first get on the login page using a get to fill your cookies jar in!

Finally, I can't help you further, as I don't have a valid account on fitbit.com and I don't need/want one. So I can only get to the login failure page for my tests.

edit:

to parse the output, then you can use:

>>> from lxml import etree
>>> p = etree.HTML(r.text)

for example to get the error messages:

>>> p.xpath('//ul[@class="errorList"]/li/text()')
['Lutilisateur nexiste pas ou le mot de passe est incorrect.']

resources:

and they both on pypi:

pip install lxml requests

HTH

Civilian answered 13/5, 2014 at 19:14 Comment(5)
Thanks! But for the next, how can I make use of r? I have never used request library before.Desinence
you can use r.status_code to get the status code, r.cookies to get the cookie jar (or you can use jar), you can use r.text and pass it to lxml or BeautifulSoup to help you parse the resulting page. Have a look at python-requests.org to get you convinced how great that library is :-)Civilian
I means what I usually do is: 1. opener.open(login_url, data) 2. regularly urllib2.urlopen.request(balabalabala) So if I use request to login, does that mean I have to use request to pull out html from url? I tried my regular second step, it doesn't work.Desinence
@Civilian is there a need to specify ALL the "loginForm" parameters? I'm asking because for example on eBay there are more than a dozen parameters for a simple email+password loginForm – except the email/pass, all of them are hidden params. Why can't they just receive their default value? Thanks!Sainfoin
It might be needed or not, depending on the site's implementation. The only say to know for sure is to actually try with and without, and see for yourself how it behaves. You might want to set all meaningful values instead of relying on defaults, as defaults can change, and determinism is always safer for your implementationCivilian
A
1

you are going to have a hard time with just urllib

you will likely need to use approved methods https://wiki.fitbit.com/display/API/Fitbit+API;jsessionid=7D918DE258862E80575153385C02507D

which will require an oauth token ... which will require opening a webpage and having a user login

Athenian answered 13/5, 2014 at 20:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.