HTTP Basic Authentication not working with Python 3
Asked Answered
G

3

6

I am trying to access an intranet site with HTTP Basic Authentication enabled.

Here's the code I'm using:

from bs4 import BeautifulSoup
import urllib.request, base64, urllib.error

request = urllib.request.Request(url)
string = '%s:%s' % ('username','password')

base64string = base64.standard_b64encode(string.encode('utf-8'))

request.add_header("Authorization", "Basic %s" % base64string)
try:
    u = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print(e)
    print(e.headers)

soup = BeautifulSoup(u.read(), 'html.parser')

print(soup.prettify())

But it doesn't work and fails with 401 Authorization required. I can't figure out why it's not working.

Grandam answered 7/11, 2017 at 12:1 Comment(1)
Does nobody have an answer?Grandam
G
9

The solution given here works without any modifications.

from bs4 import BeautifulSoup
import urllib.request

# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)

handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)

# use the opener to fetch a URL
u = opener.open(url)

soup = BeautifulSoup(u.read(), 'html.parser')

The previous code works as well. You just have to decode the utf-8 encoded string otherwise the header contains a byte-sequence.

from bs4 import BeautifulSoup
import urllib.request, base64, urllib.error

request = urllib.request.Request(url)
string = '%s:%s' % ('username','password')

base64string = base64.standard_b64encode(string.encode('utf-8'))

request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
try:
    u = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
    print(e)
    print(e.headers)

soup = BeautifulSoup(u.read(), 'html.parser')

print(soup.prettify())
Grandam answered 9/11, 2017 at 11:31 Comment(0)
M
0

UTF-8 encoding might not work. You can try to use ASCII or ISO-8859-1 encoding instead.

Also, try to access the intranet site with a web browser and check how the Authorization header is different from the one you are generating.

Merridie answered 9/11, 2017 at 10:33 Comment(5)
Thanks. I got the authentication to work using the instructions given here. But for learning purposes, how do I check what header is being generated by the browser?Grandam
Great that you got it working. You should submit the solution as an answer and accept it so others may benefit from it.Merridie
You can see the headers in browsers' dev tools. For example the Network tab in Chrome dev tools. #4423561Merridie
Ok I checked and the Authorization header I'm generating using the code above is exactly the same as what the browser is sending so I don't understand what the problem is.Grandam
Ok I figured out what the problem is. Modifying my answer with the solution.Grandam
Y
0

Encode using "ascii". This worked for me.

import base64
import urllib.request

url = "http://someurl/path"
username = "someuser"
token = "239487svksjdf08234"

request = urllib.request.Request(url)
base64string = base64.b64encode((username + ":" + token).encode("ascii"))
request.add_header("Authorization", "Basic {}".format(base64string.decode("ascii")))
response = urllib.request.urlopen(request)

response.read() # final response string
Younker answered 15/5, 2021 at 18:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.