How to download a file over http with authorization in python 3.0, working around bugs?

Asked 27/12, 2008 at 21:45 Answered 25/3, 2021 at 9:51

I have a script that I'd like to continue using, but it looks like I either have to find some workaround for a bug in Python 3, or downgrade back to 2.6, and thus having to downgrade other scripts as well...

Hopefully someone here have already managed to find a workaround.

The problem is that due to the new changes in Python 3.0 regarding bytes and strings, not all the library code is apparently tested.

I have a script that downloades a page from a web server. This script passed a username and password as part of the url in python 2.6, but in Python 3.0, this doesn't work any more.

For instance, this:

import urllib.request;
url = "http://username:password@server/file";
urllib.request.urlretrieve(url, "temp.dat");

fails with this exception:

Traceback (most recent call last):
  File "C:\Temp\test.py", line 5, in <module>
    urllib.request.urlretrieve(url, "test.html");
  File "C:\Python30\lib\urllib\request.py", line 134, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python30\lib\urllib\request.py", line 1476, in retrieve
    fp = self.open(url, data)
  File "C:\Python30\lib\urllib\request.py", line 1444, in open
    return getattr(self, name)(url)
  File "C:\Python30\lib\urllib\request.py", line 1618, in open_http
    return self._open_generic_http(http.client.HTTPConnection, url, data)
  File "C:\Python30\lib\urllib\request.py", line 1576, in _open_generic_http
    auth = base64.b64encode(user_passwd).strip()
  File "C:\Python30\lib\base64.py", line 56, in b64encode
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str

Apparently, base64-encoding now needs bytes in and outputs a string, and thus urlretrieve (or some code therein) which builds up a string of username:password, and tries to base64-encode this for simple authorization, fails.

If I instead try to use urlopen, like this:

import urllib.request;
url = "http://username:password@server/file";
f = urllib.request.urlopen(url);
contents = f.read();

Then it fails with this exception:

Traceback (most recent call last):
  File "C:\Temp\test.py", line 5, in <module>
    f = urllib.request.urlopen(url);
  File "C:\Python30\lib\urllib\request.py", line 122, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python30\lib\urllib\request.py", line 359, in open
    response = self._open(req, data)
  File "C:\Python30\lib\urllib\request.py", line 377, in _open
    '_open', req)
  File "C:\Python30\lib\urllib\request.py", line 337, in _call_chain
    result = func(*args)
  File "C:\Python30\lib\urllib\request.py", line 1082, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python30\lib\urllib\request.py", line 1051, in do_open
    h = http_class(host, timeout=req.timeout) # will parse host:port
  File "C:\Python30\lib\http\client.py", line 620, in __init__
    self._set_hostport(host, port)
  File "C:\Python30\lib\http\client.py", line 632, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: 'password@server'

Apparently the url parsing in this "next gen url retrieval library" doesn't know what to do with username and passwords in the url.

What other choices do I have?

Subfusc answered 27/12, 2008 at 21:45 Comment(0)

Direct from the Py3k docs: http://docs.python.org/dev/py3k/library/urllib.request.html#examples

import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')

Saccharase answered 27/12, 2008 at 22:4 Comment(2)

did you mean to post that password? If not, then I suggest deleting the answer and posting a new one with dummy data there. Thanks for the answer though, this looks promising. – Subfusc 27/12, 2008 at 23:11

Klem is probably pretty pissed if that's his real password though :) – Saccharase 27/12, 2008 at 23:38

My advice would be to maintain your 2.* branch as your production branch until you can get the 3.0 stuff sorted.

I am going to wait a while before moving over to Python 3.0. There seems a lot of people in a rush, but I just want everything sorted out, and a decent selection of third-party libraries. This may take a year, it may take 18 months, but the pressure to "upgrade" is really low for me.

Chapbook answered 28/12, 2008 at 1:21 Comment(0)

You can use requests.get to download file. Try the sample code:

import requests
from requests.auth import HTTPBasicAuth

def download_file(user_name, user_pwd, url, file_path):
    file_name = url.rsplit('/', 1)[-1]
    with requests.get(url, stream = True, auth = HTTPBasicAuth(user_name, user_pwd)) as response:
        with open(file_path + "/" + file_name, 'wb') as f:
            for chunk in response.iter_content(chunk_size = 8192):
                f.write(chunk)

# You will download the login.html file to /home/dan/
download_file("dan", "password", "http://www.example.com/login.html", "/home/dan/")

Enjoy it!!

Same as: https://mcmap.net/q/1171482/-getting-a-file-from-an-authenticated-site-with-python-urllib-urllib2

Sheffie answered 25/3, 2021 at 9:51 Comment(0)

Recommended topics

Hot tags