How do I download a file using urllib.request in Python 3?
Asked Answered
E

2

12

So, I'm messing around with urllib.request in Python 3 and am wondering how to write the result of getting an internet file to a file on the local machine. I tried this:

g = urllib.request.urlopen('http://media-mcw.cursecdn.com/3/3f/Beta.png')
with open('test.png', 'b+w') as f:
    f.write(g)

But I got this error:

TypeError: 'HTTPResponse' does not support the buffer interface

What am I doing wrong?

NOTE: I have seen this question, but it's related to Python 2's urllib2 which was overhauled in Python 3.

Equilibrant answered 6/4, 2013 at 1:25 Comment(1)
possible duplicate of Download file from web in Python 3Democratic
W
11

change

f.write(g)

to

f.write(g.read())
Wiatt answered 6/4, 2013 at 1:32 Comment(0)
B
7

An easier way I think (also you can do it in two lines) is to use:

import urllib.request
urllib.request.urlretrieve('http://media-mcw.cursecdn.com/3/3f/Beta.png', 'test.png')

As for the method you have used. When you use g = urllib.request.urlopen('http://media-mcw.cursecdn.com/3/3f/Beta.png') you are just fetching the file. You must use g.read(), g.readlines() or g.readline() to read it it.

It's just like reading a normal file (except for the syntax) and can be treated in a very similar way.

Bicapsular answered 21/8, 2017 at 21:39 Comment(8)
The PEP20 would have you use Request from urllib.request but yours would have a line less of code. Information about PEP20 for Request. You can use open() chained to file.write(url.read()) like you mentioned.Whitney
@Whitney Are you sure? The link mentioned Open the URL url, which can be either a string or a Request object., here I specified a string so I don't think Request is required in this case.Bicapsular
That worked on debian9 using python3.5. I don't use 2.7 too much.Whitney
This doesn't work if you have to get round the 403: Forbidden issue using https://mcmap.net/q/167716/-urllib2-httperror-http-error-403-forbiddenChirurgeon
@Sevenearths That's true. However that's a different issue. Out of all the files I have used python to download/read, only a handful have ever given me a 403 error. I don't think this is a big enough reason not to warrent the use of urlretrieve(). Obviously if that issue is encounted, then what you have linked is the way forwardBicapsular
Interesting how experiences differ. While writing my app the first url I tried https://medium.com/@tomaspueyo/coronavirus-the-hammer-and-the-dance-be9337092b56 and it gave me the 403: Forbidden. I wonder if it's just a Medium related issueChirurgeon
@Sevenearths 403 is a Forbidden error. This usually happens when a website (server) attempts to block a bot. Or you try to access a webpage with incorrect login/cert information (usually cookie related from my experience, like passing outdated information, or similar). Seen as the solution you listed uses a user agent, it strongly looks like that site attepts to block bots (which makes sense since it's a news site) a user agent tricks the server into thinking it's a legitimate browser.Bicapsular
@Sevenearths Personally I usually use dedicated APIs (and this sort of thing never comes up, as they expect bots), which is probably why I don't encounter the problem much.Bicapsular

© 2022 - 2024 — McMap. All rights reserved.