What should I use to open a url instead of urlopen in urllib3
Asked Answered
F

6

74

I wanted to write a piece of code like the following:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)

But I found that I have to install urllib3 package now.

Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3 does not have urlopen.

Any explanation or example, please?!

P/S: I'm using python 3.4.

Felicific answered 9/4, 2016 at 11:33 Comment(4)
why do you have to install urllib3 when the example works?Civics
Because it doesn't work for me, no urllib2 found.Felicific
@Felicific Python 3.4 has urllib2 named as urllib. from urllib import urlopen should work for this case.Lunsford
Don't use urllib3. Do this: import urllib.request urllib.request.urlopen('https://...')Flashback
N
62

urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/

If you'd like to use urllib3, you'll need to pip install urllib3. A basic example looks like this:

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)
Noway answered 9/4, 2016 at 18:33 Comment(2)
response.read() does not work at least in Python 2.7. As per the documentation urllib3.readthedocs.io/en/latest/user-guide.html it should be html = response.data.Prizefight
This example give me a an exception error (urllib3.exceptions.MaxRetryError) python3Redshank
S
40

You do not have to install urllib3. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup. The choice is though usually requests because of the rich feature set and convenient API. You can install requests by entering pip install requests in the command line. Here is a basic example:

from bs4 import BeautifulSoup
import requests

url = "url"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")
Sumner answered 9/4, 2016 at 11:50 Comment(3)
FWIW, you still need to install requests if you want to use requests. Neither of them come native with Python.Noway
Requests depends on urllib3.Indocile
@CeesTimmerman I tried requests without urlib3 and it works, why it depends on urllib3?Redshank
I
13

The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:

Import urllib3
from bs4 import BeautifulSoup

url = 'http://www.thefamouspeople.com/singers.php'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))

The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide

Inconsecutive answered 7/11, 2017 at 13:57 Comment(2)
is requests simply using urllib3 behind the scenesCaracalla
@Caracalla It is.Dissimulation
P
0

With gazpacho you could pipeline the page straight into a parse-able soup object:

from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

And run finds on top of it:

soup.find("div")
Philpot answered 9/10, 2020 at 20:41 Comment(0)
P
0

In urlip3 there's no .urlopen, instead try this:

import requests
html = requests.get(url)
Philae answered 10/1, 2021 at 11:52 Comment(0)
F
-1

You should use urllib.reuqest, not urllib3.

import urllib.request   # not urllib - important!
urllib.request.urlopen('https://...')
Flashback answered 12/2, 2021 at 11:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.