What should I use to open a url instead of urlopen in urllib3

Asked 9/4, 2016 at 11:33 Answered 12/2, 2021 at 11:37

python web-scraping beautifulsoup urllib3

I wanted to write a piece of code like the following:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)

But I found that I have to install urllib3 package now.

Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3 does not have urlopen.

Any explanation or example, please?!

P/S: I'm using python 3.4.

Felicific answered 9/4, 2016 at 11:33 Comment(4)

why do you have to install urllib3 when the example works? – Civics 9/4, 2016 at 11:40

Because it doesn't work for me, no urllib2 found. – Felicific 9/4, 2016 at 11:55

@Felicific Python 3.4 has urllib2 named as urllib. from urllib import urlopen should work for this case. – Lunsford 25/10, 2016 at 19:3

Don't use urllib3. Do this: import urllib.request urllib.request.urlopen('https://...') – Flashback 12/2, 2021 at 11:39

urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/

If you'd like to use urllib3, you'll need to pip install urllib3. A basic example looks like this:

from bs4 import BeautifulSoup
import urllib3

http = urllib3.PoolManager()

url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)

Noway answered 9/4, 2016 at 18:33 Comment(2)

response.read() does not work at least in Python 2.7. As per the documentation urllib3.readthedocs.io/en/latest/user-guide.html it should be html = response.data. – Prizefight 12/2, 2017 at 3:57

This example give me a an exception error (urllib3.exceptions.MaxRetryError) python3 – Redshank 20/6, 2021 at 20:22

You do not have to install urllib3. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup. The choice is though usually requests because of the rich feature set and convenient API. You can install requests by entering pip install requests in the command line. Here is a basic example:

from bs4 import BeautifulSoup
import requests

url = "url"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

Sumner answered 9/4, 2016 at 11:50 Comment(3)

FWIW, you still need to install requests if you want to use requests. Neither of them come native with Python. – Noway 9/4, 2016 at 18:30

Requests depends on urllib3. – Indocile 27/7, 2018 at 12:54

@CeesTimmerman I tried requests without urlib3 and it works, why it depends on urllib3? – Redshank 20/6, 2021 at 20:25

The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:

Import urllib3
from bs4 import BeautifulSoup

url = 'http://www.thefamouspeople.com/singers.php'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))

The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide

Inconsecutive answered 7/11, 2017 at 13:57 Comment(2)

is requests simply using urllib3 behind the scenes – Caracalla 10/4, 2018 at 13:12

@Caracalla It is. – Dissimulation 13/1, 2019 at 19:3

With gazpacho you could pipeline the page straight into a parse-able soup object:

from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

And run finds on top of it:

soup.find("div")

Philpot answered 9/10, 2020 at 20:41 Comment(0)

In urlip3 there's no .urlopen, instead try this:

import requests
html = requests.get(url)

Philae answered 10/1, 2021 at 11:52 Comment(0)

-1

You should use urllib.reuqest, not urllib3.

import urllib.request   # not urllib - important!
urllib.request.urlopen('https://...')

Flashback answered 12/2, 2021 at 11:37 Comment(0)

Recommended topics

Hot tags