I've rechecked my code and looked at comparable operations on opening a URL to pass web data into Beautiful Soup, for some reason my code just doesn't return anything although it's in correct form:
>>> from bs4 import BeautifulSoup
>>> from urllib3 import poolmanager
>>> connectBuilder = poolmanager.PoolManager()
>>> content = connectBuilder.urlopen('GET', 'http://www.crummy.com/software/BeautifulSoup/')
>>> content
<urllib3.response.HTTPResponse object at 0x00000000032EC390>
>>> soup = BeautifulSoup(content)
>>> soup.title
>>> soup.title.name
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'name'
>>> soup.p
>>> soup.get_text()
''
>>> content.data
a stream of data follows...
As shown, it's clear that urlopen() returns an HTTP response which is captured by the variable content, it makes sense that it can read the status of the response, but after it's passed into Beautiful Soup, the web data doesn't get converted into a Beautiful Soup object (variable soup). You can see that I've tried to read a few tags and text, the get_text() returns an empty list, this is strange.
Strangely, when I access the web data via content.data, the data shows up but it's not useful since I can't use Beautiful Soup to parse it. What is my problem? Thanks.
BeautifulSoup
object—otherwise,soup.title
would have raised an exception rather than giving youNone
. A better way to tell is to print outtype(soup)
. – Necrolatryurlopen
is the lowest level. Unless you have some good reason, you do not want to use the lowest level. Especially if you're just learning. That's why that same documentation recommends, at least twice, that you use one of the convenience methods. While you could learn all the nitty-gritty details of howurllib3
works under the covers, wouldn't you rather first learn how to use it the easy way, and write some working code you can play with to learn further? – Necrolatry