BeautifulSoup - TypeError: 'NoneType' object is not callable
Asked Answered
D

2

13

I need to make my code backwards compatible with python2.6 and BeautifulSoup 3. My code was written using python2.7 and at this case using BS4. But when I try to run it at squeezy server, I get this error (it has python2.6 and bs3):

try:
    from bs4 import BeautifulSoup
except ImportError:
    from BeautifulSoup import BeautifulSoup

gmp = open(fname, 'r')
soup = BeautifulSoup(gmp)
p = soup.body.div.find_all('p')

p = soup.body.div.find_all('p')
TypeError: 'NoneType' object is not callable

If I change to:

   p = soup.body.div.findAll('p')

then I get this error:

p = soup.body.div.findAll('p')
TypeError: 'NoneType' object is not callable

Update of thrown error

  File "/home/user/openerp/7.0/addons/my_module/models/gec.py", line 401, in parse_html_data
    p = soup.body.div.findAll('p') #used findAll instead of find_all for backwards compatability to bs3 version
TypeError: 'NoneType' object is not callable

Either way, both approaches work on my Ubuntu with python2.7 and bs4, but not on squeezy. Is there any other difference between those versions that I don't see/know and gives me this error?

Decrescent answered 28/10, 2014 at 8:5 Comment(2)
There is no point in falling back to from BeautifulSoup import BeautifulSoup (version 3) when using version 4 only syntax.Abscission
You should see that I written I tried using backwards compatible syntax, but still got same error.Decrescent
A
25

You are using BeautifulSoup 3, but are using BeautifulSoup 4 syntax.

Your fallback is at fault here:

try:
    from bs4 import BeautifulSoup
except ImportError:
    from BeautifulSoup import BeautifulSoup

If you want to use either version 3 or 4, stick to version 3 syntax:

p = soup.body.div.findAll('p')

because find_all is not a valid method in BeautifulSoup 3, so it is instead interpreted as a tag search. There is no find_all tag in your HTML, so None is returned, which you then try to call.

Next, the parser used by BeautifulSoup 3 will respond differently to broken or incomplete HTML. If you have lxml installed on Ubuntu, then that'll be used as the default parser, and it'll insert a missing <body> tag for you. BeautifulSoup 3 may leave that out.

I strongly urge you to instead remove the fallback, and stick with BeautifulSoup version 4 only. Version 3 has been discontinued years ago, and contains unfixed bugs. BeautifulSoup 4 also offers additional features you may want to make use of.

BeautifulSoup is pure Python and easily installed into a virtual environment on any platform supported by Python. You are not tied to the system-supplied package here.

On Debian Squeezy for example, you'd be stuck with BeautifulSoup 3.1.0, and even the BeautifulSoup developers do not want you to use it!. Your problem with findAll almost certainly stems from using that release.

Abscission answered 28/10, 2014 at 8:9 Comment(14)
Well my intention was to use bs4. But the thing is, squeezy only has bs3 and I need it to work there too. But why findAll does not work when it falls back on bs3? Well when I use bs3.Decrescent
@Andrius: I was about to post this on your question: what is the full traceback of the exception thrown for findAll()? Are you sure you copied the correct exception message there (it is the same as for find_all)?Abscission
I just copy/pasted exact error I get when using findAllDecrescent
@Andrius: interesting, because I cannot reproduce that with BeautifulSoup 3.2.1. I do have a Squeezy system, since 3.1.0 isn't available on PyPI anymore I'll try and install it there.Abscission
@Andrius: this looks like a 3.1.0-specific problem. That release was quickly replaced by the 3.2 series, and the developers told everyone not to use 3.1 back in 2009. Do not use that release, and you won't have problems.Abscission
@Andrius: I am not, however, able to reproduce your exact error; that probably depends on the exact HTML you used.Abscission
Exact version is 3.1.0.1 on squeezy. I don't have sudo rights for that server so I can only ask for any packages to be installed and they might even be older than they should. So I might need to go with virtualenv, its just there are some weird things about that company which administrates it. They said they would not be responsible (well they give server support) if anything would go wrong on server if we use virtualenv and recommends us using global packages (without virtualenv), so then they could administrate it "safely".Decrescent
@Andrius: you don't need sudo to add a package next to your script. You can just put it in the same directory and it'll work.Abscission
So is there newer version that can be installed on squeezy? cause they installed only this version. And squeezy supposed to be stable...Decrescent
@Andrius: just like you can put your Python script on that server, you can put other Python files there too. No compilation or installation required.Abscission
@Andrius: you can also install packages in ~/.local/lib/python2.6/site-packages and Python will automatically pick those up too. If you use pip, you can use pip install --user beautifulsoup4 to install into that directory.Abscission
When I try to use --user option I get no such option error. I tried like this --user=myuser. If try without this option, then it tries to install globaly and I get permission denied error. Also gives same error when just using --user.Decrescent
@Andrius: oh the joys of outdated software. I tested with pip on a Debian Squeezy box, and the option was there, but I didn't check if I installed pip from a Debian-supplied package. You can just create the ~/.local/lib/python2.6/site-packages directories, then download BeautifulSoup 4, unpack the tarball and put the bs4 directory and everything in it inside ~/.local/lib/python2.6/site-packages.Abscission
@Andrius: back at my desk I checked the pip version I have on the Debian Squeeze box, and it is a backport. I installed python-pip 1.0-1~bpo60+1 from squeeze-backports; the 0.7.2 version included in squeeze is way to old to be much use.Abscission
B
0

I'm aware this is a 6 year old post, but just posting this if someone has a similar issue.

It appears that on line 9 it's supposed to be a formatted string, it appears to work perfectly fine after adding the f.

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

product_all_pages = []

for i in range(1,15):
    response = requests.get(f"https://www.bol.com/nl/s/?page={i}&searchtext=hand+sanitizer&view=list")
    content = response.content
    parser = BeautifulSoup(content, 'html.parser')
    body = parser.body
    producten = body.find_all(class_="product-item--row js_item_root")
    product_all_pages.extend(producten)
len(product_all_pages)

price = float(product_all_pages[1].meta.get('content'))
productname = product_all_pages[1].find(class_="product-title--inline").a.getText()
print(price)
print(productname)

productlijst = []

for item in product_all_pages:
    if item.find(class_="product-prices").getText() == '\nNiet leverbaar\n':
        price = None
    else:
        price = float(item.meta['content'])
    product = item.find(class_="product-title--inline").a.getText()
    productlijst.append([product, price])
    
print(productlijst[:3])

df = pd.DataFrame(productlijst, columns=["Product", "price"])
print(df.shape)
df["price"].describe()
Binah answered 4/5, 2021 at 17:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.