Error in reading html to data frame in Python “html5lib not found”
Asked Answered
T

4

23

I've come accross the following error about html5lib when trying to read an html data frame.

Here is the code:

!pip install html5lib
!pip install lxml
!pip install beautifulSoup4

import html5lib
import lxml
from bs4 import BeautifulSoup

table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

This is the error:

ImportError                               Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
    913                   thousands=thousands, attrs=attrs, encoding=encoding,
    914                   decimal=decimal, converters=converters, na_values=na_values,
--> 915                   keep_default_na=keep_default_na)

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    737     retained = None
    738     for flav in flavor:
--> 739         parser = _parser_dispatch(flav)
    740         p = parser(io, compiled_match, attrs, encoding)
    741 

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
    680     if flavor in ('bs4', 'html5lib'):
    681         if not _HAS_HTML5LIB:
--> 682             raise ImportError("html5lib not found, please install it")
    683         if not _HAS_BS4:
    684             raise ImportError(

ImportError: html5lib not found, please install it

Any help would be much appreciated. Thanks

Tektite answered 1/3, 2018 at 3:36 Comment(0)
G
24

If you read the error message, you don't have html5lib installed. Do:

pip install html5lib

in your terminal.


If you are calling from jupyter notebook (just like you did with !), try to restart the kernel in order to have the packages loaded.

Gimcrack answered 1/3, 2018 at 4:22 Comment(2)
Are you running your code in jupyter notebook? If yes, have you tried to restart the kernel?Gimcrack
Yes, I'm using Jupyter. Just restarted the kernel and run fine now. Thanks Yilun ;)Tektite
C
1

I had this exact error show up while trying to read a saved .htm file using Spyder IDE.

This code displayed html5lib error:

import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")

I knew I had html5lib installed and working correctly because I had other scripts that worked.

For whatever reason, file path needed to be a string literal (putting an r in front of the file path).

This code works for me:

import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")
Cecilia answered 18/8, 2022 at 13:17 Comment(0)
Y
0

I ran into this error when I gave the wrong path to the local file I was trying to open. So also be sure that you're pointing to the right place!

Yvor answered 8/4, 2022 at 17:57 Comment(0)
J
0

For my MacBook I used the following to install:

python3 -m pip install html5lib

I also updated my libs using:

python3.11 -m pip install --upgrade pip

Once done, the problem was solved

Jabalpur answered 17/4 at 19:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.