For various reasons I'm trying to switch from lxml.html.fromstring()
to lxml.html.html5parser.document_fromstring()
. The big difference between the two is that the first returns an lxml.html.HtmlElement
, and the second returns an lxml.etree._Element
.
Mostly this is OK, but when I try to run my code with the _Element
object, it crashes, saying:
AttributeError: 'lxml.etree._Element' object has no attribute 'rewrite_links'
Which makes sense. My question is, what's the best way to deal with this problem. I have a lot of code that expects HtmlElements, so I think the best solution will be to convert to those. I'm not sure that's possible though.
Update
One terrible solution looks like this:
from lxml.html import fromstring, tostring
from lxml.html import html5parser
e = html5parser.fromstring(text)
html_element = fromstring(tostring(e))
Obviously, that's pretty brute force, but it does work. I'm able to get an HtmlElement that's been parsed by the html5parser, which is what I'm after.
The other option would be to work out how to do the rewrite_links and xpath queries that I rely on, but _Element
s don't seem to have that function (which, again, makes sense!)
etree.HTML()
? – Acadian