I'm trying to parse an HTML fragment that contains a custom HTML tag using Nokogiri.
Example:
string = "<div>hello</div>\n<custom-tag></custom-tag>"
I tried to load it in many ways, but none is optimal.
If I use Nokogiri::HTML:
doc = Nokogiri::HTML(string)
When I use to_html
, it adds a doctype
and an html
tag that wraps the content. It's undesired.
If I use Nokogiri::XML:
doc = Nokogiri::XML(string)
I got Error at line 2: Extra content at the end of the document
, since in XML there must be a root tag that wraps all the document content. If I try to save this content again, The output is <div>hello</div>
(every tag after the first is removed)
I tried also doc = Nokogiri::HTML.fragment
:
doc = Nokogiri::HTML.fragment(string)
But it complains about the custom-tag
.
How can I make Nokogiri parse correctly with this HTML fragment?
custom-tag
. I need to make a few xpath queries, edit the content, and serialize back to html without errors. – Arakdoc = Nokogiri::HTML(string).inner_html
? – Savagismdoc.errors
. Should I just ignore them? How can I be sure that the content will be intact? @AmitSharmainner_html
seems to work the same asto_html
... – Arakinner_html
does not adds adoctype
but it wraps the content withhtml
– Savagismhtml
tag as well. This is not my request. I want to parse the content and save it back without any change. – Arak