I managed to work around this with both an HTML::Document
and an HTML::DocumentFragment
.
For background, I'm using Nokogiri to parse and modify "templates", "partials" and/or "components" of HTML. This means that the files I encounter are not valid HTML documents. They are, instead, pieces of an HTML document that gets put together by the framework I'm using.
For reference, HTML::Document
adds the <!DOCTYPE>
declaration and also wraps your document into <html>
and <body>
entities if they are not already present in your document. Similarly, HTML::DocumentFragment
will wrap your fragment with <p>
entity.
Rather than spending too much time digging into the Nokogiri library code to understand where these additional entities were being, I decided to accept to this opinionated implementation and work around it.
Solution
Here's how I write out my modified HTML:
html_str = doc.xpath("//body").children.to_html(encoding: 'UTF-8')
File.open(_filename, 'w') {|f| f.write(html_str)}
Final Word
This seems harder than it should be. I even tried using the SaveOptions
setting save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION
to no avail.
In any case, while this solution is a bit kludgey for my liking, it works.