Find comment or text nodes in a document fragment
Asked Answered
U

1

2

I have to clean up a Nokogiri::HTML::DocumentFragment document (remove comment nodes and text nodes which contain whitespace only). Here's an example:

html = "<p>paragraph</p><!-- comment --><p>paragraph</p>   <p>paragraph</p>"
doc = Nokogiri::HTML::DocumentFragment.parse html

The document fragment looks as you'd expect:

#(DocumentFragment:0x3fc65f9f5870 {
  name = "#document-fragment",
  children = [
    #(Element:0x3fc65f9f5064 { name = "p", children = [ #(Text "paragraph")] }),
    #(Comment " comment "),
    #(Element:0x3fc65f9f4f60 { name = "p", children = [ #(Text "paragraph")] }),
    #(Text "   "),
    #(Element:0x3fc65f9f4e48 { name = "p", children = [ #(Text "paragraph")] })
  ]
})

How can I find all comment or all text nodes in this document fragment?

The following don't work because it's not a full document but a document fragment:

doc.search('//text()')
doc.search('//comment()')
Unhallow answered 24/11, 2016 at 13:27 Comment(0)
U
3

Figured it out:

doc.search('.//text()')
doc.search('.//comment()')
Unhallow answered 24/11, 2016 at 13:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.