A better way to remove blank lines after Nokogiri Node removal
Asked Answered
S

3

9

Perhaps this is nitpicky, but I have to ask.

I'm using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove leaves blank lines in the XML. I'm currently using a regex to get rid of the blank lines. Is there some built-in Nokogiri method I should be using?

Here's what I have:

require 'Nokogiri'
io_path = "/path/to/metadata.xml"
io = File.read(io_path)
document = Nokogiri::XML(io)
document.xpath('//artwork_files', '//tracks', '//previews').remove

# write to file and remove blank lines with a regular expression
File.open(io_path, 'w') do |x|
  x << document.to_s.gsub(/\n\s+\n/, "\n")
end
Sisneros answered 24/11, 2009 at 20:5 Comment(2)
I don't know a method using Nokogiri, but I can tell you that your regular expression is wrong. It will only remove single blank lines, not multiple consecutive blank lines. I think this will work better: gsub(/^\s*\n/, "")Kerf
Ah. Good point. So far I've only had to deal with single blank lines (even if the node takes up multiple lines), so it works fine. Perhaps if I alter the script to remove multiple lines it will no longer work. Thanks for pointing this out.Sisneros
I
7

There is not built-in methods, but we can add one

class Nokogiri::XML::Document
  def remove_empty_lines!
    self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
  end
end
Ibson answered 6/12, 2009 at 19:26 Comment(1)
It is not working for me.. :( I tried here, but it did not remove those blank lines,so I removed that part from the answer. Any help ?Cockerham
L
3

This removed blank lines for me;

doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)
Lenoralenore answered 19/1, 2017 at 11:12 Comment(1)
You could just use ...each(&:remove) since you're not interested in the final result.Sidwel
P
1

Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.

One rather messy solution I found was to reparse the document:

xml = Nokogiri::XML.parse xml.to_xml

Now adjacent text nodes will be merged and you can do regexes on them.

But this looks like it's probably a better option:

https://github.com/tobym/nokogiri-pretty

Positronium answered 20/11, 2014 at 16:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.