how does one remove <![CDATA[ ]]> tags from around text in XML using Hpricot?
Asked Answered
A

3

5

i just want the text out of there with out those tags. Does Hrpicot.XML have any methods for this?

Ardith answered 22/8, 2010 at 19:13 Comment(0)
A
7

use element.inner_text instead of #inner_html and it removes them for you

Ardith answered 22/8, 2010 at 19:24 Comment(1)
You probably will want a #inner_text.strip to get rid of the (almost guaranteed) extraneous whitespace.Wheels
P
2
doc.search("*") do |element|
    element.swap element.content if element.kind_of? Hpricot::CData
end
Princess answered 22/8, 2010 at 19:31 Comment(0)
P
1
doc = Hpricot::XML(open('http://www.cnn.com/.element/ssi/www/auto/2.0/video/xml/most_popular.xml'))
(doc/:cnn_video/:video).each do |status|
  ['tease_txt'].each do |el|
    puts "#{status.at(el).inner_text}"
  end
end

Example output (looks spammy but this is not spam!):

New Reno air crash video shows impact
Teen catches 800-pound gator
Resuming careers post 'don't ask' repeal
Creepy skirt peepers
Bus-sized satellite to hit Earth thi ...
'DWTS' cast hits ballroom for first time
What caused trainer's death at SeaWorld?
What led to Troy Davis clemency denial?

Prig answered 20/9, 2011 at 18:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.