How to convert Nokogiri Document object into JSON
Asked Answered
L

3

21

I have some parsed Nokogiri::XML::Document objects that I want to print as JSON.

I can go the route of making it a string, parsing it into a hash, with active-record or Crack and then Hash.to_json; but that is both ugly and depending on way too manay libraries.

Is there not a simpler way?

As per request in the comment, for example the XML <root a="b"><a>b</a></root> could be represented as JSON:

<root a="b"><a>b</a></root> #=> {"root":{"a":"b"}}
<root foo="bar"><a>b</a></root> #=> {"root":{"a":"b","foo":"bar"}}

That is what I get with Crack now too. And, sure, collisions between entities and child-tags are a potential problem, but I build most of the XML myself, so it is easiest for me to avoid these collisions alltogether :)

Lunneta answered 25/6, 2011 at 13:25 Comment(2)
What is the "correct"/desired JSON representation for <root a="b"><a>b</a></root>?Carol
It does not really matter. I will update the question to explain that.Lunneta
C
15

Here's one way to do it. As noted by my comment, the 'right' answer depends on what your output should be. There is no canonical representation of XML nodes in JSON, and hence no such capability is built into the libraries involved:

require 'nokogiri'
require 'json'
class Nokogiri::XML::Node
  def to_json(*a)
    {"$name"=>name}.tap do |h|
      kids = children.to_a
      h.merge!(attributes)
      h.merge!("$text"=>text) unless text.empty?
      h.merge!("$kids"=>kids) unless kids.empty?
    end.to_json(*a)
  end
end
class Nokogiri::XML::Document
  def to_json(*a); root.to_json(*a); end
end
class Nokogiri::XML::Text
  def to_json(*a); text.to_json(*a); end
end
class Nokogiri::XML::Attr
  def to_json(*a); value.to_json(*a); end
end

xml = Nokogiri::XML '<root a="b" xmlns:z="zzz">
  <z:a>Hello <b z:x="y">World</b>!</z:a>
</root>'
puts xml.to_json
{
  "$name":"root",
  "a":"b",
  "$text":"Hello World!",
  "$kids":[
    {
      "$name":"a",
      "$text":"Hello World!",
      "$kids":[
        "Hello ",
        {
          "$name":"b",
          "x":"y",
          "$text":"World",
          "$kids":[
            "World"
          ]
        },
        "!"
      ]
    }
  ]
}

Note that the above completely ignores namespaces, which may or may not be what you want.


Converting to JsonML

Here's another alternative that converts to JsonML. While this is a lossy conversion (it does not support comment nodes, DTDs, or namespace URLs) and the format is a little bit "goofy" by design (the first child element is at [1] or [2] depending on whether or not attributes are present), it does indicate namespace prefixes for elements and attributes:

require 'nokogiri'
require 'json'
class Nokogiri::XML::Node
  def namespaced_name
    "#{namespace && "#{namespace.prefix}:"}#{name}"
  end
end
class Nokogiri::XML::Element
  def to_json(*a)
    [namespaced_name].tap do |parts|
      unless attributes.empty?
        parts << Hash[ attribute_nodes.map{ |a| [a.namespaced_name,a.value] } ]
      end
      parts.concat(children.select{|n| n.text? ? (n.text=~/\S/) : n.element? })
    end.to_json(*a)
  end
end
class Nokogiri::XML::Document
  def to_json(*a); root.to_json(*a); end
end
class Nokogiri::XML::Text
  def to_json(*a); text.to_json(*a); end
end
class Nokogiri::XML::Attr
  def to_json(*a); value.to_json(*a); end
end

xml = Nokogiri::XML '<root a="b" xmlns:z="zzz">
  <z:a>Hello <b z:x="y">World</b>!</z:a>
</root>'
puts xml.to_json
#=> ["root",{"a":"b"},["z:a","Hello ",["b",{"z:x":"y"},"World"],"!"]]
Carol answered 27/6, 2011 at 15:44 Comment(0)
C
50

This one works for me:

Hash.from_xml(@nokogiri_object.to_xml).to_json

This method is using active support, so if you are not using rails then include active support core extensions manually:

require 'active_support/core_ext/hash'
Commanding answered 29/5, 2012 at 6:22 Comment(5)
Using that example, this is what is produced {"root":{"foo":"bar","a":"b"}}Conga
Hash.from_xml is from the gem 'activesupport'Minnow
this is the better answerOleaceous
To be specific, use require 'active_support/core_ext/hash' to require the relevant methodsGumption
OMG! Best answer!Asternal
C
15

Here's one way to do it. As noted by my comment, the 'right' answer depends on what your output should be. There is no canonical representation of XML nodes in JSON, and hence no such capability is built into the libraries involved:

require 'nokogiri'
require 'json'
class Nokogiri::XML::Node
  def to_json(*a)
    {"$name"=>name}.tap do |h|
      kids = children.to_a
      h.merge!(attributes)
      h.merge!("$text"=>text) unless text.empty?
      h.merge!("$kids"=>kids) unless kids.empty?
    end.to_json(*a)
  end
end
class Nokogiri::XML::Document
  def to_json(*a); root.to_json(*a); end
end
class Nokogiri::XML::Text
  def to_json(*a); text.to_json(*a); end
end
class Nokogiri::XML::Attr
  def to_json(*a); value.to_json(*a); end
end

xml = Nokogiri::XML '<root a="b" xmlns:z="zzz">
  <z:a>Hello <b z:x="y">World</b>!</z:a>
</root>'
puts xml.to_json
{
  "$name":"root",
  "a":"b",
  "$text":"Hello World!",
  "$kids":[
    {
      "$name":"a",
      "$text":"Hello World!",
      "$kids":[
        "Hello ",
        {
          "$name":"b",
          "x":"y",
          "$text":"World",
          "$kids":[
            "World"
          ]
        },
        "!"
      ]
    }
  ]
}

Note that the above completely ignores namespaces, which may or may not be what you want.


Converting to JsonML

Here's another alternative that converts to JsonML. While this is a lossy conversion (it does not support comment nodes, DTDs, or namespace URLs) and the format is a little bit "goofy" by design (the first child element is at [1] or [2] depending on whether or not attributes are present), it does indicate namespace prefixes for elements and attributes:

require 'nokogiri'
require 'json'
class Nokogiri::XML::Node
  def namespaced_name
    "#{namespace && "#{namespace.prefix}:"}#{name}"
  end
end
class Nokogiri::XML::Element
  def to_json(*a)
    [namespaced_name].tap do |parts|
      unless attributes.empty?
        parts << Hash[ attribute_nodes.map{ |a| [a.namespaced_name,a.value] } ]
      end
      parts.concat(children.select{|n| n.text? ? (n.text=~/\S/) : n.element? })
    end.to_json(*a)
  end
end
class Nokogiri::XML::Document
  def to_json(*a); root.to_json(*a); end
end
class Nokogiri::XML::Text
  def to_json(*a); text.to_json(*a); end
end
class Nokogiri::XML::Attr
  def to_json(*a); value.to_json(*a); end
end

xml = Nokogiri::XML '<root a="b" xmlns:z="zzz">
  <z:a>Hello <b z:x="y">World</b>!</z:a>
</root>'
puts xml.to_json
#=> ["root",{"a":"b"},["z:a","Hello ",["b",{"z:x":"y"},"World"],"!"]]
Carol answered 27/6, 2011 at 15:44 Comment(0)
L
0

If you are trying to convert a SOAP request to REST, this one works too:

require 'active_support/core_ext/hash'
require 'nokogiri'

xml_string = "<root a=\"b\"><a>b</a></root>"
doc = Nokogiri::XML(xml_string)
Hash.from_xml(doc.to_s)
Lota answered 22/7, 2020 at 3:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.