How do I get Nokogiri to add the right XML encoding?
Asked Answered
H

3

21

I have created a xml doc with Nokogiri: Nokogiri::XML::Document

The header of my file is <?xml version="1.0"?> but I'd expect to have <?xml version="1.0" encoding="UTF-8"?>. Is there any options I could use so the encoding appears ?

Hilarius answered 7/12, 2010 at 21:19 Comment(0)
V
43

Are you using Nokogiri XML Builder? You can pass an encoding option to the new() method:

new(options = {})

Create a new Builder object. options are sent to the top level Document that is being built.

Building a document with a particular encoding for example:

  Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
    ...
  end

Also this page says you can do the following (when not using Builder):

doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')

Presumably you could change 'EUC-JP' to 'UTF-8'.

Vet answered 7/12, 2010 at 22:21 Comment(1)
It's funny that this has been one of my most highly upvoted answers. I have never used Nokogiri or Ruby, just XML and google search.Vet
C
5

When parsing the doc you can set the encoding like this:

doc = Nokogiri::XML::Document.parse(xml_input, nil, "UTF-8")

For me that returns <?xml version="1.0" encoding="UTF-8"?>

Capacious answered 7/12, 2010 at 21:55 Comment(1)
in fact, I do not parse an existing file but create a new one using Nokogiri::XML::Document.newHilarius
A
0

If you're not using Nokogiri::XML::Builder but rather creating a document object directly, you can just set the encoding with Document#encoding=:

doc = Nokogiri::XML::Document.new
# => #<Nokogiri::XML::Document:0x1180 name="document">
puts doc.to_s
# <?xml version="1.0"?>
doc.encoding = 'UTF-8'
puts doc.to_s
# <?xml version="1.0" encoding="UTF-8"?>
Aloysius answered 9/3, 2021 at 22:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.