Dealing with namespaces while parsing XML in Go
Asked Answered
F

3

6

I am trying to parse a piece if XML in Go:

package main

import (
    "encoding/xml"
    "fmt"
)

type XML struct {
    Foo string `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    fmt.Printf("foo: %s\n", x.Foo)
}

This outputs:

foo: B

While I expected it to produce:

foo: A

How do I get content of the first foo tag (i.e. one without namespace)?

Foothold answered 3/1, 2013 at 19:33 Comment(1)
I think this may be a bug in the Go itself. codereview.appspot.com/6868044 and code.google.com/p/go/issues/detail?id=3526 . View the goplay example in the second link, I believe it's the reverse issue you face but if it fails in reverse it likely fails in your direction.Hebbe
T
10

I don't think the xml decoder can specify an element should have no namespace with struct tags. But I do know that it can retrieve the information about the namespaces for you and you could then post process the data after to get the same result:

package main

import (
    "encoding/xml"
    "fmt"
)

type Foo struct {
    XMLName xml.Name
    Data string `xml:",chardata"`
}

type XML struct {
    Foo []Foo `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    //fmt.Printf("foo: %#v\n", x)
    for _, el := range x.Foo {
       if el.XMLName.Space == "" {
          fmt.Printf("non namespaced foo %q", el.Data)
      }
    }
}

http://play.golang.org/p/aDEFPmHPc0

Tedmann answered 4/1, 2013 at 7:18 Comment(0)
M
3

You have two values in series in your xml document. You only have room for one value in your struct. The xml parser is parsing the first one and then overwriting it with the second one.

Change Foo to a slice in the struct and then you'll get both values.

http://play.golang.org/p/BRgsuMQ7rK

package main

import (
    "encoding/xml"
    "fmt"
)

type XML struct {
    Foo []string `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    fmt.Printf("foo: %s\n", x.Foo[0])
    fmt.Printf("both: %v\n", x.Foo)
}
Misfit answered 3/1, 2013 at 19:44 Comment(1)
Is it possible to get just one without namespace? (example is contrived, in real thing there is a lot of tags with various namespaces in unspecified order, and I need just one that does not have any)Foothold
W
1

The xml:"foo" selector syntax takes an optional namespace xml:"ns foo", but the problem is that it doesn't support a way to select for no namespace.

One fix is to use xml.Decoder.DefaultSpace to simply assign a namespace to non-namespaced tags that you can now select using xml:"<ns> <tag>" syntax:

https://play.golang.org/p/1UggvqLFT9x

import (
    "encoding/xml"
    "strings"
    "fmt"
)

type Doc struct {
    Foo string `xml:"_ foo"` // <-- <foo> will now be <_:foo>
    NsFoo string `xml:"ns foo"`
}

var input = `<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`

func main() {
    decoder := xml.NewDecoder(strings.NewReader(input))
    decoder.DefaultSpace = "_"

    doc := &Doc{}
    decoder.Decode(doc)

    fmt.Printf("<foo>: %#v\n", doc.Foo)
    fmt.Printf("<ns:foo>: %#v\n", doc.NsFoo)

}

Prints:

<foo>: A
<ns:foo>: B
Wheezy answered 11/6, 2020 at 6:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.