mixed XML decoding in golang preserving order
Asked Answered
B

1

7

I need to extract offers from an XML, but taking into consideration nodes order:

<items>
  <offer/>
  <product>
    <offer/>
    <offer/>
  </product>
  <offer/>
  <offer/>
</items>

The following struct would decode the values, but into two different slices, which will cause loss of original order:

type Offers struct {
    Offers   []offer `xml:"items>offer"`
    Products []offer `xml:"items>product>offer"`
}

Any ideas?

Biddy answered 24/8, 2015 at 16:19 Comment(2)
Do not unmarshal the XML: Decode it element for element with a xml.Decoder by calling the Token method. (Sorry, I do not have an example at hand.)Battologize
...or use XPath to query your document for all nodes named "offer" located under the element "items". XPath works reasonably OK on short-to-mid-sized documents, otherwise I'd go with what @Battologize proposed.Limitation
P
9

One way would be to overwrite the UnmarshalXML method. Let's say our input looks like this:

<doc>
    <head>My Title</head>
    <p>A first paragraph.</p>
    <p>A second one.</p>
</doc>

We want to deserialize the document and preserve the order of the head and paragraphs. For order we will need a slice. To accommodate both head and p, we will need an interface. We could define our document like this:

type Document struct {
    XMLName  xml.Name `xml:"doc"`
    Contents []Mixed  `xml:",any"`
}

The ,any annotation will collect any element into Contents. It is a Mixed type, which we need to define as a type:

type Mixed struct {
    Type  string      // just keep "head" or "p" in here
    Value interface{} // keep the value, we could use string here, too
}

We need more control over the deserialization process, so we turn Mixed into an xml.Unmashaler by implementing UnmarshalXML. We decide on the code path based on the name of the start element, e.g. head or p. Here, we only populate our Mixed struct with some values, but you can basically do anything here:

func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    switch start.Name.Local {
    case "head", "p":
        var e string
        if err := d.DecodeElement(&e, &start); err != nil {
            return err
        }
        m.Value = e
        m.Type = start.Name.Local
    default:
        return fmt.Errorf("unknown element: %s", start)
    }
    return nil
}

Putting it all together, usage of the above structs could look like this:

func main() {
    s := `
    <doc>
        <head>My Title</head>
        <p>A first paragraph.</p>
        <p>A second one.</p>
    </doc>
    `

    var doc Document
    if err := xml.Unmarshal([]byte(s), &doc); err != nil {
        log.Fatal(err)
    }
    fmt.Printf("#%v", doc)
}   

Which would print.

#{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]}

We preserved order and kept some type information. Instead of a single type, like Mixed you could use many different types for the deserialization. The cost of this approach is that your container - here the Contents field of the document - is an interface. To do anything element-specific, you'll need a type assertion or some helper method.

Complete code on play: https://play.golang.org/p/fzsUPPS7py

Pocosin answered 9/8, 2016 at 12:38 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.