Read XML file into struct
Asked Answered
O

1

17

I am trying to write a program that reads an XML file into a previously defined Rust struct.

Something like this:

<?xml version="1.0" encoding="UTF-8"?>
<note name="title">
  <body name="main_body">
    <layer content_type="something" count="99">
      <data id="13">
        Datacontent
      </data>
    </layer>
  </body>
</note>

Into this:

struct Note {
    name: String,
    Body: Body 
}

struct Body {
    name: String,
    layers: Vec<Layer>,
}

struct Layer {
    content_type: String,
    count: u8,
    data: Vec<Data>,
}

struct Data {
    id: u8,
    // Datacontent?
}

I looked at xml-rs because it currently appears to be the most popular XML library. Being new to Rust, I have a hard time figuring out how to perform this task.

Oliver answered 22/6, 2016 at 14:1 Comment(0)
N
35

Rust has great support for automatically generating (de)serialization code. There's the legacy rustc-serialize which requires very little setup. Then there's the serde crate which is a completely new (de)serialization framework that allows many formats and detailed custom configurations, but requires a little more initial setup.

I'm going to describe how to use serde + serde_xml_rs to deserialize the XML to the Rust-structs.

Add the crates to your Cargo.toml

We could either implement the deserialization code manually, or we can generate it automatically by using the serde_derive crate.

[dependencies]
serde_derive = "1.0"
serde = "1.0"
serde-xml-rs = "0.3.1"

Add annotations to your structs

Serde needs to know about your structs. To aid it and not generate code for every single struct in your project, you need to annotate the structs you want. The Debug derivation is so we can easily print the structs with println! to inspect whether everything worked. The Deserialize bound is what notifies serde to generate code. If you want to treat the contents of an xml tag as text, you need to "rename" the field that should contain the text to $value. The naming of $value was done very arbitrarily in the creation of the serde_xml_rs crate, but can never collide with an actual field, because field names can't contain $ signs.

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_xml_rs;

#[derive(Deserialize, Debug)]
struct Note {
    name: String,
    body: Body,
}

#[derive(Deserialize, Debug)]
struct Body {
    name: String,
    #[serde(rename="layer")]
    layers: Vec<Layer>,
}

#[derive(Deserialize, Debug)]
struct Layer {
    content_type: String,
    count: u8,
    data: Vec<Data>,
}

#[derive(Deserialize, Debug)]
struct Data {
    id: u8,
    #[serde(rename="$value")]
    content: String,
}

Turn a String containing xml into an object

Now comes the easy part. You call serde_xml::from_str on your string and you get either an error or a value of type Node:

fn main() {
    let note: Note = serde_xml_rs::deserialize(r##"
<?xml version="1.0" encoding="UTF-8"?>
<note name="title">
  <body name="main_body">
    <layer content_type="something" count="99">
      <data id="13">
        Datacontent
      </data>
    </layer>
  </body>
</note>
    "##.as_bytes()).unwrap();
    println!("{:#?}", note);
}
Narcosynthesis answered 22/6, 2016 at 15:36 Comment(2)
I had to add this to Cargo.toml, instead: serde-xml-rs = "0.2.1"Ka
The explanatory text for parsing refers to serde_xml::from_str whereas the code uses serde_xml_rs::deserialize. (Has one been edited without the other?) The deserialize method was removed in serde_xml_rs 0.3, but you can parse bytes in the current version 0.4.0 with from_reader instead (because &[u8] implements Read). But what with the shifting API and its general eagerness to choke on normal XML stuff such as a Unicode BOM or DOCTYPE declaration at the start of the file, serde_xml_rs doesn't seem particularly fit for purpose yet. serde_xml is abandoned.Daedal

© 2022 - 2024 — McMap. All rights reserved.