Deserializing JSON with fields that can be of multiple types with Serde [duplicate]
Asked Answered
P

2

6

I have some JSON text data with fields that can be either strings or arrays of strings. Here are four possible examples:

{
        "keya": "some string",
        "keyb": "some string"
}


{
        "keya": "some string",
        "keyb": ["some string", "some string"]
}

{
        "keya": ["some string", "some string"],
        "keyb": "some string"
}

{
        "keya": ["some string", "some string"],
        "keyb": ["some string", "some string"]
}

How can I create a type that allows me to deserialize such JSON text data using Serde?

Posthaste answered 29/11, 2022 at 14:38 Comment(1)
How much of example is this? Could your JSON have any structure or is it restricted to the options listed here?Gilligan
C
8

This answer predates the extra requirements given as comments by the OP, but is left as-is as the extra requirements makes it a separate question.


To handle trailing commas

The input data provided in the question is not valid JSON, due to the existence of trailing commas before every closing bracket }. If you must work with trailing commas, then the conventional serde_json crate doesn't suit your needs, and you may want to replace all usages of serde_json with crates supporting trailing commas like the json5 crate. The json5 provides an API that's similar to serde_json, so the following answer is still valid.

To handle fields that can be of multiple types

Handling JSON fields with multiple possible value types can be done with an enum that holds either a String or a Vec<String>, with the #[serde(untagged)] attribute. See Enum representations from the official documentation of serde for details about the attribute.

Full example:

use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
#[serde(untagged)]
enum StringOrStringVec {
    String(String),
    Vec(Vec<String>)
}

#[derive(Debug, Serialize, Deserialize)]
struct MyObj {
    keya: StringOrStringVec,
    keyb: StringOrStringVec,
}

fn main() {
    let input_json = r#"
        {
            "keya": "some string",
            "keyb": ["some string", "some string"]
        }
    "#;
    let my_obj: MyObj = serde_json::from_str(input_json).unwrap();
    println!("{:?}", my_obj);
    
    let input_json = r#"
        {
            "keya": ["some string", "some string"],
            "keyb": "some string"
        }
    "#;
    let my_obj: MyObj = serde_json::from_str(input_json).unwrap();
    println!("{:?}", my_obj);
}

Example output:

MyObj { keya: String("some string"), keyb: Vec(["some string", "some string"]) }
MyObj { keya: Vec(["some string", "some string"]), keyb: String("some string") }

See it in action on Rust Playground

Calamint answered 29/11, 2022 at 14:43 Comment(6)
Hi, the trailing commas were a mistake i've removed them, you might want to edit your answer?Posthaste
Thanks for the above .. my problem is that i don't always know what the key name will be - this is GraphQL relatedPosthaste
Then use a HashMap<String, StringOrStringVec> or simply a serde_json::Value instead of MyObj - also, that's a separate question.Wilt
I'm going to leave this answer as-is, since varying key names makes it a different question, which should be asked separately. The enum and attribute part of this answer still applies, and as @Wilt said, using a HashMap is a potential solution.Calamint
If i follow the StringOrStringVec example, I then can't iterate over the result when its an array. If i take a row, even when i know it is a Vec<Str> (its graphql so i can check the type) how can i implement to_iter?Posthaste
Use match on the enum to handle the case where it is a String and the case where it is a Vec<String>. If you can make sure that a value of type StringOrStringVec is of the variant StringOrStringVec::Vec, then you may panic! on the other match arm that can never be reached.Calamint
E
0

If you already use an either::Either type you can define "either string or vector of strings" type (eligible for JSON transparent deserialization) like this (https://docs.rs/either/1.10.0/either/serde_untagged/index.html):

struct with inner Either

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, Debug)]
#[serde(transparent)]
struct StringOrVec {
    #[serde(with = "either::serde_untagged")]
    inner: Either<String, Vec<String>>
};

Use it like this:


#[derive(Serialize, Deserialize, Debug)]
struct MyObj {
    keya: StringOrVec,
    keyb: StringOrVec,
}

The resulting output:

MyObj { keya: StringOrVec { inner: Left("some string") }, keyb: StringOrVec { inner: Right(["some string", "some string"]) } }
MyObj { keya: StringOrVec { inner: Right(["some string", "some string"]) }, keyb: StringOrVec { inner: Left("some string") } }

enum StringOrVec

Using Either is less convenient than the enum from the answer of @kotatsuyaki. That's why, I still prefer defining the type via enum like this:

use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
#[serde(untagged)]
enum StringOrVec {
    String(String),
    Vec(Vec<String>)
}

Output:

MyObj { keya: String("some string"), keyb: Vec(["some string", "some string"]) }
MyObj { keya: Vec(["some string", "some string"]), keyb: String("some string") }
Ethnocentrism answered 16/2, 2024 at 16:0 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.