Deserialize JSON list of hex strings as bytes
Asked Answered
H

2

5

Iʼm trying to read a JSON stream, part of which looks like

  "data": [
    "c1a8f800a4393e0cacd05a5bc60ae3e0",
    "bbac4013c1ca3482155b584d35dac185",
    "685f237d4fcbd191c981b94ef6986cde",
    "a08898e81f1ddb6612aa12641b856aa9"
  ]

(there are more entries in the data list and each each entry is longer, but this should be illustrative; both the length of the list and the length of each hex string is known at compile time)

Ideally Iʼd want a single [u8; 64] (the actual size is known at compile time), or failing that, a Vec<u8>, but I imagine itʼs easier to deseriazie it as a Vec<[u8; 16]> and merge them later. However, Iʼm having trouble doing even that.

The hex crate has a way to deserialize a single hex string as a Vec or array of u8, but I canʼt figure out how to tell Serde to do that for each entry of the list. Is there a simple way to do that Iʼm overlooking, or do I need to write my own list deserializer?

Hermaphrodite answered 7/1, 2022 at 0:6 Comment(4)
Your arrays have length 32, not 16.Creel
@ChayimFriedman They do! 32 hex digits = 16 bytes after deseriliazation.Hermaphrodite
Oops, my mistake.Creel
You can easily wrap the deserialize() function of hex::serde with a newtype Hash the implements Deserialize.Creel
S
10

Serde has the power to use serializers and deserializers from other crates in a nested fashion using #[serde(with = "...")]. Since hex has a serde feature, this can be easily done.

Here is a simple example using serde_json and hex.

cargo.toml

serde = { version =  "1.0.133",  features = ["derive"] }
serde_json = "1.0.74"
hex = { version = "0.4", features = ["serde"] }

main.rs

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize, Debug)]
struct MyData {
    data: Vec<MyHex>,
}

#[derive(Serialize, Deserialize, Debug)]
#[serde(transparent)]
struct MyHex {
    #[serde(with = "hex::serde")]
    hex: Vec<u8>,
}


fn main() -> Result<()> {
    let data = r#"
    {
        "data": [
            "c1a8f800a4393e0cacd05a5bc60ae3e0",
            "bbac4013c1ca3482155b584d35dac185",
            "685f237d4fcbd191c981b94ef6986cde",
            "a08898e81f1ddb6612aa12641b856aa9"
        ]
    }
    "#';

    let my_data: MyData = serde_json::from_str(data)?;

    println!("{:?}", my_data); // MyData { data: [MyHex { hex: [193, 168, 248, 0, 164, 57, 62, 12, 172, 208, 90, 91, 198, 10, 227, 224] }, MyHex { hex: [187, 172, 64, 19, 193, 202, 52, 130, 21, 91, 88, 77, 53, 218, 193, 133] }, MyHex { hex: [104, 95, 35, 125, 79, 203, 209, 145, 201, 129, 185, 78, 246, 152, 108, 222] }, MyHex { hex: [160, 136, 152, 232, 31, 29, 219, 102, 18, 170, 18, 100, 27, 133, 106, 169] }] }


    return Ok(());
}

Serde With Reference
Hex Serde Reference

Sevenfold answered 7/1, 2022 at 1:7 Comment(1)
Ah, of course! I should have thought of deserializing to a custom type like that; thanks.Hermaphrodite
T
4

In some performance-critical situations, it may be advantageous to implement your own deserializer and use it with serde(deserialize_with = …).

If you go that route, you have to:

  • Implement a deserialziation function for data
  • Implement a visitor which takes a sequence of precisely 4 blocks
  • These blocks then need another deserialization function that turns a string into [u8; 16]
use serde::{Deserialize, Deserializer};

#[derive(Deserialize, Debug)]
pub struct Foo {
    #[serde(deserialize_with = "deserialize_array_of_hex")]
    pub data: [u8; 64],
}

fn deserialize_array_of_hex<'de, D: Deserializer<'de>>(d: D) -> Result<[u8; 64], D::Error> {
    use serde::de;
    use std::fmt;

    #[derive(serde_derive::Deserialize)]
    struct Block(#[serde(with="hex::serde")] [u8; 16]);

    struct VecVisitor;
    impl<'de> de::Visitor<'de> for VecVisitor {
        type Value = [u8; 64];

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            write!(formatter, "a list containing 4 hex strings")
        }

        fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
        where
            A: de::SeqAccess<'de>,
        {
            let mut data = [0; 64];
            for i in 0..4 {
                let block = seq
                    .next_element::<Block>()?
                    .ok_or_else(|| de::Error::custom("too short"))?;
                for j in 0..16 {
                    data[i * 16 + j] = block.0[j];
                }
            }
            if seq.next_element::<String>()?.is_some() {
                return Err(de::Error::custom("too long"))
            }
            Ok(data)
        }
    }

    d.deserialize_seq(VecVisitor)
}

Full example playground. One could also implement DeserializeSeed for Block and only pass a reference to the [u8; 64] to be written into, but I suspect that copying 16 bytes is negligibly cheap. (Edit: I measured it, it turns out about 10% faster than the other two solutions in this post (when using hex::decode_to_slice in visit_str).)


Actually, nevermind having to implement your own deserializer for performance, the above solution is about equal in performance to

use serde::Deserialize;

#[derive(Deserialize, Debug)]
#[serde(from = "MyDataPre")]
pub struct MyData {
    pub data: [u8; 64],
}

impl From<MyDataPre> for MyData {
    fn from(p: MyDataPre) -> Self {
        let mut data = [0; 64];
        for b in 0..4 {
            for j in 0..16 {
                data[b * 16 + j] = p.data[b].0[j];
            }
        }
        MyData { data }
    }
}

#[derive(Deserialize, Debug)]
pub struct MyDataPre {
    data: [MyHex; 4],
}

#[derive(Deserialize, Debug)]
struct MyHex (#[serde(with = "hex::serde")] [u8; 16]);

The trick here is the use of #[serde(from = …)], which allows you to deserialize to some other struct, and then tell serde how to convert that to the struct you originally wanted.

Taryn answered 7/1, 2022 at 2:11 Comment(2)
Thanks! Your example playground link doesnʼt seem to be working and I donʼt have time to look into that now (I havenʼt written a deserializer before so the debugging will take longer than usual), but this looks like what I want from your description so Iʼll look at it more tomorrow.Hermaphrodite
Uff, I pasted the wrong playground link and had to rewrite some of the code… I added a code example to the post, too, just to be safe.Taryn

© 2022 - 2024 — McMap. All rights reserved.