How to pass options to Rust's serde that can be accessed in Deserialize::deserialize()?
Asked Answered
V

1

17

For context: I'm writing a ray tracer in Rust but I'm struggling with finding a good way to load the scene in a filesystem-agnostic way. I'm using serde so that I don't have to invent my own file format (yet). The assets (image textures and mesh data) are stored separately to the scene file. The scene file only stores the paths of these files. Because the ray tracer itself is supposed to be a platform-agnostic library (I want to be able to compile it to WebAssembly for the Browser) the ray tracer itself has no idea about the file system. I intend to load the assets when deserializing the scene but this is causing me real problems now:

I need to pass an implementation of the file system interfacing code to serde that I can use in Deserialize::deserialize() but there doesn't seem to be any easy way to do that. I came up with a way to do it with generics, but I'm not happy about it.

Here's the way I'm doing it at the moment, stripped down as an MCVE (packages used are serde and serde_json):

The library code (lib.rs):

use std::marker::PhantomData;
use serde::{Serialize, Serializer, Deserialize, Deserializer};

pub struct Image {}

pub struct Texture<L: AssetLoader> {
    path: String,
    image: Image,
    phantom: PhantomData<L>,
}

impl<L: AssetLoader> Serialize for Texture<L> {
    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
        self.path.serialize(serializer)
    }
}

impl<'de, L: AssetLoader> Deserialize<'de> for Texture<L> {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Texture<L>, D::Error> {
        let path = String::deserialize(deserializer)?;

        // This is where I'd much rather have an instance of AssetLoader
        let image = L::load_image(&path);

        Ok(Texture {
            path,
            image,
            phantom: PhantomData,
        })
    }
}

pub trait AssetLoader {
    fn load_image(path: &str) -> Image;
    // load_mesh(), load_hdr(), ...
}

#[derive(Serialize, Deserialize)]
pub struct Scene<L: AssetLoader> {
    textures: Vec<Texture<L>>,
    // meshes, materials, lights, ...
}

The platform-specific code (main.rs):

use serde::{Serialize, Deserialize};
use assetloader_mcve::{AssetLoader, Image, Scene};

#[derive(Serialize, Deserialize)]
struct AssetLoaderImpl {}

impl AssetLoader for AssetLoaderImpl {
    fn load_image(path: &str) -> Image {
        println!("Loading image: {}", path);
        // Load the file from disk, the web, ...
        Image {}
    }
}

fn main() {
    let scene_str = r#"
    {
      "textures": [
        "texture1.jpg",
        "texture2.jpg"
      ]
    }
    "#;

    let scene: Scene<AssetLoaderImpl> = serde_json::from_str(scene_str).unwrap();

    // ...
}

What I don't like about this approach:

  • AssetLoaderImpl has to implement Serialize and Deserialize even though it's never (de-)serialized
  • I'm also using typetag which causes a compilation error because "deserialization of generic impls is not supported yet"
  • Caching assets will be very difficult because I don't have an instance of AssetLoaderImpl which could cache them in a member variable
  • Passing the AssetLoader type parameter around is getting unwieldy when Texture (or other assets) are nested deeper
  • It just doesn't feel right, mostly because of the PhantomData and the abuse of generics

This makes me think that I'm not going about this the right way but I'm struggling to come up with a better solution. I thought about using a mutable global variable in the library holding an instance of AssetLoader (maybe with lazy_static) but that also doesn't seem right. Ideally I'd pass an instance of AssetLoader (Box<dyn AssetLoader> probably) to serde when deserializing that I can access in the impl Deserialize for Texture. I haven't found any way to do that and I'd really appreciate if anybody could point me in the right direction.

Veto answered 7/8, 2020 at 16:58 Comment(0)
A
3

For passing in state to deserialization, you should use the DeserializeSeed trait. The documentation for DeserializeSeed addresses this use case:

DeserializeSeed is the stateful form of the Deserialize trait. If you ever find yourself looking for a way to pass data into a Deserialize impl, this trait is the way to do it.

Stateful AssetLoader

Like you said, passing AssetLoader as a generic parameter means you aren't able to store a cache (or other things) within it. Using DeserializeSeed, we're able to pass an instance of our AssetLoader struct, so let's modify AssetLoader's functions to give access to self:

pub trait AssetLoader {
    // Adding `&mut self` allows implementers to store data in a cache or 
    // whatever else they want to do.
    fn load_image(&mut self, path: &str) -> Image;
}

Now we can modify the AssetLoaderImpl to use this new definition:

struct AssetLoaderImpl {
    // cache, etc.
}

impl AssetLoader for AssetLoaderImpl {
    fn load_image(&mut self, path: &str) -> Image {
        // Access cache here.
        println!("Loading image: {}", path);
        Image {}
    }
}

Deserializing with the AssetLoader

Now we can use an AssetLoader during deserialization using the DeserializeSeed trait. Since we want this to work for any implementer of AssetLoader (allowing us to keep the filesystem logic separate from our deserialization logic), we still have to use a generic L: AssetLoader, but it no longer has to be attached to the Texture struct (or any structs containing Texture).

A good pattern is to introduce a separate TextureDeserializer type to handle the stateful deserialization, and implement DeserializeSeed on that struct. We can set the Value associated type to indicate that the deserialization should return a Texture.

pub struct Texture {
    path: String,
    image: Image,
}

struct TextureDeserializer<'a, L> {
    asset_loader: &'a mut L,
}

impl<'de, L> DeserializeSeed<'de> for TextureDeserializer<'_, L>
where
    L: AssetLoader,
{
    type Value = Texture;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    where
        D: Deserializer<'de>,
    {
        let path = String::deserialize(deserializer)?;

        let image = self.asset_loader.load_image(&path);

        Ok(Texture { path, image })
    }
}

Notice that the generic AssetLoader is no longer used by the `Texture directly.

We now have to define DeserializeSeed all the way up the chain to Scene's deserialization logic, since we will have the AssetLoader state through the whole process. This may seem very verbose, and it is unfortunate we can't just derive it with serde-derive, but the advantage of not having deserialization state tied up in the structs we are deserializing far outweighs the extra verbosity.

To deserialize a Vec<Texture>, we define a TexturesDeserializer:

struct TexturesDeserializer<'a, L> {
    asset_loader: &'a mut L,
}

impl<'de, L> DeserializeSeed<'de> for TexturesDeserializer<'_, L>
where
    L: AssetLoader,
{
    type Value = Vec<Texture>;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct TexturesVisitor<'a, L> {
            asset_loader: &'a mut L,
        }

        impl<'de, L> Visitor<'de> for TexturesVisitor<'_, L>
        where
            L: AssetLoader,
        {
            type Value = Vec<Texture>;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("a sequence of Textures")
            }

            fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
            where
                A: SeqAccess<'de>,
            {
                let mut textures = Vec::new();

                while let Some(texture) = seq.next_element_seed(TextureDeserializer {
                    asset_loader: self.asset_loader,
                })? {
                    textures.push(texture);
                }

                Ok(textures)
            }
        }

        deserializer.deserialize_seq(TexturesVisitor {
            asset_loader: self.asset_loader,
        })
    }
}

And a SceneDeserializer to deserialize the Scene itself:

pub struct Scene {
    textures: Vec<Texture>,
}

pub struct SceneDeserializer<'a, L> {
    pub asset_loader: &'a mut L,
}

impl<'de, L> DeserializeSeed<'de> for SceneDeserializer<'_, L>
where
    L: AssetLoader,
{
    type Value = Scene;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct SceneVisitor<'a, L> {
            asset_loader: &'a mut L,
        }

        impl<'de, L> Visitor<'de> for SceneVisitor<'_, L>
        where
            L: AssetLoader,
        {
            type Value = Scene;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("struct Scene")
            }

            fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
            where
                A: MapAccess<'de>,
            {
                if let Some(key) = map.next_key()? {
                    if key != "textures" {
                        return Err(de::Error::unknown_field(key, FIELDS));
                    }
                } else {
                    return Err(de::Error::missing_field("textures"));
                }

                let textures = map.next_value_seed(TexturesDeserializer {
                    asset_loader: self.asset_loader,
                })?;

                Ok(Scene { textures })
            }
        }

        const FIELDS: &[&str] = &["textures"];
        deserializer.deserialize_struct(
            "Scene",
            FIELDS,
            SceneVisitor {
                asset_loader: self.asset_loader,
            },
        )
    }
}

Note that these above DeserializeSeed definitions are very similar to what would be generated by #[derive(Deserialize)] (in the case of Scene) and what is already defined by serde for Vec<T>. However, defining these custom implementations allows state to be passed through the whole process into the deserialization of Texture.

Putting it all together

Now we can use serde_json to deserialize from our JSON input. Note that serde_json does not provide any helper methods for deserializing with DeserializeSeed (there has been discussion on this in the past), so we have to use the serde_json::Deserializer manually. Lucky for us, it's pretty simple to use:

fn main() {
    let mut asset_loader = AssetLoaderImpl {
        // cache, etc.
    };

    let scene_str = r#"
    {
      "textures": [
        "texture1.jpg",
        "texture2.jpg"
      ]
    }
    "#;

    let mut deserializer = serde_json::Deserializer::new(serde_json::de::StrRead::new(&scene_str));
    let scene = SceneDeserializer {
        asset_loader: &mut asset_loader,
    }.deserialize(&mut deserializer);

    // ...
}

Now we can deserialize a Scene with a stateful AssetLoader. This can be easily extended to include other resources for other members of Scene to access during deserialization as well. And best of all, it keeps the deserialized state decoupled from the actual deserialized structs, meaning you don't need to care about what AssetLoader was used outside of deserialization.

Anthropogeography answered 31/3, 2023 at 20:42 Comment(2)
That seems like a slightly nicer solution, although I'm kind of put off by the large amount of boilerplate codeVeto
It is the solution for passing data into deserialization, as is pointed out in the serde docs. It is a lot of boilerplate, but it's no different than manually implementing Deserialize. The difference is that there isn't a derive macro for DeserializeSeed like there is for Deserialize, so the boilerplate-heavy way is really the only option currently. You can always open an issue requesting a derive macro for it though, or find an existing issue about it and state your use case there.Anthropogeography

© 2022 - 2024 — McMap. All rights reserved.