How do I express mutually recursive data structures in safe Rust?
Asked Answered
D

4

46

I am trying to implement a scenegraph-like data structure in Rust. I would like an equivalent to this C++ code expressed in safe Rust:

struct Node
{
    Node* parent; // should be mutable, and nullable (no parent)
    std::vector<Node*> child;

    virtual ~Node() 
    { 
        for(auto it = child.begin(); it != child.end(); ++it)
        {
            delete *it;
        }
    }

    void addNode(Node* newNode)
    {
        if (newNode->parent)
        {
            newNode->parent.child.erase(newNode->parent.child.find(newNode));
        }
        newNode->parent = this;
        child.push_back(newNode);
    }
}

Properties I want:

  • the parent takes ownership of its children
  • the nodes are accessible from the outside via a reference of some kind
  • events that touch one Node can potentially mutate the whole tree
Danby answered 22/3, 2016 at 23:32 Comment(3)
What is your question?Instinctive
The ownership is not expressed through types in your current example, did you meant to write std::vector<std::unique_ptr<Node>> children;? Also, why is the destructor virtual? Is this supposed to be an interface?Hokanson
The node owns its children and is owned by the parent. I didn't want to use more templates than nesessary.Danby
C
74

Rust tries to ensure memory safety by forbidding you from doing things that might potentially be unsafe. Since this analysis is performed at compile-time, the compiler can only reason about a subset of manipulations that are known to be safe.

In Rust, you could easily store either a reference to the parent (by borrowing the parent, thus preventing mutation) or the list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe). This is especially problematic for your implementation of addNode, which requires mutable access to the given node's parent. However, if you store the parent pointer as a mutable reference, then, since only a single mutable reference to a particular object may be usable at a time, the only way to access the parent would be through a child node, and you'd only be able to have a single child node, otherwise you'd have two mutable references to the same parent node.

If you want to avoid unsafe code, there are many alternatives, but they'll all require some sacrifices.


The easiest solution is to simply remove the parent field. We can define a separate data structure to remember the parent of a node while we traverse a tree, rather than storing it in the node itself.

First, let's define Node:

#[derive(Debug)]
struct Node<T> {
    data: T,
    children: Vec<Node<T>>,
}

impl<T> Node<T> {
    fn new(data: T) -> Node<T> {
        Node { data: data, children: vec![] }
    }

    fn add_child(&mut self, child: Node<T>) {
        self.children.push(child);
    }
}

(I added a data field because a tree isn't super useful without data at the nodes!)

Let's now define another struct to track the parent as we navigate:

#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
    node: &'a Node<T>,
    parent: Option<&'a NavigableNode<'a, T>>,
}

impl<'a, T> NavigableNode<'a, T> {
    fn child(&self, index: usize) -> NavigableNode<T> {
        NavigableNode {
            node: &self.node.children[index],
            parent: Some(self)
        }
    }
}

impl<T> Node<T> {
    fn navigate<'a>(&'a self) -> NavigableNode<T> {
        NavigableNode { node: self, parent: None }
    }
}

This solution works fine if you don't need to mutate the tree as you navigate it and you can keep the parent NavigableNode objects around (which works fine for a recursive algorithm, but doesn't work too well if you want to store a NavigableNode in some other data structure and keep it around). The second restriction can be alleviated by using something other than a borrowed pointer to store the parent; if you want maximum genericity, you can use the Borrow trait to allow direct values, borrowed pointers, Boxes, Rc's, etc.


Now, let's talk about zippers. In functional programming, zippers are used to "focus" on a particular element of a data structure (list, tree, map, etc.) so that accessing that element takes constant time, while still retaining all the data of that data structure. If you need to navigate your tree and mutate it during the navigation, while retaining the ability to navigate up the tree, then you could turn a tree into a zipper and perform the modifications through the zipper.

Here's how we could implement a zipper for the Node defined above:

#[derive(Debug)]
struct NodeZipper<T> {
    node: Node<T>,
    parent: Option<Box<NodeZipper<T>>>,
    index_in_parent: usize,
}

impl<T> NodeZipper<T> {
    fn child(mut self, index: usize) -> NodeZipper<T> {
        // Remove the specified child from the node's children.
        // A NodeZipper shouldn't let its users inspect its parent,
        // since we mutate the parents
        // to move the focused nodes out of their list of children.
        // We use swap_remove() for efficiency.
        let child = self.node.children.swap_remove(index);

        // Return a new NodeZipper focused on the specified child.
        NodeZipper {
            node: child,
            parent: Some(Box::new(self)),
            index_in_parent: index,
        }
    }

    fn parent(self) -> NodeZipper<T> {
        // Destructure this NodeZipper
        let NodeZipper { node, parent, index_in_parent } = self;

        // Destructure the parent NodeZipper
        let NodeZipper {
            node: mut parent_node,
            parent: parent_parent,
            index_in_parent: parent_index_in_parent,
        } = *parent.unwrap();

        // Insert the node of this NodeZipper back in its parent.
        // Since we used swap_remove() to remove the child,
        // we need to do the opposite of that.
        parent_node.children.push(node);
        let len = parent_node.children.len();
        parent_node.children.swap(index_in_parent, len - 1);

        // Return a new NodeZipper focused on the parent.
        NodeZipper {
            node: parent_node,
            parent: parent_parent,
            index_in_parent: parent_index_in_parent,
        }
    }

    fn finish(mut self) -> Node<T> {
        while let Some(_) = self.parent {
            self = self.parent();
        }

        self.node
    }
}

impl<T> Node<T> {
    fn zipper(self) -> NodeZipper<T> {
        NodeZipper { node: self, parent: None, index_in_parent: 0 }
    }
}

To use this zipper, you need to have ownership of the root node of the tree. By taking ownership of the nodes, the zipper can move things around in order to avoid copying or cloning nodes. When we move a zipper, we actually drop the old zipper and create a new one (though we could also do it by mutating self, but I thought it was clearer that way, plus it lets you chain method calls).


If the above options are not satisfactory, and you must absolutely store the parent of a node in a node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and Weak<RefCell<Node<T>>> to the children. Rc enables shared ownership, but adds overhead to perform reference counting at runtime. RefCell enables interior mutability, but adds overhead to keep track of the active borrows at runtime. Weak is like Rc, but it doesn't increment the reference count; this is used to break reference cycles, which would prevent the reference count from dropping to zero, causing a memory leak. See DK.'s answer for an implementation using Rc, Weak and RefCell.

Corrinacorrine answered 23/3, 2016 at 2:47 Comment(6)
+1! I had seen zippers before in the context of immutable data structures and never even thought about using them in the context of mutable ones!Hokanson
Wouldn't storing both the children and the parent as Rc cause a cyclic reference count? Isn't a Weak pointer better suited to storing a reference to the parent in a child to prevent this?Faraway
Also, just to clarify as I'm interested in trying out the zipper technique, the tree would consist of these zipper-wrappers instead of the nodes themselves? And if the parent zipper is optional, then shouldn't the id_in_parent be optional as well?Faraway
The zipper owns the nodes, so in a way, the zipper is the tree. Indeed, when the parent is None, the index_in_parent is irrelevant. It would make sense to bundle the parent and the index together in a single Option (either you have neither, or you have both).Polio
@FrancisGagné Right, so if I wanted to create a structure TreeWithMetadatathat contained the tree as one field and information about the tree in other fields, it would look something like TreeWithMetadata{ tree: TreeZipper<T>, .. } and not TreeWithMetadata{ tree: Node<T>, .. }. So the structure that owns the tree would actually own the TreeZipper representing the tree. The child and parent methods would then shift focus to either the parent of the current node or one of the children. These could be used to traverse the tree.Faraway
The zipper involves allocation or deallocation when traversing, thus slower than using arena.Achromatize
C
30

The problem is that this data structure is inherently unsafe; it doesn't have a direct equivalent in Rust that doesn't use unsafe. This is by design.

If you want to translate this into safe Rust code, you need to be more specific about what, exactly, you want from it. I know you listed some properties above, but often people coming to Rust will say "I want everything I have in this C/C++ code", to which the direct answer is "well, you can't."

You're also, unavoidably, going to have to change how you approach this. The example you've given has pointers without any ownership semantics, mutable aliasing, and cycles; all of which Rust will not allow you to simply ignore like C++ does.

The simplest solution is to just get rid of the parent pointer, and maintain that externally (like a filesystem path). This also plays nicely with borrowing because there are no cycles anywhere:

pub struct Node1 {
    children: Vec<Node1>,
}

If you need parent pointers, you could go half-way and use Ids instead:

use std::collections::BTreeMap;

type Id = usize;

pub struct Tree {
    descendants: BTreeMap<Id, Node2>,
    root: Option<Id>,
}

pub struct Node2 {
    parent: Id,
    children: Vec<Id>,
}

The BTreeMap is effectively your "address space", bypassing borrowing and aliasing issues by not directly using memory addresses.

Of course, this introduces the problem of a given Id not being tied to the particular tree, meaning that the node it belongs to could be destroyed, and now you have what is effectively a dangling pointer. But, that's the price you pay for having aliasing and mutation. It's also less direct.

Or, you could go whole-hog and use reference-counting and dynamic borrow checking:

use std::cell::RefCell;
use std::rc::{Rc, Weak};

// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);

pub type WeakNode3 = Weak<RefCell<Node3>>;

pub struct Node3_ {
    parent: Option<WeakNode3>,
    children: Vec<Node3>,
}

impl Node3 {
    pub fn add(&self, node: Node3) {
        // No need to remove from old parent; move semantics mean that must have
        // already been done.
        (node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
        self.children.push(node);
    }
}

Here, you'd use Node3 to transfer ownership of a node between parts of the tree, and WeakNode3 for external references. Or, you could make Node3 cloneable and add back the logic in add to make sure a given node doesn't accidentally stay a child of the wrong parent.

This is not strictly better than the second option, because this design absolutely cannot benefit from static borrow-checking. The second one can at least prevent you from mutating the graph from two places at once at compile time; here, if that happens, you'll just crash.

The point is: you can't just have everything. You have to decide which operations you actually need to support. At that point, it's usually just a case of picking the types that give you the necessary properties.

Cappuccino answered 23/3, 2016 at 2:31 Comment(2)
while I agree that the example provided by OP is unsafe but note that parent pointer is not unsafe by itself since it's not the owner of the parent memory region. The problem is with vector<Node*> which should've been unique_ptr or shared_ptr atleast.Lamarckism
The second solution doesn't compile for meGrassland
I
18

In certain cases, you can also use an arena. An arena guarantees that values stored in it will have the same lifetime as the arena itself. This means that adding more values will not invalidate any existing lifetimes, but moving the arena will. Thus, such a solution is not viable if you need to return the tree.

This solves the problem by removing the ownership from the nodes themselves.

Here's an example that also uses interior mutability to allow a node to be mutated after it is created. In other cases, you can remove this mutability if the tree is constructed once and then simply navigated.

use std::{
    cell::{Cell, RefCell},
    fmt,
};
use typed_arena::Arena; // 1.6.1

struct Tree<'a, T: 'a> {
    nodes: Arena<Node<'a, T>>,
}

impl<'a, T> Tree<'a, T> {
    fn new() -> Tree<'a, T> {
        Self {
            nodes: Arena::new(),
        }
    }

    fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
        self.nodes.alloc(Node {
            data,
            tree: self,
            parent: Cell::new(None),
            children: RefCell::new(Vec::new()),
        })
    }
}

struct Node<'a, T: 'a> {
    data: T,
    tree: &'a Tree<'a, T>,
    parent: Cell<Option<&'a Node<'a, T>>>,
    children: RefCell<Vec<&'a Node<'a, T>>>,
}

impl<'a, T> Node<'a, T> {
    fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
        let child = self.tree.new_node(data);
        child.parent.set(Some(self));
        self.children.borrow_mut().push(child);
        child
    }
}

impl<'a, T> fmt::Debug for Node<'a, T>
where
    T: fmt::Debug,
{
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{:?}", self.data)?;
        write!(f, " (")?;
        for c in self.children.borrow().iter() {
            write!(f, "{:?}, ", c)?;
        }
        write!(f, ")")
    }
}

fn main() {
    let tree = Tree::new();
    let head = tree.new_node(1);
    let _left = head.add_node(2);
    let _right = head.add_node(3);

    println!("{:?}", head); // 1 (2 (), 3 (), )
}
Instinctive answered 14/7, 2017 at 18:38 Comment(0)
G
1

TL;DR: DK.'s second version doesn't compile because parent has another type than self.0, fix it by converting it to a WeakNode. Also, in the line directly below, "self" doesn't have a "children" attribute but self.0 has.


I corrected the version of DK. so it compiles and works. Here is my Code:

dk_tree.rs

use std::cell::RefCell;
use std::rc::{Rc, Weak};

// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);


pub struct WeakNode(Weak<RefCell<Node_>>);

struct Node_ {
    parent: Option<WeakNode>,
    children: Vec<Node>,
}

impl Node {
    pub fn new() -> Self {
        Node(Rc::new(RefCell::new(Node_ {
            parent: None,
            children: Vec::new(),
        })))
    }
    pub fn add(&self, node: Node) {
        // No need to remove from old parent; move semantics mean that must have
        // already been done.
        node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
        self.0.borrow_mut().children.push(node);
    }
    // just to have something visually printed
    pub fn to_str(&self) -> String {
        let mut result_string = "[".to_string();
        for child in self.0.borrow().children.iter() {
            result_string.push_str(&format!("{},", child.to_str()));
        }
        result_string += "]";
        result_string
    }
}

and then the main function in main.rs:

mod dk_tree;

use crate::dk_tree::{Node};


fn main() {
    let root = Node::new();
    root.add(Node::new());
    root.add(Node::new());
    let inner_child = Node::new();
    inner_child.add(Node::new());
    inner_child.add(Node::new());
    root.add(inner_child);
    let result = root.to_str();
    println!("{result:?}");
}

The reason I made the WeakNode be more like the Node is to have an easier conversion between the both

Grassland answered 10/12, 2022 at 5:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.