MongoDB Tree Model: Get all ancestors, Get all descendants
Asked Answered
H

3

8

I have an arbitrary tree structure.

Example data structure:

root
  |--node1
  |     |--node2
  |     |     |--leaf1
  |     |
  |     |--leaf2
  |
  |--node3
        |--leaf3

Each node and leaf has 2 properties: id and name.


The important queries:

1.: A leaf id is given. The query should return the whole path from root to that leaf, with all node's id and name properties.

It's not important if the return value is an sorted array of nodes or if it's an object where the nodes are nested.

Example: If the id of leaf2 is given, the query should return: root(id, name), node1(id, name), leaf2(id, name).


2.: Given any node id: Get the whole (sub)tree. Here it would be nice to retrieve a single object where each node has a children array.


Thoughts, trials and errors:

1.: First I tried to simply model the tree as a single JSON document, but then the query would become impossible: There's no way to find out at which nesting level the leaf is. And if I knew the whole path of ids from root to the leaf, I'd had to use a projection with multiple positional operators and that's not supported by MongoDB at the moment. Additionally it's not possible to index the leaf ids because the nesting can be infinite.

2.: Next idea was to use a flat data design, where each node has an array which contains the node's ancestor ids:

{
  id: ...,
  name: ...,
  ancestors: [ rootId, node1Id, ... ]
}

This way I'd have to do 2 queries, to get the whole path from root to some node or leaf, which is quite nice.

Questions:

If I choose data model 2.: How can I get the whole tree, or a subtree?

Getting all descendants is easy: find({ancestors:"myStartingNodeId"}). But those will of course be not sorted or nested.

Is there a way using the aggregation framework or a completely different data model to solve this problem?

Thank you!

Harber answered 25/7, 2015 at 22:33 Comment(0)
H
4

Here's what data structure I finally came up with. It's optimized for read queries. Some write queries (like moving subtrees) can be painful.

{
  id: "...",
  ancestors: ["parent_node_id", ..., "root_node_id"], // order is important!
  children: ["child1_id", "child2_id", ...]
}

Benefits:

  • Easy to get all documents for a sub-tree

  • Easy to get all documents from some node to the root

  • Easy to check if some document is parent/child/ancestor/descendant of some node

  • Children are sorted. Can be easily moved by changing the children array order

How to use it:

  • Get by ID: findOne({ id: "..." })

  • Get Parent: findOne({ children: "..." })

  • Get all Ancestors: first do Get by ID, then grab the ancestors array and look for all documents that match the given list of IDs

  • Get all Children: find({ 'ancestors.0': "..." })

  • Get all Descendants: find({ ancestors: "..." })

  • Get all Descendants up to x generations: find({ $and: [ {ancestors: "..."}, {ancestors: {$size: x}} ] })

Drawbacks:

  • The application code has to take care of correct order.

  • The application code has to build nested objects (maybe this is possible using MongoDB Aggregation framework).

  • Every insert must be done using 2 queries.

  • Moving whole sub-trees between nodes must update a lot of documents.

Harber answered 14/12, 2016 at 19:15 Comment(0)
H
4

MongoDB is not a graph database and doesn't provide graph traversal operations, so there is no direct solution.

You can use data model you have described in point 2. (nodes with ancestor list), the query find({ancestors:"myStartingNodeId"}) and sort/nest the results in your application code.

Another possibility is to use data model where _id (or some other field) represents full path, for example 'root.node1.node2'. Then graph queries can be transformed to substring queries and correct ordering can be achieved (I hope) just by sorting by this _id.


Update: btw. there are some tree structure patterns described in MongoDB docs: Model Tree Structures in MongoDB

Humiliation answered 25/7, 2015 at 22:51 Comment(3)
I know that MongoDB is no graph database, but I didn't want to use an additional database for that one use case. Maybe another option would be to store both data models, and then query the model that fits best? The point is: The tree will get created once and then will be modified every few months. Though there won't be any relevant overhead. What do you think about it?Harber
I don't like model 1 - nested documents - because it's too dynamic. Like you wrote, it's hard or impossible to write queries or index. If the graph grows too much then you can hit document size limit (16 MB). I have bad experience with processing large number of big documents, because of overhead in networking, de/serializing etc. But you can use it. I would store "primary data" in separate documents (model 2) but then you can aggregate them to "model 1" documents as a cache.Humiliation
I know modified my model a bit: I added an children array to each node, so I can easily go forwards and backwards within the tree, and for each node I can see instantly if there are child elements or if it's a leaf. Then concerning model 1: I will completely dismiss it. After the query someone has to create (in my case) Java objects from the data, and it really doesn't matter if it's done by a framework or if I do it myself. This way I can even enrich the data with additional information on the fly.Harber
H
4

Here's what data structure I finally came up with. It's optimized for read queries. Some write queries (like moving subtrees) can be painful.

{
  id: "...",
  ancestors: ["parent_node_id", ..., "root_node_id"], // order is important!
  children: ["child1_id", "child2_id", ...]
}

Benefits:

  • Easy to get all documents for a sub-tree

  • Easy to get all documents from some node to the root

  • Easy to check if some document is parent/child/ancestor/descendant of some node

  • Children are sorted. Can be easily moved by changing the children array order

How to use it:

  • Get by ID: findOne({ id: "..." })

  • Get Parent: findOne({ children: "..." })

  • Get all Ancestors: first do Get by ID, then grab the ancestors array and look for all documents that match the given list of IDs

  • Get all Children: find({ 'ancestors.0': "..." })

  • Get all Descendants: find({ ancestors: "..." })

  • Get all Descendants up to x generations: find({ $and: [ {ancestors: "..."}, {ancestors: {$size: x}} ] })

Drawbacks:

  • The application code has to take care of correct order.

  • The application code has to build nested objects (maybe this is possible using MongoDB Aggregation framework).

  • Every insert must be done using 2 queries.

  • Moving whole sub-trees between nodes must update a lot of documents.

Harber answered 14/12, 2016 at 19:15 Comment(0)
P
2

You can use graphLookup

Documentation: https://docs.mongodb.com/manual/reference/operator/aggregation/graphLookup/

Puffball answered 1/10, 2019 at 15:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.