Materialized Path is a method for representing hierarchy in SQL. Each node contains the path itself and all its ancestors (grandparent/parent/self
).
The django-treebeard
implementation of MP (docs):
Each step of the path is a fixed length for consistent performance.
Each node contains
depth
andnumchild
fields (fast reads at minimal cost to writes).The path field is indexed (with a standard b-tree index):
The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.
Implementation of get_ancestors
(link):
Match nodes with a path that contains a subset of the current path (steplen
is the fixed length of a step).
paths = [
self.path[0:pos]
for pos in range(0, len(self.path), self.steplen)[1:]
]
return get_result_class(self.__class__).objects.filter(
path__in=paths).order_by('depth')
Implementation of get_descendants
(link):
Match nodes with a depth greater than self and a path which starts with current path.
return cls.objects.filter(
path__startswith=parent.path,
depth__gte=parent.depth
).order_by(
'path'
)
Potential downsides to this approach:
- A deeply nested hierarchy will result in long paths, which can hurt read performance.
- Moving a node requires updating the path of all descendants.
Postgres includes the ltree
extension which provides a custom GiST index (docs).
I am not clear which benefits ltree
provides over django-treebeard
's implementation. This article argues that only ltree
can answer the get_ancestors
question, but as demonstrated earlier, figuring out the ancestors (or descendants) of a node is trivial.
[As an aside, I found this Django ltree
library - https://github.com/mariocesar/django-ltree].
Both approaches use an index (django-treebeard
uses b-tree, ltree
uses a custom GiST). I am interested in understanding the implementation of the ltree
GiST and why it might be a more efficient index than a standard b-tree for this particular use case (materialized path).
Additional links
What are the options for storing hierarchical data in a relational database?