For a while now I've been wrestling with how best to handle hierarchies in SQL. Frustrated by the limitations of adjacency lists and the complexity of MPTT/nested sets, I began thinking about simply storing key paths instead, as a simple node_key/node_key/...
string. I decided to compile the pros and cons of the three techniques:
Number of calls required to create/delete/move a node:
- Adjacency = 1
- MPTT = 3
- Path = 1 (Replace old node path with new node path across all nodes that contain that path)
Number of calls required to get a tree:
- Adjacency = [number of sub-levels]
- MPTT = 1
- Path = 1
Number of calls required to get path to a node / ancestry:
- Adjacency = [number of super-levels]
- MPTT = 1
- Path = 0
Number of calls required to get number of subnodes:
- Adjacency = [number of sub-levels]
- MPTT = 0 (Can be calculated from right/left values)
- Path = 1
Number of calls required to get depth of node:
- Adjacency = [number of super-levels]
- MPTT = 1
- Path = 0
DB fields required:
- Adjacency = 1 (parent)
- MPTT = 3 (parent,right,left)
- Path = 1 (path)
Conclusion
The stored path technique uses the same or less calls than the other techniques in every use case except one. By this analysis, storing paths is a clear winner. Not to mention, it's a lot simpler to implement, human readable, etc.
So the question is, shouldn't stored paths be considered a stronger technique than MPTT? Why are stored paths not a more commonly used technique, and why would you not use them over MPTT in a given instance?
Also, if you think this analysis is incomplete please let me know.
UPDATE:
Here are at least 2 things MPTT can do out of the box that a stored path solution won't:
- Allows calculation of subnode count for each node without any additional queries (mentioned above).
- Imposes an order on nodes at a given level. The other solutions are unordered.
SELECT * FROM adjacency a /* for($number_of_levels_required.. $prevname = $level > 1 ? 'a'+$level-1 : 'a'; */ INNER JOIN adjacency a/*$level*/ ON a/*$level*/.parent_id = /*$prevname*/.id /* endfor*/
– Martsenreferential integrity
as Bill Karwin mentions in his answer. – JohannejohannesSELECT * FROM items WHERE path LIKE 'my/child/path%';
. Yes, referential integrity suffers but the trade off is much more efficient queries. – Obla