Implementing a hierarchical data structure in a database
Asked Answered
L

6

12

I know there are two approaches: adjacency list and nested tree. It's said that adjacency list can become slow to use on traversal because of numerous queries. But I don't know any realistic figures for this. The site I'm making will have in the region of 200 pages. Is traversal to generate (for example) a sitemap going to take longer than about 0.3 seconds?

Running on MySQL (innoDB) with LAMP stack.

I'd prefer to implement adjacency if possible because of the more simplistic design.

Thanks.

Langrage answered 13/2, 2009 at 3:48 Comment(7)
so basically, your question is if some unknown implementation of some datastructure, running on an unknown piece of hardware going to take less than 0.3 sec? nice one.Cottrill
@Shy - MySQL innoDB database on a LAMP stack.Understate
It shouldn't be that difficult to throw together a prototype and do some bench testing. What RDBMS will host this?Leta
No seriously, which DBMS, not which wanna-be DBMS? :-)Trainor
@Pax - well we also use Oracle at work, but I choose MySQL despite the far less features I admit. Oracle gives so many problems when upgrading, it's like some software from the 80's.Understate
@Me, that's 'cause it was written in the '70s and not much improved since then. Can it still not tell the difference between NULL and empty string?Trainor
@Pax: correct, in Oracle, NULL is the same as ''.Britnibrito
B
16

There are more options than just the two you mention. There are:

  • Adjacency List (the "parent_id" one almost everyone uses)
  • Nested Sets
  • Path Enumeration
  • Closure Table (aka Adjacency Relation)

See my answer to "What is the most efficient/elegant way to parse a flat table into a tree?"

Or a couple of books:

Britnibrito answered 13/2, 2009 at 4:33 Comment(1)
Thanks, that was very comprehensive.Understate
I
4

The article Managing Hierarchical Data in MySQL goes in details about this.

I would recommend the "nested set" technique, as it allows you to get the whole tree (and its children) in one query. Basically reads are cheap but writes are expensive because the whole tree has to be re-balanced. But in cases where you have 99% reads then its totally justifiable.

Isiah answered 13/2, 2009 at 4:2 Comment(0)
R
3

The naive approach to parsing an adjacency list requires a lot of queries, and for large lists may take a significant amount of time to build in memory. For reference, the naive approach I'm referring to could be summarized as: Select all items with no parent, Then for each item recursively get it's children. This approach requires n+1 database queries.

I've used the following approach to build an adjacency list with 1 query. Select all items form the database. Transfer all items into an array indexed by their key. Traverse the array and assign a reference from the parent object to each of it's children. Traverse the array a second time and remove all of the child objects leaving behind only the root level objects.

Since you mentioned LAMP stack, PHP code to do this is roughly as follows:

<?php
// Assumes $src is the array if items from the database.
$tmp = array();

// Traverse the array and index it by id, ensuing each item has an empty array of children.
foreach ($src as $item) {
  $item['children'] = array();
  $tmp[$item['id']] = $item;
}

// Now traverse the array a second time and link children to their parents.
foreach ($tmp as $id => $item) {
  if ($item['parent_id'] != 0 || $item['parent_id'] !== NULL) {
    $tmp[$item['parent_id']]['children'][$id] = &$tmp[$id];
  }
}

// Finally create an array with just root level items.
$tree = array();
foreach ($tmp as $id => $item) {
  if ($item['parent_id'] == 0 || $item['parent_id'] === NULL) {
    $tree[$id] = $item;
  }
}

// $tree now contains our adjacency list in tree form.
?>

Please note this code is intended to illustrate a technique for building an adjacency list from a single database query. It could probably be optimized for less memory consumption, etc. It also hasn't been tested.

Jim,

Raker answered 13/2, 2009 at 5:4 Comment(0)
B
2

Here are a couple of questions that might help you:

SQL how to store and navigate hierarchies

Which is the best database schema for my navigation

Breda answered 13/2, 2009 at 3:54 Comment(1)
Not really my area of expertise but I do swing a mean search box.Breda
R
2

The other approach is called "nested set", I think, not "nested tree".

Anyway, a good thing about a site map is that you might know its maximum depth. I think that the problem with the adjacency model is that the corresponding SQL works on one level at a time, so if you have 'n' levels then you need a loop of 'n' SQL statements ... but I think (I'm not sure) that if you know the maximum 'n' in advance then you can code the corresponding fixed-number-of-multiple-levels SQL.

0.3 seconds sounds to me like a very long time to figure 200 pages, so that's probably OK.

Also a site map isn't updated very often; so even if it does take a long time to retrieve from SQL, you can probably cache the retrieved/calculated tree in RAM.

ALternatively, instead of worrying about the SQL to build a tree, you could just store it as simply as possible (as adjacency list), retrieve it from the database as a simple set of rows, and build the tree in RAM (using loops in your high-level programming language) instead of using loops in SQL to build the tree using SQL statements.

Risible answered 13/2, 2009 at 3:56 Comment(4)
Thanks. I know the 'n' number of levels at the moment (4), but it might change. I think perhaps it's worth me just implementing the nested set, rather than the easier method.Understate
I presumed that SQL statements would be a lot faster than loops in PHP.Understate
I don't know PHP. The benefit of a nested set is that you can retrieve an entire branch (not the whole tree) with one SELECT. Perhaps that's the only benefit, and unless you need to do that (and why would you?) then it's not worth the extra complexity.Risible
If you have only one tree containing only 200 pages, it's not a big deal to fetch the entire data set and figure out the hierarchy in application code. If you have hundreds of thousands of nodes, or thousands of parallel trees, that's different.Britnibrito
E
2

For completeneness: Oracle has the START_WITH and CONNECT_BY operators: see this Hierarchical Queries document.

Ethology answered 13/2, 2009 at 4:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.