Dealing with nested sets in mysql?
Asked Answered
H

3

7

I have decided to follow http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

So now I am looking for some help with the code.

I am using their data for my testing, So, I visualized the tree being like so:

    array('value' => 'Richard Shakespeare',
        array('value' => 'Henry',
            array('value' => 'Joan'),
            array('value' => 'Margaret'),
            array('value' => 'William',
                array('value' => 'Susana',
                    array('value' => 'Elizabeth Hall',
                        array('value' => 'John Bernard'))),
                array('value' => 'Hamnet'),
                array('value' => 'Judith',
                    array('value' => 'Shakespeare Quiney'),
                    array('value' => 'Richard Quiney'),
                    array('value' => 'Thomas Quiney'))),
            array('value' => 'Gilbert'),
            array('value' => 'Joan',
                array('value' => 'William Hart'),
                array('value' => 'Mary Hart'),
                array('value' => 'Thomas Hart'),
                array('value' => 'Micheal Hart')),
            array('value' => 'Anne'),
            array('value' => 'Richard'),
            array('value' => 'Edmond')),
        array('value' => 'John'));

So if we want to insert that into the database we want to end up with

Array
(
    [0] => Array
        (
            [value] => Richard Shakespeare
            [left] => 1
            [right] => 46
        )

    [1] => Array
        (
            [value] => Henry
            [left] => 2
            [right] => 43
        )

    [2] => Array
        (
            [value] => Joan
            [left] => 3
            [right] => 4
        )

    [3] => Array
        (
            [value] => Margaret
            [left] => 5
            [right] => 6
        )

    [4] => Array
        (
            [value] => William
            [left] => 7
            [right] => 24
        )

    [5] => Array
        (
            [value] => Susana
            [left] => 8
            [right] => 13
        )

    [6] => Array
        (
            [value] => Elizabeth Hall
            [left] => 9
            [right] => 12
        )

    [7] => Array
        (
            [value] => John Bernard
            [left] => 10
            [right] => 11
        )

    [8] => Array
        (
            [value] => Hamnet
            [left] => 14
            [right] => 15
        )

    [9] => Array
        (
            [value] => Judith
            [left] => 16
            [right] => 23
        )

    [10] => Array
        (
            [value] => Shakespeare Quiney
            [left] => 17
            [right] => 18
        )

    [11] => Array
        (
            [value] => Richard Quiney
            [left] => 19
            [right] => 20
        )

    [12] => Array
        (
            [value] => Thomas Quiney
            [left] => 21
            [right] => 22
        )

    [13] => Array
        (
            [value] => Gilbert
            [left] => 25
            [right] => 26
        )

    [14] => Array
        (
            [value] => Joan
            [left] => 27
            [right] => 36
        )

    [15] => Array
        (
            [value] => William Hart
            [left] => 28
            [right] => 29
        )

    [16] => Array
        (
            [value] => Mary Hart
            [left] => 30
            [right] => 31
        )

    [17] => Array
        (
            [value] => Thomas Hart
            [left] => 32
            [right] => 33
        )

    [18] => Array
        (
            [value] => Micheal Hart
            [left] => 34
            [right] => 35
        )

    [19] => Array
        (
            [value] => Anne
            [left] => 37
            [right] => 38
        )

    [20] => Array
        (
            [value] => Richard
            [left] => 39
            [right] => 40
        )

    [21] => Array
        (
            [value] => Edmond
            [left] => 41
            [right] => 42
        )

    [22] => Array
        (
            [value] => John
            [left] => 44
            [right] => 45
        )

)

So the issue comes to mind of, How best to do this?

My solution was:

$container = array();

function children($item){
  $children = 0;
  foreach($item as $node)
    if(is_array($node))
      $children += children($node)+1;
    return $children;
}

function calculate($item, &$container, $data = array(0,0)){
  //althought this one is actually of no use, it could be useful as it contains a count 
  $data[0]++; //$left

  $right = ($data[0]+(children($item)*2))+1;

  //store the values in the passed container
  $container[] = array(
    'value' => $item['value'],
    'left'  => $data[0],
    'right' => $right,
  );

  //continue looping
  $level = $data[1]++;
  foreach($item as &$node)
    if(is_array($node))
      $data = calculate($node, $container, $data);

    $data[1] = $level;
    $data[0]++;
    return $data;
}

calculate($tree, $container);

How efficient it is I do not know.

But now onto the queries.

To select all descendants of a node we can use

SELECT child.value AS 'Descendants of William', COUNT(*) AS `Level`
FROM tester AS parent
JOIN tester AS child ON child.`left` BETWEEN parent.`left` AND parent.`right`
WHERE parent.`left` > 7 AND parent.`right` < 24
GROUP BY child.value ORDER BY `level`;

To select all descendants of a node, to a specific depth we can use
Note that we are selecting Descendants of William to a depth of 2
Williams left: 7, Williams right: 24, Levels: 2

SELECT child.value AS 'Descendants of William', COUNT(*) AS `Level`
FROM tester AS parent
JOIN tester AS child ON child.`left` BETWEEN parent.`left` AND parent.`right`
WHERE parent.`left` > 7 AND parent.`right` < 24
GROUP BY child.value HAVING `level` <= 2 ORDER BY `level`;

So that's easy enough.

But now I want to know a few things,
Note that in the actual database as well as left/right all rows have a unique id, and a "parent" column containing their inviteers id, or null if not invited

  • Lets say I want to insert David as a child of Judith, How do I do that?
  • Lets say I want to get Mary Hart's Parent, and the Parents Parent (array('Henery', 'Joan', 'Mary Hart')), How do I do that?
  • Lets say I want to delete William Hart from Joan How so I do that?
Hols answered 9/6, 2011 at 22:8 Comment(1)
What have you tried? I hate to ask, but nested sets are complex, and your question sounds like you're crowd-sourcing your work. Plus, you haven't even begun to skim the difficulty of the problem. Such as: "if I want to invert david and judith, so that judith is now takes david's place as child and vice-versa, how do I do that [without re-indexing the tree twice]?"Uchida
Q
1

To update/delete you will need to increase/decrease left/right values of all elements of branch.
Examples of queries you can find here.

How efficient it is I do not know.

Nested sets works VERY slowly with big trees on update/insert/delete. And very fast to select.
So use this model only with static data, which will be stored without changes most of the time, and this tree will not contain thousands of nodes (or any update will take minutes to complete). Materialized path works much faster.

Querist answered 9/6, 2011 at 22:24 Comment(8)
So, If I am working with a user table, hence, The data will be inserted often, there will be hundreds of thousands of rows eventually, but it will not be updated/deleted often, but will be read on almost every page load (the tree structure is a central part of our site) should I go with nested sets?Hols
@OZ in fact not only elements of branch but all elements with a left_id > current_element.right_id.Zirconia
@OZ_: I downvote you because this is not true for MySQL. Maybe it is true for PostGreSql and also Celco is around here.Tumulus
@malko, right, thanks. @epitaph, pff, it's just reality. I spent many time to check it in past - go do it. If table will contain 70000 nodes, inserting of new node could require to update many many items, thousands and thousands.Querist
@Hailwood, no, don't do it with users table. Users in hierarchy? Try the Materialized paths model.Querist
@epitaph: you're very wrong. When used as is (i.e. integers instead of floats), it's dead slow in all DBs, sheerly because of the number of updates that may occur on the slightest insert/update/delete.Uchida
Materialized paths, Are they a good idea, when a tree of users could be say 800 users long?Hols
@Hailwood, not depends on count of users.Querist
Z
1
  • to get the parents of a node you need nodes with left_id < child.left_id and right_id > child.right_id, if you only want the direct ancestor choose the one from previous set with the highest left_id.

  • to delete a node remove it and then lower twice all left/right ids that are greater than deleted element right id. if( leftId > deleted.leftId ) leftId-=2 same for rightId

  • to insert a node make some space for it adding+2 to all nodes with leftId > parent.rightId then parent.rightId += 2 then insert node with leftId = parent.rightId-2 and rightId=parent.rightId-1

Zirconia answered 9/6, 2011 at 22:29 Comment(0)
T
0

All of your questions can be solved very easy if you use a DFS for each relationship and then use your function calculate() again if you want it more detailed.

Tumulus answered 9/6, 2011 at 22:21 Comment(1)
Hailwood: Depth-First-Search.Tumulus

© 2022 - 2024 — McMap. All rights reserved.