Nested Set Query to retrieve all ancestors of each node
Asked Answered
T

2

8

I have a MySQL query that I thought was working fine to retrieve all the ancestors of each node, starting from the top node, down to its immediate node. However when I added a 5th level to the nested set, it broke.

Below are example tables, queries and SQL Fiddles:

Four Level Nested Set:

CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
 `tree` int,
 `left` int,
 `right` int);

INSERT Tree
VALUES
("Food", 1, 1, 18),
('Fruit', 1, 2, 11),
('Red', 1, 3, 6),
('Cherry', 1, 4, 5),
('Yellow', 1, 7, 10),
('Banana', 1, 8, 9),
('Meat', 1, 12, 17),
('Beef', 1, 13, 14),
('Pork', 1, 15, 16);

The Query:

SELECT t0.title node
      ,(SELECT GROUP_CONCAT(t2.title)
                    FROM Tree t2
                    WHERE t2.left<t0.left AND t2.right>t0.right
                    ORDER BY t2.left) ancestors
FROM Tree t0
GROUP BY t0.title;

The returned result for node Banana is Food,Fruit,Yellow - Perfect. You can see this here SQL Fiddle - 4 Levels

When I run the same query on the 5 level table below, the 5th level nodes come back in the wrong order:

CREATE TABLE Tree
(title varchar(20) PRIMARY KEY,
 `tree` int,
 `left` int,
 `right` int);

INSERT Tree
VALUES
("Food", 1, 1, 24),
('Fruit', 1, 2, 13),
('Red', 1, 3, 8),
('Cherry', 1, 4, 7),
('Cherry_pie', 1, 5, 6),
('Yellow', 1, 9, 12),
('Banana', 1, 10, 11),
('Meat', 1, 14, 23),
('Beef', 1, 15, 16),
('Pork', 1, 17, 22),
('Bacon', 1, 18, 21),
('Bacon_Sandwich', 1, 19, 20);

The returned result for Bacon_Sandwich is Bacon,Food,Meat,Pork which is not the right order, it should be Food,Meat,Pork,Bacon - You can see this here SQL Fiddle - 5 Levels

I am not sure what is happening because I don't really understand subqueries well enough. Can anyone shed any light on this?

EDIT AFTER INVESTIGATION:

Woah!! Looks like writing all this out and reading up about ordering with GROUP_CONCAT gave me some inspiration.

Adding ORDER BY to the actual GROUP_CONCAT function and removing from the end of the subquery solved the issue. I now receive Food,Meat,Pork,Bacon for the node Bacon_Sandwich

SELECT t0.title node
      ,(SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
                    FROM Tree t2
                    WHERE t2.left<t0.left AND t2.right>t0.right
                    ) ancestors
FROM Tree t0
GROUP BY t0.title;

I still have no idea why though. Having ORDER BY at the end of the subquery works for 4 levels but not for 5?!?!

If someone could explain what the issue is and why moving the ORDER BY fixes it, I'd be most grateful.

Tokay answered 8/1, 2014 at 10:54 Comment(0)
A
6

First it's important to understand that you have an implicit GROUP BY

If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

To make the point more understandable I'll leave out subqueries and reduce the problem to the banana. Banana is the set [10, 11]. The correct sorted ancestors are those:

SELECT "banana" as node, GROUP_CONCAT(title ORDER by `left`)
  FROM Tree WHERE `left` < 10 AND `right` > 11
  GROUP BY node;

The ORDER BY must be in GROUP_CONCAT() as you want the aggregation function to sort. ORDER BY outside sorts by the aggregated results (i.e. the result of GROUP_CONCAT()). The fact that it worked until level 4 is just luck. ORDER BY has no effect on an aggregate function. You would get the same results with or without the ORDER BY:

SELECT GROUP_CONCAT(title)
  FROM Tree WHERE `left` < 10 AND `right` > 11
  /* ORDER BY `left` */

It might help to understand what SELECT GROUP_CONCAT(title ORDER BY left) FROM Tree WHERE … ORDER BY left does:

  1. Get a selection (WHERE) which results in three rows in an undefined order:

    ("Food")
    ("Yellow")
    ("Fruit")
    
  2. Aggregate the result into one row (implicit GROUP BY) in order to be able to use an aggregate function:

    (("Food","Yellow", "Fruit"))
    
  3. Fire the aggregate function (GROUP_CONCAT(title, ORDER BY link)) on it. I.e. order by link and then concatenate:

    ("Food,Fruit,Yellow")
    
  4. And now finally it sorts that result (ORDER BY). As it's only one row, sorting changes nothing.

    ("Food,Fruit,Yellow")
    
Activism answered 8/1, 2014 at 10:54 Comment(3)
Yeah I understand how the GROUP_CONCAT() works, I just assumed it did it's thing after the entire query was run, which would have ordered the results with the ORDER BY at the end. To be honest I am still not really sure why this is the case. Surely the ONLY way GROUP_CONCAT() knows WHAT to concatenate, is by the SELECT query. And the SELECT query has ORDER BY in it, so it should return what to concatenate in order?!?!? I am obviously missing something.....? Just to be clear, I know that GROUP_CONCAT() is concatenating the results in to one row... that is not my confusion..Tokay
I think you are missing the point about aggregation. I intentionally put a GROUP BY into my query. GROUP_CONCAT() is not operating on selected fields, it's operating on the grouped fields. You're query (without the GROUP BY) is just a shortcut for a GROUP BY query.Activism
let us continue this discussion in chatActivism
F
4

You can get the result using JOIN or SUB-QUERY.

Using JOIN:

SELECT t0.title node, GROUP_CONCAT(t2.title ORDER BY t2.left) ancestors 
FROM Tree t0
LEFT JOIN Tree t2 ON t2.left < t0.left AND t2.right > t0.right
GROUP BY t0.title; 

Check this SQL FIDDLE DEMO

Using SUB-QUERY:

SELECT t0.title node, 
      (SELECT GROUP_CONCAT(t2.title ORDER BY t2.left)
       FROM Tree t2 WHERE t2.left<t0.left AND t2.right>t0.right) ancestors
FROM Tree t0
GROUP BY t0.title;

Check this SQL FIDDLE DEMO

OUTPUT

|           NODE |             ANCESTORS |
|----------------|-----------------------|
|          Bacon |        Food,Meat,Pork |
| Bacon_Sandwich |  Food,Meat,Pork,Bacon |
|         Banana |     Food,Fruit,Yellow |
|           Beef |             Food,Meat |
|         Cherry |        Food,Fruit,Red |
|     Cherry_pie | Food,Fruit,Red,Cherry |
|           Food |                (null) |
|          Fruit |                  Food |
|           Meat |                  Food |
|           Pork |             Food,Meat |
|            Red |            Food,Fruit |
|         Yellow |            Food,Fruit |

In your sub query you had used ORDER BY after WHERE clause which won't affect the output. By default GROUP_CONCAT() function will orders the output string in ascending order of column value. It won't consider you explicit ORDER BY clause.

If you check your output of first query which returns the data in ascending order of title column. So the returned result for node Banana is Food,Fruit,Yellow.

But in your second result for Bacon_Sandwich is Bacon,Food,Meat,Pork because in ascending order Bacon comes first than Food will come.

If you want to order the result based on left column than you have to specify ORDER BY inside the GROUP_CONCAT() function as above. Check my both queries.

I prefer that you use JOIN instead of SUB-QUERY for improving performance.

Frontogenesis answered 11/1, 2014 at 8:52 Comment(2)
I was certain I left a comment which points you to the manual. So again: GROUP_CONCAT() is not sorting at all by default.Activism
Thanks for the added explanation.Tokay

© 2022 - 2024 — McMap. All rights reserved.