Generating Depth based tree from Hierarchical Data in MySQL (no CTEs)
Asked Answered
V

4

31

Hi For many days I have been working on this problem in MySQL, however I can not figure it out. Do any of you have suggestions?

Basically, I have a category table with domains like: id, name (name of category), and parent (id of parent of the category).

Example Data:

1  Fruit        0
2  Apple        1
3  pear         1
4  FujiApple    2
5  AusApple     2
6  SydneyAPPLE  5
....

There are many levels, possibly more than 3 levels. I want to create an sql query that groups the datas according to he hierarchy: parent > child > grandchild > etc.

It should output the tree structure, as follows:

1 Fruit 0
 ^ 2 Apple 1
   ^ 4 FujiApple 2
   - 5 AusApple 2
     ^ 6 SydneyApple 5
 - 3 pear 1

Can I do this using a single SQL query? The alternative, which I tried and does work, is the following:

SELECT * FROM category WHERE parent=0

After this, I loop through the data again, and select the rows where parent=id. This seems like a bad solution. Because it is mySQL, CTEs cannot be used.

Vaasa answered 13/3, 2011 at 17:23 Comment(4)
still reading and understanding all solution, no sure which one to pick.Vaasa
Too bad you aren't using MSSQL - the HierachyId feature solves this problem and is extremely fast.Electromagnetism
#4048651Abysm
added some additional info to my answer :)Hardman
H
41

You can do it in a single call from php to mysql if you use a stored procedure:

Example calls

mysql> call category_hier(1);

+--------+---------------+---------------+----------------------+-------+
| cat_id | category_name | parent_cat_id | parent_category_name | depth |
+--------+---------------+---------------+----------------------+-------+
|      1 | Location      |          NULL | NULL                 |     0 |
|      3 | USA           |             1 | Location             |     1 |
|      4 | Illinois      |             3 | USA                  |     2 |
|      5 | Chicago       |             3 | USA                  |     2 |
+--------+---------------+---------------+----------------------+-------+
4 rows in set (0.00 sec)


$sql = sprintf("call category_hier(%d)", $id);

Hope this helps :)

Full script

Test table structure:

drop table if exists categories;
create table categories
(
cat_id smallint unsigned not null auto_increment primary key,
name varchar(255) not null,
parent_cat_id smallint unsigned null,
key (parent_cat_id)
)
engine = innodb;

Test data:

insert into categories (name, parent_cat_id) values
('Location',null),
   ('USA',1), 
      ('Illinois',2), 
      ('Chicago',2),  
('Color',null), 
   ('Black',3), 
   ('Red',3);

Procedure:

drop procedure if exists category_hier;

delimiter #

create procedure category_hier
(
in p_cat_id smallint unsigned
)
begin

declare v_done tinyint unsigned default 0;
declare v_depth smallint unsigned default 0;

create temporary table hier(
 parent_cat_id smallint unsigned, 
 cat_id smallint unsigned, 
 depth smallint unsigned default 0
)engine = memory;

insert into hier select parent_cat_id, cat_id, v_depth from categories where cat_id = p_cat_id;

/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */

create temporary table tmp engine=memory select * from hier;

while not v_done do

    if exists( select 1 from categories p inner join hier on p.parent_cat_id = hier.cat_id and hier.depth = v_depth) then

        insert into hier 
            select p.parent_cat_id, p.cat_id, v_depth + 1 from categories p 
            inner join tmp on p.parent_cat_id = tmp.cat_id and tmp.depth = v_depth;

        set v_depth = v_depth + 1;          

        truncate table tmp;
        insert into tmp select * from hier where depth = v_depth;

    else
        set v_done = 1;
    end if;

end while;

select 
 p.cat_id,
 p.name as category_name,
 b.cat_id as parent_cat_id,
 b.name as parent_category_name,
 hier.depth
from 
 hier
inner join categories p on hier.cat_id = p.cat_id
left outer join categories b on hier.parent_cat_id = b.cat_id
order by
 hier.depth, hier.cat_id;

drop temporary table if exists hier;
drop temporary table if exists tmp;

end #

Test runs:

delimiter ;

call category_hier(1);

call category_hier(2);

Some performance testing using Yahoo geoplanet places data

drop table if exists geoplanet_places;
create table geoplanet_places
(
woe_id int unsigned not null,
iso_code  varchar(3) not null,
name varchar(255) not null,
lang varchar(8) not null,
place_type varchar(32) not null,
parent_woe_id int unsigned not null,
primary key (woe_id),
key (parent_woe_id)
)
engine=innodb;

mysql> select count(*) from geoplanet_places;
+----------+
| count(*) |
+----------+
|  5653967 |
+----------+

so that's 5.6 million rows (places) in the table let's see how the adjacency list implementation/stored procedure called from php handles that.

     1 records fetched with max depth 0 in 0.001921 secs
   250 records fetched with max depth 1 in 0.004883 secs
   515 records fetched with max depth 1 in 0.006552 secs
   822 records fetched with max depth 1 in 0.009568 secs
   918 records fetched with max depth 1 in 0.009689 secs
  1346 records fetched with max depth 1 in 0.040453 secs
  5901 records fetched with max depth 2 in 0.219246 secs
  6817 records fetched with max depth 1 in 0.152841 secs
  8621 records fetched with max depth 3 in 0.096665 secs
 18098 records fetched with max depth 3 in 0.580223 secs
238007 records fetched with max depth 4 in 2.003213 secs

Overall i'm pretty pleased with those cold runtimes as I wouldn't even begin to consider returning tens of thousands of rows of data to my front end but would rather build the tree dynamically fetching only several levels per call. Oh and just incase you were thinking innodb is slower than myisam - the myisam implementation I tested was twice as slow in all counts.

More stuff here : http://pastie.org/1672733

Hope this helps :)

Hardman answered 13/3, 2011 at 17:40 Comment(9)
I fear this method will have some serious performance issues.Aroma
i am trying this method, writing my own one, then maybe i will check out the processing timeVaasa
you can do some performance and stress testing with Yahoo GeoPlanet data found here: developer.yahoo.com/geo/geoplanet/dataHardman
looks good, but gonna take forever for me to digest this..thx anyway..upvotedVince
@Vince - happy to help you if required :)Hardman
Cool solution. I modified it a bit to support building up the "full path" for the name, and to accept NULL id param to print ALL parents and their children. Code is at gist.github.com/jdmullin/9377818Furbish
Very useful and relatively easy to modify to suit my own requirements - thanks @JonBlack :)Deutsch
FYI: If anyone is trying to put this into PHPMyAdmin under the 'routines' tab, you've got to set up the variable yourself, and remove everything from the procedure before the 'begin' line. It should then work. If it doesnt, copy and paste the SQL for the procedure and run under the SQL tab.Kalgoorlie
@JonBlack Sorry to bother you - when I run this procedure (or Routine) it works, but then when I run it a second time I get Undefined index: ORDER BY. The data still shows but the message "Some errors have been detected on the server" are showing. I'm using PhpMyAdmin - any advice?Kalgoorlie
T
9

There are two common ways of storing hierarchical data in an RDBMS: adjacency lists (which you are using) and nested sets. There is a very good write-up about these alternatives in Managing Hierarchical Data in MySQL. You can only do what you want in a single query with the nested set model. However, the nested set model makes it more work to update the hierarchical structure, so you need to consider the trade-offs depending on your operational requirements.

Taxiway answered 13/3, 2011 at 17:31 Comment(0)
A
3

You can't achieve this using a single query. Your hierarchical data model is ineffective in this case. I suggest you try two other ways of storing hierarchical data in a database: the MPTT model or the "lineage" model. Using either of those models allows you to do the select you want in a single go.

Here is an article with further details: http://articles.sitepoint.com/article/hierarchical-data-database

Aroma answered 13/3, 2011 at 17:30 Comment(5)
Here's a description of the lineage model.Taxiway
Why did you remove the MySQL tag?Summary
This is not MySQL specific. It's a general SQL subject.Aroma
@Aroma - But if, for example, the OP was on SQL Server this could be achieved with a recursive CTE. Someone did waste time providing such an answer only to delete it when they found out the OP was on MySQL.Summary
@Martin: actually all other major DBMS support recursive CTE - except MySQLBicentenary
B
0

The linear way:

I am using a ugly function to create a tree in a simple string field.

/              topic title
/001           message 1
/002           message 2
/002/001       reply to message 2
/002/001/001/  reply to reply
/003           message 3
etc...

the table can be used to select all the rows in the tree order with a simple SQL Query:

select * from morum_messages where m_topic=1234 order by m_linear asc

INSERT is just select the parent linear (and children) and calculate the string as needed.

select M_LINEAR FROM forum_messages WHERE m_topic = 1234 and M_LINEAR LIKE '{0}/___' ORDER BY M_LINEAR DESC limit 0,1  
/* {0} - m_linear of the parent message*/

DELETE is simple as delete the message, or delete by linear all replies of the parent one.

Beneficial answered 4/5, 2012 at 9:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.