Is it possible to query a tree structure table in MySQL in a single query, to any depth?
Asked Answered
S

9

58

I'm thinking the answer is no, but I'd love it it anybody had any insight into how to crawl a tree structure to any depth in SQL (MySQL), but with a single query

More specifically, given a tree structured table (id, data, data, parent_id), and one row in the table, is it possible to get all descendants (child/grandchild/etc), or for that matter all ancestors (parent/grandparent/etc) without knowing how far down or up it will go, using a single query?

Or is using some kind of recursion require, where I keep querying deeper until there are no new results?

Specifically, I'm using Ruby and Rails, but I'm guessing that's not very relevant.

Suprasegmental answered 4/10, 2008 at 5:42 Comment(1)
PostgreSQL allows direct use of tail-recursive (= iterative) queries, but as far as I know MySQL does not. I have successfully used this technique before in Postgres as a substitute for the nested-sets approach. postgresql.org/docs/8.4/static/queries-with.htmlJohnajohnath
H
37

Yes, this is possible, it's a called a Modified Preorder Tree Traversal, as best described here

Joe Celko's Trees and Hierarchies in SQL for Smarties

A working example (in PHP) is provided here

http://www.sitepoint.com/article/hierarchical-data-database/2/

Himes answered 4/10, 2008 at 6:55 Comment(6)
Ok... I've gotta get that book now. I've used that structure several times, with a different source for info each time...Countermeasure
Seriously - get Celko's SQL for Smarties. It IS the bible for SQLHimes
There's a whole book on that?! Can't it be summarized in 2 or 3 pages? (like that article on SitePoint!)Humorous
The sitePoint article is just great!Oversize
I've come across this idea a few times but as far as I remember, that system, though seeming very clever, abandons a lot of what makes RDBMS fast and safe. If you have a large tree and need to insert something into one of the oldest branches, you will have to update nearly every record in the table, and the same if you delete. I think I would recommend kids take the road more-traveled when it comes to this problem. But I'm no expert.Muskrat
@ButtleButkus If you read the tree more than you update it, then the structure presented is really advantageous, and IMO, that is the most common scenario you will face.Finery
L
23

Here are several resources:

Basically, you'll need to do some sort of cursor in a stored procedure or query or build an adjacency table. I'd avoid recursion outside of the db: depending on how deep your tree is, that could get really slow/sketchy.

Litchfield answered 4/10, 2008 at 5:55 Comment(2)
that second link where it discusses Nested Sets is really good - I'd never thought about doing it that way!Judaica
That link also mentions Joe Celko's book.Countermeasure
L
3

Daniel Beardsley's answer is not that bad a solution at all when the main questions you are asking are 'what are all my children' and 'what are all my parents'.

In response to Alex Weinstein, this method actually results in less updates to nodes on a parent movement than in the Celko technique. In Celko's technique, if a level 2 node on the far left moves to under a level 1 node on the far right, then pretty much every node in the tree needs updating, rather than just the node's children.

What I would say however is that Daniel possibly stores the path back to root the wrong way around.

I would store them so that the query would be

SELECT FROM table WHERE ancestors LIKE "1,2,6%"

This means that mysql can make use of an index on the 'ancestors' column, which it would not be able to do with a leading %.

Latria answered 5/3, 2012 at 15:7 Comment(7)
Storing comma separated values in a column has several integrity and performance problems. It's really that bad.Bullfinch
52 SO users disagree with you: Is storing a comma separated list in a database column really that bad?Bullfinch
OK, so you back up your argument now. Just saying 'that's not a good idea, without backing it up with some reasoning is no use to anyone. The example you give is where someone is using a comma separated list to store a one-to-many relationship. This was clearly wrong. However here we are using it as a cache to speed up query performance. It's a totally different use case. We are not proposing replacing the parent_id column in the table, merely storing a cached path to root in a form where we don't have to recursively query the structure.Latria
It's obvious I didn't just say it's not a good idea. I said "there are several integrity and performance issues."Bullfinch
( it's also unsaid, but obvious , that this technique would only work for certain lengths of id to a certain depth. For very large trees you'd run out of space in the column )Latria
I don;t want to get into a flame war, but you can't just say 'there are issues with this', without saying what those issues are. It doesn't add anything to the discussion.Latria
I can see advantage in the simplicity of this approach as well (the LIKE query you suggest, and FIND_IN_SET clauses). Updating ancestry for descendants could be achieved in a single UPDATE table SET ancestors=REPLACE(ancestors, old_path, new_path) WHERE ancestors LIKE old_path+'%'; I am trying to think of any major flaws in this approach besides the integrity concerns (which seem easy enough to manage with a well designed data layer). Performance-wise, I am actually curious if lookups with this approach may be faster than some recursive approaches I've come across.Oulu
D
2

I came across this problem before and had one wacky idea. You could store a field in each record that is concatenated string of it's direct ancestors' ids all the way back to the root.

Imagine you had records like this (indentation implies heirarchy and the numbers are id, ancestors.

  • 1, "1"
    • 2, "2,1"
      • 5, "5,2,1"
      • 6, "6,2,1"
        • 7, "7,6,2,1"
        • 11, "11,6,2,1"
    • 3, "3,1"
      • 8, "8,3,1"
      • 9, "9,3,1"
      • 10, "10,3,1"

Then to select the descendents of id:6, just do this

SELECT FROM table WHERE ancestors LIKE "%6,2,1"

Keeping the ancestors column up to date might be more trouble than it's worth to you, but it's feasible solution in any DB.

Diluent answered 4/10, 2008 at 6:45 Comment(2)
maintaining this would be total hell. imagine if node 3 changes. You need to do an UPDATE on all of its children.Jourdan
Supposing that you reversed the order of the path so that the root ancestor is at the left (ex for id 5 and 6, ancestors='1,2'), and most direct parent is at the right (as @Latria suggests in his answer), updating the children could be achieved in a single query that sets ancestors = REPLACE(ancestors, old_path, new_path). If your hierarchy structure is not very large or doesn't change often, I don't think this is a bad way to go. Using LIKE "1,2%" or FIND_IN_SET with this structure, ancestry-based queries (for descendants or ancestors) could be greatly simplified.Oulu
H
2

Celko's technique (nested sets) is pretty good. I also have used an adjacency table with fields "ancestor" and "descendant" and "distance" (e.g. direct children/parents have a distance of 1, grandchildren/grandparents have a distance of 2, etc).

This needs to be maintained, but is fairly easy to do for inserts: you use a transaction, then put the direct link (parent, child, distance=1) into the table, then INSERT IGNORE a SELECTion of existing parent&children by adding distances (I can pull up the SQL when I have a chance), which wants an index on each of the 3 fields for performance. Where this approach gets ugly is for deletions... you basically have to mark all the items that have been affected and then rebuild them. But an advantage of this is that it can handle arbitrary acyclic graphs, whereas the nested set model can only do straight hierarchies (e.g. each item except the root has one and only one parent).

Hermeneutics answered 8/12, 2008 at 19:43 Comment(0)
E
1

SQL isn't a Turing Complete language, which means you're not going to be able to perform this sort of looping. You can do some very clever things with SQL and tree structures, but I can't think of a way to describe a row which has a certain id "in its hierarchy" for a hierarchy of arbitrary depth.

Your best bet is something along the lines of what @Dan suggested, which is to just work your way through the tree in some other, more capable language. You can actually generate a query string in a general-purpose language using a loop, where the query is just some convoluted series of joins (or sub-queries) which reflects the depth of the hierarchy you are looking for. That would be more efficient than looping and multiple queries.

Ethben answered 4/10, 2008 at 6:7 Comment(0)
C
1

This can definitely be done and it isn't that complicated for SQL. I've answered this question and provided a working example using mysql procedural code here:

MySQL: How to find leaves in specific node

Booth: If you are satisfied, you should mark one of the answers as accepted.

Catchpole answered 9/10, 2012 at 21:9 Comment(0)
M
1

I used the "With Emulator" routine described in https://stackoverflow.com/questions/27013093/recursive-query-emulation-in-mysql (provided by https://stackoverflow.com/users/1726419/yossico). So far, I've gotten very good results (performance wise), but I don't have an abundance of data or a large number of descendents to search through/for.

Mentor answered 12/1, 2015 at 21:15 Comment(0)
B
-1

You're almost definitely going to want to employ some recursion for that. And if you're doing that, then it would be trivial (in fact easier) to get the entire tree rather than bits of it to a fixed depth.

In really rough pseudo-code you'll want something along these lines:

getChildren(parent){
    children = query(SELECT * FROM table WHERE parent_id = parent.id)
    return children
}

printTree(root){
    print root
    children = getChildren(root)
    for child in children {
        printTree(child)
    }
}

Although in practice you'd rarely want to do something like this. It will be rather inefficient since it's making one request for every row in the table, so it'll only be sensible for either small tables, or trees that aren't nested too deeply. To be honest, in either case you probably want to limit the depth.

However, given the popularity of these kinds of data structure, there may very well be some MySQL stuff to help you with this, specifically to cut down on the numbers of queries you need to make.

Edit: Having thought about it, it makes very little sense to make all these queries. If you're reading the entire table anyway, then you can just slurp the whole thing into RAM - assuming it's small enough!

Bridgework answered 4/10, 2008 at 5:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.