Hierarchical SQL data (Recursive CTE vs HierarchyID vs closure table)
P

1

22

I have a set of hierarchical data being used in a SQL Server database. The data is stored with a guid as the primary key, and a parentGuid as a foreign key pointing to the objects immediate parent. I access the data most often through Entity Framework in a WebApi project. To make the situation a little more complex I also need to manage permission based on this hierarchy such that a permission applied to a parent applies to all of its descendants. My question is this:

I have searched all over and cannot decide which would be best to handle this situation. I know I have the following options.

  1. I can create Recursive CTEs, Common Table Expression, (aka RCTE) to handle the hierarchical data. This seems to be the most simple approach for normal access, but I'm worried it may be slow when used to determine permission levels for child objects.
  2. I can create a hierarchyId data type field in the table and use SQL Server provided functions such as GetAncestor(), IsDescendantOf(), and etc. This seems like it would make querying fairly easy, but seems to require a fairly complex insert/update trigger to keep the hierarchyId field correct through inserts and moves
  3. I can create a closure table, which would store all of the relationships in the table. I imagine it as such: parent column and child column, each parent -> child relationship would be represented. (ie 1->2 2->3 would be represented in the database as 1-2, 1-3, 2-3). The downside is that this requires insert, update, and delete triggers even though they are fairly simple, and this method generates a lot of records.

I have tried searching all over and can't find anything giving any advice between these three methods.

PS I am also open to any alternative solutions to this problem

Petronilapetronilla answered 25/6, 2013 at 16:30 Comment(3)
Please tag your question with the version of SQL Server that you are using. Do your queries tend to progress from the child to the parent, or the other way about? An RCTE walking up the tree following parent links for a single child shouldn't be too bad. Going the other way for all children is where it gets slow.Drizzle
I can't check the version now, but will later. I think though that its either 2008 or newer. It is likely that I would more often be getting children of parents than getting parents of childrenPetronilapetronilla
I can't add another tag, but it is SQL Server 2008 r2.Petronilapetronilla
O
17

I have used all three methods. It's mostly a question of taste.

I agree that hierarchy with parent-child relationships in the table is the simplest. Moving a subtree is simple and it's easy to code the recursive access with CTEs. Performance is only going to be an issue if you have very large tree structures and you are frequently accessing the hierarchical data. For the most part, recursive CTEs are very fast when you have the correct indexes on the table.

The closure table is more like a supplement to the above. Finding all the descendants of a given node is lightning fast, you don't need the CTEs, just one extra join, so it's sweet. Yes, the number of records blows up, but I think it is no more than N-1 times the number of nodes for a tree of depth N (e.g. a tertiary tree of depth 5 would require 1 + 3 + 9 + 27 + 81 = 121 connections when storing only the parent-child relationship vs. 1 + 3 + (9 * 2) + (27 * 3) + (81 * 4) = 427 for the closure table). In addition, the closure table records are so narrow (just 2 ints at a minimum) that they take up almost no space. Generating the list of records to insert into the closure table when a new record is inserted into the hierarchy takes a tiny bit of overhead.

I personally like HierarchyId since it really combines the benefit of the above two, which is compact storage, and lightning fast access. Once you get it set up, it is easy to query and takes very little space. As you mentioned, it's a little tricky to move subtrees around, but it's manageable. Anyway, how often do you really move a subtree in a hierarchy? There are some links you can find that will suggest some methods, e.g.:

http://sqlblogcasts.com/blogs/simons/archive/2008/03/31/SQL-Server-2008---HierarchyId---How-do-you-move-nodes-subtrees-around.aspx

The main drawback I have found to hierarchyId is the learning curve. It's not as obvious how to work with it as the other two methods. I have worked with some very bright SQL developers who would frequently get snagged on it, so you end up with one or two resident experts who have to field questions from everyone else.

Oud answered 25/6, 2013 at 21:44 Comment(3)
Two drawbacks I ran into on my initial foray into hierarchyId was, first, handling bulk inserts/updates in the insert/update trigger and, second, using the hierarchyId'd table in C# through Entity Framework. Do you have any advice or info about handling either of those? Also, this answer was very helpful, but I am going to wait a bit before I select it to encourage anyone else who might have helpful information.Petronilapetronilla
There is another drawback that you run into when trying to use these kinds of relationships in Analysis Services regardless of the method chosen to represent them. Self-joins, loops, and hierarchyid are all unsupported in Analysis Services which means that you end up having to do one of two workarounds; (1) create flattening views to represent the dimensions stored this way, or (2) use SSIS to flatten the dimensions into a BI database (star or snowflake schema). Either workaround requires maintenance whenever a new item is added to the hierarchy.Zygodactyl
@Petronilapetronilla HierarchyId is now a closed issue for EF Core 8.0.0-preview2. When 8.0.0 is released later this year it will be supported by the EF Core team. https://mcmap.net/q/589355/-entity-framework-core-hierarchyidFleeta

© 2022 - 2024 — McMap. All rights reserved.