Indexes and Nested Sets

Asked 5/7, 2011 at 14:9 Answered 5/7, 2011 at 20:12

Solved sql sql-server indexing nested-sets

I am using a nested set to represent a hierarchy in my application, and am wondering where the best place to put indexes (clustered or otherwise) is. I am using Microsoft SQL Server 2008.

Operations:

About 40 times a day, a new hierarchy will be added just off the root.
Hierarchies will probably never be deleted.
Hierarchies are accessed often during the day by parentId to incrementally fill combo boxes.
Hierarchies are VERY rarely moved. Perhaps not even once a month.
Biggest access is via left and right when linking with other tables. This is by far the most frequent access against the hierarchy.

I toyed with putting a clustered index on left and right (as the majority of the time, it will be queried with a val BETWEEN @left AND @right. But is clustering on left and right the correct way to go about it?

Many thanks in advance to anyone with more experience of SQL indexes than I!

Schema as it stands

_id       INT IDENTITY NOT NULL
_idParent INT IDENTITY NULL
_name     NVARCHAR(64)
_left     INT NOT NULL
_right    INT NOT NULL

Infuse answered 5/7, 2011 at 14:9 Comment(1)

Can you provide an idea of the schema as it stands? – Ruelu 5/7, 2011 at 14:22

Your best bet is to test various index configurations and just see which works best. At first glance though, clustered on lft and rgt seems like it would be best. It sounds like there isn't much DML on the table, so it shouldn't have to rearrange the data very often and the clustered index on lft and rgt should turn most of your queries into clustered index scans/seeks.

The only downside that I see is that if you're putting hierarchies right below the root then it might involve moving a lot of other hierarchies as well. Will you always be adding onto the "right" side of the root? That would only involve updating the rgt column in the root row, which would be good. If you add into the middle of the left side of the root then you'll have to shift over all other hierarchies to the right of the new one. Also, how large is your table? That will have some affect on things. If it's small enough then shifting those hierarchies might not be a big deal anyway. You definitely want to try to insert onto the right side of the root if you can though.

EDIT: One other thing... have you looked into the SQL Server built-in hierarchyid data type?

Regain answered 5/7, 2011 at 14:19 Comment(6)

Hi, thanks for your prompt response. Generally yes, it is only added on the right. Basically, below the root there will be a list of companies. This is currently in the region of a 130 or so, but this could easily go in to the thousands. Beneath each is usually 1, but as many as 8 sub nodes. It never goes deeper than this, however. – Infuse 5/7, 2011 at 14:23

In that case you're probably in good shape for the index as you propose. BTW, you shouldn't need the _idParent column that you have if it just points to the hierarchical parent. It's a duplication of data and a bad idea. Also, if you will be joining to this table based on the _id then consider a non-clustered index with an INCLUDES on the other columns (assuming that your schema above shows all of the columns). Again, test various setups of course. – Regain 5/7, 2011 at 14:40

I use parentId as I sometimes have to walk "up" the tree given an arbitrary item at any level. Also, I use it to get only direct descendants of a given node (for incrementally filling a user-interface element). Naturally, if there are better ways to do this I'd appreciate your assistance, though it probably falls in to the category of "another question"? – Infuse 5/7, 2011 at 15:38

I was going to suggest Googling "Joe Celko nested set model", but it's hard to find specific queries for it. If you can find a copy of his book on the subject it's a pretty good read. I don't have my copy with me. Finding immediate children and parents involves subqueries to get a COUNT(*) of parents so that you can determine the level that the item is in the hierarchy. Alternatively, you can include that column in your table to make queries easier, although I just berated another poster for including duplicate data in a database :) – Regain 5/7, 2011 at 16:7

sometimes we have to do what we have to do to make life just that little bit simpler. It's odd you suggested googling as that is what I did before I wrote my comment to you and I cannot seem to find the original article I based my code off. Oracle has also pulled the one that used to be on the mysql site, that most people used to link to. Good job Oracle! Many thanks for your assistance and prompt response. – Infuse 5/7, 2011 at 16:26

Heh... that's why I withdrew my suggestion to Google. Every link I followed seemed to be broken. :) – Regain 5/7, 2011 at 16:56

I'd be nervous about using a clustered index in this way - it's very fast for querying, but any inserts/updates/deletes could require re-writing the data on disk; this could create serious performance issues (esp. for large tables).

I'd also suggest that in practice, you're unlikely to notice the difference between a clustered and non-clustered index - especially with indexing integers. If you have gigantic tables - hundreds of millions of records - you might be able to measure the difference, but on modern hardware I don't think it makes a noticeable difference.

SO, I'd agree with Tom H. - try it out, and measure. Make sure you measure insert/update/delete as well as query. Unless you notice really significant performance benefits, I'd be inclined to use the clustered index for the primary key - because by definition, this is immutable.

Ruelu answered 5/7, 2011 at 20:12 Comment(0)

Recommended topics

Hot tags