Should every User Table have a Clustered Index?
Asked Answered
K

6

26

Recently I found a couple of tables in a Database with no Clustered Indexes defined. But there are non-clustered indexes defined, so they are on HEAP.

On analysis I found that select statements were using filter on the columns defined in non-clustered indexes.

Not having a clustered index on these tables affect performance?

Kiehl answered 3/8, 2012 at 0:44 Comment(1)
Yes, because otherwise you have a Heap is bad. Heap gets fragmented, so this is not good.Idol
C
56

It's hard to state this more succinctly than SQL Server MVP Brad McGehee:

As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique. In many cases, the primary key is the ideal column for a clustered index.

BOL echoes this sentiment:

With few exceptions, every table should have a clustered index.

The reasons for doing this are many and are primarily based upon the fact that a clustered index physically orders your data in storage.

  • If your clustered index is on a single column that monotonically increases, inserts occur in order on your storage device and page splits will not happen.

  • Clustered indexes are efficient for finding a specific row when the indexed value is unique, such as the common pattern of selecting a row based upon the primary key.

  • A clustered index often allows for efficient queries on columns that are often searched for ranges of values (between, >, etc.).

  • Clustering can speed up queries where data is commonly sorted by a specific column or columns.

  • A clustered index can be rebuilt or reorganized on demand to control table fragmentation.

  • These benefits can even be applied to views.

You may not want to have a clustered index on:

  • Columns that have frequent data changes, as SQL Server must then physically re-order the data in storage.

  • Columns that are already covered by other indexes.

  • Wide keys, as the clustered index is also used in non-clustered index lookups.

  • GUID columns, which are larger than identities and also effectively random values (not likely to be sorted upon), though newsequentialid() could be used to help mitigate physical reordering during inserts.

  • A rare reason to use a heap (table without a clustered index) is if the data is always accessed through nonclustered indexes and the RID (SQL Server internal row identifier) is known to be smaller than a clustered index key.

Because of these and other considerations, such as your particular application workloads, you should carefully select your clustered indexes to get maximum benefit for your queries.

Also note that when you create a primary key on a table in SQL Server, it will by default create a unique clustered index (if it doesn't already have one). This means that if you find a table that doesn't have a clustered index, but does have a primary key (as all tables should), a developer had previously made the decision to create it that way. You may want to have a compelling reason to change that (of which there are many, as we've seen). Adding, changing or dropping the clustered index requires rewriting the entire table and any non-clustered indexes, so this can take some time on a large table.

Coalition answered 6/8, 2012 at 20:51 Comment(9)
can you explain the third bullet A clustered index allows for efficient queries on columns that are often searched for ranges of values (between, >, etc.). ?Dionnadionne
A clustered index works well in this manner because the next higher or lower keyed rows are guaranteed to be physically adjacent to each other in storage. Thus, once the first value is found, it is unnecessary to search for the remaining rows; you've already found them.Coalition
@TimLehner - very good post ... +1 for that ... I learned something today!Macle
They are not guaranteed to be physically adjacent in storage. They are only guaranteed to be logically adjacent. This is why you need to reorganize or rebuild indexes.Swound
@MartinSmith, good call, you're right. No index is immune to fragmentation. Technet could use an update on this sentence: "[...] rows with subsequent indexed values are guaranteed to be physically adjacent."Coalition
You mention that heaps are rare, however I don't think it is that uncommon if you are using GUID identifiers. Other than insert dates and sequential IDs, there are usually not many other candidates for columns that fulfill the requirements of being monotonically increasing, infrequently changing and reasonably frequently accessed.Bove
Interesting article pushing heaps here: use-the-index-luke.com/blog/2014-01/…Willms
How big is the penalty you get for having a cluster index on a column whose values are not monotonically increasing? What if all your columns in the table are like that? Should you still have a clustered index?Myeshamyhre
If you frequently query specific rows using an alternate index on a clustered table, you have to do two index lookups, because the alternate index contains the key to use on the clustered index. On a heap, it contains the RID which is basically the location of the row.Fda
E
5

I would not say "Every table should have a clustered index", I would say "Look carefully at every table and how they are accessed and try to define a clustered index on it if it makes sense". It's a plus, like a Joker, you have only one Joker per table, but you don't have to use it. Other database systems don't have this, at least in this form, BTW.

Putting clustered indices everywhere without understanding what you're doing can also kill your performance (in general, the INSERT performance because a clustered index means physical re-ordering on the disk, or at least it's a good way to understand it), for example with GUID primary keys as we see more and more.

So, read Tim Lehner's exceptions and reason.

Encephalo answered 7/8, 2012 at 16:38 Comment(0)
D
1

Performance is a big hairy problem. Make sure you are optimizing for the right thing.

Free advice is always worth its price, and there is no substitute for actual experimentation.

The purpose of an index is to find matching rows and help retrieve the data when found.

A non-clustered index on your search criteria will help to find rows, but there needs to be additional operation to get at the row's data.

If there is no clustered index, SQL uses an internal rowId to point to the location of the data.

However, if there is a clustered index on the table, that rowId is replaced by the data values in the clustered index.

So the step of reading the rows data would not be needed, and would be covered by the values in the index.

Even if a clustered index isn't very good at being selective, if those keys are frequently most or all of the results requested - it may be helpful to have them as the leaf of the non-clustered index.

Devi answered 6/8, 2012 at 20:13 Comment(0)
I
0

Yes you should have clustered index on a table.So that all nonclustered indexes perform in better way.

Itinerate answered 3/8, 2012 at 1:45 Comment(2)
use-the-index-luke.com/blog/2014-01/… This article strongly contradicts your statementSteepen
Indeed, this answer is patently wrong. A non-clustered index performs worse on a clustered table if it's used to find a particular row to fetch data from. If the non-clustered index is covering, such that all data is included in the index that the original row does not need to be fetched, then there's no difference between a heap and a clustered table.Fda
D
0

Consider using a clustered index when Columns that contain a large number of distinct values so to avoid the need for SQL Server to add a "uniqueifier" to duplicate key values

Disadvantage : It takes longer to update records if only when the fields in the clustering index are changed.

Avoid clustering index constructions where there is a risk that many concurrent inserts will happen on almost the same clustering index value

Searches against a nonclustered index will appear slower is the clustered index isn't build correctly, or it does not include all the columns needed to return the data back to the calling application. In the event that the non-clustered index doesn't contain all the needed data then the SQL Server will go to the clustered index to get the missing data (via a lookup) which will make the query run slower as the lookup is done row by row.

Dionnadionne answered 3/8, 2012 at 9:48 Comment(0)
V
0

Yes, every table should have a clustered index. The clustered index sets the physical order of data in a table. You can compare this to the ordering of music at a store, by bands name and or Yellow pages ordered by a last name. Since this deals with the physical order you can have only one it can be comprised by many columns but you can only have one.

It’s best to place the clustered index on columns often searched for a range of values. Example would be a date range. Clustered indexes are also efficient for finding a specific row when the indexed value is unique. Microsoft SQL will place clustered indexes on a PRIMARY KEY constraint automatically if no clustered indexes are defined.

Clustered indexes are not a good choice for:

Columns that undergo frequent changes

  • This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.

Wide keys

  • The key values from the clustered index are used by all nonclustered indexes as lookup keys and therefore are stored in each nonclustered index leaf entry.
Vein answered 6/8, 2012 at 21:15 Comment(2)
can you explain it with a help of an example It’s best to place the clustered index on columns often searched for a range of values. Example would be a date range and this point The key values from the clustered index are used by all nonclustered indexes as lookup keys and therefore are stored in each nonclustered index leaf entry.Dionnadionne
Another factor to consider is, if you are constantly updating, inserting and deleting data, each index, including the clustered one (if any), has to be re-ordered, etc. A heap table (no clustered index) is generally fastest for those operations.Idol

© 2022 - 2024 — McMap. All rights reserved.