Why NonClustered index scan faster than Clustered Index scan?
Asked Answered
T

4

4

As I know, heap tables are tables without clustered index and has no physical order. I have a heap table "scan" with 120k rows and I am using this select:

SELECT id FROM scan

If I create a non-clustered index for the column "id", I get 223 physical reads. If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads.

If the clustered index table is something like this picture:

enter image description here

Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?

Treasure answered 28/11, 2014 at 17:58 Comment(0)
B
5

SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.

Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.

To understand what's going on, you need to examine the execution plan selected by the optimizer:

Byzantium answered 28/11, 2014 at 18:14 Comment(4)
Got it. My mistake was thinking that the b-tree for non-clustered and clustered are equivalent.Treasure
@Mucida: the b-tree (the navigation tree) for both types of indexes is roughly equivalent - what's significantly different are the leaf level nodes in the tree...Tiffanietiffanle
@marc_s, but when I have a Clsutered and A Non Clsutered index at same time in the table, the leafs for the non-clustered b-tree change and includ the clustered index? They seem to run fast when I select these two indexed columns, even the execution plan using non-clustered index scanTreasure
@Mucida: yes, that's what I was trying to explain: the navigation tree for both index types is roughly the same - a b-tree with the index columns. The leaf level is very different: the actual data pages for the clustered index, and the nonclustered index contains the index columns, plus the clustered index columns (if a clustered index exists - or the RID = row identifier if a clustered index doesn't exist)Tiffanietiffanle
T
2

A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.

A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.

Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.

Tabitha answered 28/11, 2014 at 18:6 Comment(4)
The problem is that the Clustered Index Scan is using the same number of physical reads that the table scan. Why Non-Clustered Index Scan is running so faster than Clustered Insdex scan in this case?Treasure
I explained that in the 1st paragraph. What exactly did you not understand? Why do you think the CI scan should be faster?Tabitha
I think that is my doubt: Does the picture really represents a Clustered Index (with pointer to the rest of rows)? Or just the Non-Clustered Index is like that? If the last one is correct, I can understand why NCI worked better than CITreasure
The CI contains all columns inline. The NCI contains only those columns that you specify plus all CI key columns.Tabitha
C
1

When to use Clustered Index-

Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.

Column Considerations : Consider columns that have one or more of the following attributes: 1) Are unique or contain many distinct values 2) Defined as IDENTITY because the column is guaranteed to be unique within the table 3) Used frequently to sort the data retrieved from a table

Clustered indexes are not a good choice for the following attributes: 1) Columns that undergo frequent changes 2) Wide keys

When to use Nonclustered Index-

Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches

Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table

Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.

Crabbed answered 17/5, 2016 at 17:0 Comment(0)
E
0

Try running

DBCC DROPCLEANBUFFERS

Before the queries...

If you really want to compare them. Physical reads don't mean the same as logical reads when optimizing a query

Elbert answered 28/11, 2014 at 18:9 Comment(3)
Given that his table seems to be only 4MB in size I think we can assume that he did this. If not he would not have received a single physical read.Tabitha
Good point usr, but I am not sure if the cache data for a table is maintained after an alter table command....Elbert
Does the picture really represents a Clustered Index (with pointer to the rest of rows)? Or just the Non-Clustered Index is like that? If the last one is correct, I can understand why NCI worked better than CITreasure

© 2022 - 2024 — McMap. All rights reserved.