What is an indexm and can a non-clustered index be non-unique?
Asked Answered
M

2

12

Subquestion to my other question about what the UNIQUE argument on INDEX creation is for:

All definitions of (MS SQL Server) indexes (that I could find) are ambiguous and all explanations, based on it, narrate something using undefined or ambiguously defined terms.

What is the definition of index?

For example, the most common definition of index from Wikipedia is:

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table...

†SQL server creates a clustered index on a primary key by default. The data is present in random order, but the logical ordering is specified by the index. The data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page.

It still feels ambiguous to me. One can understand an index as:

  1. An ordered data structure, a tree, containing intermediate and leaf nodes;
  2. Leaf node data containing values from indexed columns + "pointer to the page and the row number in the data page"

Can non-clustered index be non-unique, considering 2)? or, even, 1) ?
It doesn't seem so to me ...

But does TSQL imply the existence of a non-unique non-clustered index?

If yes, then What is understood by non-clustered index in the MS Create Index docs, and to what the argument UNIQUE is applied there?

Is it:

  1. Leaf node data containing values from indexed columns but without pointer + row number

If it is 3), then again question 1) arises - why apply constraints to copy of real data in an "index", instead of real data in-situ?


Is a bookmark (pointer+row number) to a real data row unique (does it uniquely identify rows)?
Doesn't this bookmark constitute part of the index and thereby make the index unique?
Can you give me the definition of the index instead of explaining how to use it UNDEFINED? The latter part I already know (or can read myself).


This paragraph no longer exists in the current revision of the Wikipedia page, but did at time of posting.

Mccain answered 27/9, 2010 at 3:27 Comment(2)
It's unclear what you want to know... Are you interested in the general definition of an INDEX, or in using them?Cumbersome
Yes, just definition. The question about non-uniqueness of index is rhetoric. Its application and implementations are knownPanegyric
T
25

An index is a data structure designed to optimize querying large data sets. As such, no claim is made about whether or not anything is unique at this point.

You can definitely have non-unique non-clustered indices - how else could you index on lastname, firstname ?? That's never going to be unique (e.g. on Facebook.....)

You can define an index as being unique - this just adds the extra check to it that no duplicate values are allowed. If you would make your index on (lastname, firstname) UNIQUE, then the second Brad Pitt to sign up on your site couldn't do so, since that unique index would reject his data.

One exception is the primary key on any given table. The primary key is the logical identifier used to uniquely and precisely identify each single row in your database. As such, it must be unique over all rows and cannot contain any NULL values.

The clustered index in SQL Server is special in that they do contain the actual data in their leaf nodes. There's no restriction up to this point - however: the clustered index is also being used to uniquely locate (physically locate) the data in your database, and thus, the clustered index must be unique - it must be able to tell Brad Pitt #1 and Brad Pitt #2 apart. If you don't take care and provide a unique set of columns to your clustered index, SQL Server will add a "uniquefier" (a 4-byte INT) to those rows that aren't unique, e.g. you'd get BradPitt001 and BradPitt002 (or something like that).

The clustered index is used as the "pointer" to the actual data row in your SQL Server table, so it's included in every single non-clustered index, too. So your non-clustered, non-unique index on (lastname, firstname) would not only contain these two fields, but in reality, it also contains the clustered key on that table - that's why it's important the clustered key on a SQL Server table is small, stable, and unique - typically an INT.

So your non-clustered index on (lastname, firstname) will really have (lastname, firstname, personID) and will have entries like (Pitt, Brad, 10176), (Pitt, Brad, 17665) and so forth. When you search for "Brad Pitt" in your non-clustered index, SQL Server will now find these two entries, and for both, it has the "physical pointer" to where to find the rest of the data for those two guys, so if you ask for more than just the first- and last name, SQL Server could now go grab the whole row for each of the two Brad Pitt entries and provide you with the data the query requires.

Twig answered 27/9, 2010 at 5:14 Comment(5)
"You can definitely have non-unique non-clustered indices - how else could you index on lastname, firstname ?? That's never going to be unique" -- by appending the table's key to the end of the column list e.g. CREATE UNIQUE CLUSTERED INDEX uq__Persons ON Persons (lastname, firstname, personID); ...but didn't you go on to say that yourself? I'm confused.Evalynevan
@onedaywhen: from your perspective, that index is non-unique - you can definitely have multiple entries with the same first and last name. Internally, for SQL Server, it will add the clustering key to the index entry to allow for data lookup. So YOU see a non-unique index - while SQL Server will add some unique data to your index entries (but you're not concerned with that, you don't see that, you don't notice that.....)Twig
thanks but ignore: despite quoting you, I didn't properly read what you originally said; now that I have, it makes perfect sense. Apologies.Evalynevan
if I were Brad Pitt, I'd create facebook with unique first/last name constraint so there wouldn't be another Brad Pitt.Ruction
I think Facebook has moved a little past creating indices on lastname :)Handwriting
C
0

The definition of an index is the first part of Wikipedia definition "A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space."

Then you have unique indexes, as a special kind of index, which ensure that indexed values are unique.

How it's implemented... depends on the DBMS. But it does not change the definition of index, or unique index.

As an implementation detail, MS SQL allows non-clustered (the usual kind, which is a tree with pointers to the actual row contents in a separate space, which you numbered 2.), and clustered (where rows are stored in the index, according the indexed value, which you numbered 1.) indexes.

So an non-unique non-clustered index is just (conceptualy) a tree of values with, for each value, a set of pointers to table rows containing this value.

Cumbersome answered 27/9, 2010 at 4:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.