Primary Key / Clustered key for Junction Tables

Asked 9/3, 2010 at 16:32 Answered 9/3, 2010 at 21:53

Solved sql clustered-index junction-table

Let's say we have a Product table, and Order table and a (junction table) ProductOrder.

ProductOrder will have an ProductID and an OrderID.
In most of our systems these tables also have an autonumber column called ID.

What is the best practice for placing the primary key (and therefor clustered key)?

Should I keep the primary key of the ID field and create a non-clustered index for the foreign key pair (ProductID and OrderID)
Or should I put the primary key of the foreign key pair (ProductID and OrderID) and put a non-clustered index on the ID column (if even necessary)
Or ... (smart remark by one of you :))

Fructify answered 9/3, 2010 at 16:32 Comment(0)

I know these words might make you cringe, but "it depends."

It is most likely that you want the order to be based on the ProductID and/or OrderId and not the autonumber (surrogate) column since the autonumber has no natural meaning in your database. You probably want to order the join table by the same field as the parent table.

First understand why and how you are using the surrogate key ID in the first place; that will often dictate how you index it. I assume you are using the surrogate key because you are using some framework that works well with single column keys. If there is no specific design reason, then for a join table, I'd simplify the problem and just remove the autonumber ID, if it brings no other benefit. The primary key becomes the (ProductID, OrderID). If not, you need to at least make sure your index on the (ProductID, OrderID) tuple is unique to preserve data integrity.
Clustered indexes are good for sequential scans/joins when the query needs the results in the same order that the index is ordered. So, look at your access patterns, figure out by which key(s) you will be doing sequential, multi-row selects / scans, and by which key you'll be doing random, individual row access, and create the clustered index on the key you'll scan most, and the non-clustered key index on the key you'll use for random access. You have to choose one or the other, since you cannot cluster both.

NOTE: If you have conflicting requirements, there is a technique ("trick") that may help. If all of the columns in a query are found in an index, then that index is a candidate table for the database engine to use to satisfy the requirements of the query. You can use this fact to store data in more than one order even if they are in conflict of one another. Just be aware of the pros and cons of adding more fields to an index, and make a conscious decision after understanding nature and frequency of queries that will be processed.

Johnny answered 9/3, 2010 at 16:42 Comment(0)

The correct and only answer is:

Primary key is ('orderid' , 'productid')
Another index on ('productid' , 'orderid')
Either can be clustered, but PK is by default

Because:

You don't need an index on orderid or productid alone: the optimiser will use one of the indexes
You'll most likely use the table "both" ways
You don't need a surrogate key because you already have them on the linked tables. So a 3rd columns wastes space.

Arpeggio answered 9/3, 2010 at 21:53 Comment(2)

I did read that it's considered bad practice to add columns from the clustered index to the non-clustered indexes: The clustered index is always added to the non-clustered index. Therefor the index 'productid' + 'orderid' wouldn't make sense? – Fructify 10/3, 2010 at 8:42

@Zyphrax: the column order makes it a completely different index – Arpeggio 10/3, 2010 at 11:20

This appears to be for a dynamic system where many orders will be added. The clustered index should therefore be on your autonumbered column.

You can make index the primary key and put another unique index on the pair of columns. Or, you can make the pair of columns the primary (but non-clustered) key.

The choice of using the primary key or a unique index key is up to you. But I would make sure that the one that is clustered is for your autonumber column.

Unjust answered 9/3, 2010 at 16:38 Comment(2)

I disagree. The autonumber column has no meaning, except "this is the order the records were created". The cluster will order the rows by the cluster key. Autonumber "in order of creation" is often not the order you query the table. Most likely, this table will be used in a join against products and orders using the ProductID and OrderID. I'd cluster for ordering so the cluster key is in the same order as the driving table of the most common query, otherwise you are wasting the cluster index. – Johnny 9/3, 2010 at 16:49

I guess the question comes down to how busy is this table? If there are many inserts, then you could get to a state where the table has to be restructured on every insert. This would slow the system down, especially with a big table. With the clustered index on the autonumber, this does not happen. The other index could have padding set to give extra room to avoid restructuring. – Unjust 9/3, 2010 at 17:4

My preference has always been to create an autonumber for Primary Keys. Then I create a unique index on the two Foreign keys so that they are not duplicated.

The reason I do this is because the more I normalize my data, the more keys I have to use in joins. I have ended up with designs going six to seven levels deep, and if I use keys flowing from one level to another, I could potentially end up with a n^2 keys in the join.

Try convincing my SQL Developers to use all of that for a single query, and they will really like me.

I keep it simple.

Wimsatt answered 9/3, 2010 at 16:58 Comment(3)

Why add a 3rd column when the linked tables already use autonumber columns? Pointless – Arpeggio 9/3, 2010 at 21:57

Don't let developers dictate the design just because they are too lazy to write joins or views. Linking tables have no business having an autonumber column, as already noted, because they are an implementation detail that shouldn't be obvious. You shouldn't be querying or exposing the linking table by itself, it will always be part of a join. Consider writing a view, let the developers use the view, and the linking table will never be evident. – Johnny 10/3, 2010 at 15:53

@Arpeggio and @mrjoltcola: Most Join tables have additional attributes. An OrderItem table may to have the ItemPrice so changes to the lookup Items table do not change up order price from Quoting to Shipment. So I make an OrderItemId for this table. When this is joined down to the ShippingLocationItems table, I have a 2 key join instead of a 1 key join. When I track back with the DeliverConfirmatioForItems, I have again, a 1 column foreign key instead of 8 or more cols for the shipping loc details, the delivery and signature mechanisms. Do this for all join tables and keep things uniform. – Wimsatt 11/3, 2010 at 14:4

Recommended topics

Hot tags