What's the best practice for primary keys in tables? [closed]
Asked Answered
M

21

287

When designing tables, I've developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements:

  1. Identity integer column that auto increments.
  2. Unique identifier (GUID)
  3. A short character(x) or integer (or other relatively small numeric type) column that can serve as a row identifier column

Number 3 would be used for fairly small lookup, mostly read tables that might have a unique static length string code, or a numeric value such as a year or other number.

For the most part, all other tables will either have an auto-incrementing integer or unique identifier primary key.

The Question :-)

I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns. Some examples:

  • datetime/character
  • datetime/integer
  • datetime/varchar
  • char/nvarchar/nvarchar

Is there a valid case for this? I would have always defined an identity or unique identifier column for these cases.

In addition there are many tables without primary keys at all. What are the valid reasons, if any, for this?

I'm trying to understand why tables were designed as they were, and it appears to be a big mess to me, but maybe there were good reasons for it.

A third question to sort of help me decipher the answers: In cases where multiple columns are used to comprise the compound primary key, is there a specific advantage to this method vs. a surrogate/artificial key? I'm thinking mostly in regards to performance, maintenance, administration, etc.?

Millstream answered 3/12, 2008 at 15:30 Comment(1)
I found Database Skills: A Sane Approach To Choosing Primary Keys to be a good read and I follow most of the points outlined.Worden
R
301

I follow a few rules:

  1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
  2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
  3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

Rowlandson answered 3/12, 2008 at 19:25 Comment(9)
I like it! Do you have any documentation for the basis of your "rules"? Thanks!Millstream
No, just experience. When dealing with "small" databases this stuff doesn't matter so much. But when you deal with large db's all of the little things matter. Just imagine if you have 1 billion rows with int or long pk's compared to using text or guid's. There's a huge difference!Rowlandson
Just remember to put that unique index on the natural key (if one actually exists which is often not the case) when you use an artifical key.Candler
@Lloyd Cotten: Here's what a big data engine provider says in support of rule number 1: skyfoundry.com/forum/topic/24. It convinced me to go back to IntsGenevieve
even if you "know" that "the natural key is small and will never change", think twice. "we never re-use those codes" are famous last words .... About the only things that fall into the categories of small, never changing are iso and other standards (country codes, iata airport codes, ). Things like "what is the 2 letter representation for this internal brand" ... think twice before assuming that "it" will never change, you're one finance decision away from a database rebuild.Paulita
Just to add to point 3... a natural key, also, may be a non public data and, generally, the keys on a client-side select/option or input hidden are just the same of your table... in this case, generate temporary client-side keys might require some processing which could be avoided - along with the data security issue - using surrogate keys.Pressurecook
Passport number can change. Countries can go bankrupt. Date formats can change (calendars have changed over history). Many real-world situations can change but what's wrong with using a natural key for something that you KNOW won't change?Pless
when would you define a primary key as small?Gile
My PK are always unsigned integersAardvark
B
100

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:

  • US States: I'd go for state_code ('TX' for Texas etc.), rather than state_id=1 for Texas
  • Employees: I'd usually create an artifical employee_id, because it's hard to find anything else that works. SSN or equivalent may work, but there could be issues like a new joiner who hasn't supplied his/her SSN yet.
  • Employee Salary History: (employee_id, start_date). I would not create an artifical employee_salary_history_id. What point would it serve (other than "foolish consistency")

Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:

state_id    state_code   state_name
137         TX           Texas
...         ...          ...
249         TX           Texas
Buckden answered 3/12, 2008 at 16:19 Comment(8)
In some cases with SQL server 2005/2008 the natural (text) key can be faster than an int key. I have an app with a 7-8 character friendly code that we use as the primary key and that was faster (and often more convenient) than an int surrogate. We needed the code anyway so that we could have a human readable/memorable code that we could safely transfer without conflict to a different application instance (multiple sites that aggregate into a bigger site).Vibrate
+1 Good answer. However, I would get the personnel officer to be the trusted source of an employee identifier i.e. the officer responsible for verifying employees in real life who are likely to use identifiers such as SSN, take up references, etc. The personnel department must the be trusted source of employee identifiers, not the DBMS!Ngo
@onedaywhen- I wouldnt. trust the personnel officer. People leave, new ones come and have different ideas. Provide them with access to the identifier they think is unique/ they want to use, but internally for the db, dba should be making their own decisionSemple
Note that the SSN is not necessarily unique in every country. At least in Austria, multiple people might share the same numberInstant
Also in some countries (I think even in the US) they actually recommend not to share the SSN.Blackamoor
Yeah, for personal info, it's best to have an artificial ID that cannot be linked back to the natural ID. E.g., SSN should be internal-only, never displayed on communications, etc. (even encrypted in the db!). I remember when my US college used SSN as the account number... Then had to change it due to PII concerns.Chelicera
In the USA, under certain circumstances, your SSN can CHANGE!!! They are also RE-USEDPhilippe
@Philippe wow, as a EU citizen i did not know that. Seems SSN is the perfect example why you should never use natural keys .. because as a programmer, i would not have known that 1. they are reused in USA and 2. multiple people can share the same SSN in Austria. Changing a primary key is inconvenient, but possible. Violating integral consistency is a no-go.Duenna
D
33

I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key.

My primary keys are all handled by the database program in the background and the user is never aware of them.

Drice answered 12/7, 2011 at 20:55 Comment(2)
I've worked with a few databases that used SSNs or Tax IDs as primary keys. Inefficient when it comes to storage and foreign key references. Not to mention that a person's SSN can change. So I completely agree with you.Patricapatrice
Also, human error on the programmers site is also not unlikely. Who would have assumed that in some countries SSN can be shared by multiple individuals, that SSN can be reused in some countries, and that SSN can change? Also, can you make sure SSN can always be provided at the time when the new record is created? Maybe you want users to let them enter details later. And lets not even start on all the other "assumed" unique natural identifiers that I've learned the hard way that they are actually NOT unique... (or might in some cases .. just not exist)Duenna
A
26

Just an extra comment on something that is often overlooked. Sometimes not using a single surrogate key as primary has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever).

Let's say we have these tables and columns:

Company:
  CompanyId   (primary key)

CostCenter:
  CompanyId   (primary key, foreign key to Company)
  CostCentre  (primary key)

CostElement
  CompanyId   (primary key, foreign key to Company)
  CostElement (primary key)

Invoice:
  InvoiceId    (primary key)
  CompanyId    (primary key, in foreign key to CostCentre, in foreign key to CostElement)
  CostCentre   (in foreign key to CostCentre)
  CostElement  (in foreign key to CostElement)

In case that last bit doesn't make sense, Invoice.CompanyId is part of two foreign keys, one to the CostCentre table and one to the CostElement table. The primary key is (InvoiceId, CompanyId).

In this model, it's not possible to screw-up and reference a CostElement from one company and a CostCentre from another company. If a single surrogate key was used as primary on the CostElement and CostCentre tables, and without the foreign key relations in the Invoice table, it would be.

The fewer chances to screw up, the better.

Agamemnon answered 5/12, 2008 at 10:38 Comment(4)
This is an under-cited disadvantage when using surrogate keys. If the table has a surrogate key I can still use it for these kinds of constraints. Unfortunately though the constraint requires an index and it's just weird to create a unique index on (surrogate_key, other_column) when (surrogate_key) is unique by itself. Also, (other_column) is often totally redundant in a map table since (surrogate_key) is unique in the foreign one. Surrogates can really muck things up.Unstopped
These are still surrogate keys and not natural keys. (as per definition by Hall, Owlett and Todd) You can still do this design with auto increments / sequence columns for CostCentre and CostElement. I edited your answer accordingly and gave an upvote, because this is some good advice. But its not an argument for the "natural vs surrogate key" discussion.Duenna
My argument is that a combination of multiple surrogates is still a surrogate and not a "natural" key. A natural key is coming from the real world (no matter how many attributes together you need to build it), which a combination of multiple surrogates is clearly not.Duenna
Also, this is a violation of several normal forms and a proper normalized design should not allow this. In table Invoice, the column CompanyId seems to be redundant and should be inherently referenced by CostCentre. CostElements could have a primary single surrogate + FK to company.Duenna
D
13

There´s no problem in making your primary key from various fields, that's a Natural Key.

You can use a Identity column (associated with a unique index on the candidate fields) to make a Surrogate Key.

That´s an old discussion. I prefer surrogate keys in most situations.

But there´s no excuse for the lack of a key.

RE: EDIT

Yeah, there´s a lot of controversy about that :D

I don´t see any obvious advantage on natural keys, besides the fact that they are the natural choice. You will always think in Name, SocialNumber - or something like that - instead of idPerson.

Surrogate keys are the answer to some of the problems that natural keys have (propagating changes for example).

As you get used to surrogates, it seems more clean, and manageable.

But in the end, you´ll find out that it's just a matter of taste - or mindset -. People "think better" with natural keys, and others don´t.

Darkroom answered 3/12, 2008 at 16:4 Comment(1)
People "think better" with natural keys. Machines and databases, don't.Alexandro
A
13

Besides all those good answers, I just want to share a good article I just read, The great primary-key debate.

Just to quote a few points:

The developer must apply a few rules when choosing a primary key for each table:

  • The primary key must uniquely identify each record.
  • A record’s primary-key value can’t be null.
  • The primary key-value must exist when the record is created.
  • The primary key must remain stable—you can’t change the primary-key field(s).
  • The primary key must be compact and contain the fewest possible attributes.
  • The primary-key value can’t be changed.

Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)

Ayeshaayin answered 3/6, 2013 at 1:46 Comment(1)
You said - The primary key-value must exist when the record is created. I would say that for any new entity you are creating, the PK will absolutely not exist until it is created during the INSERT. This assumes you are using some form of programmatically generated primary key, which I would recommend. If you are sharp, you can use this fact to perform UPSERTs rather than INSERTs or UPDATEs.Hartal
D
12

Tables should have a primary key all the time. When it doesn't it should have been an AutoIncrement fields.

Sometime people omit primary key because they transfer a lot of data and it might slow down (depend of the database) the process. BUT, it should be added after it.

Some one comment about link table, this is right, it's an exception BUT fields should be FK to keep the integrity, and is some case those fields can be primary keys too if duplicate in links is not authorized... but to keep in a simple form because exception is something often in programming, primary key should be present to keep the integrity of your data.

Deplorable answered 3/12, 2008 at 15:33 Comment(5)
I agree. And in the case where a lot of data is to be inserted, remove the primary key constraint (or use INSERT IDENTITY ON in TSQL) and put it back afterwards :)Supraliminal
There are exceptions: link tables obviouslyWhale
Another reason: If there is no PK/unique key, table browsers (I mean, something like Access / SQL Server Management Studio) will refused to update/delete a single row with duplicated row. You'll have to write SQL for that.Cointreau
It's quite common to omit a PK from a data warehouse fact table. In Oracle you can reference the ROWID pseudocolumn as a unique identifier in the short term (ie. don't store it somewhere and expect it to not change)Beech
Internally, some DB engines like MySQL InnoDB / MariaDB or even Oracle (see Davids comment above) will create an invisible auto increment PK in the background, because the underlying data structure (clustered index) needs a PK. So they just generate one for you in the background, but sometimes inaccessable. (i.e. in MySQL/Maria .. it's there, but you can not reference ROWID like in Oracle)Duenna
H
9

Here are my own rule of thumbs I have settled on after 25+ years of development experience.

  • All tables should have a single column primary key that auto increments.
  • Include it in any view that is meant to be updateable
  • The primary key should not have any meaning in the context of your application. This means that it should not be a SKU, or an account number or an employee id or any other information that is meaningful to your application. It is merely a unique key associated with an entity.

The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.

Always having a single value primary key makes performing UPSERTs very straightforward.

  • Favor multiple indices on single columns over multi-column indices.
    For example, if you have a two column key, favor creating an index on each column over creating a two column index. If we create a multi-column key on firstname + lastname, we can't do indexed lookups on lastname without providing a firstname as well. Having indices on both columns allows the optimizer to perform indexed lookups on either or both columns regardless of how they are expressed in your WHERE clause.

  • If your tables are massive, explore partitioning the table into segments based on the most prominent search criteria.

  • If you have a table that has a significant number of Id fields in it, consider removing all except the primary key to a single table which has an id (PK), an org_id (FK to original table) and an id_type column. Create indices for all columns on the new table and relate it to the original table. In this manner, you can now perform indexed lookups of any number of ids using only a single index.

Hartal answered 5/3, 2018 at 19:19 Comment(0)
M
6

A natural key, if available, is usually best. So, if datetime/char uniquely identifies the row and both parts are meaningful to the row, that's great.

If just the datetime is meaningful, and the char is just tacked on to make it unique, then you might as well just go with an identify field.

Mescal answered 3/12, 2008 at 15:34 Comment(11)
Usually best? I don't have any scientific basis but I'm almost positive most people prefer a surrogate key over natural. In many cases there is no natural key.Cali
There should ALWAYS be a natural key for any row in your database. That "natural" key may be something generated in the business world or by your technical system, but it should always exist.Papillose
So you're calling a guid PK column a natural key?Cali
If, in your world, that's what has been determined to be the only way to identify a row in the table, then yes. Of course, when a designer chooses to create a GUID for a PK it's usually because they haven't done the work to find the REAL natural key, so in that case the GUID is NOT the natural key.Papillose
1. Asking the database to use string (etc) comparisons when doing indexing or joins doesn't feel right - ints are much more suitable for that.Abagael
2. If you take your key from the natural world, the natural world will change to break your key. If you use the telephone number, you'll get two users from the same household. If you use the last name, they get married. If you use SSN, privacy laws will change and require you remove them.Abagael
@Barry: RE: #2. if the natural world changes and that causes your natural key to change that means that you did a poor job selecting a natural key. By definition, a natural key does not change over time.Papillose
I should amend that... a natural key should be VERY VERY unlikely to change over time. Using a last name as part of a natural key is a bad decision on the designer's part and not a strike against natural keys.Papillose
@TomH. Their difficulty to select correctly does sound like a strike against them. People on this page have suggested both SSN and Last name as possible natural keys.Sofer
@TomH Are you serious? "There should ALWAYS be a natural key for any row in your database.", when I have invoice items for an invoice?Hourglass
Every invoicing system that I've seen typically has invoice items assigned to a line on the invoice, so your natural key would be the invoice # and the line #. That's a pretty standard one.Papillose
P
6

What is special about the primary key?

What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used.

Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).

  • ASIDE: The unfortunately side effect of most databases being designed and developed by application programmers (which I am sometimes) is that what is best for the application or application framework often drives the primary key choice for tables. This leads to integer and GUID keys (as these are simple to use for application frameworks) and monolithic table designs (as these reduce the number of application framework objects needed to represent the data in memory). These application driven database design decisions lead to significant data consistency problems when used at scale. Application frameworks designed in this manner naturally lead to table at a time designs. “Partial records” are created in tables and data filled in over time. Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly. These designs lead to data that is meaningless (or difficult to understand), data spread over tables (you have to look at other tables to make sense of the current table), and duplicated data.

It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.

It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning.

Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.

Periscope answered 3/1, 2013 at 18:57 Comment(2)
"By this logic, you should never delete a record from a table with one key and then add another record with a second key." -- There is a case for this, and that is effectively what an "ON DELETE RESTRICT" clause on a foreign key will do. In some instances (say where an audit trail is required) a "deleted" boolean field would be better than allowing the record to be deleted.Mere
It's a decent rant, but I'm not sure it answers any questions or provides much in the way of guidance.Hartal
S
4

I suspect Steven A. Lowe's rolled up newspaper therapy is required for the designer of the original data structure.

As an aside, GUIDs as a primary key can be a performance hog. I wouldn't recommend it.

Supraliminal answered 3/12, 2008 at 15:30 Comment(3)
To say its a performance hog is a premature optimization. Guids are required in some cases (disconnected clients, future table merging, replication)Cali
"Premature optimization" is an overused phrase on SO (IMHO)! Yes, GUIDs may be required in SOME cases, but Andrew is right to point out that they shouldn't be used as the default data type whether required or not.Buckden
OK, it wasn't actually a premature optimization. What I meant is that most people don't experience the volume required to notice the performance difference. Yes, use autoincrement if you know you will never need a guid.Cali
B
4

Natural versus artificial keys to me is a matter of how much of the business logic you want in your database. Social Security number (SSN) is a great example.

"Each client in my database will, and must, have an SSN." Bam, done, make it the primary key and be done with it. Just remember when your business rule changes you're burned.

I don't like natural keys myself, due to my experience with changing business rules. But if your sure it won't change, it might prevent a few critical joins.

Bahrain answered 3/12, 2008 at 19:26 Comment(3)
And I have seen data where SSN is not unique even though it should be. Be very wary of natural keys if you import your data from another source!Candler
If you are a subject to identity theft, you can have your social security number changed. There are four more situations where they will change your number and they are listed on the ssa.gov site.Pless
SSN is probably the best anti-example you could have picked as an example for programmers naivety on choosing natural primary keys. Who would have guessed, that in some countries: SSN can be shared by multiple individuals, SSN can be changed, SSN can be reissued/reused, SSN might not even exist (yet) for a (new) individual? Always using surrogates consistently saves you from your own stupidness and a lot of headache in production environments.Duenna
C
3

I too always use a numeric ID column. In oracle I use number(18,0) for no real reason above number(12,0) (or whatever is an int rather than a long), maybe I just don't want to ever worry about getting a few billion rows in the db!

I also include a created and modified column (type timestamp) for basic tracking, where it seems useful.

I don't mind setting up unique constraints on other combinations of columns, but I really like my id, created, modified baseline requirements.

Crelin answered 3/12, 2008 at 15:35 Comment(1)
I must also point out that I don't put IDs on link/join tables, only on the tables containing data.Crelin
M
3

I look for natural primary keys and use them where I can.

If no natural keys can be found, I prefer a GUID to a INT++ because SQL Server use trees, and it is bad to always add keys to the end in trees.

On tables that are many-to-many couplings I use a compound primary key of the foreign keys.

Because I'm lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily.

Mogul answered 3/12, 2008 at 19:33 Comment(4)
Do you have any documentation to back up this statement: 'if no natural keys can be found, I prefer a GUID to a INT++ because SQL Server use trees, and it is bad to always add keys to the end in trees.' Not skeptical, just trying to compile some documentation.Millstream
@Lloyd - Glad you're taking an interest into something I find very fascinating myself. A good starting point at msdn.microsoft.com/en-us/library/ms177443(SQL.90).aspxMogul
@Mogul the linked article only provides a simple explanation how B-Trees work, but does not support your argument. I think its actually the other way round: inserting IDs in the middle leads to more reorganization of data on disk in a B-Tree. Appending has better write performance.Duenna
Here is support for my opposite argument. Percona refers to MySQL, but B-Tree is B-Tree and hence should apply to SQL Server too: percona.com/blog/2019/11/22/…Duenna
S
2

You should use a 'composite' or 'compound' primary key that comprises of multiple fields.

This is a perfectly acceptable solution, go here for more info :)

Safekeeping answered 3/12, 2008 at 15:35 Comment(0)
H
2

I always use an autonumber or identity field.

I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a "MemberID" and it caused a ton of problems when updating the foreign keys in related tables. Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects.

Harpist answered 3/12, 2008 at 15:53 Comment(2)
Poor selection of a natural key by a developer doesn't mean that natural keys are bad.Papillose
A tool being hard to use is somehow not a point against that tool?Sofer
I
2

GUIDs can be used as a primary key, but you need to create the right type of GUID so that it performs well.

You need to generate COMB GUIDs. A good article about it and performance statistics is The Cost of GUIDs as Primary Keys.

Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity(archive).

Infrastructure answered 3/12, 2008 at 19:49 Comment(2)
IMHO, guid's should only be used when you need to synchronize data across databases. In which an automatically generated id is problematic. The difference between using a guid and using a basic numeric type is that a guid will require 16 bytes per row, while a numeric will be much smaller.Rowlandson
If you go to the link I provided above there is very little difference in performance using COMB Guids.Infrastructure
D
1

All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance).

For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.

Despoliation answered 3/12, 2008 at 16:29 Comment(0)
P
1

If you really want to read through all of the back and forth on this age-old debate, do a search for "natural key" on Stack Overflow. You should get back pages of results.

Papillose answered 3/12, 2008 at 16:34 Comment(0)
I
0

We do a lot of joins and composite primary keys have just become a performance hog. A simple int or long takes care of many problems even though you are introducing a second candidate key, but it's a lot easier and more understandable to join on one field versus three.

Invincible answered 3/12, 2008 at 15:41 Comment(2)
This strategy falls apart when you now have to traverse 6 tables to join the actual two tables that you need because composite keys were not propagated. It also ends up requiring the use of loops/cursors for multiple inserts which can be a HUGE performance hog.Papillose
I'm not to big to learn something new. I'd love to see an example of what you're saying, it would be helpful to inject a little rational fact into some of these religious arguments.Invincible
L
0

I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:

  • Row ID (GUID)
  • Creator (string; has a default of the current user's name (SUSER_SNAME() in T-SQL))
  • Created (DateTime)
  • Timestamp

Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use.

Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:

  • People - use surrogate key, e.g. INT. If it's internal, the Active Directory user GUID is an acceptable choice
  • Lookup tables (e.g. StatusCodes) - use a short CHAR code; it's easier to remember than INTs, and in many cases the paper forms and users will also use it for brevity (e.g. Status = "E" for "Expired", "A" for "Approved", "NADIS" for "No Asbestos Detected In Sample")
  • Linking tables - combination of FKs (e.g. EventId, AttendeeId)

So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID.

Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me!

Libb answered 30/12, 2008 at 22:34 Comment(2)
Are you suggesting to create both GUID and INT SKs for tables with no strong natural key?Bougainville
You don't have to, but the benefits are: a) it makes replication easier if you need it, b) when dealing with ORM, you can assign a unique ID to your object in code before saving it (which is useful if you have to do lots of editing on your object, maybe saving into a session cache, before saving it). The key is the INT in this instace; the GUID is just a bonus.Libb

© 2022 - 2024 — McMap. All rights reserved.