Should each and every table have a primary key?
Asked Answered
W

17

410

I'm creating a database table and I don't have a logical primary key assigned to it. Should each and every table have a primary key?

Willamina answered 8/5, 2009 at 14:49 Comment(3)
Could you give some more details about the table? The answer is probably "yes" though.Zoniazoning
Yes, each and every table should have primary key.Angelaangele
Yes, unless you have a very good reason not to do it ( bulk loaded tables that you are never going to query)Pulverulent
C
363

Short answer: yes.

Long answer:

  • You need your table to be joinable on something
  • If you want your table to be clustered, you need some kind of a primary key.
  • If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?

In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.

Note that a primary key can be composite.

If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.

Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.

And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.

See this article in my blog for why you should always create a unique index on unique data:

P.S. There are some very, very special cases where you don't need a primary key.

Mostly they include log tables which don't have any indexes for performance reasons.

Continually answered 8/5, 2009 at 14:52 Comment(11)
they still have a primary key, the composite oneKassi
@annakata: they should have a composite primary keyContinually
do it while you can if you're using MySQL, because, when you go back later, you will have to shut the database off... and it can take a while to add a key to a MySql DB....Tumulus
"And since any PRIMARY KEY involves creating a UNIQUE index" is not true for Oracle. One can use a non-unique index to enforce a primary key. In fact, it is sometimes REQUIRED that unique and PK constraints use non-unique indexes.Dowie
Just a comment on the rhetorical question "Why keep identical records?". Note that just adding a PK will not ensure that there is no duplication. Often the PK is not visible to the user, hence what is important is in the visible fields, which could contain duplicate data. Depending on your design this may be desirable or not.Naumachia
Keys have nothing to to with joinability. And the clustering argument is dependent on which DBMS you use, and mixes logical and physical considerations.Circuit
@JonHeggland is right, the lack of keys does not prevent you from joining tables (of course, it is usually desirable to have an index of some sort that speeds up the join).Cephalonia
As stated, a log table is a perfect example where a PK is not necessary because it doesn't matter if there are duplicate entries. Some will suggest adding an Auto Number field for the PK but I fail to see an immediate benefit unless you plan to use that field elsewhere.Superposition
@MikeLowery: autoincrement number and PK are different things. You can have former without the latter and and vice versa.Continually
@Continually I'm well aware of that. I wrote "adding an Auto Number field for the PK" as in the case where no other natural key exists.Superposition
- "You need your table to be joinable on something" - tables can be joined without keys, - "Why keep identical records" - 1) you don't need a primary key for uniqueness, a unique constraint is enough and the question is about primary keys specifically.Atonement
O
42

Disagree with the suggested answer. The short answer is: NO.

The purpose of the primary key is to uniquely identify a row on the table in order to form a relationship with another table. Traditionally, an auto-incremented integer value is used for this purpose, but there are variations to this.

There are cases though, for example logging time-series data, where the existence of a such key is simply not needed and just takes up memory. Making a row unique is simply ...not required!

A small example: Table A: LogData

Columns:  DateAndTime, UserId, AttribA, AttribB, AttribC etc...

No Primary Key needed.

Table B: User

Columns: Id, FirstName, LastName etc. 

Primary Key (Id) needed in order to be used as a "foreign key" to LogData table.

Onyx answered 29/4, 2019 at 9:52 Comment(2)
does the DB internally create an auto-increment counter for example 1 (LogData) if no primary key is specified?Casino
The short answer is absolutely "yes". The long answer is "no". It's like someone asking "Do I have to have a driver's license to drive?" Technically, the answer is no, but if they are asking that question then all of the rest of us DEFINITELY want them to have one. Same here, if OP is asking "should I use PKs?" then absolutely the answer is "yes" because it is unlikely they will understand the nuances of when they shouldn't. Much better to err on the side of having them.Prohibitory
J
38

Always best to have a primary key. This way it meets first normal form and allows you to continue along the database normalization path.

As stated by others, there are some reasons not to have a primary key, but most will not be harmed if there is a primary key

Jabber answered 8/5, 2009 at 15:34 Comment(1)
@PaulSuart Data need not always be in their normal forms. In fact, when data gets huge, it shouldn't be kept in its normal form otherwise accessing data would be horrendously slow for queries doing table joins etc. Normal forms is an "idealization" and practically possible only when data is not expected to grow huge.Karachi
D
17

Except for a few very rare cases (possibly a many-to-many relationship table, or a table you temporarily use for bulk-loading huge amounts of data), I would go with the saying:

If it doesn't have a primary key, it's not a table!

Deary answered 8/5, 2009 at 15:11 Comment(4)
Strictly speaking that sentence is wrong. Tables can be "View Tables" created by your Query Language. A RDBMS is comprised of relations not tables. That sentence should say: "If it doesn't have a primary key, it's not a relation!".Bizet
Or perhaps, "if there are no candidate keys then it isn't a relational table". But see the very rare cases where it's ok to have a table that doesn't represent a relation.Superannuated
Why wouldn't a many-to-many table have a primary key? You could create a separate primary key and then create a unique index for the surrogate of foreign keys. I think it's better to have a primary key on every table. Even on a bulk loading table you might want to separately identify a primary key that is not inclusive of the data being imported as it may help you identify duplicate records in an ETL process. It seems to me that every table should still have a primary key, even if it is a little more storage. A table created by a view is a subset of a table not a table itself.Faze
In a many to many relationship table, you could create a composite primary key consisting of both ids to the relationships.Provide
P
15

Pretty much any time I've created a table without a primary key, thinking I wouldn't need one, I've ended up going back and adding one. I now create even my join tables with an auto-generated identity field that I use as the primary key.

Parterre answered 8/5, 2009 at 14:54 Comment(2)
A join table IS a primary key - a composite one, consisting of the PK's of both records being joined. E.g. CREATE TABLE PersonOrder (PersonId int, OrderId int, PRIMARY KEY(PersonId, OrderId)).Bonnice
Yes, but what if the Link Table has also a third attribute, lets say "OrderDate". Would you add that to the composite key as well? IMHO no - because it is further reducable and does not serve the not-reducable characterstic a primary key should have.Bizet
R
9

Just add it, you will be sorry later when you didn't (selecting, deleting. linking, etc)

Rabia answered 8/5, 2009 at 14:55 Comment(0)
V
8

Will you ever need to join this table to other tables? Do you need a way to uniquely identify a record? If the answer is yes, you need a primary key. Assume your data is something like a customer table that has the names of the people who are customers. There may be no natural key because you need the addresses, emails, phone numbers, etc. to determine if this Sally Smith is different from that Sally Smith and you will be storing that information in related tables as the person can have mulitple phones, addesses, emails, etc. Suppose Sally Smith marries John Jones and becomes Sally Jones. If you don't have an artifical key onthe table, when you update the name, you just changed 7 Sally Smiths to Sally Jones even though only one of them got married and changed her name. And of course in this case withouth an artificial key how do you know which Sally Smith lives in Chicago and which one lives in LA?

You say you have no natural key, therefore you don't have any combinations of field to make unique either, this makes the artficial key critical.

I have found anytime I don't have a natural key, an artifical key is an absolute must for maintaining data integrity. If you do have a natural key, you can use that as the key field instead. But personally unless the natural key is one field, I still prefer an artifical key and unique index on the natural key. You will regret it later if you don't put one in.

Voorhees answered 8/5, 2009 at 15:5 Comment(0)
B
6

It is a good practice to have a PK on every table, but it's not a MUST. Most probably you will need a unique index, and/or a clustered index (which is PK or not) depending on your need.

Check out the Primary Keys and Clustered Indexes sections on Books Online (for SQL Server)

"PRIMARY KEY constraints identify the column or set of columns that have values that uniquely identify a row in a table. No two rows in a table can have the same primary key value. You cannot enter NULL for any column in a primary key. We recommend using a small, integer column as a primary key. Each table should have a primary key. A column or combination of columns that qualify as a primary key value is referred to as a candidate key."

But then check this out also: http://www.aisintl.com/case/primary_and_foreign_key.html

Borderer answered 9/2, 2012 at 11:24 Comment(8)
Check this also, sql-server-performance.com/2006/primary-key-index-clusteredBorderer
that page is quite stupid. First, a primary key is needed for performance reasons. By reading his page I learn that adding an ID to a book table is useless because the book's text is unique; obviously, the guy never worked with databases.But he also has problems in understanding what he criticizes. Page written that 1) a PK value references a row 2) you can join 2 tables by any set of columns. There is no contraddiction. It is amazing that an academic article author doesn't understand the very basic of relational theory.Loera
"First, a primary key is needed for performance reasons" this is incorrect, PK doesn't direct affect on performance. Not having a PK may lead many problems (identifying a row, joining etc.) but performance is not one of them. When you create PK on a table SQL server creates a unique-clustered index, that index affects the performance not the PK itself. As a real example, my table has a clustered index on date column and a PK on a GUID field, because my rows should be physically ordered over date column in the table since all queries have a date range (in my case).Borderer
That clustered index is a flavor of the primary key, created by SQL Server and several other DBMS's. Are you sure that it is a good idea to use it? For example, in MySQL, it isn't for several undocumented reasons.Loera
@FedericoRazzoli Clustered index is, roughly, an index that the data stored physically. So it directly affects performance choosing the most suitable column or columns together as clustered index. Usually programmers, pust an auto-inc ID as PK, to make CRUD easily, that ID becomes clustered index automatically, but it doesn't have to be. In my case the data was predefined, no chance to change it, so I created a PK on a GUID column and clustered index on date column and it gives me a good performance improvement. See the dev.mysql.com/doc/refman/5.7/en/innodb-index-types.htmlBorderer
Keep in mind that GUID is not an optimal type for PKs, in InnoDB. All indexes contain a reference to the PK so, the bigger is the PK, the bigger will be all other indexes.Loera
@FedericoRazzoli, I know, it's too big to be a good candidate for being PK, but as I said in my case the data was already predefined and already have a ID which uniquely identify rows in GUID format (data was in CSV/file format need to be loaded to a table periodically). I had no chance to add/remove other values/columns so I had to choose the PK and the clustered index. Thanks for informing anyway.Borderer
Np. For the record, the main problem is that it has more mutexes than a regular primary key. Also the binlog will contain all columns, not just the 6 bytes "clustered index". And anyway the PKs are mandatory when using Galera cluster.Loera
M
5

Late to the party but I wanted to add my two cents:

Should each and every table have a primary key?

  • If you are talking about "Relational Albegra", the answer is Yes. Modelling data this way requires the entities and tables to have a primary key. The problem with relational algebra (apart from the fact there are like 20 different, mismatching flavors of it), is that it only exists on paper. You can't build real world applications using relational algebra.

  • Now, if you are talking about databases from real world apps, they partially/mostly adhere to the relational algebra, by taking the best of it and by overlooking other parts of it. Also, database engines offer massive non-relational functionality nowadays (it's 2020 now). So in this case the answer is No. In any case, 99.9% of my real world tables have a primary key, but there are justifiable exceptions. Case in point: event/log tables (multiple indexes, but not a single key in sight).

Bottom line, in transactional applications that follow the entity/relationship model it makes a lot of sense to have primary keys for almost (if not) all of the tables. If you ever decide to skip the primary key of a table, make sure you have a good reason for it, and you are prepared to defend your decision.

Methodical answered 11/8, 2020 at 15:0 Comment(0)
D
4

To make it future proof you really should. If you want to replicate it you'll need one. If you want to join it to another table your life (and that of the poor fools who have to maintain it next year) will be so much easier.

Dorso answered 8/5, 2009 at 14:58 Comment(0)
P
4

I am in the role of maintaining application created by offshore development team. Now I am having all kinds of issues in the application because original database schema did not contain PRIMARY KEYS on some tables. So please dont let other people suffer because of your poor design. It is always good idea to have primary keys on tables.

Physiology answered 9/5, 2009 at 16:32 Comment(0)
N
4

I'd like to find something official like this - 15.6.2.1 Clustered and Secondary Indexes - MySQL.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

So, why not create primary key or something like it by yourself? Besides, ORM cannot identify this hidden ID, meaning that you cannot use ID in your code.

Newlin answered 19/1, 2021 at 7:17 Comment(0)
D
3

I know that in order to use certain features of the gridview in .NET, you need a primary key in order for the gridview to know which row needs updating/deleting. General practice should be to have a primary key or primary key cluster. I personally prefer the former.

Dielectric answered 8/5, 2009 at 14:54 Comment(0)
G
2

I always have a primary key, even if in the beginning I don't have a purpose in mind yet for it. There have been a few times when I eventually need a PK in a table that doesn't have one and it's always more trouble to put it in later. I think there is more of an upside to always including one.

Goodnight answered 8/5, 2009 at 14:54 Comment(0)
H
2

If you are using Hibernate its not possible to create an Entity without a primary key. This issues can create problem if you are working with an existing database which was created with plain sql/ddl scripts, and no primary key was added

Haematoma answered 8/5, 2009 at 15:0 Comment(0)
E
1

In short, no. However, you need to keep in mind that certain client access CRUD operations require it. For future proofing, I tend to always utilize primary keys.

Eisegesis answered 8/5, 2009 at 14:52 Comment(0)
A
-1

If you define a table without a primary key, it means that you don't have clustered index on your table. Such a table is called a heap table and writing into the table can be done efficiently without worrying about maintaining the clustered index.

Astrahan answered 25/3, 2023 at 20:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.