When should I use a composite primary key?
Asked Answered
O

2

8

ETA: My question is based upon keeping an optimal database. What is the difference in database performance/size between having an all composite primary key for ProjectUserBooleanAttribute, which would presumably have indexes for PUAT_Id and UserID and a non-composite table using an auto-increment PK but then having indexes for both PUAT_Id and UserID? From further reading it seems if I went with the non-composite approach I would have to create a unique index on the those two columns. Would I still need to create indexes on those two columns? If so, doesn't this essentially means every column in that table would have it's own index?

Is this the quintessential dilemma of database size (indexes) vs performance?


So I have the following entities I wish to create

  • Projects
  • Users
  • ProjectUserAttributeTypes

In this simplified example, all of my ProjectUserAttributeTypes are going to be boolean, so I'm only showing the ProjectUserBooleanAttribute table.

Let's say I want to create two boolean ProjectUserAttributeType called Silver and Gold. I would simply create two rows in ProjectUserAttributeTypes. Now if I want to assign a user as having that attribute, I would add a row to the ProjectUserBooleanAttribute

A DBA has warned me off from using composite primary keys in general for performance reasons. However in this instance I fail to see what I gain by NOT using composites. In both cases I will need to make sure that ProjectUserBooleanAttribute has non null and unique values for all columns. I will surely need indexes as well.

NOTE: My ultimate objective is to be able to query my database and find all users that have certain attributes combinations. I would join the tables to filter by project, then use where clauses to filter even further. A few examples:

  • (GOLD OR SILVER)
  • (GOLD XOR SILVER)
  • ((GOLD OR SILVER) AND NOT (BRONZE))

COMPOSITE PK

Compsite ERD

VERSUS NON-COMPOSITE PK

No Composite ERD

Orabelle answered 11/7, 2015 at 19:11 Comment(3)
possible duplicate of Composite primary keyCirsoid
So if I added a unique index on (PUAT_ID, UserID) that would prevent 2 rows like (1,2,3) and (2,2,3)? If I needed fast searches for both of those columns, would I also need to create single indexes on those columns, or would that be redundant to the unique index?Orabelle
1. Unique/key (PUAT_Id,UserId) would prevent there being two rows like that, because they both have a subrow (2,3). You should add such a unique/key constraint to keep out table values with such pairs. 2, If you mean you need fast searches for each of those columns s)parately then for a typical DBMS the index on the first column of a composite index would be redundant.. Note that single-column indexes would not be unique indexes.Bezanson
G
19

There are two main designs for relational databases:

  • natural keys
  • IDs

With natural keys you use the keys given: A project is identified by its project number, a user by their login name or number, etc. This often leads to compound (or composite) keys:

  • project (project_no, name, ...)
  • user (user_name, first_name, last_mname, ...)
  • project_user (project_no, user_name, role, ...)

The table project_user has a compound key: the project number plus the user name uniquely identifying a record, telling us who is working on which project.

With IDs, you usually add a technical ID that is only used to link records and has no meaning to a user:

  • project (project_id, project_no, name, ...)
  • user (user_id, user_name, first_name, last_mname, ...)
  • project_user (project_user_id, project_id, user_id, role, ...)

The tables contain the same fields plus the IDs and you need the same unique and not null constraints as with natural keys plus the constraints on the IDs.

The project_user_id in project_user is only necessary of course when there is any table needing that reference. But often every table gets an ID whether needed or not, just to make them all look alike (and so the IDs are already there in case they are needed later).

At first glance it seems that a database based on IDs is only more work, more indexes and nothing gained, but this is not the case. An ID concept is often chosen, because it gives more freedom. An example: What would happen, if a project number could change? With the natural keys, the project number is in many tables and would have to get updated cascadingly somehow, which can become quite a task. In an ID database, you would just change the project number in its one place.

And what would happen, if suddenly project numbers were only unique within a company? In an ID based database, you would add a company_id to the projects table, add a unique index on company_id and project_no and be done with it. With natural keys a company number (an ILN? an artificial number?) would have to be added to the primary key and would have to be introduced in all child tables. So: When you design a database with natural keys, you must think it all through to get to stable natural keys - and sometimes there are none, and then you have to invent some. With IDs you don't care that much whether fields can change or not. An ID based database is thus easier to implement.

Then there is hierarchy. Lets say there are several companies in your database, each with its own items, own warehouses.

Natural keys:

  • company (company_code, name, ...)
  • item (company_code, item_no, name, ...)
  • warehouse (company_code, warehouse_no, address, ...)
  • stock (company_code, warehouse_no, item_no, amount, ...)

IDs:

  • company (company_id, name, ...)
  • item (item_id, item_no, name, company_id, ...)
  • warehouse (warehouse_id, address, company_id, ...)
  • stock (stock_id, warehouse_id, item_id, amount, ...)

With the ID concept, you don't have to name the company_id again in the stock table, because its known from a parent table. To store it there would even be redundant, whereas in a natural key concept it is needed, because it's part of the compound key and without it we would lose the link to its parent tables. This purity is seen by some people as a great advantage of the ID concept over natural keys. However, there comes a disadvantage with it. In the natural key database it is guaranteed that a company's items are in the company's warehouses, as the company is part of the key of the stock table. With an ID concept the linked warehouse record could belong to company 1 and the linked item record to company 2. Inconsistent data caused by some wrong insert statement which the DBMS could not prevent us from. With the natural keys such an error can not occur.

And if I want to know how many stock a company has, I simply select from stock with natural keys. But I would have to select from stock plus another table to get the company in an ID database.

With much hierarchy you may get queries with many, many more tables involved when the database is ID based. So far I haven't seen an ID based database outperform one with natural keys. But I have seen natural key based databases outperform ID based ones by far. This may be because I have mainly seen big databases with much hierarchy.

As to your database: It seems to be ID based, provided project ID and user ID are only technical internal numbers - otherwise your database would be a mixed concept (natural project number, natural user ID, technical ID for ProjectUserBooleanAttribute). So your question has not really to do with compound keys or not.

PUAT_ID and UserID will both have to be in ProjectUserBooleanAttribute, they would be not null, and you should have a unique constraint (a unique index) on them. So they have all qualities a primary key needs, no matter whether you call this "primary key" or not. Whether you add a technical ID is just for the looks of it. It doesn't really change anything. The concept stays the same.

In a natural key concept you would make the fields the primary key. But then, you would not have a PUAT_Id, but some composite key here (ProjectId plus AttributeType?).

In a technical ID concept you would not make this the primary key, but make the fields not null and add a unique constraint (which makes it the key, only it is not called "primary"). Then either add a technical ID as the primary key or have the table without ID, and thus without a primary key. It doesn't matter. If someone asks for a key, give them the ID, if not, you can go without it. It is superfluous, as long as not needed by any other table.

Gemmule answered 13/7, 2015 at 10:27 Comment(4)
I think the first example is bad. I would have project (project_id, project_no, ..), user(user_id, first_name, ..), project_user(project_id, user_id, role, ..). Why would you use anything that could change as a PK?Compositor
@displayname: You are misunderstanding. I am using project number and user name as two examples for unique identifyes that don't change. Something must uniquely identify a project, not only in your database but (also) in the real world. In my example this is a project number. Project 4013 will never be renamed to project 5532. Why should it? It's the projects ID by which you identify it's documents, diagrams etc., not only in the computer, but also on real paper in real folders and desk drawers :-)Gemmule
@displayname: You are showing yet another design: Technical IDs that you also use for composite keys. This is rather uncommon, and for some reason. If you added two other tables project_details and project_user_details, what would your PKs be? project_details(project_details_id) and project_user_details(project_id, user_id, project_details_id)? Or would you rather have project_user_details(user_id, project_details_id) thus avoiding redundancy, but not being able to reference table project_user in a foreign key constraint?Gemmule
@displayname: With technical IDs better stick to non-composite PKs. As mentioned, you would have the pair project_id, user_id as unique and not nullable in project_user anyway - same as a primary key. You can add project_user_id as the table's PK immediately or not, just as you like. You would add it at the latest when you wanted to create child tables to table project_user.Gemmule
B
1

When you add an id column to a table it unavoidably adds overhead to managing that table. But the benefit is that other tables can now reference rows of that table by a single id column instead of the old composite key. This can make storage and indexes smaller and access faster for those tables. Also it can make references to the corresponding entities briefer (only one column) and more apparent (by the kind of id).

Note that in tables that now use that added id as FK, if you keep any of the old composite FK columns along with the id then you should have a constraint that the values for those columns are the same as the values for those columns in the row that the id references.

Bezanson answered 20/7, 2015 at 9:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.