Should I create a surrogate key instead of a composite key?
Asked Answered
P

6

6

Structure:

Actor <=== ActorMovie ===> Movie

ActorMovie: ActorID (fk), MovieId (fk)... ===> pk: (ActorID, MovieID)

Should do I create a surrogate key for ActorMovie table like this?

ActorMovie: ActorMovieID (pk), ActorID (fk), MovieId (fk)...
Production answered 25/11, 2013 at 10:11 Comment(1)
If an actor plays more than one role in a movie, how will this be represented?Taitaichung
P
8

Conventions are good if they are helpful

"SQL Antipatterns", Chapter 4, "ID Required"

Intention of primary key

Primary key - is something that you can use to identify your row with it's unique address in table. That means, not only some surrogate column can be primary key. In fact, primary key should be:

  • Unique. identifier for each row. If it's compound, that means, every combination of column's values must be unique
  • Minimal. That means, it can't be reduced (i.e. if it's compound, no column could be omitted without losing uniqueness)
  • Single. No other primary key may be defined, each table can have only one primary key

Compound versus surrogate

There are cases, when surrogate key has benefits. Most common problem - if you have table with people names. Can combination of first_name + last_name + taxpayer_id be unique? In most cases - yes. But in theory, there could be cases, when duplicated will occur. So, this is the case, when surrogate key will provide unique identifying of rows in any case.

However, if we're talking about many-to-many link between tables, it's obvious, that linking table will always contain each pair once. In fact, you'll even need to check if duplicate does not exist before operating with that table (otherwise - it's redundant row, because it holds no additional information unless your design has a special intention to store that). Therefore, your combination of ActorID + MovieID satisfies all conditions for primary key, and there's no need to create surrogate key. You can do that, but that will have little sense (if not at all), because it will have no meaning rather than numbering rows. In other hand, with compound key, you'll have:

  • Unique check by design. Your rows will be unique, no duplicates for linking table will be allowed. And that has sense: because there's no need to create a link if it already exists
  • No redundant (and, thus, less comprehensive) column in design. That makes your design easier and more readable.

As a conclusion - yes, there are cases, when surrogate key should (or even must) be applied, but in your particular case it will definitely be antipattern - use compound key.

References:

Poyssick answered 25/11, 2013 at 10:37 Comment(5)
I really cannot imagine how a surrogate key can be antipattern .... Do not forget, in this case or in any other similar one, that "the unicity" of your field combination is a business rule. Business rules do change. And it's always better to separate business rules from record identifying methods.Olli
I've mentioned that there are cases, when it's "ID required" - is a normal way to resolve a matter. Also there are cases, when surrogate key is the only option. But in this particular case using surrogate key is an antipattern since, as I've mentioned, it doesn't allow to maintain data integrity, and, besides, has no sense while natural key exists in such tablePoyssick
It's good to comment when dv. That will allow me what's wrong & improve the answerPoyssick
The concept of "antipattern" opposes the concept of patterns as understood when building a database following the normal forms rules. There is nothing, absolutely nothing, in these rules, that allows you to say that adding a surrogate key to a table does not follow these rules. Even if you estimate that it's not needed!!! So it cannot be an antipattern.Olli
The definition of "pattern" is sometimes vague(for example, good-known singleton which is pattern&antipattern in same time) The OP question isn't an easy design issue-because-yes there are cases when it's pattern and there are cases when it's antipattern.The best definition of line between them is in provided title Conventions are good if they are helpful .In OP case it's quite defined, that there's no need in such conventions. That's why in this particular case using surrogate key is an antipattern.It will reduce meaning of natural row addressing and add useless data into table designPoyssick
T
4

I'd always go with the composite key. My reasoning:

  • You will probably never use the surrogate key anywhere.
  • You will reduce the number of indexes/constraints on the table, as you will most certainly still need a indexes over actor and movie.
  • You will always search for either movie or an actor anyway.

Unless you have a scenario where you will actually use the surrogate key outside of it's own table, I'd go with the composite key.

Taitaichung answered 25/11, 2013 at 10:34 Comment(0)
G
4

Let me just mention a detail that seems to have been missed by other posters: InnoDB tables are clustered.

If you have just a primary key, your whole table will be represented by a lone B-Tree, which is very efficient. Adding a surrogate would just create another B-Tree (and "fatter" than expected to boot, due to how clustering works), without benefit to offset the added overhead.

Surrogates have their place, but junction tables are usually not it.

Gupta answered 25/11, 2013 at 14:59 Comment(0)
O
1

If you want to associate other data elements with the join table, such as the name(s) of the role(s) played (which might be a child table) then I certainly would. If you were sure that you never wanted to then I'd consider it as optional.

Olav answered 25/11, 2013 at 10:35 Comment(0)
T
1

Consider the first normal form (1NF) of database design normalization.

I would have made the ActorID and MovieID as unique key combination then create a primary key ActorMovieID.

See the same question here: Two foreign keys instead of primary

Thanksgiving answered 25/11, 2013 at 10:38 Comment(1)
What does a surrogate key have to do with 1NF?Longdrawnout
O
0

On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Composite keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons.

So when you start asking yourself 'is composite better than surrogate', you have already entered the process of loosing your time. Go for surrogate. It allways works. And switch to next step.

Olli answered 25/11, 2013 at 10:42 Comment(5)
A surrogate key does NOT "work" instead of a composite if the business rule is that any actor should only be assigned to a movie once. Data integrity is lost if you don't enforce the composite key.Longdrawnout
So you have a surrogate key and an additional integrity constraint on a combination of fields. Fair enough: separating business rules ('an actor plays or does not play in a film') from record identifying method is always a safe decision.Olli
So to be clear, your answer to the question here is "No" - one should NOT create a surrogate key instead of the composite key. I agree with that. The composite is presumably essential to meet the business requirement and leaving it out shouldn't be an option. Creating a surrogate is superfluous, optional and potentially a wasteful overhead in compexity and storage. I'd suggest only creating the surrogate where there is a clearly defined reason to do so.Longdrawnout
superfluous: a personal opinion. optional: true. a wastefull overhead in complexity and storage: no, definitely. Checking the unicity of one unique field is definitely easier for the machine than checking the unicity of a pair of fields!!! Then if it's optional and efficient, why shouldn't we do it? Airbags on your car can be optional, but then they can be very efficient, and this is the reason why you'll pay for this option. Here the option is (almost) for free: don't give it a thought. take it!Olli
Philippe, you are mistaken. Creating that additional key constraint for the surrogate will use more storage than the natural key alone (in MySQL and in other DBMSs). Validating two key constraints and updating two key indexes instead of one will also be a greater workload. I'm not against surrogates. They are extremely useful in some situations but they also have disadvantages so the decision to use them needs to be evaluated on a case by case basis. To apply them blindly in every case without proper consideration is simply careless, counter-productive and doesn't make any practical sense.Longdrawnout

© 2022 - 2024 — McMap. All rights reserved.