Should many to many tables have a primary key?
Asked Answered
E

7

36

If I have two objects that have a many-to-many relationship, I would typically model them in my database schema with a many-to-many table to relate the two. But should that many-to-many table (or "join table") have a primary key of its own (integer auto-incremented)?

For example, I might have tables A and B, each with an ID, and a table called A_B that has a foreign key tuple of (A_ID, B_ID). But should A_B have a primary key auto-incremented ID column of its own, or not?

What are the advantages and disadvantages of adding it? I personally like natural keys for many-to-many joins. But what added benefit would a primary key add?

Eatmon answered 21/12, 2010 at 21:0 Comment(2)
Duplicate of #2190772.Allium
Should we delete this question, then?Eatmon
F
23

I agree with everything Oded said except

"It can't reasonably be used as a foreign key either."

In this case it's a pick your poison, the mapping table absolutely can be a parent, it's just a matter of the child using a multicolumn FK or not.

Take a simple case of Car and color. Each Year Auto Makers have a certain pallet of colors and each model only comes in limited number of those colors. Many - Many :: Colors to Cars models

So now design the Order table where new cars orders are stored. Clearly Color and Model will be on the Order table. If you make a FK to each of those tables, the database will permit an incorrect model/color combination to be selected. (Of course you can enforce this with code, you can't do so declaratively.) If you make the parent be the many:many table, you'll only get combinations that have been specified.

SO would you rather have a multicolumn FK and point to a PK built on both ModelID and ColorID or do you want a single column FK?

Pick your poison.

EDIT

But if it's not a parent of something, no table needs a surrogate key.

Followthrough answered 21/12, 2010 at 22:7 Comment(0)
T
15

Such a surrogate key adds nothing except overhead.

Use the natural keys, make them a composite primary key if you care about duplication in this table.

To expand:

In the application, this key will be meaningless and will remain unused.

In the database, it will have no function, as you can't reasonably use it in a query for any type of meaningful result.

It can't reasonably be used as a foreign key either.

Thrift answered 21/12, 2010 at 21:5 Comment(2)
Yes, the only use I can see is that maybe if you want to limit deletion to one record -- but that would cause an additional lookup for the ID, assuming you had both foreign key IDs. I concur.Eatmon
@Eatmon - You shouldn't have duplicate rows in this table (or any table, for that matter). And if you don't, then deleting a single row will simply require both foreign keys.Thrift
F
2

If the table tracking the many to many relationship has it's own primary key and that key is used as a foreign key anywhere else in the database then you create a dependency on that relationship. The relationship can never be removed.

For instance in the car color example, if the color for a car is ever discontinued (removed from the many to many relationship table) then any table (i.e. purchase history) referencing the primary key would be broken.

Fractional answered 17/8, 2020 at 15:23 Comment(0)
B
1

I have done it both ways. Sometimes it is beneficial for adding a feature down the road. For instance, if there was ever a time that a row in the table would ever contain anything more than just the 2 id's. If you don't lack space I would put one in there just because it can't hurt. Sometimes it can interfere with ORM tools like hibernate or ADO.NET but that is minor.

So to sum it up... PROS 1. Allows potential future growth.

CONS 1. Space 2. Confuses some ORM tools.

Braca answered 21/12, 2010 at 21:10 Comment(5)
Regarding potential future growth, note that as soon as you add meaningful data to the association table then it becomes more than just persistence logic. It becomes its own entity. This could cause headaches in the domain. I think it's best to separate business entities from persistence logic and keep them separated. If you need to add data to an association table, re-think from a wider perspective what it is you're doing and if there's a better way, because the logical impact is greater than just adding a column to a table.Biotic
I concur. Although it's hard to think of an example where an M:M relation would become an entity with a unique primary key.Eatmon
Fair enough, I've just never had a headache from deciding to have done it that way. I have had Entities that acted as a M:M relation that contained other useful data. This data sometimes was edited in it's own webpage where having a single ID to pass was handy.Braca
I see what you're saying. For example, a blog that allows multiple authors could have a authors_post table, with a (later added) author_added_on_date field so we can go back into revisions and see who had permission when. Hmm. I would still say "add the surrogate key when you need it, not before."Eatmon
ahes999 - Yes that is the exact kind of situation i refer to. I would agree, you can always add one if you need it.Braca
S
1

The term "join table" is often used but I don't think I've ever seen it properly defined or explained. Personally I avoid using that term. As I understand it, a "join table" means any table with two foreign keys (or possibly more than two?).

I think the criteria for selecting keys in a table with more than one foreign key should be much the same as in any other table. Ask yourself what dependencies you need to enforce, what is unique and irreducible. Select keys on the criteria of Familiarity, Stability and Simplicity. Add surrogate keys only when you have a good reason to.

Swordtail answered 21/12, 2010 at 23:44 Comment(0)
B
0

It doesn't really provide anything useful. Keep in mind the purpose of a key, which is to uniquely refer to "something." An association table like this isn't by itself "something" but is rather persistence structure for two other "somethings" which already have keys. Outside of the persistence medium (the database) it doesn't have a meaning and shouldn't even really exist or be known (such as in the business domain), so there would (should) never be a reason to refer to it by its own ID.

Biotic answered 21/12, 2010 at 21:9 Comment(2)
Precisely my argument. Real objects should have IDs to identify them as such.Eatmon
@ashes999: Ya, going to the original question I can't really think of a benefit to adding a separate primary key to such a table. @dko's suggestion is, in my opinion, more of a risk than a benefit. It may provide something of immediate value in certain situations, but I wouldn't be comfortable with the doors it opens up.Biotic
M
0

There are two cases:

1. Unique associations

A website has Users and Roles, every Role can be associated to a User just once, and the database doesn't keep track of the history of permissions.

In the UserRole table an autonomous PK column would be just useless: it's not referenced by anybody else (you associate claims/permissions to Roles or to Users, not to UserRoles) and you still need the FK UserRole->User and UserRole->Role.

Conclusion: it makes sense to create a PK using the tuple (UserId, RoleId), avoiding extra columns.

2. Duplicated associations

A website keeps track of the matches between tennis players. The Matches table is sort of a many-to-many between players, but playerA may have played against playerB multiple times.

In this case you can't create a PK using the tuple (Player1Id, Player2Id), because it would end in duplicated keys. The Matches table will have a MatchDate column for the time of the game, therefore you could create a PK on the tuple (Player1Id, Player2Id, MatchDate), but it starts to be awkward.

Conclusion: in this case an additional column for the PK seems more practical to me.

Mameluke answered 17/2 at 16:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.