Why are composite keys discouraged in hibernate?
Asked Answered
D

6

42

This is from Hibernate official tutorial:

There is an alternative <composite-id> declaration that allows access to legacy data with composite keys. Its use is strongly discouraged for anything else.

Why are composite keys discouraged? I am considering using a 3-column table where all of the columns are foreign keys and together form a primary key that is a meaningful relationship in my model. I don't see why this is a bad idea, espicially that I will be using an index on them.

What's the alternative? Create an additional automatically generated column and use it as a primary key? I still need to query my 3 columns anyways!?

In short, why is this statement true? and what's the better alternative?

Doubletongue answered 1/1, 2013 at 18:1 Comment(1)
You should define a surrogate primary key, then redefine your existing composite as a unique (natural, or business) key. This leads to a cleaner, more maintainable database design.Snout
U
48

They discourage them for several reasons:

  • they're cumbersome to use. Each time you need to reference an object (or row), for eexample in your web application, you need to pass 3 parameters instead of just one.
  • they're inefficient. Instead of simply hashing an integer, the database needs to hash a composite of 3 columns.
  • they lead to bugs: developers inevitably implement the equals and hashCode methods of the primary key class incorrectly. Or they make it mutable, and modify their value once stored in a HashSet or HashMap
  • they pollute the schema. If another table needs to reference this 3-column table, it will need to have a 3 columns instead of just one as a foreign key. Now suppose you follow the same design and make this 3-column foreign key part of the primary key of this new table, you'll quickly have a 4-column primary key, and then a 5-column PK in the next table, etc. etc., leading to duplication of data, and a dirty schema.

The alternative is to have a single-column, auto-generated primary key, in addition to the other three columns. If you want to make the tuple of three columns unique, then use a unique constraint.

Unfit answered 1/1, 2013 at 18:11 Comment(11)
But in use you have to have those 3 parameters to get the unique one - dbs also often use btrees not hashes so accessing the key costs no less and having a separate unique id is an overhead. There is an unresolved debate in in design between those who prefer generated ids and those who want physical keys which leads to composite keys.Hexa
An index on a single numeric column is faster, and takes less space than an index on 3 columns. And that's the index that will be used the most in the application. Also see the additional point I have added to my answer. I have seen people using those kinds of composite PKs, and leading to tables with a 9-column primary key being just a reference to another table, and 1 or 2 functional columns. It was horrible.Unfit
I was going to +1 this, but then you added the last bullet point and now I want to +10 it. Anyone who has faced the horrible problem of 'cascading composite keys' will really start to appreciate simple surrogate keys as an essential part of database design.Snout
What about simple many-to-many link tables? They don't have most of these problems and the obvious way to make them on the DB level is to have a composite primary key of all the fields.Acrobatics
Hibernate uses a composite key in this case. Bu they aren't referenced by any other table, and they're limited to two columns. If it was mapped as an entity (and would thus have two ManyToOne associations), I would add a surrogate key to the table.Unfit
I think that the 4th bullet is the most important one :), and -alone- it's enough to mark natural keys deprecated.Exponent
@MattiVirkkunen - most ORM frameworks automatically map link tables, aka there are no physical entities created to represent them.Snout
If you are facing any of the reasons above, then you need to reconsider your design. Else, you've made a GOOD decision using a composite key. About the limitation with Hibernate, take a look at mwsiri's answer.Lally
Probably I am missing something very basic, but suppose I use auto-generated primary key for an entity which in business sense is identified by its unique string name, how will I find the entity from DB using its name? using a 'WHERE' clause in the SELECT statement as I won't know it's id at the application layer?Friedman
@Anmol you will execute a query: select u from User u where u.name = :nameUnfit
but this points don't explain why it's bad for Hibernate, it only explains why it could be bad for DBOpprobrious
C
41

Even if it is - maybe - too late to answer your question, I want here to give another point of view (more moderate I hope) on the need (Is it really an advise ?) of Hibernate to use surrogate keys.

First of all, I want to be clear on the fact that both surrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have pros and cons. I am not trying to say that one key type is better than the other. I am trying to say that depending on your requirements, natural keys might be a better choice than surrogate ones and vice versa.

Myths on natural keys

  1. Composite keys are less efficient than surrogate keys. No! It depends on the used database engine:
  2. Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduled flight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standard then this set of data is a [good] natural key candidate.
  3. Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.

Disadvantages of surrogate keys

Surrogate keys are:

  1. Source of performance problems:
    • They are usually implemented using auto-incremented columns which mean:
      • A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
      • If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
      • If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
  2. Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following facts:
    • You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
    • If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
    • Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
  3. They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THE problem of surrogate keys from a database design point of view.
  4. More in Wikipedia...

Why Hibernate prefers/needs surrogate keys ?

As stated in Java Persistence with Hibernate reference:

More experienced Hibernate users use saveOrUpdate() exclusively; it’s much easier to let Hibernate decide what is new and what is old, especially in a more complex network of objects with mixed state. The only (not really serious) disadvantage of exclusive saveOrUpdate() is that it sometimes can’t guess whether an instance is old or new without firing a SELECT at the database—for example, when a class is mapped with a natural composite key and no version or timestamp property.

Some manifestations of the limitation (This is how, I think, we should call it) can be found here.

Conclusion

Please don't be too squared on your opinions. Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.

Hope that this helped someone!

Cathcart answered 27/12, 2013 at 15:56 Comment(8)
I love the use of composite keys where it's appropriate, loudly and proudly! It makes much cleaner design. When it's the appropriate place, you don't face any of the disadvantages being told about.Lally
An important statement within ur answer is, More generally, when a set of business data is guaranteed to be unique by a given standard then this set of data is a [good] natural key candidate. Often, it is not the technical folks who ultimately enforce the standard. Usually, its some non-technical manager who mandates an extra key be added to the pk while riding some Steve Jobs like power trip/delusion, in response to an anomaly in the business. Ur own example may suggest this. Why wouldn't an airline, departureDate and flightNumber be unique? Why is the suffix even needed?Gatling
Regarding "Why is the suffix even needed?"; It is needed for the specific case when a regular daily flight is delayed to the next day (i.e originally scheduled for 23:00 then rescheduled for 00:30). If we use only the departureDate and the flighNumber then we will have a duplicate flight (the flight scheduled for day n+1 and the one originally scheduled for day n but rescheduled to day n+1 due to the delay). In order to keep the same flightNumber so we avoid client confusion, we add just a suffix (i.e 'D' for example).Cathcart
Its true that sometimes - unfortunately - non-technical people ends up setting standards for technical people. However, I believe that standards should be put jointly by technical and functional people. The suffix example is a good demonstration that some non-technical considerations (people being lost by the change of their flight's number) has to be taken into account when defining a standard. Moreover, when a standard is already widely adopted it does no longer matter whether it is a "good" or "bad" standard.Cathcart
Neat advice: "Please don't be too squared on your opinions. Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them."Leaper
This debate is off topic in my opinion. As stated both surrogate and natural keys have pros and cons, but we are really discussing this in Hibernate / JPA point of view. It really is easier to pass a long or int value to a getModel() method rather than getModel(MySuperComplicatedPK pk). Also lots of the code and lots of the operations are handled by the frameworks (hibernate and or spring) and it is really easier for those tools to always work with "predictible" primary keys rather than complicated classes that change from one domain to antoherTungsten
" Besides, the 4 columns can be used to query the child table directly" That sounds like some nasty data duplicationAmmeter
What is the difference between duplicating numbers (for foreign key referencing) and other types of data ?Cathcart
E
10

I would consider the problem from a design point of view. It's not just if Hibernate considers them good or bad. The real question is: are natural keys good candidates to be good identifiers for my data?

In your business model, today it can be convenient to identify a record by some of its data, but business models evolves in time. And when this happens, you'll find that your natural key doesn't fit anymore to uniquely identify your data. And with referential integrity in other tables, this will make things MUCH harder to change.

Having a surrogate PK is convenient because it doesn't chain how your data is identified in your storage with your business model structure.

Natural keys cannot be generated from a sequence, and the case of data which cannot be identified by its data is much more frequent. This is an evidence that natural keys differ from a storage key, and they cannot be taken as a general (and good) approach.

Using surrogate keys simplifies the design of the application and database. They are easier to use, are more performant, and do a perfect job.

Natural keys bring only disadvantages: I cannot think of a single advantage for using natural keys.

That said, I think hibernate has no real issues with natural (composed) keys. But you'll probably find some problems (or bugs) sometimes, and issues with the documentation or trying to get help, because the hibernate community widely acknowledges the benefits of surrogate keys. So, prepare a good answer for why you did choose a composite key.

Exponent answered 1/1, 2013 at 18:29 Comment(1)
+1, also there is something to be said for identifiers that are under control of the application developers instead of being tied to the business or controlled by an external entity (such as ssns or abas).Webbing
A
1

If Hibernate documentation is properly understood:

"There is an alternative <composite-id> declaration that Allows access to legacy data with composite keys. Its use is strongly discouraged for anything else."

on topic 5.1.4. id tag xml <id> which enables the primary key mapping made too soon we can conclude that the hibernate documentation discourages the use of <composite-id> instead of <id> xml tag for composite primary key mapping and NOT make any reference negative to use composite primary keys.

Aphanite answered 6/10, 2016 at 23:16 Comment(2)
There's already an answer to the given question, if you have any idea on how to improve it.. kindly post your suggestion to the chosen answer to also help and update it.Rockey
My intention is to deny the quote that the hibernate documentation does not recommend the use of primary key composed and lead discussion to a character not based on erroneous quotes, because who is opposed to an opinion of the documentation? I am not against the use of primary key composed but I am against the bad parts quotes understoodAphanite
C
0

Applications developed with the database as a tool are definitely more beneficial to keep work flow on surrogate keys, using clustered indices for query optimization.

Special care does need to be made for Data Warehousing and OLAP style systems however, that utilize a massive Fact Table to tie surrogate keys of dimensions together. In this case the data dictates the dashboard/application that can be used to maintain records.

So instead of one method being preferable to another, perhaps it is one directive is advantageous to another, for key construction : You won't be developing a Hibernate app very easily to harness direct access to an SSAS system instance.

I develop using both key mixtures, and feel to implement a solid star or snowflake pattern a surrogate with clustered index is typically my first choice.

So, to the regards of the OP and others looking by: if you want to stay db invariant with your development (which Hibernate specializes in) -- utilize the surrogate method, and when data reads tend to slow, or you notice certain queries drain performance, revert to your specific database, and add composite, clustered indices that optimize query order.

Countermeasure answered 26/11, 2014 at 15:49 Comment(0)
S
0

Do not confuse primary key and unique index. If you use natural keys, you link your key to your business, to business data; and it's not so good. So, even if a set of data could be use to define a composite key it is not recommended. To my point of view, composite keys are mainly usable when you've an existing schema

Siana answered 1/2, 2021 at 12:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.