Still Confused About Identifying vs. Non-Identifying Relationships
Asked Answered
B

8

84

So, I've been reading up on identifying vs. non-identifying relationships in my database design, and a number of the answers on SO seem contradicting to me. Here are the two questions I am looking at:

  1. What's the Difference Between Identifying and Non-Identifying Relationships
  2. Trouble Deciding on Identifying or Non-Identifying Relationship

Looking at the top answers from each question, I appear to get two different ideas of what an identifying relationship is.

The first question's response says that an identifying relationship "describes a situation in which the existence of a row in the child table depends on a row in the parent table." An example of this that is given is, "An author can write many books (1-to-n relationship), but a book cannot exist without an author." That makes sense to me.

However, when I read the response to question two, I get confused as it says, "if a child identifies its parent, it is an identifying relationship." The answer then goes on to give examples such as Social Security Number (is identifying of a Person), but an address is not (because many people can live at an address). To me, this sounds more like a case of the decision between primary key and non-primary key.

My own gut feeling (and additional research on other sites) points to the first question and its response being correct. However, I wanted to verify before I continued forward as I don't want to learn something wrong as I am working to understand database design.

Belloir answered 11/5, 2010 at 21:9 Comment(0)
F
172

The technical definition of an identifying relationship is that a child's foreign key is part of its primary key.

CREATE TABLE AuthoredBook (
  author_id INT NOT NULL,
  book_id INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors(author_id),
  FOREIGN KEY (book_id) REFERENCES Books(book_id)
);

See? book_id is a foreign key, but it's also one of the columns in the primary key. So this table has an identifying relationship with the referenced table Books. Likewise it has an identifying relationship with Authors.

A comment on a YouTube video has an identifying relationship with the respective video. The video_id should be part of the primary key of the Comments table.

CREATE TABLE Comments (
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  PRIMARY KEY (video_id, user_id, comment_dt),
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

It may be hard to understand this because it's such common practice these days to use only a serial surrogate key instead of a compound primary key:

CREATE TABLE Comments (
  comment_id SERIAL PRIMARY KEY,
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

This can obscure cases where the tables have an identifying relationship.

I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person's primary key.


Re comment from @Niels:

So if we use a surrogate key instead of a compound primary key, there is no notable difference between use identifying or non-identifying relationship ?

I suppose so. I hesitate to say yes, because we haven't changed the logical relationship between the tables by using a surrogate key. That is, you still can't make a Comment without referencing an existing Video. But that just means video_id must be NOT NULL. And the logical aspect is, to me, really the point about identifying relationships.

But there's a physical aspect of identifying relationships as well. And that's the fact that the foreign key column is part of the primary key (the primary key is not necessarily a composite key, it could be a single column which is both the primary key of Comments as well as the foreign key to the Videos table, but that would mean you can store only one comment per video).

Identifying relationships seem to be important only for the sake of entity-relationship diagramming, and this comes up in GUI data modeling tools.

Fledgling answered 11/5, 2010 at 21:42 Comment(9)
So if we use a surrogate key instead of a compound primary key, there is no notable difference between use identifying or non-identifying relationship ?Ruminant
so every weak entity has an identifying relationship?Labored
Good answer, but could you please answer this question about why it even matters in ERD design? #34846918Precipitation
@Ochado, I don't answer questions on Stack Overflow any more.Fledgling
In the second create statement of Comments table, you can removed the "obscure" part adding a compound UNIQUE index using video_id, user_id and comment_dt. Because a primary key does its columns UNIQUE.Rothberg
@Pollin, you're right, that would constrain the columns in the same way if they had been the PRIMARY key as in my first example. But the point of being an identifying relationship is that the foreign keys are also part of the primary key.Fledgling
What I think is missing beyond a doubt are the reasons why an identifying relationship is created rather than a not null foreign key. I keep feeling like concepts in normalization apply, but in an English language explanation. To say an identifying relationship is where blah blah part of primary key is basically wrote learning without understanding why. Just bitching my complaints about answers. I think you did touch on this a bit, but I think when people explain this, more emphasis should be on why it is done rather than just what it is. :)Rankle
@BillR, I am not sure it's anything but an academic term to describe the case when a table's foreign key column is part of a candidate key. That is, the column is needed to uniquely identify a row in the table. You do need at least one candidate key for a table to be a proper relation.Fledgling
:) I think I am winding myself up a bit by working with a logical model today when normally I don't. So many to many where a lookup table would lay in between kind of throws a wrench into it for me. Happy Wednesday. Thanks for sharing you knowledge.Rankle
G
20

"as I don't want to learn something wrong".

Well, if you really mean that, then you can stop worrying about ER lingo and terminology. It is imprecise, confused, confusing, not at all generally agreed-upon, and for the most part irrelevant.

ER is a bunch of rectangles and straight lines drawn on a piece of paper. ER is deliberately intended to be a means for informal modeling. As such, it is a valuable first step in database design, but it is also just that: a first step.

Never shall an ER diagram get anywhere near the preciseness, accuracy and completeness of a database design formally written out in D.

Gaius answered 11/5, 2010 at 23:25 Comment(7)
So, if I read your response right, ER modeling is just a tool to help conceptualize the database (similar to how UML modeling is a tool used to conceptualize software systems). While each tool is helpful, that does come with caveats that they have their own syntax and problems that can add further confusion. I hadn't thought of this aspect. Thanks.Belloir
If ER means 'Entity-Relationship', what means D?Gawk
D is the family of all languages that adhere to the rules laid out in "Databases, Types & the Relational Model", and/or "The Third Manifesto".Gaius
@pacerier, Question marks are usually preceded by a question !!! Do you have one ?Gaius
@ErwinSmout, yes it was abbreviated above into 0 characters. What I meant was "D is the family of all languages that adhere to the rules laid out in "Databases, Types & the Relational Model", and/or "The Third Manifesto""??Quass
The Third Manifesto, TTM for short, by Chris Date & Hugh Darwen, is their blueprint for what a data[base] processing language ought to look like in the 21st century. It defines the rules and requirements that said 21-st century language must abide with. One of those requirements is the ability to express/declare any database constraint what so ever in a formally precise way. Don't misunderstand "database constraint" to mean "only the types of constraint that 20th century SQL engines can support". No, "database constraint" really means "any constraint what so ever that governs the database.Gaius
That formally precise way of expressing "any database constraint what so ever" will come rather close, syntactically, to the database design specification language/syntax used in "Applied Mathematics for Database Professionals". It will (inevitably) look quite radically different from the constraint specification techniques of more traditional methods such as ERD and even of Halpin ORM (whose support for constraint specification is way more complete than ERD's).Gaius
P
11

Identifying / non-identifying relationships are concepts in ER modelling - a relationship being an identifying one if it is represented by a foreign key that is part of the referencing table's primary key. This is usually of very little importance in relational modelling terms because primary keys in the relational model and in SQL databases do not have any special significance or function as they do in an ER model.

For example, suppose your table enforces two candidate keys, A and B. Suppose A is also a foreign key in that table. The relationship thus represented is deemed to be "identifying" if A is designated to be the "primary" key, but it is non-identifying if B is the primary key. Yet the form, function and meaning of the table is identical in each case! This is why in my opinion I don't think the identifying / non-identifying concept is really very important.

Procathedral answered 11/5, 2010 at 22:13 Comment(2)
+1 - Thanks for clearing this up! I (and another co-worker, also not familiar with database design) were struggling with this as we were not seeing why one or the other mattered as it achieved the same effect. This really helps.Belloir
To follow up on your answer, could you please answer or comment on this question about why it even matters in ERD design? #34846918Precipitation
B
9

I believe the only difference between an identifying and non identifying relationship is about Nullability of the foreign key. If a FK cannot be NULL, the relationship it represents is identifying (a child cannot exist without a parent), else it is non identifying.

Biform answered 23/12, 2013 at 11:37 Comment(1)
But in the answer of @bill-karwin here he said that a non-identifying relationship can be optional or mandatoryExtenuate
F
7

Part of the issue here is the confusion of terminology. Identifying relationships are useful for avoiding long join paths.

The best definition I have seen is "an identifying relationship includes the PK as of the parent in the the child PK. In other words, the PK of the child includes the FK to the parent as well as the "actual" PK of the child.

Flowery answered 2/9, 2014 at 13:17 Comment(1)
+1 for "identifying relationships are useful for avoiding long join paths". It would be great if you elaborated more on this.Catharine
H
1

Yes, go with the first one, but I don't think the second one contradicts the first one. It's just formulated a little bit confusing...

I just checked. The second question's answer is wrong in some assumptions. book-author is not necessarily a 1:n relation, as it could be m:n. In relational databases that create an intersection table for this m:n relation, and you get identifying relations between the intersection table and those other two tables.

Hidie answered 11/5, 2010 at 21:19 Comment(0)
E
1

Identifying relationship gives out a one-to-many optional relationship when we have to define a parent-to-child relationship. In addition, it gives a one-to-only one relationship from child to parent flow. Since the parent entity primary key will be the part of the primary key of the child entity, the child entity instance will identify the parent entity instance. It is represented by a solid line in the ER diagram.

Whereas a non-identifying relationship will have a many-to-many relationship. For the existence of the child entity instance, there should have parent an entity instance, but each entity instance in the child entity may be related to many entity instances of the parent entity. This is the reason why the primary key of the parent entity will be the foreign key of the child entity, but the child entity will not take the parent entity primary key as its primary key. It will have its own primary key.

A many-to-many relation doesn't exist in the real-world ER diagram. So it needs to be resolved.

Equites answered 18/12, 2013 at 0:15 Comment(0)
A
1

An identifying relationship is indeed an ERD concept as this is the domain of conceptual modelling, modelling our understanding of the 'universe of discourse'. It is a parent-child relationship whereby we model the fact that identity of each child object is (at least in part) established/determined by the identity of the parent object. It is therefore mandatory, and immutable.

A real-world example is with the perennial challenge of identifying people. A person's unique identity can be (in part) defined by their relationship with their birth mother and father. When known, these are immutable facts. Therefore the relationship between birth parent and child is an identifying relationship in that it contributes (immutably) to defining the identity of the child.

It is these qualities and the use of relational DBMS constructs that result in the PK of the child being a composite key that includes, via FK, the PK of the parent. As a PK, the identity of the child is mandatory and immutable (it can't change). A 'change' in a PK is in fact instantiating a new object. Therefore the PK must not be able to be changed. The immutability of a PK should also be constrained. DB constraints can be used to implement that quality of PKs.

Apollonius answered 25/11, 2018 at 10:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.