This question requires some hypothetical background. Let's consider an employee
table that has columns name
, date_of_birth
, title
, salary
, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name
and date_of_birth
that means "don't store the same person twice." Now consider this data:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
If I now try to run the following statement, it should and will fail:
INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')
If I try this one, it will succeed:
INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')
And now my data will look like this:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
5 Jim Johnson NULL Office Manager 40,000
This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,
{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN
My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL
birth date isn't already in this table, I'll add him."
My question is: How can I prevent duplicates even though date_of_birth
is not always known? The best I've come up with so far is to move date_of_birth
to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.
user_id
,team_scope
,team_id
. Here,team_scope
is an enumerated field whose values represent "specific team", "all teams of which I am captain" and "all teams in my club"; only in the first case willteam_id
be populated. But whenteam_scope
is one of the other values, the uniqueness constraint should still work. – Energeticsteam_scope
and define magic values ofteam_id
, though this isn't ideal from a theoretical point of view.) – Energetics