Support for Using Surrogate Keys in Table Definitions
Thanks to @Branko Dimitrijevic for opening the topic of relational database primary keys (PK's) by describing the role of SURROGATE KEYS
and getting to the center of the discussion. Surrogate keys are by definition devoid of any intrinsic meaning aside from their uniqueness between their values in each record of their table.
Thanks also to @Mattias Åslund for your additional wisdom:
Mind you most things that "will never change" tends to do so anyway given enough time, like someones social security number...
I add that even if the value chosen as "unchangeable" really does not change, it is also just as likely for the rules of the supported business or organization itself to drift and change over time in ways that may affect the core assumptions of given design.
A useful discussion on the integration of demographic and biometric key values for tracking individuals can be found in this section on Choosing an Appropriate Key for New Databases put out by the Computer Professionals for Social Responsibility.
Cases of Alternate Unique Database Keys: A Sample Schema
I plan to approach the comments on this post with a discussion around a specific example design to explain the kinds of things that can go wrong with assigning a Primary Key that isn't a surrogate key. Many of these assumptions are taken from observations from real applications. They are remembered well because of the complexity their design introduced into other business processes as other systems and data sources became dependent on their assumptions.
The design and sample data is below, borrowed loosely from Oracle's infamous Scott/TIGER database design.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE employee
(
fake_ssn varchar(15) primary key,
last_name varchar(40),
first_name varchar(40),
dept_id varchar(15),
hire_date date,
salary int,
email varchar(100)
);
INSERT INTO employee
(fake_ssn, last_name, first_name, dept_id, hire_date, salary,
email)
VALUES
('130-60-0101', 'MARLOWE', 'JACOB', '1200-05', date('2009/01/25'),
8000, '[email protected]'),
('967-22-5025', 'CRACHITT', 'BOB', '1200-05', date('2010/02/05'),
500, '[email protected]'),
('040-36-5555', 'PERRY', 'VICTORIA', '1200-02', date('2011/05/25'),
2700, '[email protected]'),
('203-89-1010', 'STEVENS', 'KEVIN', '2955-03', date('2007/04/25'),
1800, '[email protected]'),
('409-99-1111', 'MCLANE', 'JONATHAN', '2955-03', date('2009/03/02'),
4200, '[email protected]');
CREATE TABLE department
(
dept_id varchar(15) primary key,
dept_manager varchar(40),
dept_title varchar(40)
);
INSERT INTO department
(dept_id, dept_manager, dept_title)
VALUES
('1200-05', 'MARLOWE', 'FINANCE'),
('1200-02', null, 'HR'),
('2955-03', 'JOHNM', 'MARKETING');
COMMIT;
Query 1:
SELECT fake_ssn, last_name, first_name, dept_id, hire_date,
salary, email
FROM employee
Results:
| FAKE_SSN | LAST_NAME | FIRST_NAME | DEPT_ID | HIRE_DATE | SALARY | EMAIL |
|-------------|-----------|------------|---------|---------------------------------|--------|--------------------------------|
| 040-36-5555 | PERRY | VICTORIA | 1200-02 | May, 25 2011 00:00:00+0000 | 2700 | [email protected] |
| 130-60-0101 | MARLOWE | JACOB | 1200-05 | January, 25 2009 00:00:00+0000 | 8000 | [email protected] |
| 203-89-1010 | STEVENS | KEVIN | 2955-03 | April, 25 2007 00:00:00+0000 | 1800 | [email protected] |
| 409-99-1111 | MCLANE | JONATHAN | 2955-03 | March, 02 2009 00:00:00+0000 | 4200 | [email protected] |
| 967-22-5025 | CRACHITT | BOB | 1200-05 | February, 05 2010 00:00:00+0000 | 500 | [email protected] |
Query 2:
SELECT dept_id, dept_manager, dept_title
FROM department
Results:
| DEPT_ID | DEPT_MANAGER | DEPT_TITLE |
|---------|--------------|------------|
| 1200-02 | (null) | HR |
| 1200-05 | MARLOWE | FINANCE |
| 2955-03 | JOHNM | MARKETING |
Fake Social Security Number
(The FAKE
name is just a reminder that these are all randomly generated values.) Although this is often a popular "unqiue" value in personnel records and databases, according to the U.S. Social Security Administration, this value is not unique. It is also problematic because this value and its transfer is subject to strict regulation because of recently passed privacy laws.
Name Combinations
Even with the additional combinations created by including the middle initial, somehow there are still just too many people with the same name. A look at what the Social Security Administration has to say about the registered names of babies born in 2012:
Two decades from now, when the JACOB's and SOPHIA's of 2012 graduate from school, they will flood into the workforce beside thousands of others like them...
Name changes by marriage or legal reasons are also threats to the referential integrity of database records relying on their values as business key values.
Department ID
Some companies will try to derive keys from other values to make SMART KEYS
. These types of keys when observed in practice are NOT smart at all. The values in the example: 1200-02
, 1200-05
, 2955-03
were intended to resemble a "smart key". The first value might be a street address or building number for a company campus or multiple-location business. The second value, ("-02", "-03", "-05") might be the floor of the building where the department is located.
Changing buildings, moving a department or relocating a business completely would render this location dependency of the DEPARTMENT ID
useless.
Department Manager
This one is subtle, but there is a hole in this relational connection. A MANAGER
is ALSO an employee, which makes the relational join between EMPLOYEE
and DEPARTMENT
a circular one:
- Is the
MANAGER
(from DEPARTMENT
) a foreign key constraint on the EMPLOYEE
table, or
- Is
DEPT_ID
(FROM EMPLOYEE
) a foreign key constraint on the DEPARTMENT
table?
If you give up the Foreign Key constraint between MANAGER
and some key column on EMPLOYEE
(LAST_NAME
or FIRST_NAME
+ LAST_NAME
), you risk non uniform values for MANAGER
.
... Looking at
Query of The DEPARTMENT
Table:
| DEPT_ID | DEPT_MANAGER | DEPT_TITLE |
|---------|--------------|------------|
| 1200-02 | (null) | HR |
| 1200-05 | MARLOWE | FINANCE |
| 2955-03 | JOHNM | MARKETING |
The misplacement of DEPT_MANAGER
within the DEPARTMENT
table, as there are three different ways the name of the Department Manager are represented: none (null), ALL-CAPS Last Name, ALL-CAPS First Name, Last Initial.
Conclusions
An important lesson to take from this posting is that making a key more than a key either by integrating derived values, making values based on assumptions about business rules tie down the flexibility of a database design because if the business rules change, so will the values such as Primary Keys or Joining Key values.
As the developer or maintainer of the business application, you (or your team) are better able to support the prevailing business conditions if you have taken control and own the parts that represent the internal structure of the business application itself. The primary key might never actually present itself in a customer or user-facing situation, but it should be protected so that the relations it represents does not change with time.
Special Thanks:
Image Credit from the popular Baby Names page of 2012:
http://www.ssa.gov/OACT/babynames/#ht=0