Nullable Foreign Key bad practice?
Asked Answered
O

11

125

Let's say you have a table Orders with a foreign key to a Customer Id. Now, suppose you want to add an Order without a Customer Id, (whether that should be possible is another question) you would have to make the foreign key NULL... Is that bad practice or would you rather work with a link table between Orders and Customers? Although the relationship is 1 to n, a link table would make it n to n. On the other hand, with a link table, I don't have those NULLS anymore...

There won't actually be a lot of NULL's in the database, because a record with a foreign key to NULL is just temporarily until a customer for the order is added.

(In my case it isn't an Order and a Customer).

EDIT: What about a unassigned Customer to link to?

Obeded answered 12/11, 2009 at 17:7 Comment(2)
That IS one of the main purposes of having NULLs available in a database schema. Furthermore, that is why you can declare fields NULL or NOT NULL, so that the specific requirements of your schema can be met.Caulk
I initially read the question as Nullable Primary keys, and was about to wade in with some strong advice... :-)Barbel
C
62

Having the link table is probably a better option. At least it does not violate normalization BCNF (Boyce-Codd normal form). however I would favor being pragmatic. If you have very few of these null values and they are only temporary I think you should skip the link table since it only adds complexity to the scheme.

On a side note; using a link table doesn't necessarily make it n to n, if you in the link table use the foreign key that's pointing to your orders table as the primary key in that link table the relationship is still 1..n. There can only be one entry in that link table per order.

Class answered 12/11, 2009 at 17:14 Comment(3)
source__destination_link or SourceDestinationLash
I'd be interested in hearing about a situation when having a link table is better, I have never run into a situation where it would have improved process flow in any way.Manus
As pointed out in my answer I'd be pragmatic in this specific case and not use a link table. I'm sure the normal forms were not invented to improve process flow but rather to ensure consistency and to avoid redundancy. It's a very general discussion though, I think it has to be viewed on a case to case basis.Unruly
S
125

No There is nothing wrong with Nullable FKs. This is common when the entity the FK points to is in a (zero or one) to (1 or many) relationship with the primary Key referenced table.

An example might be if you had both a Physical address and a Mailing address attribute (column) in a table, with FKs to an Address table. You might make the Physical address nullable to handle when the entity only has a post office box (mailing address), and the mailing address nullable to handle when the mailing address is the same as the physical address (or not).

Scholiast answered 12/11, 2009 at 17:12 Comment(2)
Yeah, but then you wouldn't make the addresses the foreign key.Reinhold
If you are implying that you can't do that, sure you can... Try it! If you are saying you wouldn't, why on earth not? Having a column that ostensibly points to a PK in another table (That's what a FK is) that is allowed to contain a value that does not exist in the parent table would be a horrible idea.Scholiast
C
62

Having the link table is probably a better option. At least it does not violate normalization BCNF (Boyce-Codd normal form). however I would favor being pragmatic. If you have very few of these null values and they are only temporary I think you should skip the link table since it only adds complexity to the scheme.

On a side note; using a link table doesn't necessarily make it n to n, if you in the link table use the foreign key that's pointing to your orders table as the primary key in that link table the relationship is still 1..n. There can only be one entry in that link table per order.

Class answered 12/11, 2009 at 17:14 Comment(3)
source__destination_link or SourceDestinationLash
I'd be interested in hearing about a situation when having a link table is better, I have never run into a situation where it would have improved process flow in any way.Manus
As pointed out in my answer I'd be pragmatic in this specific case and not use a link table. I'm sure the normal forms were not invented to improve process flow but rather to ensure consistency and to avoid redundancy. It's a very general discussion though, I think it has to be viewed on a case to case basis.Unruly
F
45

Nullable columns can be in 1NF through 5NF, but not in 6NF according to what I've read.

Only if you know better than Chris Date "what first normal form really means". If x and y are both nullable, and indeed in some row x and y are both null, then WHERE x=y does not yield true. This proves beyond reasonable doubt that null is not a value (because any real value is always equal to itself). And since the RM prescribes that "there must be a value in every cell of a table", any thing that possibly contains nulls, is not a relational thing, and thus the question of 1NF doesn't even arise.

I've heard it argued that Nullable columns in general break the first degree of normalization.

See above for the sound reason underlying that argument.

But in practice it's very practical.

Only if you're immune to the headaches that it usually causes in the entire rest of the world. One such headache (and it's only a minor one, comparatively to other null phenomenons) is the fact that WHERE x=y in SQL actually means WHERE x is not null and y is not null and x=y, but that most programmers simply aren't aware of that fact and just read over it. Sometimes without any harm, other times not.

In fact, nullable columns violate one of the most fundamental database design rules : don't combine distinct information elements in one column. Nulls do exactly that because they combine the boolean value "this field is/is not really present" with the actual value.

Funiculus answered 12/11, 2009 at 18:39 Comment(18)
+1 for "WHERE x is not null and y is not null and x=y". Wasn't aware of that.Josefjosefa
Very nicely laid out argument and examples.Inhuman
One problem. When the value "does not exist" (Which IS a real-world scenario), and the database attribute does not allow nulls, whatever value is in the attribute is WRONG. As to the headaches, remember, KISS, it does not just mean keep it simple, it means Keep it as simple as possible, but no simpler. If the "relational model" require an unrealistic, stupid result, then perhaps the rules need to expand to handle what is necessary real-world data?Scholiast
Recognizing that two-valued logic was insufficient to handle real-world data, Codd himself, some years after publication of his 1970 model, proposed a three-valued logic (True, False, Missing/NULL) version of it to deal with missing information.Scholiast
.... and yes, I have seen dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without-nulls.pdf. I don't consider this solution simpler or easier to use than nulls.Scholiast
It has been shown that a three-valued logic leads to a need for four-valued logic, and that leads to a need for five-valued logic, etc. etc. 2-valued logic is sufficient, but the data structures we get when applyiing it make "as simple as possible" still far less simple than "as simple as we'd want".Funiculus
Well, as soon as you include the word "want", "as simple as we want" becomes, of course, subjective, It then depends on how simple we want it, doesn't it? And do you have a reference for how 3-valued logic requires 4-valued, etc.? The logic behind that argument should be entertaining.Scholiast
Chris Date, Logic & Databases, Chpt 6, "Why relational DBMS logic must not be many-valued", pg 145. The list of references to that chapter should also be interesting, especially those involving McGoveran.Funiculus
@CharlesBretana Yes. Whenever someone tells me, "You must use this awkward, difficult, complicated solution instead of the simple and easy solution because someone wrote in a book that the easy solution is wrong", I can only say, "Why should I follow the rules in this book? Following this rule just makes our lives more difficult." Of course the answer I get is usually, "No, you don't understand. See, here's the rule in the book. This really is the rule."Lilylivered
@Lilylivered Nobody cares if your life is made "more difficult". Your job is to worry about how difficult the life of your users is, not your own. If you do things with the perspective/purpose of making your own life "less difficult", you can bet your arse that some of the complexities will not be addressed where they should be, ending up making the end user's life "more difficult". Complexity always goes somewhere.Funiculus
@ErwinSmout I said "our lives", not "my life". :-) Sure, a lazy programmer might do things to make his own job easier but that make life harder for the user. Inadequate error checking is an obvious example. But "complexity always goes somewhere". No, it doesn't. Sometimes people create complexity out of thin air. I was talking about the sort of rule that creates complexity for no apparent gain. Like if we made a rule that all functions must have exactly four parameters, that would make our programs more difficult to read and understand, while serving no useful purpose.Lilylivered
@Jay, Implicit in your comment are the ideas that a) that using nulls is making my life "easier", or necessitates that error checking be omitted, and b) that making the developer's life "easier" is wrong. Both implications are wrong. When the value can be unknown, using Nulls makes the system more closely reflect reality, and that makes everyone (developers and users)'s lives easier.Scholiast
That's probably why Tony Hoare called it "his billion-dollar mistake". Because it made everyone's lives easier.Funiculus
@CharlesBretana Wait, we may be talking in circles. I agree that using nulls in cases where a value is unknown or inapplicable often makes a program simpler, easier to understand and maintain, and other good things. My comment about error checking was an analogy, I didn't mean that it had anything in particular to do with use of nulls. I am most definitely not against making developers' lives easier, I just conceded the caveat that there's a difference between "made my life easier because this is a better way to do the job" and "made my life easier because I was too lazy to do the job right".Lilylivered
@ErwinSmout So ... I write a post saying that I refuse to accept something as Unquestionable Truth just because somebody wrote it in a book, and your rebuttal is to say that that's not true because somebody said so in a book? :-)Lilylivered
Actually it was not in a book but in a lecture, and it is more than telling that you equate Tony Hoare with "somebody".Funiculus
This was an interesting discussion. I just wanted to chip in, Charles Bretana is correct while Tony Hoare may have called nulls his "billion dollar mistake", he was playfully only looking at the downsides of the null references and none of the benefits. Allowing null references has also caused billions of dollars in benefits, for example by allowing developers to roll out features more quickly. There are real world considerations, for example there is a performance gain to denormalization that translates into $, not having a linking table means less hard drive space that translates into $Hypercatalectic
I would also add that in certain contexts database queries can be much less complex and easier with nulls. A newcomer to SQL could easily query around a single table, using where and select, throw in two more tables and doing a join using a linking table, is a much more complicated process. NULL in a database is one of those examples where breaking the forms results in a much more practical and almost always quicker solution. Time has value to the customers and to the developers and also translates into $. If Tony Hoare did not invent null someone else would have.Hypercatalectic
C
14

I can't see anything wrong with that it is just an optional n-1 relationship that will be represented with a null in the foreign-key. Otherwise if you put your link table then you'll have to manage that it doesn't become a n-n relationship, so causing even more trouble.

Certification answered 12/11, 2009 at 17:10 Comment(3)
Actually it's a 0-N relationship, not an optional 1-N relationship. But I agree with you.Collaboration
Manage? It's a simple UNIQUE constraint on the 0-to-1 side!Raleigh
Yes it is a UNIQUE constraint, but you'll also have to deal with possible exceptions later on with in your code due that constraint...Certification
W
5

Optional relationships are definitely possible in the relational model.

You can use nulls to express the absence of a relationship. They are convenient, but they will cause you the same headaches that nulls cause you elsewhere. One place where they don't cause any trouble is joins. Rows that have a null in the foreign key don't match any rows in the referenced table. So they drop out of an inner join. If you do outer joins, you are going to be dealing with nulls anyway.

If you really want to avoid nulls (6th normal form), you can decompose the table. One of the two decomposed tables has two foreign key columns. One is the optional foreign key you have, and the other is a foreign key referencing the primary key of the original table. Now you have to use constraints to prevent the relationship from becoming many-to-many, it you want to prevent that.

Warrantor answered 12/11, 2009 at 17:47 Comment(0)
N
2

Using NULL would be a good way to clean up incomplete orders:

SELECT * FROM `orders`
WHERE `started_time` < (UNIX_TIMESTAMP() + 900) AND `customer_id` IS NULL

The above would show orders older than 15 minutes without a related customer ID.

Nippers answered 12/11, 2009 at 17:14 Comment(0)
R
1

If you are only adding the order temporarily with no customer id until a customer is defined, would it not be simpler to add the customer and order in a single transaction, thereby removing the need for the NULL foreign key entry and avoiding any constraints or triggers you've set up being violated?

Normally this situation arises ins web apps where the order is detailed before the customer defines who he/she is. And in those situations the order is kept in server state or in a cookie until all the necessary state for a complete order is supplied at which point the order is persisted to the database.

NULL foreign keys are ok for things like addresses, as mentioned above. But a NULL customer field doesn't make sense for an order and should be constrained.

Revamp answered 12/11, 2009 at 17:31 Comment(2)
The order-customer was as an example. In my app it's idd more like addresses. Couldn't find immediately an example that was correct all the way. thx.Obeded
This could be a valid scenario if the database was being used to store items in a shopping cart, where the shopping cart does not belong to a registered user.Evolutionary
T
1

You could always add an artificial row to your Customer table, something like Id=-1 and CustomerName = 'Unknown' and then in cases when you would normally set your CustomerId in Order NULL set it to -1.

This allows you to have no nullable FKs but still represent the lack of data appropriately (and will save you from downstream users not knowing how to deal with NULLs).

Taurus answered 20/5, 2015 at 14:55 Comment(2)
Just to add to this, remember that NULLS don't get stored in an index (in oracle) so this means that skipping the link table and going for the nullable FK would make sense - performance wise. The other other thing it could depend on is if you want to store anything else in this link table, for example, WHO made the link and when? Is the link now inactive/deleted (but once was?)Rhyme
This is a bad idea. If you have a foreign key that is set, and the data it is pointing to is later removed, you won't get the foreign key exception and now your data is senseless. Worse, if something else is later on assigned to that key you're pointing to entirely the wrong customerCytoplast
T
0

Nullable FKs for optional many-to-one relations are totally fine.

Tartuffe answered 12/11, 2009 at 17:11 Comment(0)
L
0

I've heard it argued that Nullable columns in general are break the first degree of normalization. But in practice it's very practical.

Linton answered 12/11, 2009 at 17:16 Comment(1)
Nullable columns can be in 1NF through 5NF, but not in 6NF according to what I've read.Warrantor
N
-1

Yes theres something wrong. Its not a foreign key if its nullable. Its database design by code. Maybe you make a zero link to unassigned. or "Unassigned" if your using a character col. Keep the integrity of your data 100%.

Nanceenancey answered 27/1, 2012 at 21:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.