Why doesn't SQL support "= null" instead of "is null"?
Asked Answered
E

13

28

I'm not asking if it does. I know that it doesn't.

I'm curious as to the reason. I've read support docs such as as this one on Working With Nulls in MySQL but they don't really give any reason. They only repeat the mantra that you have to use "is null" instead.

This has always bothered me. When doing dynamic SQL (those rare times when it has to be done) it would be so much easier to pass "null" into where clause like this:

@where = "where GroupId = null"

Which would be a simple replacement for a regular variable. Instead we have to use if/else blocks to do stuff like:

if @groupId is null then
     @where = "where GroupId is null"
else
     @where = "where GroupId = @groupId"
end

In larger more-complicated queries, this is a huge pain in the neck. Is there a specific reason that SQL and all the major RDBMS vendors don't allow this? Some kind of keyword conflict or value conflict that it would create?

Edit:

The problem with a lot of the answers (in my opinion) is that everyone is setting up an equivalency between null and "I don't know what the value is". There's a huge difference between those two things. If null meant "there's a value but it's unknown" I would 100% agree that nulls couldn't be equal. But SQL null doesn't mean that. It means that there is no value. Any two SQL results that are null both have no value. No value does not equal unknown value. Two different things. That's an important distinction.

Edit 2:

The other problem I have is that other HLLs allow null=null perfectly fine and resolve it appropriately. In C# for instance, null=null returns true.

Exploration answered 16/8, 2011 at 13:20 Comment(5)
Relevant Wikipedia article Null (SQL)Saffian
Instead of this clause, it's perfectly acceptable to make your where clause something like "where GroupId is null OR GroupId = @groupId"... depending on your RDBMS, certain phrasing will be more optimal than others.Mcalister
Good point, Jeremy. And I've done that in a lot of queries. I forgot about OR-ing the other state in there. Still, I find it frustrating that I have to.Exploration
possible duplicate of Why does NULL = NULL evaluate to false in SQL serverLexielexigraphy
NULL is a placeholder for a value that is missing. Why that value is missing is semantics which are provided by your data model and/or your application, not the SQL language. This includes your "No value" interpretation as well as everyone else's. As for the history, Ted Codd proposed it and SQL implemented it. Codd went on to propose a second kind of NULL but everyone ignored him that time.Miltonmilty
C
28

The reason why it's off by default is that null is really not equal to null in a business sense. For example, if you were joining orders and customers:

select * from orders o join customers c on c.name = o.customer_name

It wouldn't make a lot of sense to match orders with an unknown customer with customers with an unknown name.

Most databases allow you to customize this behaviour. For example, in SQL Server:

set ansi_nulls on
if null = null  
    print 'this will not print' 
set ansi_nulls off
if null = null  
    print 'this should print'
Cajole answered 16/8, 2011 at 13:36 Comment(0)
G
11

Equality is something that can be absolutely determined. The trouble with null is that it's inherently unknown. If you follow the truth table for three-value logic, null combined with any other value is null - unknown. Asking SQL "Is my value equal to null?" would be unknown every single time, even if the input is null. I think the implementation of IS NULL makes it clear.

Gallfly answered 16/8, 2011 at 13:23 Comment(10)
See, I don't agree with that - the truth tables I mean. Why can't SQL determine that the comparison value is null and return a true? I don't understand why that must be the case. In JavaScript, NaN doesn't equal other NaN variables because you could have two different values both of which are NaN (such as "abc" and "def"). But here, with SQL null, the unknown is always the same.Exploration
This has its roots in First-Order Logic and Set Theory. I'd say it was carried into ANSI SQL for consistency.Gallfly
@HLGEM: Of course it can. As others have pointed out, in SQL Server, you can turn off ANSI_NULL which makes NULL=NULL return true. (So your last sentence is incorrect)Exploration
Turning off ANSI_NULL is a product feature. It has no place in the logic theory that created the concept of null in the first place. Furthermore, taking advantage of turning off ANSI_NULL can be very, very confusing and lead to SQL scripts that are hard to debug or have unexpected results.Gallfly
Thinking through this more, my main problem is the equivalency everyone is making between no value and unknown value. Those are two very different things. Two different SQL results that are both null mean that both of them have no value. The value isn't merely unknown, it doesn't exist.Exploration
null can actually mean "not applicable", "unknown", or "value not present". Those are not equivalent situations.Gallfly
@HLGEM: My "wants" have nothing to do with it. Stick to the debate at hand. As far as math goes, you're wrong. Null does not mean "unknown", it means "no value". Furthermore, to suggest that null=null is mathematically incorrect is to suggest that every other HLL that accepts that (C#, C++, VB, PHP, etc) is also mathematically incorrect. As one looks across the programming world, one notices that null=null is accepted by the majority of HLLs. SQL is in the minority.Exploration
@Sohtimsso1970 - you're ascribing a meaning to four letters (NULL) that doesn't have a specific, inherent meaning. It no more means no value than it means unknown. However, in the SQL language, it is defined to be a placeholder for missing or unknown information. What it means in other languages has no relevance when asking a question about SQL.Silvas
@Damien_The_Unbeliever: Look at the Wikipedia page for SQL null: "indicates... does not exist". Other HLL environments use the same definition. To me, there's a clear difference between a value being unknown and a value not existing. If the value was merely unknown, I'd agree that you couldn't equate the two. The known absence of a value should be comparable though.Exploration
@Sohtimsso1970 - "that a data value does not exist in the database" - it's not an assertion that the value doesn't necessarily exist for the item being modelled, just that it is "missing information [or] inapplicable information".Silvas
D
6

It's a language semantic.

Null is the lack of a value.

is null makes sense to me. It says, "is lacking a value" or "is unknown". Personally I've never asked somebody if something is, "equal to lacking a value".

Dalt answered 16/8, 2011 at 13:23 Comment(3)
so if it's just a language semantic, why not let the language parser interpret = null the same way as is nullLiquidator
And in fact, MySQL, according to the part of the documentation linked to by the original poster, just plays with you by replying “NULL” if you ask if anything “is equal to NULL” :-)Dumpcart
Agreed - "=null" seems to imply that two non-existent things are equal... ie., lochNessMonster=bigFoot , just because neither of them exist (at least, I think not ;-) )Nagging
S
5

I can't help but feel that you're still not satisfied with the answers that have been given so far, so I thought I'd try another tack. Let's have an example (no, I've no idea why this specific example has come into my head).

We have a table for employees, EMP:

EMP
---
EMPNO           GIVENNAME
E0001           Boris
E0002           Chris
E0003           Dave
E0004           Steve
E0005           Tony

And, for whatever bizarre reason, we're tracking what colour trousers each employee chooses to wear on a particular day (TROUS):

TROUS
-----
EMPNO       DATE        COLOUR
E0001       20110806    Brown
E0002       20110806    Blue
E0003       20110806    Black
E0004       20110806    Brown
E0005       20110806    Black
E0001       20110807    Black
E0003       20110807    Black
E0004       20110807    Grey

I could go on. We write a query, where we want to know the name of every employee, and what colour trousers they had on on the 7th August:

SELECT e.GIVENNAME,t.COLOUR
FROM
    EMP e
        LEFT JOIN
    TROUS t
        ON
             e.EMPNO = t.EMPNO and
             t.DATE = '20110807'

And we get the result set:

GIVENNAME       COLOUR
Chris           NULL
Steve           Grey
Dave            Black
Boris           Black
Tony            NULL

Now, this result set could be in a view, or CTE, or whatever, and we might want to continue asking questions about these results, using SQL. What might some of these questions be?

  1. Were Dave and Boris wearing the same colour trousers on that day? (Yes, Black==Black)

  2. Were Dave and Steve wearing the same colour trousers on that day? (No, Black!=Grey)

  3. Were Boris and Tony wearing the same colour trousers on that day? (Unknown - we're trying to compare with NULL, and we're following the SQL rules)

  4. Were Boris and Tony not wearing the same colour trousers on that day? (Unknown - we're again comparing to NULL, and we're following SQL rules)

  5. Were Chris and Tony wearing the same colour trousers on that day? (Unknown)

Note, that you're already aware of specific mechanisms (e.g. IS NULL) to force the outcomes you want, if you've designed your database to never use NULL as a marker for missing information.

But in SQL, NULL has been given two roles (at least) - to mark inapplicable information (maybe we have complete information in the database, and Chris and Tony didn't turn up for work that day, or did but weren't wearing trousers), and to mark missing information (Chris did turn up that day, we just don't have the information recorded in the database at this time)

If you're using NULL purely as a marker of inapplicable information, I assume you're avoiding such constructs as outer joins.


I find it interesting that you've brought up NaN in comments to other answers, without seeing that NaN and (SQL) NULL have a lot in common. The biggest difference between them is that NULL is intended for use across the system, no matter what data type is involved.

You're biggest issue seems to be that you've decided that NULL has a single meaning across all programming languages, and you seem to feel that SQL has broken that meaning. In fact, null in different languages frequently has subtly different meanings. In some languages, it's a synonym for 0. In others, not, so the comparison 0==null will succeed in some, and fail in others. You mentioned VB, but VB (assuming you're talking .NET versions) does not have null. It has Nothing, which again is subtly different (it's the equivalent in most respects of the C# construct default(T)).

Silvas answered 17/8, 2011 at 4:45 Comment(3)
In what language does null=0? Never seen that before. I keep returning to the definition of null being "does not exist" (which is the same as saying "nothing" in VB) because that's what all the definitions say. I shared the Wikipedia link earlier. There are plenty of other sources. The only places I see where null=unknown is in discussions such as this where commenters insist that it means that. In the same Wikipedia page, there's a section specifically talking about "unknown" comparisons and it says that that definition is in conflict with the original spec.Exploration
I don't personally "care" or "want" it to be one way or another. I've just noticed, objectively, that a) almost all other programming languages allow null=null and b) that SQL defines null as "does not exist" which is in keeping with the same principal. I guess my beef is more with the assertion that null=unknown.Exploration
Thanks for the added information here. +1 for that. Hope I'm not coming across needlessly argumentative. I appreciate everyone talking about it.Exploration
W
3

The concept is that NULL is not an equitable value. It denotes the absence of a value.

Therefore, a variable or a column can only be checked if it IS NULL, but not if it IS EQUAL TO NULL.

Once you open up arithmetic comparisions, you may have to contend with IS GREATER THAN NULL, or IS LESS THAN OR EQUAL TO NULL

Witenagemot answered 16/8, 2011 at 13:23 Comment(0)
M
3

NULL is unknown. It is neither true nor false so when you are comparing anything to unknown, the only answer is "unknown" Much better article on wikipedia http://en.wikipedia.org/wiki/Null_(SQL)

Mode answered 16/8, 2011 at 13:24 Comment(0)
M
2

Because in ANSI SQL, null means "unknown", which is not a value. As such, it doesn't equal anything; you can just evaluate the value's state (known or unknown).

Mcalister answered 16/8, 2011 at 13:23 Comment(6)
Why can't "not a value" equal "not a value" when doing a comparison? In JavaScript, NaN doesn't equal other NaN variables because you could have two different variable values both of which are NaN. But here, with SQL null, the unknown is always the same.Exploration
It's an ANSI SQL definition. In SQL Server, if you shut off ANSI_NULLS option (NULL = NULL) is true ... otherwise, (NULL = NULL) is NULL.Mcalister
@sohtimsso1970: NaN means “not a number”. It has no specified value, and the fact that you could interpret it as having some specific value in a particular implementation is irrelevant. The same applies to NULL in SQL.Dumpcart
@Arthur: Not exactly. In SQL, null is null is null is null. NaN is different in that the underlying value can be many different things. There is no difference between one SQL result that is null and another.Exploration
I made an edit on the OP to try and clarify my position. I think the problem is trying to say that no value is the same as unknown value. (My point about NaN is that you can have two variables that are different, say "abc" and "def", which are both NaN but are not equal)Exploration
@Exploration And this is exactly the case with SQL NULL too. NULL can mean "unknown", not only "no value". If two values are unknown, then you don't know if they're equal. That is, their equality is unknown. Or in SQL, NULL = NULL is NULL.Brownedoff
S
2

a. Null is not the "lack of a value"

b. Null is not "empty"

c. Null is not an "unset value"

It's all of the above and none of the above.

By technical rights, NULL is an "unknown value". However, like uninitialized pointers in C/C++, you don't really know what your pointing at. With databases, they allocate the space but do not initialize the value in that space.

So, it is an "empty" space in the sense that it's not initialized. If you set a value to NULL, the original value stays in that storage location. If it was originally an empty string (for example), it will remain that.

It's a "lack of a value" in the fact that it hasn't been set to what the database deems a valid value.

It's an "unset value" in that if the space was just allocated, the value that is there has never been set.

"Unknown" is the closest that we can truly come to knowing what to expect when we examine a NULL.


Because of that, if we try to compare this "unknown" value, we will get a comparison that

a) may or may not be valid

b) may or may not have the result we expect

c) may or may not crash the database.

So, the DBMS systems (long ago) decided that it doesn't even make sense to use equality when it comes to NULL.

Therefore, "= null" makes no sense.

Sansculotte answered 19/8, 2011 at 22:12 Comment(0)
D
1

In addition to all that has already been said, I wish to stress that what you write in your first line is wrong. SQL does support the “= NULL” syntax, but it has a different semantic than “IS NULL” – as can be seen in the very piece of documentation you linked to.

Dumpcart answered 16/8, 2011 at 13:33 Comment(0)
P
1

I agree with the OP that

where column_name = null

should be syntactic sugar for

where column_name is null

However, I do understand why the creators of SQL wanted to make the distinction. In three-valued logic (IMO this is a misnomer), a predicate can return two values (true or false) OR unknown which is technically not a value but just a way to say "we don't know which of the two values this is". Think about the following predicate in terms of three-valued logic:

A == B

This predicate tests whether A is equal to B. Here's what the truth table looks like:

    T U F
    -----
T | T U F
U | U U U
F | F U T

If either A or B is unknown, the predicate itself always returns unknown, regardless of whether the other one is true or false or unknown.

In SQL, null is a synonym for unknown. So, the SQL predicate

column_name = null

tests whether the value of column_name is equal to something whose value is unknown, and returns unknown regardless of whether column_name is true or false or unknown or anything else, just like in three-valued logic above. SQL DML operations are restricted to operating on rows for which the predicate in the where clause returns true, ignoring rows for which the predicate returns false or unknown. That's why "where column_name = null" doesn't operate on any rows.

Persistent answered 22/10, 2014 at 18:19 Comment(0)
M
0

NULL doesn't equal NULL. It can't equal NULL. It doesn't make sense for them to be equal.

A few ways to think about it:

  1. Imagine a contacts database, containing fields like FirstName, LastName, DateOfBirth and HairColor. If I looked for records WHERE DateOfBirth = HairColor, should it ever match anything? What if someone's DateOfBirth was NULL, and their HairColor was too? An unknown hair color isn't equal to an unknown anything else.

  2. Let's join the contacts table with purchases and product tables. Let's say I want to find all the instances where a customer bought a wig that was the same color as their own hair. So I query WHERE contacts.HairColor = product.WigColor. Should I get matches between every customer I don't know the hair color of and products that don't have a WigColor? No, they're a different thing.

  3. Let's consider that NULL is another word for unknown. What's the result of ('Smith' = NULL)? The answer is not false, it's unknown. Unknown is not true, therefore it behaves like false. What's the result of (NULL = NULL)? The answer is also unknown, therefore also effectively false. (This is also why concatenating a string with a NULL value makes the whole string become NULL -- the result really is unknown.)

Maidservant answered 18/8, 2011 at 14:24 Comment(0)
I
0

Why don't you use the isnull function?

@where = "where GroupId = "+ isnull(@groupId,"null")
Illusion answered 17/10, 2013 at 13:20 Comment(0)
A
0

The problem here is that even if we accept the idea that NULL should mean "unknown", SQL handles it incorrectly. Here's why.

The result of a comparison involving "unknown" is itself unknown. That's fine. And as long as you're dealing entirely with three-valued logic, there's no problem... at least in a theoretical sense. (From a practical standpoint, it will still cause issues, but generally minor ones.)

The problem comes up when this value is dropped into a WHERE clause. A WHERE clause can't maintain this indeterminate "unknown" state. It's got a bunch of rows, and for each row it has to decide whether to include it in the result set, or exclude it.

What is the correct thing for a computer to do when it's presented with a decision for which it has insufficient data? Simple: Throw an error! This would be annoying to deal with, but at least it would immediately alert you to a mistake.

And this makes perfect sense if you step back and consider what's going on. You want to find each record that meets X condition. If you encounter a record where you can't determine if it meets the condition or not, then there is a possibility that it does meet the condition, so you shouldn't leave it out. But there is also a possibility that it doesn't, so you shouldn't put it in. The correct response is to go back to the person who asked for this data and say, "What should I do here?"

But SQL for some reason prefers to coerce that NULL value into FALSE whenever it meets a WHERE clause, and thus was born the insanity that every novice SQL dev endures as a rite of passage.

TL;DR: Silent type coercion is the root of all evil.

Azimuth answered 13/5 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.