NULL values inside NOT IN clause
Asked Answered
F

12

323

This issue came up when I got different records counts for what I thought were identical queries one using a not in where constraint and the other a left join. The table in the not in constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.

To state it simply, why does query A return a result but B doesn't?

A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)

This was on SQL Server 2005. I also found that calling set ansi_nulls off causes B to return a result.

Folderol answered 24/9, 2008 at 18:51 Comment(0)
S
342

Query A is the same as:

select 'true' where 3 = 1 or 3 = 2 or 3 = 3 or 3 = null

Since 3 = 3 is true, you get a result.

Query B is the same as:

select 'true' where 3 <> 1 and 3 <> 2 and 3 <> null

When ansi_nulls is on, 3 <> null is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.

When ansi_nulls is off, 3 <> null is true, so the predicate evaluates to true, and you get a row.

Subrogate answered 24/9, 2008 at 19:1 Comment(10)
Has anybody ever pointed out that converting NOT IN to a series of <> and changes the semantic behavior of not in this set to something else?Agnate
@Ian - It looks like "A NOT IN ( 'X', 'Y' )" actually is an alias for A <> 'X' AND A <> 'Y' in SQL. (I see that you discovered this yourself in #3925194, but wanted to make sure your objection was addressed in this question.)Flunky
I guess this explains why SELECT 1 WHERE NULL NOT IN (SELECT 1 WHERE 1=0); yields a row instead of the empty resultset I expected.Tjader
This is a very poor behavior of SQL server, because if it expects NULL-comparison using "IS NULL", then it should expand IN clause to that same behavior, and not stupidly apply the wrong semantics to itself.Passacaglia
@binki, You query executes if run here rextester.com/l/sql_server_online_compiler but does not work if run here sqlcourse.com/cgi-bin/interpreter.cgi.Quinones
@IstiaqueAhmed Are you sure that sqlcourse.com’s thing works for any query? I can’t find something that it accepts…Tjader
@binki, it would be helpful if you could elaborate more on what you saidQuinones
@IstiaqueAhmed I’m pretty sure that sqlcourse is just broken: imgur.com/a/zXJle . So there’s no reason for my code to work on it if it is incapable of running T-SQL.Tjader
@Ian Not exactly. If you interpret the meaning of null to be "some unknown value", then the semantics are consistent: 3 in (1, 2, 3, unknown) is true because you know that 3 is in the group. 3 not in (1, 2, unknown) is neither true nor false because it could be in the group. "Given 1, 2, and something we don't know the actual value of, is 3 not in this set?" I'm not sure.Platino
One potential point of confusion here is that case cast(null to boolean) then 'yes' else 'no' end results in no. On the other hand, case cast(null to boolean) = false then 'yes' else 'no' end also results in no... case cast(null to boolean) is null then 'yes' else 'no' end will result in a yes.Platino
D
78

NOT IN returns 0 records when compared against an unknown value

Since NULL is an unknown, a NOT IN query containing a NULL or NULLs in the list of possible values will always return 0 records since there is no way to be sure that the NULL value is not the value being tested.

Daviddavida answered 24/9, 2008 at 19:1 Comment(1)
This is the answer in a nutshell. I found this to be easier to understand even without any example.Veneaux
F
66

Whenever you use NULL you are really dealing with a Three-Valued logic.

Your first query returns results as the WHERE clause evaluates to:

    3 = 1 or 3 = 2 or 3 = 3 or 3 = null
which is:
    FALSE or FALSE or TRUE or UNKNOWN
which evaluates to 
    TRUE

The second one:

    3 <> 1 and 3 <> 2 and 3 <> null
which evaluates to:
    TRUE and TRUE and UNKNOWN
which evaluates to:
    UNKNOWN

The UNKNOWN is not the same as FALSE you can easily test it by calling:

select 'true' where 3 <> null
select 'true' where not (3 <> null)

Both queries will give you no results

If the UNKNOWN was the same as FALSE then assuming that the first query would give you FALSE the second would have to evaluate to TRUE as it would have been the same as NOT(FALSE).
That is not the case.

There is a very good article on this subject on SqlServerCentral.

The whole issue of NULLs and Three-Valued Logic can be a bit confusing at first but it is essential to understand in order to write correct queries in TSQL

Another article I would recommend is SQL Aggregate Functions and NULL.

Federate answered 25/9, 2008 at 9:54 Comment(0)
C
19

Compare to null is undefined, unless you use IS NULL.

So, when comparing 3 to NULL (query A), it returns undefined.

I.e. SELECT 'true' where 3 in (1,2,null) and SELECT 'true' where 3 not in (1,2,null)

will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE

Constellation answered 24/9, 2008 at 18:53 Comment(1)
Great point. select 1 where null in (null) does not return rows (ansi).Amora
H
11

SQL uses three-valued logic for truth values. The IN query produces the expected result:

SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE col IN (NULL, 1)
-- returns first row

But adding a NOT does not invert the results:

SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT col IN (NULL, 1)
-- returns zero rows

This is because the above query is equivalent of the following:

SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT (col = NULL OR col = 1)

Here is how the where clause is evaluated:

| col | col = NULL⁽¹⁾  | col = 1 | col = NULL OR col = 1 | NOT (col = NULL OR col = 1) |
|-----|----------------|---------|-----------------------|-----------------------------|
| 1   | UNKNOWN        | TRUE    | TRUE                  | FALSE                       |
| 2   | UNKNOWN        | FALSE   | UNKNOWN⁽²⁾            | UNKNOWN⁽³⁾                  |

Notice that:

  1. The comparison involving NULL yields UNKNOWN
  2. The OR expression where none of the operands are TRUE and at least one operand is UNKNOWN yields UNKNOWN (ref)
  3. The NOT of UNKNOWN yields UNKNOWN (ref)

You can extend the above example to more than two values (e.g. NULL, 1 and 2) but the result will be same: if one of the values is NULL then no row will match.

Hydrostatics answered 28/11, 2019 at 8:44 Comment(0)
S
10

IF you want to filter with NOT IN for a subquery containg NULLs justcheck for not null

SELECT blah FROM t WHERE blah NOT IN
        (SELECT someotherBlah FROM t2 WHERE someotherBlah IS NOT NULL )
Sarene answered 31/10, 2014 at 22:6 Comment(2)
I had problem with outer join query that not returned any records in special situations, So checked this solution for both Null and exist records scenario and it worked for me, If another issues occurred I'll be mention here, Thanks so much.Morly
+1 I came to the same conclusion, even if I was hoping that there was some built-in method for avoiding doing this IS NOT NULL check.Belen
S
9

The title of this question at the time of writing is

SQL NOT IN constraint and NULL values

From the text of the question it appears that the problem was occurring in a SQL DML SELECT query, rather than a SQL DDL CONSTRAINT.

However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)

When the predicate evaluates to UNKNOWN you don't get any rows.

Although this is the case for SQL DML, when considering constraints the effect is different.

Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by @Brannon):

DECLARE @T TABLE 
(
 true CHAR(4) DEFAULT 'true' NOT NULL, 
 CHECK ( 3 IN (1, 2, 3, NULL )), 
 CHECK ( 3 NOT IN (1, 2, NULL ))
);

INSERT INTO @T VALUES ('true');

SELECT COUNT(*) AS tally FROM @T;

As per @Brannon's answer, the first constraint (using IN) evaluates to TRUE and the second constraint (using NOT IN) evaluates to UNKNOWN. However, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result.

The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec

7.6 where clause

The result of the is a table of those rows of T for which the result of the search condition is true.

4.10 Integrity constraints

A table check constraint is satisfied if and only if the specified search condition is not false for any row of a table.

In other words:

In SQL DML, rows are removed from the result when the WHERE evaluates to UNKNOWN because it does not satisfy the condition "is true".

In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it does satisfy the condition "is not false".

Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)

p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!

Scalable answered 23/9, 2011 at 8:39 Comment(12)
I honestly did not think there was anything left to say on this subject. Interesting.Folderol
@Jamie Ide: Actually, I have another answer on the subject: because NOT IN (subquery) involving nulls can give unexpected results, it's tempting to avoid IN (subquery) completely and always use NOT EXISTS (subquery) (as I once did!) because it seems that it always handles nulls correctly. However, there are cases where NOT IN (subquery) gives the expected result whereas NOT EXISTS (subquery) gives unexpected results! I may get around to writing this up yet if I can find my notes on the subject (need notes because it's non-intuitive!) The conclusion is the same, though: avoid nulls!Scalable
@Scalable I'm confused by your assertion that NULL would need to be special cased to have consistent behaviour (internally consistent, not consistent with the spec). Would it not be enough to change 4.10 to read "A table check constraint is satisfied if and only if the specified search condition is true"?Subway
@DylanYoung: No, the spec is worded that way for a crucial reason: SQL suffers from three value logic, where those values are TRUE, FALSE and UNKNOWN. I suppose 4.10 could have read, "A table check constraint is satisfied if and only if the specified search condition is TRUE or UNKNOWN for every row of a table" - note my the change to the end of the sentence -- which you omitted -- from "for any" to "for all'. I feel the need to capitalise the logical values because the meaning of 'true' and 'false' in natural language must surely refer to classical two-valued logic.Scalable
@Scalable Your change reintroduces the internal inconsistency you mentioned above. My question is, why is the constraint satisfied when the result is UNKNOWN. Or alternatively, why isn't the result row returned when the result is UNKNOWN. You haven't explained the reason for this inconsistency. Just that it exists in the spec. Three value boolean logic (fuzzy logic) is internally consistent (at least within the limits discovered by Godel et. al.), so why isn't the SQL three-value logic?Subway
Ah, but I think I get it. It's an artificial limitation to allow for expected user behaviour (unique constraints aren't applied to NULLs). According to three-value logic though, this behaviour is actually wrong: if each null represents an unknown value then there is nothing to say that when these values are known they won't break the constraint. And of course, I now see what you mean by special casing NULLs.Subway
Though I still don't see (yet) why the opposite change can't be made: return all rows that are either UNKNOWN or TRUE... I'm sure this breaks something else, I just can't pinpoint what. Let me know if you can think of something!Subway
I would say the opposite: our current mental models of computation suffer from the cultural myopia of two-value logic (we only know how to think in binary logic and about binary values). The problem crops up quite frequently in statistics as well, where almost everything is UNKNOWN. Zen Koans can be a good exercise to break through this myopia.Subway
Consider: CREATE TABLE T ( a INT NOT NULL UNIQUE, b INT CHECK( a = b ) ); - the intent here is that b must either equal a or be null. If a constraint had to result TRUE to be satisfied then we'd need to change the constraint to explicitly handle nulls e.g. CHECK( a = b OR b IS NULL ). Thus, every constraint would need to have ...OR IS NULL logic added by the user for each nullable column involved: more complexity, more bugs when they forgot to do so, etc. So I think the SQL standards committee were just try to be pragmatic.Scalable
@DylanYoung: Note that SQL DML isn't consistent as regards nulls. Consider the above table has multiple rows, some but not all of which have null in column b: (1) SELECT COUNT( b ) FROM T returns a number greater than 0. (2) SELECT SUM( b ) FROM T returns null. I think the committee took into consideration how pre-existing SQL products handled each case and what most users would find more useful and decided to be inconsistent. Maybe they did so reluctantly but it wasn't an oversight. There are other 'special cases' to learn too!Scalable
@Scalable I get that part. The question is, why can't the DML be made consistent with the DDL? i.e. by returning all rows for which the where condition evaluates to True or Unknown. The COUNT, SUM case doesn't seem like an inconsistency at all (int(NULL) == UNKNOWN whereas len([null]) == 1; the operators involved in a count and sum are different)Subway
Woops. I see the inconsistency in COUNT now (nothing to do with SUM I think). It treats NULL as non-existent!!! That's a real zinger :( Interestingly, it doesn't treat (NULL, NULL) in the same fashion. SUM seems fine; on PostgreSQL it errors out when passed a NULL, which is roughly equivalent to returning NULL. I suspect this is a concession to reality as you say, but better would be to fix bad programming practice: if you don't want to COUNT NULLs then you should explicitly exclude them from your query.Subway
R
8

In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)

In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.

Roundshouldered answered 24/9, 2008 at 18:58 Comment(0)
V
7

Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.

Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned

In the second however you can reduce it to

select 'true' where 3 not in (null)

So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.

Vertex answered 24/9, 2008 at 19:8 Comment(0)
S
7

It may be concluded from answers here that NOT IN (subquery) doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is NOT IN that handles nulls correctly and returns the correct result, rather than NOT EXISTS.

Consider a table sp to represent suppliers (sno) who are known to supply parts (pno) in quantity (qty). The table currently holds the following values:

      VALUES ('S1', 'P1', NULL), 
             ('S2', 'P1', 200),
             ('S3', 'P1', 1000)

Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.

The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.

The following uses NOT IN to correctly identify supplier 'S2' only:

WITH sp AS 
     ( SELECT * 
         FROM ( VALUES ( 'S1', 'P1', NULL ), 
                       ( 'S2', 'P1', 200 ),
                       ( 'S3', 'P1', 1000 ) )
              AS T ( sno, pno, qty )
     )
SELECT DISTINCT spx.sno
  FROM sp spx
 WHERE spx.pno = 'P1'
       AND 1000 NOT IN (
                        SELECT spy.qty
                          FROM sp spy
                         WHERE spy.sno = spx.sno
                               AND spy.pno = 'P1'
                       );

However, the below query uses the same general structure but with NOT EXISTS but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):

WITH sp AS 
     ( SELECT * 
         FROM ( VALUES ( 'S1', 'P1', NULL ), 
                       ( 'S2', 'P1', 200 ),
                       ( 'S3', 'P1', 1000 ) )
              AS T ( sno, pno, qty )
     )
SELECT DISTINCT spx.sno
  FROM sp spx
 WHERE spx.pno = 'P1'
       AND NOT EXISTS (
                       SELECT *
                         FROM sp spy
                        WHERE spy.sno = spx.sno
                              AND spy.pno = 'P1'
                              AND spy.qty = 1000
                      );

So NOT EXISTS is not the silver bullet it may have appeared!

Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.

This can be achieved (among other possible designs) using two tables:

  • sp suppliers known to supply parts
  • spq suppliers known to supply parts in known quantities

noting there should probably be a foreign key constraint where spq references sp.

The result can then be obtained using the 'minus' relational operator (being the EXCEPT keyword in Standard SQL) e.g.

WITH sp AS 
     ( SELECT * 
         FROM ( VALUES ( 'S1', 'P1' ), 
                       ( 'S2', 'P1' ),
                       ( 'S3', 'P1' ) )
              AS T ( sno, pno )
     ),
     spq AS 
     ( SELECT * 
         FROM ( VALUES ( 'S2', 'P1', 200 ),
                       ( 'S3', 'P1', 1000 ) )
              AS T ( sno, pno, qty )
     )
SELECT sno
  FROM spq
 WHERE pno = 'P1'
EXCEPT 
SELECT sno
  FROM spq
 WHERE pno = 'P1'
       AND qty = 1000;
Scalable answered 27/9, 2011 at 15:0 Comment(1)
Omg. Thank you for actually writing this up.... this was driving me crazy..Veneaux
G
1

this is for Boy:

select party_code 
from abc as a
where party_code not in (select party_code 
                         from xyz 
                         where party_code = a.party_code);

this works regardless of ansi settings

Gag answered 23/6, 2009 at 10:2 Comment(2)
for the original question: B: select 'true' where 3 not in (1, 2, null) a way to remove nulls must be done e.g. select 'true' where 3 not in (1, 2, isnull(null,0)) the overall logic is, if NULL is the cause, then find a way to remove NULL values at some step in the query.Gag
select party_code from abc as a where party_code not in ( select party_code from xyz where party_code is not null) but good luck if you forgot the field allows nulls, which is often the caseGag
F
0

also this might be of use to know the logical difference between join, exists and in http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Flesh answered 24/9, 2008 at 22:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.