It may be concluded from answers here that NOT IN (subquery)
doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS
. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is NOT IN
that handles nulls correctly and returns the correct result, rather than NOT EXISTS
.
Consider a table sp
to represent suppliers (sno
) who are known to supply parts (pno
) in quantity (qty
). The table currently holds the following values:
VALUES ('S1', 'P1', NULL),
('S2', 'P1', 200),
('S3', 'P1', 1000)
Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.
The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.
The following uses NOT IN
to correctly identify supplier 'S2' only:
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND 1000 NOT IN (
SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
);
However, the below query uses the same general structure but with NOT EXISTS
but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND NOT EXISTS (
SELECT *
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
AND spy.qty = 1000
);
So NOT EXISTS
is not the silver bullet it may have appeared!
Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.
This can be achieved (among other possible designs) using two tables:
sp
suppliers known to supply parts
spq
suppliers known to supply parts in known quantities
noting there should probably be a foreign key constraint where spq
references sp
.
The result can then be obtained using the 'minus' relational operator (being the EXCEPT
keyword in Standard SQL) e.g.
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1' ),
( 'S2', 'P1' ),
( 'S3', 'P1' ) )
AS T ( sno, pno )
),
spq AS
( SELECT *
FROM ( VALUES ( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT sno
FROM spq
WHERE pno = 'P1'
EXCEPT
SELECT sno
FROM spq
WHERE pno = 'P1'
AND qty = 1000;
NOT IN
to a series of<> and
changes the semantic behavior of not in this set to something else? – Agnate