In standard SQL there are no parentheses in DISTINCT colA
. DISTINCT
is not a function.
SELECT DISTINCT colA
FROM mytable
WHERE colA NOT IN (SELECT DISTINCT colB FROM mytable);
Added DISTINCT
to the sub-select as well. For many duplicates it can speed up the query. But, typically, there are more efficient query techniques than NOT IN
for this:
The deciding factor for speed will be indexes. You need indexes on colA
and colB
for this query to be fast.
A CTE might be faster, depending on your DBMS. I additionally demonstrate LEFT JOIN
as alternative to exclude the values in valB
, and an alternative way to get distinct values with GROUP BY
:
WITH x AS (SELECT colB FROM mytable GROUP BY colB)
SELECT m.colA
FROM mytable m
LEFT JOIN x ON x.colB = m.colA
WHERE x.colB IS NULL
GROUP BY m.colA;
Or, simplified further, and with a plain subquery (probably fastest):
SELECT DISTINCT m.colA
FROM mytable m
LEFT JOIN mytable x ON x.colB = m.colA
WHERE x.colB IS NULL;
NOT IN
slows down as the set size grows, and there's often a limit to how many rows can be in theNOT IN
clause. Outside of small result sets I've found it better to use other means to get the difference between two result sets. – Ramonaramonda