I've recently implemented INTERSECT ALL and EXCEPT ALL and now found there aren't many resources on that on SO.
Consider the example of the following data.
You can reproduce example on sqlfiddle.com, use postgres 9.3.
Be aware that most of popular databases does not support INTERSECT ALL and EXCEPT ALL. Workarounds using row_number() over ()
are of course possible.
create table x (V1 numeric);
create table y (V1 numeric);
insert into x values (1),(2),(2),(2),(3),(4),(4);
insert into y values (2),(3),(4),(4),(4),(5);
EXCEPT [ALL] matches on all columns from both tables, column types and order must match.
select * from x except select * from y;
| v1
----
| 1
select * from x except all select * from y;
| v1
----
| 1
| 2
| 2
In EXCEPT sql process distinct datasets, so any duplicates are automatically removed leaving only single copy of each row. This results to exclude a row based on match just one in row in second dataset.
On the other hand EXCEPT ALL process datasets taking duplicate rows quantities into account. This results to return the exact difference in duplicate rows between tables. Precisely max(0, x.N - y.N)
.
Another tricky operator, pretty similar to EXCEPT ALL is INTERSECT ALL, this one returns min(x.N, y.N)
duplicates of each matching row.
As the project I've commited the feature is open-source I'm glad to put a link to here: github.com/Rdatatable/data.table. May be useful if you would look for the tools to benchmark against with. data.table
is in-memory and mostly C implemented data processing. It is open-source around 10 years already.
ALL
means (ANSI SQL) "leave duplicates intact" – Skate