I have 2 tables, "transaksi" and "buku". "transaksi" has around ~250k rows, and buku has around ~170k rows. Both tables have column called "k999a", and both tables use no indexes. Now I check these 2 statements.
Statement 1:
explain select k999a from transaksi where k999a not in (select k999a from buku);
Statement 1 outputs:
Seq Scan on transaksi (cost=0.00..721109017.46 rows=125426 width=9)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..5321.60 rows=171040 width=8)
-> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)
Statement 2:
explain select k999a from transaksi where k999a in (select k999a from buku);
Statement 2 outputs:
Hash Semi Join (cost=6604.40..22664.82 rows=250853 width=9)
Hash Cond: (transaksi.k999a = buku.k999a)
-> Seq Scan on transaksi (cost=0.00..6356.53 rows=250853 width=9)
-> Hash (cost=3797.40..3797.40 rows=171040 width=8)
-> Seq Scan on buku (cost=0.00..3797.40 rows=171040 width=8)
Why in the NOT IN query, postgresql does loop join, making the query takes a long time?
PS: postgresql version 9.6.1 on windows 10
buku
in one case, and materializebuku
in memory in the other case. The bottom line is that if you really wanted these queries to run fast, you'd indexk999a
in thebuku
table. – Frasquitok999a
? Are you saying Postgres would not take advantage of this index, if it existed? – Frasquito(select k999a from buku)
you get a materialize table and you need the whole table to checkNOT IN
so in that case an index doesnt help. On the second one even when also a select looks like planner do aSEMI JOIN
in that case the index would help – Nebulose