Postgresql huge performance difference when using IN vs NOT IN - McMap

About

Postgresql huge performance difference when using IN vs NOT IN

Asked 4/10, 2018 at 3:30 Answered 4/10, 2018 at 3:34

Solved postgresql sql-execution-plan explain

S

1

6

I have 2 tables, "transaksi" and "buku". "transaksi" has around ~250k rows, and buku has around ~170k rows. Both tables have column called "k999a", and both tables use no indexes. Now I check these 2 statements.

Statement 1:

explain select k999a from transaksi where k999a not in (select k999a from buku);

Statement 1 outputs:

 Seq Scan on transaksi  (cost=0.00..721109017.46 rows=125426 width=9)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..5321.60 rows=171040 width=8)
           ->  Seq Scan on buku  (cost=0.00..3797.40 rows=171040 width=8)

Statement 2:

explain select k999a from transaksi where k999a in (select k999a from buku);

Statement 2 outputs:

Hash Semi Join  (cost=6604.40..22664.82 rows=250853 width=9)
   Hash Cond: (transaksi.k999a = buku.k999a)
   ->  Seq Scan on transaksi  (cost=0.00..6356.53 rows=250853 width=9)
   ->  Hash  (cost=3797.40..3797.40 rows=171040 width=8)
         ->  Seq Scan on buku  (cost=0.00..3797.40 rows=171040 width=8)

Why in the NOT IN query, postgresql does loop join, making the query takes a long time?

PS: postgresql version 9.6.1 on windows 10

Sanguinary answered 4/10, 2018 at 3:30 Comment(7)

why no indexes? – Breastplate 4/10, 2018 at 3:36

I don't know why Postgres chose to hash buku in one case, and materialize buku in memory in the other case. The bottom line is that if you really wanted these queries to run fast, you'd index k999a in the buku table. – Frasquito 4/10, 2018 at 3:48

@TimBiegeleisen What difference make an index on buku? He is doing a full scan of that table anyway. – Nebulose 4/10, 2018 at 3:58

@JuanCarlosOropeza Who says we do a full scan on that table if there is a B-tree built around k999a? Are you saying Postgres would not take advantage of this index, if it existed? – Frasquito 4/10, 2018 at 3:59

Im saying on first select (select k999a from buku) you get a materialize table and you need the whole table to check NOT IN so in that case an index doesnt help. On the second one even when also a select looks like planner do a SEMI JOIN in that case the index would help – Nebulose 4/10, 2018 at 4:2

@Used_By_Already a dummy table for learning, wanna check the entire plan 1st – Sanguinary 4/10, 2018 at 12:15

@JuanCarlosOropeza just tried using index on buku.k999a, no difference on plan – Sanguinary 4/10, 2018 at 12:17

C

14

This is to be expected. You may get better performance using WHERE NOT EXISTS instead:

SELECT k999a
FROM transaksi
WHERE NOT EXISTS (
    SELECT 1 FROM buku WHERE buku.k999a = transaksi.k999a LIMIT 1
);

Here is a good explanation as to why for each of the methods: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/

Culet answered 4/10, 2018 at 3:34 Comment(4)

@TimBiegeleisen - I can tell you that is certainly not the case in PostgreSQL. EXISTS and IN are often the same, but not true for anti-joins – Culet 4/10, 2018 at 3:44

@TimBiegeleisen - it is possible only when subselect result should not be NULL. PostgreSQL cannot to ensure this behave, and then NOT IN cannot be translated to antijoin. Other databases maybe can ensure this behave or their implementation is not semantically correct. – Deltadeltaic 4/10, 2018 at 3:46

@Nicarus so basically, the problem is caused by the implementation of NOT IN by postgres? at least that's what I get from the link you gave – Sanguinary 4/10, 2018 at 12:27

@Culet Given that the article is very old, is this still true today? – Trussing 19/9, 2023 at 12:59

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.