Fastest check if row exists in PostgreSQL

Asked 19/9, 2011 at 13:22 Answered 20/8, 2019 at 2:3

300

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.

So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.

But since I'm fairly new to PostgreSQL I'd rather ask people who know.

My batch contains rows with following structure:

userid | rightid | remaining_count

So if table contains any rows with provided userid it means they all are present there.

Regular answered 19/9, 2011 at 13:22 Comment(8)

You want to see if the table has ANY rows, or any rows from your batch? – Pounce 19/9, 2011 at 13:24

any rows from my batch yes. they all share same field ill edit a little. – Regular 19/9, 2011 at 13:25

Please clarify your question. You want to add a batch of records, all or nothing? Is there something special about count ? (BTW a reserved word, impractical as a column name) – Atabrine 19/9, 2011 at 14:8

okay, I was trying to simplify actual situation a little but we are getting closer and closer to real implementation. Once those rows are inserted (theres another field for_date) I begin decrementing rights for specified user as they use specific rights, once rights become 0 they cannot perform those actions anymore for that date. thats the real story – Regular 19/9, 2011 at 14:11

Just show (the relevant part of) the table definitions, and tell what you intend to do. – Atabrine 19/9, 2011 at 14:15

[userid, rightid, for_date, remainingCount] thats the real table. I intend to insert into this table when user attempts to use any right. I could have created a speciall process that is responsible for inserting rows daily but currently I am checking whether another approach will suit me performance wise. – Regular 19/9, 2011 at 14:20

remainingCount with a capital "C"? I'd advice to stick to lower case (remaining_count), or you'll always have to quote this identifier: "remainingCount". Also there is no such thing as "postgre". It's "PostgreSQL" or "postgres". And lastly, it's still unclear whether all rows to be inserted at a time share the same userid. – Basel 19/9, 2011 at 19:41

oh well. I copy pasted that from parameter list in C# code don't worry Erwin. I thought it should be clear, but to specify that more, batch contains rows which do share userid, thats why, checking that single row with specified userid makes sense first of all.. – Regular 19/9, 2011 at 19:56

582

Use the EXISTS keyword for TRUE / FALSE return:

SELECT EXISTS(SELECT 1 FROM contact WHERE id=12)

Circuitry answered 9/5, 2013 at 17:30 Comment(11)

Extension on this, you can name the returned column for easy reference. Eg select exists(select 1 from contact where id=12) AS "exists" – Tourer 15/9, 2013 at 23:16

This is better, because it will always return a value (true or false) instead of sometimes None (depending on your programing language) which might not expand the way you expect. – Michell 10/6, 2014 at 23:18

I have Seq Scan with using this method. I do something wrong? – Denigrate 6/7, 2016 at 7:42

@Denigrate :: This might answer your question: #5204255 – Circuitry 6/7, 2016 at 17:14

@Michael.M I have DB table with 30 millions rows and when I use exists or limit 1 I have strong performance drop because Postgres uses Seq Scan instead of Index Scan. And analyze doesn't help. – Denigrate 7/7, 2016 at 10:41

Would limit 1 in subquery here help or slowen the query? – Dalpe 26/9, 2018 at 7:27

@Dalpe please understand that ‘id’ is a primary key, so “LIMIT 1” would be pointless since there is only one record with that id – Circuitry 26/9, 2018 at 18:45

Is this better than SELECT id FROM table WHERE something=%s – Macedonian 27/11, 2019 at 23:31

@CodeGuru, Depends. A SERIAL is 4 times bigger than a BYTE. For every 1MM positives you will marshal 3MB more data than just using exists, and you just end up throwing that data away. 7MB more for BIGSERIAL. Also, it may be more efficient, depending on your calling code, to just handle the bool in the response rather than evaluating if there even is a record or not. Without knowing specifics of the OP's case, its impossible to tell. In any case, the code is easier to read if we know we always get a bool. – Circuitry 29/11, 2019 at 23:24

@MikeM Thank you for explaining and yes I tested and went with your recommendation. – Macedonian 30/11, 2019 at 4:4

@Dalpe It helps if you’re not comparing a UNIQUE (including PRIMARY KEY) column. Otherwise, pointless. – Swimmingly 27/4, 2021 at 6:55

How about simply:

select 1 from tbl where userid = 123 limit 1;

where 123 is the userid of the batch that you're about to insert.

The above query will return either an empty set or a single row, depending on whether there are records with the given userid.

If this turns out to be too slow, you could look into creating an index on tbl.userid.

if even a single row from batch exists in table, in that case I don't have to insert my rows because I know for sure they all were inserted.

For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).

Debbiedebbra answered 19/9, 2011 at 13:38 Comment(3)

It might be sometimes be programatically easier to "select count(*) from (select 1 ... limit 1)" as it's guaranteed to always return a row with a value of count(*) of 0 or 1. – Kapp 9/5, 2013 at 18:33

@DavidAldridge count(*) still means that all the rows have to be read, whereas limit 1 stops at the first record and returns – Duwalt 12/12, 2013 at 13:36

@Duwalt I think you've misinterpreted the query. The COUNT acts on a nested SELECT that has at most 1 row (because the LIMIT is in the subquery). – Catharinecatharsis 17/1, 2014 at 11:37

INSERT INTO target( userid, rightid, count )
  SELECT userid, rightid, count 
  FROM batch
  WHERE NOT EXISTS (
    SELECT * FROM target t2, batch b2
    WHERE t2.userid = b2.userid
    -- ... other keyfields ...
    )       
    ;

BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)

INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count 
FROM batch
    ;

will do exactly what you want: either it succeeds, or it fails.

Atabrine answered 19/9, 2011 at 13:40 Comment(5)

This will check each row. He wants to do a single check. – Pounce 19/9, 2011 at 13:42

No, it does a single check. The subquery is uncorrelated. It will bail out once one matching pair is found. – Atabrine 19/9, 2011 at 13:45

Right you are, I thought it referred to the outer query. +1 to you – Pounce 19/9, 2011 at 13:47

BTW: since the query is inside a transaction, nothing will happen if a duplicate id were to be inserted, hence the subquery can be omitted. – Atabrine 19/9, 2011 at 13:56

hmm I am not sure I understand. After rights are inserted, I begin to decrement count column. (just some details for picture) If rows already exist and subquery is omitted I think ill get errors with duplicate unique key thrown or? (userid&right form that unique key) – Regular 19/9, 2011 at 14:0

select true from tablename where condition limit 1;

I believe that this is the query that postgres uses for checking foreign keys.

In your case, you could do this in one go too:

insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);

Neediness answered 19/9, 2011 at 13:38 Comment(0)

If you think about the performace ,may be you can use "PERFORM" in a function just like this:

 PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
  IF FOUND THEN
      RAISE NOTICE ' found record id=%', i;  
  ELSE
      RAISE NOTICE ' not found record id=%', i;  
 END IF;

Armalla answered 20/9, 2011 at 3:25 Comment(2)

doesn't work with me : I get a syntax error near perform – Carree 5/2, 2015 at 3:10

that's pl/pgsql, not SQL, hence the syntax error for "PERFORM" if trying to run it as SQL – Canthus 29/11, 2016 at 18:28

SELECT 1 FROM user_right where userid = ? LIMIT 1

If your resultset contains a row then you do not have to insert. Otherwise insert your records.

Stelly answered 19/9, 2011 at 13:37 Comment(2)

if bunch contains 100 rows it will return me 100 rows, you think thats good? – Regular 19/9, 2011 at 13:39

You can limit it to 1 row. Should perform better. Have a look at edited answer from @aix for that. – Stelly 19/9, 2011 at 13:44

as @MikeM pointed out.

select exists(select 1 from contact where id=12)

with index on contact, it can usually reduce time cost to 1 ms.

CREATE INDEX index_contact on contact(id);

Broadwater answered 20/8, 2019 at 2:3 Comment(1)

Cost of 1ms is huge - can only do 1000 such checks per second. Something around 10M checks per second should be aimed. – Recidivism 4/11, 2021 at 3:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags