Fastest check if row exists in PostgreSQL
Asked Answered
R

7

300

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.

So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.

But since I'm fairly new to PostgreSQL I'd rather ask people who know.

My batch contains rows with following structure:

userid | rightid | remaining_count

So if table contains any rows with provided userid it means they all are present there.

Regular answered 19/9, 2011 at 13:22 Comment(8)
You want to see if the table has ANY rows, or any rows from your batch?Pounce
any rows from my batch yes. they all share same field ill edit a little.Regular
Please clarify your question. You want to add a batch of records, all or nothing? Is there something special about count ? (BTW a reserved word, impractical as a column name)Atabrine
okay, I was trying to simplify actual situation a little but we are getting closer and closer to real implementation. Once those rows are inserted (theres another field for_date) I begin decrementing rights for specified user as they use specific rights, once rights become 0 they cannot perform those actions anymore for that date. thats the real storyRegular
Just show (the relevant part of) the table definitions, and tell what you intend to do.Atabrine
[userid, rightid, for_date, remainingCount] thats the real table. I intend to insert into this table when user attempts to use any right. I could have created a speciall process that is responsible for inserting rows daily but currently I am checking whether another approach will suit me performance wise.Regular
remainingCount with a capital "C"? I'd advice to stick to lower case (remaining_count), or you'll always have to quote this identifier: "remainingCount". Also there is no such thing as "postgre". It's "PostgreSQL" or "postgres". And lastly, it's still unclear whether all rows to be inserted at a time share the same userid.Basel
oh well. I copy pasted that from parameter list in C# code don't worry Erwin. I thought it should be clear, but to specify that more, batch contains rows which do share userid, thats why, checking that single row with specified userid makes sense first of all..Regular
C
582

Use the EXISTS keyword for TRUE / FALSE return:

SELECT EXISTS(SELECT 1 FROM contact WHERE id=12)
Circuitry answered 9/5, 2013 at 17:30 Comment(11)
Extension on this, you can name the returned column for easy reference. Eg select exists(select 1 from contact where id=12) AS "exists"Tourer
This is better, because it will always return a value (true or false) instead of sometimes None (depending on your programing language) which might not expand the way you expect.Michell
I have Seq Scan with using this method. I do something wrong?Denigrate
@Denigrate :: This might answer your question: #5204255Circuitry
@Michael.M I have DB table with 30 millions rows and when I use exists or limit 1 I have strong performance drop because Postgres uses Seq Scan instead of Index Scan. And analyze doesn't help.Denigrate
Would limit 1 in subquery here help or slowen the query?Dalpe
@Dalpe please understand that ‘id’ is a primary key, so “LIMIT 1” would be pointless since there is only one record with that idCircuitry
Is this better than SELECT id FROM table WHERE something=%sMacedonian
@CodeGuru, Depends. A SERIAL is 4 times bigger than a BYTE. For every 1MM positives you will marshal 3MB more data than just using exists, and you just end up throwing that data away. 7MB more for BIGSERIAL. Also, it may be more efficient, depending on your calling code, to just handle the bool in the response rather than evaluating if there even is a record or not. Without knowing specifics of the OP's case, its impossible to tell. In any case, the code is easier to read if we know we always get a bool.Circuitry
@MikeM Thank you for explaining and yes I tested and went with your recommendation.Macedonian
@Dalpe It helps if you’re not comparing a UNIQUE (including PRIMARY KEY) column. Otherwise, pointless.Swimmingly
D
47

How about simply:

select 1 from tbl where userid = 123 limit 1;

where 123 is the userid of the batch that you're about to insert.

The above query will return either an empty set or a single row, depending on whether there are records with the given userid.

If this turns out to be too slow, you could look into creating an index on tbl.userid.

if even a single row from batch exists in table, in that case I don't have to insert my rows because I know for sure they all were inserted.

For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).

Debbiedebbra answered 19/9, 2011 at 13:38 Comment(3)
It might be sometimes be programatically easier to "select count(*) from (select 1 ... limit 1)" as it's guaranteed to always return a row with a value of count(*) of 0 or 1.Kapp
@DavidAldridge count(*) still means that all the rows have to be read, whereas limit 1 stops at the first record and returnsDuwalt
@Duwalt I think you've misinterpreted the query. The COUNT acts on a nested SELECT that has at most 1 row (because the LIMIT is in the subquery).Catharinecatharsis
A
11
INSERT INTO target( userid, rightid, count )
  SELECT userid, rightid, count 
  FROM batch
  WHERE NOT EXISTS (
    SELECT * FROM target t2, batch b2
    WHERE t2.userid = b2.userid
    -- ... other keyfields ...
    )       
    ;

BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)

INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count 
FROM batch
    ;

will do exactly what you want: either it succeeds, or it fails.

Atabrine answered 19/9, 2011 at 13:40 Comment(5)
This will check each row. He wants to do a single check.Pounce
No, it does a single check. The subquery is uncorrelated. It will bail out once one matching pair is found.Atabrine
Right you are, I thought it referred to the outer query. +1 to youPounce
BTW: since the query is inside a transaction, nothing will happen if a duplicate id were to be inserted, hence the subquery can be omitted.Atabrine
hmm I am not sure I understand. After rights are inserted, I begin to decrement count column. (just some details for picture) If rows already exist and subquery is omitted I think ill get errors with duplicate unique key thrown or? (userid&right form that unique key)Regular
N
5
select true from tablename where condition limit 1;

I believe that this is the query that postgres uses for checking foreign keys.

In your case, you could do this in one go too:

insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);
Neediness answered 19/9, 2011 at 13:38 Comment(0)
A
5

If you think about the performace ,may be you can use "PERFORM" in a function just like this:

 PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
  IF FOUND THEN
      RAISE NOTICE ' found record id=%', i;  
  ELSE
      RAISE NOTICE ' not found record id=%', i;  
 END IF;
Armalla answered 20/9, 2011 at 3:25 Comment(2)
doesn't work with me : I get a syntax error near performCarree
that's pl/pgsql, not SQL, hence the syntax error for "PERFORM" if trying to run it as SQLCanthus
S
4
SELECT 1 FROM user_right where userid = ? LIMIT 1

If your resultset contains a row then you do not have to insert. Otherwise insert your records.

Stelly answered 19/9, 2011 at 13:37 Comment(2)
if bunch contains 100 rows it will return me 100 rows, you think thats good?Regular
You can limit it to 1 row. Should perform better. Have a look at edited answer from @aix for that.Stelly
B
4

as @MikeM pointed out.

select exists(select 1 from contact where id=12)

with index on contact, it can usually reduce time cost to 1 ms.

CREATE INDEX index_contact on contact(id);
Broadwater answered 20/8, 2019 at 2:3 Comment(1)
Cost of 1ms is huge - can only do 1000 such checks per second. Something around 10M checks per second should be aimed.Recidivism

© 2022 - 2024 — McMap. All rights reserved.