Check for uniqueness of column in postgres table
Asked Answered
S

3

7

I need to ensure that the values in a column from a table are unique as part of a larger process.

I'm aware of the UNIQUE constraint, but I'm wondering if there is a better way to do the check.

I'm running the queries using psycopg2 so adding that tag on the off chance there's something in there that can help with this.

If the column is unique I can add a constraint. If the column is not unique adding the constraint will return an error.

If there is already a constraint of the same name a useful error is returned. in this case would prefer to just check for the existing constraint.

If the column is the primary key, the unique constraint can be added without error but in this case it would be preferable to just recognize that the column must be unique based on the primary key.

Code examples of this below.

DROP TABLE IF EXISTS unique_test;
CREATE TABLE unique_test (
pkey INT PRIMARY KEY,
unique_yes CHAR(1),
unique_no CHAR(1)
);

INSERT INTO unique_test (pkey, unique_yes, unique_no)
VALUES(1, 'a', 'a'),
        (2, 'b', 'a');
    
CREATE UNIQUE INDEX CONCURRENTLY u_test_1  ON unique_test (unique_yes);

ALTER TABLE unique_test
ADD CONSTRAINT unique_target_1
UNIQUE USING INDEX u_test_1;

-- the above runs no problem

-- check what happens when column is not unique

CREATE UNIQUE INDEX CONCURRENTLY u_test_2  ON unique_test (unique_no);

ALTER TABLE unique_test
ADD CONSTRAINT unique_target_2
UNIQUE USING INDEX u_test_2;

-- returns:
-- SQL Error [23505]: ERROR: could not create unique index "u_test_2"
--  Detail: Key (unique_no)=(a) is duplicated.


CREATE UNIQUE INDEX CONCURRENTLY u_test_1  ON unique_test (unique_yes);

ALTER TABLE unique_test
ADD CONSTRAINT unique_target_1
UNIQUE USING INDEX u_test_1;

-- returns
-- SQL Error [42P07]: ERROR: relation "unique_target_1" already exists


-- test what happens if adding constrint to primary key column

CREATE UNIQUE INDEX CONCURRENTLY u_test_pkey  ON unique_test (pkey);

ALTER TABLE unique_test
ADD CONSTRAINT unique_target_pkey
UNIQUE USING INDEX u_test_pkey;

-- this runs no problem but is inefficient.
Selfconsistent answered 12/8, 2020 at 13:36 Comment(1)
I am a little unclear on which column you are referring to and what results you want.Sanctified
C
12

If all you want to do is verify that values are unique, then use a query:

select unique_no, count(*)
  from unique_test
 group by unique_no
having count(*) > 1;

If it needs to be boolean output:

select not exists (
  select unique_no, count(*)
    from unique_test
   group by unique_no
  having count(*) > 1
);
Copacetic answered 12/8, 2020 at 13:43 Comment(3)
obviously, thank you, just getting to fancy for my own good.Selfconsistent
If your process makes enough of these checks to slow it down unacceptably, then you can look into the information_schema.constraint_column_usage and information_schema.table_constraints to bypass such queries.Copacetic
Quick question; Since you used "not exists," a "True" output would mean, that there are no duplicates, right?Peroration
S
6

If you just want a flag, you can use:

select count(*) <> count(distinct uniq_no) as duplicate_flag
from unique_test;
Sanctified answered 12/8, 2020 at 13:49 Comment(0)
T
1
DELETE FROM
zoo x
USING zoo y
WHERE
x.animal_id < y.animal_id
AND x.animal = y.animal;

I think this is simpler, https://kb.objectrocket.com/postgresql/delete-duplicate-rows-in-postgresql-762 for reference

Teryl answered 8/7, 2022 at 18:25 Comment(1)
that's a helpful query but I can't necessarily just discard a duplicate row as it's potentially a problem with an intended primary key rather than a true duplicate.Selfconsistent

© 2022 - 2024 — McMap. All rights reserved.