I'm trying to create a validation procedure during a load that checks to make sure data isn't duplicated. Vertica doesn't support this natively:
Vertica checks for constraint violations when queries are run, not when data is loaded. To detect constraint violations as part of the load process, use a COPY (page 667) statement with the NO COMMIT option. By loading data without committing it, you can run a post-load check of your data using the ANALYZE_CONSTRAINTS function. If the function finds constraint violations, you can roll back the load because you have not committed it.
The problem is that I can't figure out how to do this programmatically. I suspect that I need a stored procedure, but I'm not familiar with the stored procedure syntax/limitations for vertica. Can you help? Here's what I have:
-- Create a new table. "id" is auto-incremented and "name" must be unique
CREATE TABLE IF NOT EXISTS my_table (
id IDENTITY
, name varchar(50) UNIQUE NOT NULL
, type varchar(20)
, description varchar(200)
);
--Insert a record
begin;
copy my_table from stdin
abort on error
NO COMMIT; -- this begins the load
name1|type1|description1 --this is the load
\. -- this closes the load
commit;
-- insert the duplicate record
begin;
copy my_table from stdin
abort on error
NO COMMIT; -- this begins the load
name1|type1|description1 --this is the load
\. -- this closes the load
commit; -- Surprisingly, the load executes successfully! What's going on?!?!
-- Check constraints. We see that there is a failed constraints:
select analyze_constraints('my_table');
My thinking is to do some conditional logic. Psudo-code is below. Can you help me prepare it for Vertica?
Begin
load data
if (select count(*) from (select analyze_constraints('my_table')) sub) == 0:
commit
else rollback