Update with function called once for each row in Postgres 8.4

Asked 30/12, 2011 at 8:12 Answered 30/12, 2011 at 8:20

Solved sql database postgresql sql-update

I have the following UPDATE statement:

update mytable
   set a = first_part(genid()),
       b = second_part(genid()),
       c = third_path(genid())
 where package_id = 10;

In this example the function genid() is called three times for each row, which is wrong - I want it to be called only once for each row of mytable.

I'm using PostgreSQL 8.4 database. How to write the correct update?

I've tried something like this:

update mytable
   set a = first_part(g),
       b = second_part(g),
       c = third_path(g)
 where package_id = 10
  from genid() as g;

But it didn't work, because genid() has been called only once for the whole update statement.

Specie answered 30/12, 2011 at 8:12 Comment(6)

Can't you put that in a stored procedure and first call genid(), store it in a variable and then use it in your update statement? – Repertory 30/12, 2011 at 8:17

you simplified too much the example, I ques you have gen_id(column) – Loafer 30/12, 2011 at 8:42

@Florin No, genid takes no parameters, it generates a unique value on each call, based on an arbitrary sequence and random generator. That's why I want it to be called once for the each row of the mytable. – Specie 30/12, 2011 at 9:59

Looks like there is already a working solution in the answers, but is defining an additional column for the full genid + a row-level trigger that updates a, b and c on insert/update of that column an option? – Slate 30/12, 2011 at 11:45

Additional column is not an option, we have about 2000 such tables, and the schema is rather constant. This column would be too much overhead for millions of records. – Specie 30/12, 2011 at 12:55

@Specie can you try with a cursor? as I suggested in my updated answer – Loafer 30/12, 2011 at 14:11

Have you tried Postgres' non-standard UPDATE .. FROM clause? I imagine, this would work

update mytable
   set a = first_part(gen.id),
       b = second_part(gen.id),
       c = third_path(gen.id)
  from (
          select genid() as genid, id
          from mytable 
          where package_id = 10
       ) gen
 where mytable.id = gen.id;
 --and package_id = 10 -- This predicate is no longer necessary as the subquery
                       -- already filters on package_id, as Erwin mentioned

Note that I'm forcing genid() to be called exactly once per record in mytable within the subselect. Then I'm self-joining mytable and gen using a hypothetical id column. See the documentation here:

http://www.postgresql.org/docs/current/interactive/sql-update.html

~~This seems to have been introduced with Postgres 9.0 only, though.~~ If that seems too complicated (i.e. not very readable), you can still resort to pgplsql as user Florin suggested here.

Quits answered 30/12, 2011 at 8:18 Comment(7)

This works great, thanks. I have some performance concerns, though, because of the hash joins this involves. – Specie 30/12, 2011 at 15:58

The second where package_id = 10 is redundant. The subquery already filters (as it should). If mytable.id is indexed (as it should be), the redundant check is of no use. – Repetitive 30/12, 2011 at 16:20

@Cezariusz: Have you analysed the execution plan? If you self-join on primary keys, the join should be fairly efficient. – Quits 30/12, 2011 at 16:44

@ErwinBrandstetter: You're right. I hadn't thought about it that way. I'll fix it – Quits 30/12, 2011 at 16:45

+1 Now I like your answer. :) You should probably link to the manual for version 8.4 (OP's version) or the current version. Preferably to the /interactive branch. Consider this. – Repetitive 30/12, 2011 at 16:57

@ErwinBrandstetter: Nice link, I wasn't aware of "/current" documentation sub-paths. They're usually not the ones found on google... :) btw: You can edit answers too, if you feel something should be improved (e.g. a link) – Quits 30/12, 2011 at 17:24

@LukasEder: When dealing with experienced users like you I only edit when I am 100% certain and rather default to a comment. The high reputation tells me you generally know what you are doing. – Repetitive 30/12, 2011 at 17:31

Can you check if this is working in postgres?

update 
  (select 
     a, b, c, genid() as val
  from mytable
  where package_id = 10
  )
set a = first_part(val),
       b = second_part(val),
       c = third_path(val);

Update: just for the ideea: Can you try it with a cursor?

DECLARE c1 CURSOR FOR 
      select 
         a, b, c, genid() as val
      from mytable
      where package_id = 10; 
...
UPDATE table SET ... WHERE CURRENT OF c1;

Loafer answered 30/12, 2011 at 8:19 Comment(1)

Doesn't work with any version of PostgreSQL, a.k.a. "wrong answer". – Repetitive 1/1, 2012 at 18:32

I would suggest to use following -

DECLARE @genID VARCHAR(100) -- or appropriate datatype

SET @genID = genid() -- Or select @genID = genid()    
update mytable    
   set a = first_part(@genID),
       b = second_part(@genID),
       c = third_path(@genID)
where package_id = 10;

Emeritaemeritus answered 30/12, 2011 at 8:20 Comment(4)

That's probably the path of least resistance, compared to my rather verbose solution... (if pgplsql is allowed in this case) – Quits 30/12, 2011 at 8:36

In this case genid() will be called only once for the whole table update, am I right? I need it to be called once for each row. – Specie 30/12, 2011 at 9:53

Yes that will be called only once for whole table update. And in your case I think User Lukas has suggested most appropriate way to achieve this. – Emeritaemeritus 30/12, 2011 at 10:32

That won't work with PostgreSQL (you cannot declare variables like that) – Trilbie 30/12, 2011 at 13:37

Recommended topics

Hot tags