Update multiple columns that start with a specific string
Asked Answered
A

2

2

I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.

I am trying to figure out if there is a way to do something to the affect of:

UPDATE table SET contact_info_address_* = null;

If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.

Alexine answered 2/5, 2013 at 17:28 Comment(1)
You'll need to read metadata. Check tables pg_%.Covetous
S
1

There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.

CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN 
    EXECUTE (select 'UPDATE table SET ' 
                  || array_to_string(array(
                              select column_name::text 
                              from information_schema.columns 
                              where table_name = 'table' 
                              and column_name::text like 'contact_info_address_%'
                     ),' = NULL,') 
                  || ' = NULL'); 
    RETURN true; 
 END; $$ LANGUAGE plpgsql;

-- run the function
SELECT reset_cols();

It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)

Sapowith answered 2/5, 2013 at 18:11 Comment(2)
I like the answer, if nothing better comes along i will mark it as accepted. (thanks for the info =) )Cortez
Careful, column names need to be escaped. This allows SQL injection.Husted
H
2

You need dynamic SQL for this. So you must defend against possible SQL injection.

Basic query

The basic query to generate the DML command needed can look like this:

SELECT format('UPDATE tbl SET (%s) = (%s)'
               ,string_agg (quote_ident(attname), ', ')
               ,string_agg ('NULL', ', ')
             )
FROM   pg_attribute
WHERE  attrelid = 'tbl'::regclass
AND    NOT attisdropped 
AND    attnum > 0
AND    attname ~~ 'foo_%';

Returns:

UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);

Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.

I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:

quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.

string_agg() requires 9.0+.

Full automation with PL/pgSQL function

CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
                                       , OUT row_ct int, OUT col_ct bigint)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text;
BEGIN
   SELECT INTO _sql, col_ct
          format('UPDATE tbl SET (%s) = (%s)'
               , string_agg (quote_ident(attname), ', ')
               , string_agg ('NULL', ', ')
                )
        , count(*)
   FROM   pg_attribute
   WHERE  attrelid = _tbl
   AND    NOT attisdropped            -- no dropped columns
   AND    attnum > 0                  -- no system columns
   AND    attname LIKE _col_pattern;  -- only columns matching pattern

   -- RAISE NOTICE '%', _sql;         -- output SQL for debugging
   EXECUTE _sql;

   GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;

COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';

Call:

SELECT * FROM f_update_cols('myschema.tbl', 'foo%');

To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.

I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.

The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:

db<>fiddle here
Old sqlfiddle

Husted answered 2/5, 2013 at 23:0 Comment(0)
S
1

There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.

CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN 
    EXECUTE (select 'UPDATE table SET ' 
                  || array_to_string(array(
                              select column_name::text 
                              from information_schema.columns 
                              where table_name = 'table' 
                              and column_name::text like 'contact_info_address_%'
                     ),' = NULL,') 
                  || ' = NULL'); 
    RETURN true; 
 END; $$ LANGUAGE plpgsql;

-- run the function
SELECT reset_cols();

It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)

Sapowith answered 2/5, 2013 at 18:11 Comment(2)
I like the answer, if nothing better comes along i will mark it as accepted. (thanks for the info =) )Cortez
Careful, column names need to be escaped. This allows SQL injection.Husted

© 2022 - 2024 — McMap. All rights reserved.