Combine two columns and add into one new column
Asked Answered
M

3

44

In PostgreSQL, I want to use an SQL statement to combine two columns and create a new column from them.

I'm thinking about using concat(...), but is there a better way?
What's the best way to do this?

Mojica answered 7/9, 2012 at 2:36 Comment(0)
A
104

Generally, I agree with @kgrittn's advice. Go for it.

But to address your basic question about concat(): it is useful if you need to deal with null values - and null has neither been ruled out in your question nor in the one you refer to.

If you can rule out null values, the good old (SQL standard) concatenation operator || is still the best choice, and @luis' answer is just fine:

SELECT col_a || col_b;

If either of your columns can be null, the result would be null in that case. You could defend with COALESCE:

SELECT COALESCE(col_a, '') || COALESCE(col_b, '');

But that gets tedious quickly with more arguments. That's where concat() comes in, which never returns null, not even if all arguments are null. The manual:

NULL arguments are ignored.

SELECT concat(col_a, col_b);

The remaining corner case for both alternatives is where all input columns are null in which case we still get an empty string ''. To get null instead:

SELECT CASE
          WHEN col_a IS NULL THEN col_b
          WHEN col_b IS NULL THEN col_a
          ELSE col_a || col_b
       END;

This gets more complex with more columns quickly. Again, use concat(), but add a check for the special condition:

SELECT CASE WHEN (col_a, col_b) IS NULL THEN NULL
            ELSE concat(col_a, col_b) END;

How does this work?
(col_a, col_b) is shorthand for ROW (col_a, col_b). And a row type is only null if all columns are null. Detailed explanation:

Also, use concat_ws() to add separators between elements (ws for "with separator").


An expression like the one in Kevin's answer:

SELECT $1.zipcode || ' - ' || $1.city || ', ' || $1.state;

is tedious to prepare for null values in PostgreSQL 8.3 (without concat()). One way (of many):

SELECT COALESCE(
         CASE
            WHEN $1.zipcode IS NULL THEN $1.city
            WHEN $1.city    IS NULL THEN $1.zipcode
            ELSE $1.zipcode || ' - ' || $1.city
         END, '')
       || COALESCE(', ' || $1.state, '');

Function volatility is only STABLE

concat() and concat_ws() are STABLE functions, not IMMUTABLE because they can invoke datatype output functions (like timestamptz_out) that depend on locale settings.
Explanation by Tom Lane.

This prohibits their direct use in index expressions. If you know that the result is actually immutable in your case, you can work around this with an IMMUTABLE function wrapper. Example here:

Acculturize answered 7/9, 2012 at 14:45 Comment(3)
Nice! I had not picked up on the different NULL-handling of the concat() function versus the SQL-standard || concatenation operator. (You notice I dodged the issue in my answer by using NOT NULL columns.) I'm now armed with one more tool in my toolbox.Neoprene
I hope I could choose multiple accepted answers. Thanks for the full explanation. All answers are appreciated!Mojica
wow, very nice explanation! extra thanks to ROW (col_a, col_b), I've never heard of it, and it is very handy for other uses tooThus
N
19

You don't need to store the column to reference it that way. Try this:

To set up:

CREATE TABLE tbl
  (zipcode text NOT NULL, city text NOT NULL, state text NOT NULL);
INSERT INTO tbl VALUES ('10954', 'Nanuet', 'NY');

We can see we have "the right stuff":

\pset border 2
SELECT * FROM tbl;
+---------+--------+-------+
| zipcode |  city  | state |
+---------+--------+-------+
| 10954   | Nanuet | NY    |
+---------+--------+-------+

Now add a function with the desired "column name" which takes the record type of the table as its only parameter:

CREATE FUNCTION combined(rec tbl)
  RETURNS text
  LANGUAGE SQL
AS $$
  SELECT $1.zipcode || ' - ' || $1.city || ', ' || $1.state;
$$;

This creates a function which can be used as if it were a column of the table, as long as the table name or alias is specified, like this:

SELECT *, tbl.combined FROM tbl;

Which displays like this:

+---------+--------+-------+--------------------+
| zipcode |  city  | state |      combined      |
+---------+--------+-------+--------------------+
| 10954   | Nanuet | NY    | 10954 - Nanuet, NY |
+---------+--------+-------+--------------------+

This works because PostgreSQL checks first for an actual column, but if one is not found, and the identifier is qualified with a relation name or alias, it looks for a function like the above, and runs it with the row as its argument, returning the result as if it were a column. You can even index on such a "generated column" if you want to do so.

Because you're not using extra space in each row for the duplicated data, or firing triggers on all inserts and updates, this can often be faster than the alternatives.

Neoprene answered 7/9, 2012 at 3:23 Comment(2)
"which takes the record type of the table as its only parameter" : how do we find that after creating a table?Libertarian
The record type for the table will generally have the same name as the table. See the example -- specifically that there is CREATE TABLE tbl and CREATE FUNCTION combined(rec tbl). For the definitive answer, find the reltype column of the pg_class row for the table, and use that to find the matching pg_type row.Neoprene
P
16

Did you check the string concatenation function? Something like:

update table_c set column_a = column_b || column_c 

should work. More here

Pioneer answered 7/9, 2012 at 2:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.