How to replace multiple special characters in Postgres 9.5
Asked Answered
C

6

16

I have a table containing a list of name which might contain special character:

id   name
1    Johän
2    Jürgen
3    Janna
4    Üdyr
...

Is there a function that replaces each character for another specific one? (Not necessarily an unaccented one). Something like this:

SELECT id, function('ä,ü',name,'ae,ue');
Result:

    id   name
    1    Johaen
    2    Juergen
    3    Janna
    4    UEdyr
    ...
Colloid answered 27/7, 2016 at 16:57 Comment(3)
Possible duplicate of Multiple split and assign order_id CHECK the TRANSLATE part. How convert every char to <space>Hickson
@JuanCarlosOropeza this is little bit different task - you cannot use translate because it is designed for single characters only.Dariadarian
I had a similar problem and what I did was just use replace twice. Though I don’t know if that’s the best / efficient way to go about it. replace(replace(column,‘ä’,’ae’), ‘ü’,’ue’)Rumsey
D
11

No, there are no this function. Probably is not to hard to write optimized C extension what does it. But C language is not necessary always. You can try SQL or PLpgSQL function:

CREATE OR REPLACE FUNCTION xx(text, text[], text[])
RETURNS text AS $$
   SELECT string_agg(coalesce($3[array_position($2, c)],c),'')
      FROM regexp_split_to_table($1,'') g(c)
$$ LANGUAGE sql;

postgres=# select xx('Jürgen', ARRAY['ä','ü'], ARRAY['ae','ue']);
┌─────────┐
│   xx    │
╞═════════╡
│ Juergen │
└─────────┘
(1 row)

On my comp it does 6000 transformation under 200ms (but I have developer build of PostgreSQL - it is slower).

Dariadarian answered 27/7, 2016 at 17:39 Comment(6)
I wondering what you mean developer build, you mean production build will be faster? If that is the case can you point me to what check to improve the build speed.Hickson
just in case: array_position is 9.5+ not available 9.4 But yes, is fast. 12000 random name. With ARRAY['a','e'], ARRAY['ae','ue'] took 300 msHickson
The question's subject is related to PostgreSQL 9.5, so I used 9.5 function. I have developer build with enabled assertions - there are much more memory checks, etc. When you use Postgres from rpm, deb, ... then asserts should be disabled. The check: show debug_assertions;Dariadarian
Good job. Note however that the function works well if both encoding and client_encoding are set to UTF8.Warfourd
@PavelStehule Is my fault didnt saw the 9.5 on the title.Hickson
Well. you can define array_position for earlier versions than 9.5. Here it is: create or replace function array_pos(arr anyarray, arg_x anyelement) returns integer language sql immutable strict as $function$ with t as ( select x, row_number() over () as pos from (select unnest (arr)) t(x) ) select pos::integer from t where x = arg_x; $function$;Sausa
B
22

replace()

If you want just to replace one or few characters you can use function replace(string text, from text, to text) that replaces all occurrences in string substring. The replace function can be used to replace one character to several characters.

translate()

If you want to translate some letters to other letters you can user function translate(string text, from text, to text) that replaces any character in a string that matches a character in the from by the corresponding character in the to set.

Some data to play with:

drop table if exists xyz;

create table xyz (
    id serial not null,
    name varchar(30)
);

insert into xyz (name) values
    ('Juhänäo'),
    ('Jürgüen'),
    ('Dannäu'),
    ('Übüdyr');

Example of replace function:

select replace(name, 'ä', 'a') from xyz;

This function replaces letter ä in the name column with letter a. Juhänäo becomes Juhanao.

select replace(name, 'ä', 'ae') from xyz;

Now it replaces letter ä with ae.

select replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue') from xyz;

Not very nice, but in the example all ä become ae, ü become ue, and Ü become 'Ue'.

update xyz set name = replace(replace(replace(name, 'ä', 'ae'), 'ü', 'ue'), 'Ü', 'Ue');

Changes letters and updates rows. The result of the update is following:

Juhaenaeo
Juergueen
Dannaeu
Uebuedyr

Example of translate function:

select translate(name, 'ä,ü,Ü', 'a,u,U') from xyz;

Translates all letters ä to a, ü to u and Ü to U.

update xyz set name = translate(name, 'ä,ü,Ü', 'a,u,U');

Updates table so all predefined letters are translated and the change is saved to the database. The result of the update is following:

Juhanao
Jurguen
Dannau
Ubudyr

More information:

Replace characters with multi-character strings

Postgresql string functions

Bosson answered 27/7, 2016 at 18:58 Comment(1)
The commas in translate are superfluous; the correct syntax according to the docs would be translate(name, 'äüÜ', 'auU');.Fax
D
11

No, there are no this function. Probably is not to hard to write optimized C extension what does it. But C language is not necessary always. You can try SQL or PLpgSQL function:

CREATE OR REPLACE FUNCTION xx(text, text[], text[])
RETURNS text AS $$
   SELECT string_agg(coalesce($3[array_position($2, c)],c),'')
      FROM regexp_split_to_table($1,'') g(c)
$$ LANGUAGE sql;

postgres=# select xx('Jürgen', ARRAY['ä','ü'], ARRAY['ae','ue']);
┌─────────┐
│   xx    │
╞═════════╡
│ Juergen │
└─────────┘
(1 row)

On my comp it does 6000 transformation under 200ms (but I have developer build of PostgreSQL - it is slower).

Dariadarian answered 27/7, 2016 at 17:39 Comment(6)
I wondering what you mean developer build, you mean production build will be faster? If that is the case can you point me to what check to improve the build speed.Hickson
just in case: array_position is 9.5+ not available 9.4 But yes, is fast. 12000 random name. With ARRAY['a','e'], ARRAY['ae','ue'] took 300 msHickson
The question's subject is related to PostgreSQL 9.5, so I used 9.5 function. I have developer build with enabled assertions - there are much more memory checks, etc. When you use Postgres from rpm, deb, ... then asserts should be disabled. The check: show debug_assertions;Dariadarian
Good job. Note however that the function works well if both encoding and client_encoding are set to UTF8.Warfourd
@PavelStehule Is my fault didnt saw the 9.5 on the title.Hickson
Well. you can define array_position for earlier versions than 9.5. Here it is: create or replace function array_pos(arr anyarray, arg_x anyelement) returns integer language sql immutable strict as $function$ with t as ( select x, row_number() over () as pos from (select unnest (arr)) t(x) ) select pos::integer from t where x = arg_x; $function$;Sausa
F
1

If you are after German letters, then this works:

CREATE OR REPLACE FUNCTION public.udf_transliterate_german(
    german_word character varying)
    RETURNS character varying
    LANGUAGE 'sql'
    COST 100
    IMMUTABLE PARALLEL UNSAFE
AS $BODY$
SELECT REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(german_word,
                                'ä','ae'),
                            'ö','oe' ),
                        'ü','ue'),
                    'ß','ss'),
                'Ä', 'AE'),
            'Ö', 'OE'),
        'Ü', 'UE'),
    'ẞ', 'SS');
$BODY$;

It is not elegant though.

Francescafrancesco answered 19/7, 2021 at 14:4 Comment(0)
C
1

Avoid writing your own unaccent function. Instead, I recommend using unaccent extension.

create extension unaccent;
select unaccent('Juhänäo');
select unaccent(name) from xyz;

Here is a link to Postgres documentation for Unaccent

Castalia answered 19/10, 2021 at 12:14 Comment(0)
F
1

you simply use this :

select  translate(column, 'âàãáéêèíóôõüúç', 'aaaaeeeiooouuc') from table

Good luck ^^

Formic answered 10/11, 2022 at 20:8 Comment(0)
M
0

This is a generic recursive function written in python (but can easily be replicated in whatever language you prefer) That takes in the original string and a list of substrings to be removed:

def replace_psql_string(str_to_remove, query):

    ss = str_to_remove.split(',')
    if ss[0] != '':
        query = "REGEXP_REPLACE('{0}', '{1}', '', 'gi')".format(query, ss[0])
        return self.replace_psql_string(','.join(ss[1:]), query)
    else:
        return query

# Run it
replace_psql_string("test,foo", "input test string")

The code splits the comma separated string to a list of substrings to remove, Creates a REGEXP_REPLACE function in sql with the first substring. Then, recursively calls the function again with [1:] elements and append each REGEXP query to the passed parameter. So that finally, when no more substrings are in the list to remove, the final aggregated query is returned.

This will output:

REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi')

Then use it in whatever part of your query that makes sense. Can be tested by appending a SELECT in front:

SELECT REGEXP_REPLACE(REGEXP_REPLACE('input test string, 'test', '', 'gi'), 'foo', '', 'gi');


regexp_replace 
----------------
 input  string
(1 row)
Mitinger answered 11/12, 2022 at 11:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.