unaccent() does not work with Greek letters in plpgsql dynamic query

Asked 15/4, 2018 at 17:54 Answered 16/4, 2018 at 3:32

Solved postgresql plpgsql dynamic-sql accent-insensitive unaccent

I use PostgreSQL 10 and I run CREATE EXTENSION unaccent; succesfully. I have a plgsql function that contains the following whereText := 'lower(unaccent(place.name)) LIKE lower(unaccent($1))';

later, according to what user chose, more clauses may be added to the whereText.

The whereText is finally used in the query:

placewithkeys := '%'||placename||'%';
RETURN QUERY EXECUTE format('SELECT id, name FROM '||fromText||' WHERE '||whereText)
     USING  placewithkeys , event, date;

The whereText := 'LOWER(unaccent(place.name)) LIKE LOWER(unaccent($1))'; does not work, even if I remove the LOWER part.

I do select __my_function('Τζι'); and I get nothing back, even though I should get back results, because in the database there is the name Τζίμα

If I remove the unaccent and leave the LOWER it works, but not for accents : τζ brings Τζίμα back as it should. It seems like the unaccent is causing a problem.

What am I missing? How can I fix this?

Since there were comments about the syntax and possible SQLi , I provide the whole function definition, now changed to work accent-insensitive and case-insensitive in Greek:

CREATE  FUNCTION __a_search_place
(placename text, eventtype integer, eventdate integer, eventcentury integer, constructiondate integer, constructioncentury integer, arstyle integer, artype integer)
RETURNS TABLE
(place_id bigint, place_name text, place_geom geometry) 
AS $$
DECLARE
selectText text;
fromText text;
whereText text;
usingText text; 
placewithkeys text;
BEGIN   
    fromText := '
    place
    JOIN cep ON place.id = cep.place_id
    JOIN event ON cep.event_id = event.id                     
    ';  
    whereText := 'unaccent(place.name) iLIKE  unaccent($1)';   
    placewithkeys := '%'||placename||'%';
    IF constructiondate IS NOT NULL OR constructioncentury IS NOT NULL OR arstyle IS NOT NULL OR artype IS NOT NULL THEN
        fromText := fromText || '
        JOIN construction ON cep.construction_id = construction.id
        JOIN construction_atype ON construction.id = construction_atype.construction_id
        JOIN construction_astyle ON construction.id = construction_astyle.construction_id
        JOIN atype ON atype.id = construction_atype.atype_id
        JOIN astyle ON astyle.id = construction_astyle.astyle_id  
        ';   
    END IF;    
    IF eventtype IS NOT NULL THEN
        whereText := whereText || 'AND event.type = $2 ';
    END IF;
    IF eventdate IS NOT NULL THEN
        whereText := whereText || 'AND event.date = $3 ';
    END IF;
    IF eventcentury IS NOT NULL THEN
        whereText := whereText || 'AND event.century = $4 ';
    END IF;    
    IF constructiondate IS NOT NULL THEN
        whereText := whereText || 'AND construction.date = $5 ';
    END IF;
    IF constructioncentury IS NOT NULL THEN
        whereText := whereText || 'AND construction.century = $6 ';
    END IF;
    IF arstyle IS NOT NULL THEN
        whereText := whereText || 'AND astyle.id = $7 ';
    END IF;
    IF artype IS NOT NULL THEN
        whereText := whereText || 'AND atype.id = $8 ';
    END IF;   
    whereText := whereText || '    
    GROUP BY place.id, place.geom, place.name
    ';    

    RETURN QUERY EXECUTE format('SELECT place.id, place.name, place.geom FROM '||fromText||' WHERE '||whereText)      
    USING  placewithkeys, eventtype, eventdate, eventcentury, constructiondate, constructioncentury, arstyle, artype ;

END;
$$
LANGUAGE plpgsql;

Phonation answered 15/4, 2018 at 17:54 Comment(8)

hi slevin: which language is that? Greek? since I clearly don't speak it, could you provide a list of words with accent and how they should look like without accent? With German it looks like it works fine select lower(unaccent('MÜNSTER')) like lower(unaccent('mÜNsTer')); – Manlove 15/4, 2018 at 18:59

attention: your code is sql injection vulnerable – Denazify 16/4, 2018 at 5:5

@JimJones Yes, this is Greek. In general all the vowels are having a simple mark that denotes tone/accent. So χαρτί, ήλιος, ξύλο, βάση have an accent , while χαρτι, ηλιος, ξυλο, βαση they dont. Notice a simple mark missing in ι, η, υ, α. How is this helping the solution tho? – Phonation 16/4, 2018 at 10:15

@PavelStehule Because of this part placewithkeys := '%'||placename||'%'; I assume ? Please elaborate. Thanks. – Phonation 16/4, 2018 at 10:17

I believe the issue is rather that unaccent does not support Greek. I asked for the examples so that I could test, in case it was another language. – Manlove 16/4, 2018 at 10:18

@PavelStehule: I don't think the code fragments are vulnerable to SQLi. I added some more to my answer. – Nankeen 16/4, 2018 at 14:9

@ErwinBrandstetter: hard to say - nobody knows what is under fromText, and how fromText was created. – Denazify 16/4, 2018 at 16:34

@PavelStehule Thanks for the remarks, check the whole code in my original question. – Phonation 17/4, 2018 at 10:54

Postgres 12

unaccent() now works for Greek letters, too. Diacritic signs are removed:

db<>fiddle here

Quoting the release notes:

Allow unaccent to remove accents from Greek characters (Tasos Maschalidis)

Postgres 11 or older

unaccent() does not yet work for Greek letters. The call:

SELECT unaccent('
ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ
ἐ ἑ ἒ ἓ ἔ ἕ         Ἐ Ἑ Ἒ Ἓ Ἔ Ἕ     
ἠ ἡ ἢ ἣ ἤ ἥ ἦ ἧ Ἠ Ἡ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ
ἰ ἱ ἲ ἳ ἴ ἵ ἶ ἷ Ἰ Ἱ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ
ὀ ὁ ὂ ὃ ὄ ὅ         Ὀ Ὁ Ὂ Ὃ Ὄ Ὅ     
ὐ ὑ ὒ ὓ ὔ ὕ ὖ ὗ     Ὑ   Ὓ   Ὕ   Ὗ
ὠ ὡ ὢ ὣ ὤ ὥ ὦ ὧ Ὠ Ὡ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ
ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ     
ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ
ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ
ᾠ ᾡ ᾢ ᾣ ᾤ ᾥ ᾦ ᾧ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ
ᾰ ᾱ ᾲ ᾳ ᾴ   ᾶ ᾷ Ᾰ Ᾱ Ὰ Ά ᾼ ᾽ ι ᾿
῀ ῁ ῂ ῃ ῄ   ῆ ῇ Ὲ Έ Ὴ Ή ῌ ῍ ῎ ῏
ῐ ῑ ῒ ΐ         ῖ ῗ Ῐ Ῑ Ὶ Ί     ῝ ῞ ῟
ῠ ῡ ῢ ΰ ῤ ῥ ῦ ῧ Ῠ Ῡ Ὺ Ύ Ῥ ῭ ΅ `
        ῲ ῳ ῴ   ῶ ῷ Ὸ Ό Ὼ Ώ ῼ ´ ῾ ');

... returns all letters unchanged, no diacritic signs removed as we would expect.
(I extracted this list from the Wikipedia page on Greek diacritics.)

db<>fiddle here

Looks like a shortcoming of the unaccent module. You can extend the default unaccent dictionary or create your own. There are instructions in the manual. I created several dictionaries in the past and it's simple. And you are not to first to need this:

Postgres unaccent rules for greek characters:

https://gist.github.com/jfragoulis/9914900

Unaccent rules plus greek characters for Postgres 9.6:

https://gist.github.com/marinoszak/7d5d6a8670faae0f4589c2da988f2ba3

You need write access to the file system of the server, though - the directory containing the unaccent files. So, not possible on most cloud services ...

Or you might report a bug and ask to include Greek diacritic signs.

Aside: Dyamic SQL and SQLi

The code fragments you presented are not vulnerable to SQL injection. $1 is concatenated as literal string and only resolved in the EXECUTE command later, where the value is safely passed with the USING clause. So, no unsafe concatenation there. I would do it like this, though:

RETURN QUERY EXECUTE format(
   $q$
   SELECT id, name
   FROM   place ... 
   WHERE  lower(unaccent(place.name)) LIKE '%' || lower(unaccent($1)) || '%'
   $q$
   )
USING  placename, event, date;

Notes:

Less confusing - your original even confused Pavel in the comments, a professional in the field.
Assignments in plpgsql are slightly expensive (more so than in other PL), so adopt a coding style with few assignments.
Concatenate the two % symbols for LIKE into the main query directly, giving the query planner the information that the pattern is not anchored to start or end, which may help a more efficient plan. Only the user input is (safely) passed as variable.
Since your WHERE clause references table place, The FROM clause needs to include this table anyway. So you cannot concatenate the FROM clause independently to begin with. Probably better to keep it all in a single format().
Use dollar-quoting so you don't have to escape single quotes additionally.
- Insert text with single quotes in PostgreSQL
- What are '$$' used for in PL/pgSQL
Maybe just use ILIKE instead of lower(...) LIKE lower(...). If you work with trigram indexes (like would seem best for this query): those work with ILIKE as well:
- LOWER LIKE vs iLIKE
I assume you are aware that you may need to escape characters with special meanings in LIKE pattern?
- How to escape string while matching pattern in PostgreSQL
- Escape function for regular expression or LIKE patterns

Audited function

After you provided your complete function ...

CREATE OR REPLACE FUNCTION __a_search_place(
        placename             text
      , eventtype             int = NULL
      , eventdate             int = NULL
      , eventcentury          int = NULL
      , constructiondate      int = NULL
      , constructioncentury   int = NULL
      , arstyle               int = NULL
      , artype                int = NULL)
  RETURNS TABLE(place_id bigint, place_name text, place_geom geometry) AS
$func$
BEGIN
   -- RAISE NOTICE '%', concat_ws(E'\n' -- to debug
   RETURN QUERY EXECUTE concat_ws(E'\n'
 ,'SELECT p.id, p.name, p.geom
   FROM   place p
   WHERE  unaccent(p.name) ILIKE (''%'' || unaccent($1) || ''%'')'  -- no $-quotes
              -- any input besides placename ($1)
, CASE WHEN NOT ($2,$3,$4,$5,$6,$7,$8) IS NULL THEN
  'AND    EXISTS (
      SELECT
      FROM   cep
      JOIN   event e ON e.id = cep.event_id' END
               -- constructiondate, constructioncentury, arstyle, artype
 , CASE WHEN NOT ($5,$6,$7,$8) IS NULL THEN

     'JOIN   construction    con ON cep.construction_id = con.id
      JOIN   construction_atype  ON con.id = construction_atype.construction_id
      JOIN   construction_astyle ON con.id = construction_astyle.construction_id' END
              -- arstyle, artype
, CASE WHEN NOT ($7,$8) IS NULL THEN
     'JOIN   atype               ON atype.id = construction_atype.atype_id
      JOIN   astyle              ON astyle.id = construction_astyle.astyle_id' END
 , CASE WHEN NOT ($2,$3,$4,$5,$6,$7,$8) IS NULL THEN
     'WHERE  cep.place_id = p.id' END
 , CASE WHEN eventtype           IS NOT NULL THEN 'AND e.type = $2'      END
 , CASE WHEN eventdate           IS NOT NULL THEN 'AND e.date = $3'      END
 , CASE WHEN eventcentury        IS NOT NULL THEN 'AND e.century = $4'   END
 , CASE WHEN constructiondate    IS NOT NULL THEN 'AND con.date = $5'    END
 , CASE WHEN constructioncentury IS NOT NULL THEN 'AND con.century = $6' END
 , CASE WHEN arstyle             IS NOT NULL THEN 'AND astyle.id = $7'   END
 , CASE WHEN artype              IS NOT NULL THEN 'AND atype.id = $8'    END
 , CASE WHEN NOT ($2,$3,$4,$5,$6,$7,$8) IS NULL THEN
     ')' END
   );
   USING  placename
        , eventtype
        , eventdate
        , eventcentury
        , constructiondate
        , constructioncentury
        , arstyle
        , artype;
END
$func$  LANGUAGE plpgsql;

This is a complete rewrite with several improvements. Should make the function considerably. Also SQLi-safe (like your original). Should be functionally identical except the cases where I join fewer tables, which might not filter rows that are filtered by joining to the tables alone.

Major features:

Use EXISTS() instead of lots of joins in the outer level plus GROUP BY. This contributes the lion share to the better performance. Related:
- Search a JSON array for an object containing a value matching a pattern
format() is typically a good choice to concatenate SQL from user input. But since you encapsulated all code elements and only pass flags, you don't need it in this case. Instead, concat_ws() is of help. Related:
- How to concatenate columns in a Postgres SELECT?
Only concatenate JOINs you actually need.
Fewer assignments, shorter code.
Default values for parameters. Allows simplified call with missing parameters. Like:
```
SELECT __a_search_place('foo', 2, 3, 4);
SELECT __a_search_place('foo');
```
Related:
- Optional argument in PL/pgSQL function
About the short ROW() syntax for testing whether any value is NOT NULL:
- Why is IS NOT NULL false when checking a row type?

Nankeen answered 16/4, 2018 at 3:32 Comment(6)

Thanks for your answer and the general comments. I put the accent/unaccent characters in the unaccent rules and it work. I dont see a need to escape characters in LIKE. As for your other remarks, check my edited original question. – Phonation 17/4, 2018 at 10:48

Oh! For the lower(unaccent('%' || $1 || '%')) part, I keep getting syntax errors like

ERROR:  operator is not unique: unknown % unknown LINE 1: ...OWER (unaccent(place.name)) LIKE LOWER (unaccent('%' || $1 |...

This is why I use the placewithkeys := '%'||placename||'%'; solution – Phonation 17/4, 2018 at 10:51

@slevin: You shouldn't get syntax errors. I guess your single quotes are in disarray. That's why I suggested dollar-quoting to simplify. While being at it, I improved some more: '%' || lower(unaccent($1)) || '%' is slightly better than lower(unaccent('%' || $1 || '%')). – Nankeen 17/4, 2018 at 12:38

@slevin: Consider the audited function I added. – Nankeen 17/4, 2018 at 14:50

Hi again, took me a while to check what is going on and google some details, but I finally got what you did in the rewrite -and the mindset helped me syntax other functions that are also used for search and combine different criteria. Your logic is great, more structured than my solution and more slick handling the various cases. Nice choice to use concat_ws. Overall, I also think that this version is safe , since there is the USING. Thank you very much. I appreciate your effort and your ability to share knowledge. – Phonation 20/4, 2018 at 13:6

I submitted a bug report as this is causing frustration and forces us to come up with workarounds for something that should be very simple to improve from the postgres side (just adding those characters in the official dictionary). Anyone that needs this, could also submit another bug report in hopes this gets noticed and solved quickly, so we and other developers can save time spent on finding workarounds in the future. – Mucoprotein 22/8, 2018 at 15:11

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Postgres 12

Postgres 11 or older

Aside: Dyamic SQL and SQLi

Audited function

Recommended topics

Hot tags