postgresql 11 - create database with ICU locale

Asked 5/4, 2020 at 20:8 Answered 20/3, 2024 at 11:48

Solved postgresql locale collation icu postgresql-11

I want to use an ICU system-insensitive sorting collation, to avoid sorting differences between postgres11-on-mac vs postgres11-on-Ubuntu. My first test was to dump out my existing Collate=en_US.UTF-8 and pg_restore them into a db created with Collate=en-US-x-icu

Create Database doc has this to say:

To create a database music with a different locale:

CREATE DATABASE music LC_COLLATE 'sv_SE.utf8' LC_CTYPE 'sv_SE.utf8' TEMPLATE template0;

I seem to have the required icu locales already:

select collname, collprovider from pg_collation where collname like 'en_US%';

        collname        | collprovider
------------------------+--------------
 en_US.UTF-8            | c
 en_US                  | c
 en_US.ISO8859-15       | c
 en_US.ISO8859-1        | c
 en_US                  | c
 en_US                  | c
 en-US-x-icu            | i 👈
 en-US-u-va-posix-x-icu | i 👈
(8 rows)

But no luck when creating a database with either icu locales.

CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ksysdb=# CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ERROR:  invalid locale name: "en-US-x-icu"

I can use `LC_COLLATE` with other locales:

The LC_COLLATE clause does seem to come with some strings attached, such as watching your encoding and specifying an appropriate template. But it seems to give error hints w non-ICU locales.

This works, for example: CREATE DATABASE test LC_COLLATE = 'en_US' TEMPLATE template0;

and this one gives a helpful user message:

ksysdb=# CREATE DATABASE test LC_COLLATE = 'en_US.ISO8859-15' TEMPLATE template0;
ERROR:  encoding "UTF8" does not match locale "en_US.ISO8859-15"
DETAIL:  The chosen LC_COLLATE setting requires encoding "LATIN9".

Note: a related question, PostgreSQL 10 on Linux - LC_COLLATE locale en_US.utf-8 not valid, doesn't seem all that relevant, as the answer talks about generating an OS-level locale to fix the issue. While the ICU locales, as far as I understand, are expressly intended to be separated from the underlying OS.

Roup answered 5/4, 2020 at 20:8 Comment(8)

Currently you can't use ICU locales as the default locale – Lethbridge 5/4, 2020 at 20:15

so.... specify it via ALTER COLUMN only? or at CREATE TABLE time? And is this also true on postgresql12, even though the question is for 11 right now? – Roup 5/4, 2020 at 20:16

Yes, that's correct – Lethbridge 5/4, 2020 at 20:16

if you write it up and link to some postgresql doc, so that people can see when/if the restriction is lifted, I'll accept. – Roup 5/4, 2020 at 20:17

The only documentation is the source: check_locale in src/backend/utils/adt/pg_locale.c uses setlocale to check if the locale name is valid. That is a C library function, consequently it will only work with C library collations. – Amand 6/4, 2020 at 6:40

This seems to be a feature of the commercial 'EDB Postgres Advanced Server'. It seems you can set default ICU collations per database like CREATE DATABASE collation_db TEMPLATE template0 ENCODING 'UTF8' ICU_SHORT_FORM = 'AN_CU_EX_NX_LROOT'; and even per cluster via initdb --icu-short-form. Nice examples in the old docs. – Stepaniestepbrother 26/11, 2020 at 16:4

Over at stackoverflow.com/questions/59168841 a nice conversion script is given to convert all text columns to icu collations. – Stepaniestepbrother 26/11, 2020 at 16:9

Is this still true for version 15? – Sonny 16/2, 2023 at 16:33

This is possible from PostgreSQL v15 on:

CREATE DATABASE test
   LOCALE_PROVIDER icu
   ICU_LOCALE "en-US"
   LOCALE "en_US.utf8"
   TEMPLATE template0;

You still need to provide the libc locale, but the ICU library is used for collations.

Amand answered 15/5, 2023 at 12:21 Comment(7)

This doesn't work in PG16. I guess you should skip LOCALE completely. The docs "locale [...] Can be overridden by setting lc_collate, lc_ctype, or icu_locale individually." postgresql.org/docs/16/sql-createdatabase.html – Strabismus 18/3, 2024 at 14:57

@Strabismus Oh yes, that works in PostgreSQL v16. But perhaps your PostgreSQL is built without ICU support, or the C library locales have a different name on your operating system. – Amand 18/3, 2024 at 15:35

I have a fresh install of PG 16.2 on Ubuntu and it doesn't work. It does work when I remove LOCALE "en_US.utf8". – Strabismus 18/3, 2024 at 17:24

@Strabismus As I said, that C library locale does not exist on your system. Run locale -a in your shell. – Amand 18/3, 2024 at 18:50

dpkg says otherwise (dpkg -l | grep libicu). I have libicu66:amd64 "International Components for Unicode". If I use bogus ICU_LOCALE it fails, but this works fine:

CREATE DATABASE "test_mlnet1"  WITH OWNER = molnet  LOCALE_PROVIDER icu ICU_LOCALE "pl-PL"  TABLESPACE = pg_default TEMPLATE template0 ;

– Strabismus 20/3, 2024 at 10:54

@Strabismus Then look at \l and see what C library locale you got. – Amand 20/3, 2024 at 11:12

I did. You might want to check my answer. ICU should not depend on the system. That is the whole point AFAIK. And so far it seem to work as advertised :) – Strabismus 20/3, 2024 at 11:50

PG15 and later

From PG16 (at least in version 16.2) you might want to skip "LOCALE" all together.

This will work fine and should be enough:

CREATE DATABASE temp_test
LOCALE_PROVIDER icu ICU_LOCALE "pl-PL" TEMPLATE template0;

This is how you can test that the ICU_LOCALE actually works and override collation:

DROP TABLE IF EXISTS table1;
CREATE TABLE table1 (
    id SERIAL PRIMARY KEY,
    text_column VARCHAR(255) NOT NULL
);
INSERT INTO table1 (text_column) VALUES ('Zażółć');
INSERT INTO table1 (text_column) VALUES ('Abecki');
INSERT INTO table1 (text_column) VALUES ('Ącki');
INSERT INTO table1 (text_column) VALUES ('definicja');
INSERT INTO table1 (text_column) VALUES ('Żółć');
INSERT INTO table1 (text_column) VALUES ('<nieokreślony>');


SELECT * FROM table1 ORDER BY text_column;

Expected order (notice how national characters are mixed in):

<nieokreślony>
Abecki
Ącki
definicja
Zażółć
Żółć

Note that - quite confusingly - a database listing might show a different collation. But the test above shows that the collation is in fact provided by ICU (the listing is from PGv16.2, this might change in future).

PG10 up to PG15

I should mention that for PG version 10 (and later) this would work, but only works on Windows:

CREATE DATABASE "temp_test"
    ENCODING 'UTF8'
    LC_COLLATE 'pl-PL-x-icu'
    LC_CTYPE 'pl-PL-x-icu'
    TEMPLATE template0
;

On Linux and Mac OS you would have to use the COLLATE keyword on each column separately:

CREATE TABLE table1 (
    id SERIAL PRIMARY KEY,
    text_column VARCHAR(255) COLLATE "pl-PL-x-icu" NOT NULL
);

So updating to Postgres version 15 or later is advised if you want to have a consistent sorting on each operating system.

Strabismus answered 20/3, 2024 at 11:48 Comment(5)

I don't know if that's so great, because you ended up with the default C library locale from the shell, which happened to be British English. So you would get for example an English decimal separator for to_char(12.34, '99D99'), which may not be what you want. It would have been better to explicitly specify "pl_PL.UTF-8" for LOCALE (note the spelling). – Amand 20/3, 2024 at 12:36

The spelling of utf8 is not problem. It still fails: ERROR: invalid LC_COLLATE locale name: "pl_PL.UTF-8"; HINT: If the locale name is specific to ICU, use ICU_LOCALE. – Strabismus 20/3, 2024 at 14:51

Since you are on Debian/Ubuntu: sudo locale-gen pl_PL.utf8. Then restart PostgreSQL and run SELECT pg_import_system_collations('pg_catalog');. Then you'll have the locale. – Amand 20/3, 2024 at 15:17

The app I develop is multilingual and formatting is done in the application, but that does seem to work. Kind of weird that formatting still relays on system components in PG. – Strabismus 20/3, 2024 at 16:43

Open source projects tend not to reinvent the wheel. We write a database server, and the resources are limited. – Amand 21/3, 2024 at 6:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

I seem to have the required icu locales already:

I can use LC_COLLATE with other locales:

PG15 and later

PG10 up to PG15

Recommended topics

Hot tags

I can use `LC_COLLATE` with other locales: