How to convert literal \u sequences into UTF-8? [duplicate]

About

How to convert literal \u sequences into UTF-8? [duplicate]

Asked 11/4, 2012 at 18:17 Answered 12/4, 2012 at 2:57

I am loading data dump from external source and some strings contain \uXXXX sequences for the UTF8 chars, like this one:

\u017D\u010F\u00E1r nad S\u00E1zavou

I can check the contents by using E'' constant in psql, but cannot find any function/operator to return me proper value.

I'd like to ask, if it's possible to convert this string with unicode escapes into normal UTF8 without using PL/pgSQL functions?

Harbour answered 11/4, 2012 at 18:17 Comment(9)

The \E'' literal format is not a psql client thing -- any SQL code running on the PostgreSQL server can use that syntax. You can put the string into a literal like E'\u017D\u010F\u00E1r nad S\u00E1zavou' to have it properly interpreted; what else would you like to do? – Karikaria 11/4, 2012 at 19:48

I want to be able to do something like: UPDATE table SET proper = somefunc('\u017D\u010F\u00E1r nad S\u00E1zavou') WHERE id=1;. And get the expected UTF8 string. – Harbour 11/4, 2012 at 19:54

If the database is using UTF8 encoding, you should be able to do something like: UPDATE table SET proper = E'\u017D\u010F\u00E1r nad S\u00E1zavou' WHERE id=1; What do you get when you run SHOW server_encoding;? How about SHOW client_encoding;? – Karikaria 11/4, 2012 at 20:45

Encoding is fine, UTF8. Problem is that I cannot use literal constants. I need a table-wide query, like: UPDATE table SET proper = somefunc(badtext); for a 8M row table. Copy-paste is not an option. – Harbour 11/4, 2012 at 21:4

Where is badtext coming from? What encoding is it in? Is this in a disk file where you could just run a encoding conversion utility on it? Depending on context, one of the convert_* functions in this table might possibly help: postgresql.org/docs/9.1/static/… – Karikaria 11/4, 2012 at 21:9

The mentioned \uXXXX sequences are not the result of the bad encoding conversion. They're just there, literally. – Harbour 11/4, 2012 at 22:1

I found and fixed a weakness in my answer. – Vikiviking 2/3, 2013 at 21:55

@CraigRinger, this question was asked 2 years ago, yet you've marked it as a duplicate for a quite recent one. Should be the other way around, don't you think so? – Harbour 15/5, 2014 at 9:37

It's not just age, it's also how searchable it is, how clear the question is, and how good the answer(s) are. In this case I'd prefer to merge them entirely, but SO doesn't provide for that. Note that there's nothing bad about something being marked a duplicate, it's not a statement that there's something wrong with the question, just that there are similar ones out there. – Littlefield 15/5, 2014 at 11:29

I don't think there is a built in method for that. Easiest way I can think of is the plpgsql function you wanted to avoid:

CREATE OR REPLACE FUNCTION str_eval(text, OUT t text)
  LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
BEGIN
   EXECUTE 'SELECT E''' || replace($1, '''', '''''') || ''''
   USING $1
   INTO t;
END
$func$;

The updated version safeguards against SQLi and is faster, too.

Vikiviking answered 12/4, 2012 at 2:57 Comment(4)

Hum... It is not working, no string change at SELECT E'Ceci\u008Dlia'. Expected Cecília. Same problem when using in select str_eval(x) from t(x). – Consternation 16/6, 2020 at 17:43

@PeterKrauss: E'Ceci\u00EDia' works for me. – Vikiviking 19/6, 2020 at 2:17

oops, sorry Erwin (and readers) my fast and non-checked comment. \u008D is not a diacrilic, it is a non-printable ASCII... Was a "clean bad encode" problem, solved here. – Consternation 20/6, 2020 at 8:52

Now a really important comment... About your 2012 parse-string function str_eval() and PostgreSQL project: why in nowadays (2020) there are no internal function for string unescape? The 3k pageviews here + 5k pageviews here shows that many programmers was looking for it (!)... Small continuous quality improvement needs small democracy, but there is no "user community demand" consensus mechanism to vote on small optimizations for pg library. – Consternation 20/6, 2020 at 8:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags