I try to use this solution and this (for str_eval()
) but seems other encode or other UTF8's Normalization Form, perhaps combining diacritical marks...
select distinct logradouro, str_eval(logradouro)
from logradouro where logradouro like '%CECi%';
-- logradouro | str_eval
------------------------------+----------------------------
-- AV CECi\u008DLIA MEIRELLES | AV CECi\u008DLIA MEIRELLES
PROBLEM: how to select all rows of the table where the problem exists?
That is, where \u
occurs?
- not works with
like '%CECi\u%'
neitherlike '%CECi\\u%'
- works with
like E'%CECi\u008D%'
but is not generic
For Google, edited after solved question: this is a typical XY problem. In the original question (above) I used ~wrong hypothesis. All the solutions bellow are answers to the following (objective) question:
How to select only printable ASCII text?
"Printable ASCII" is a subset of UTF8, it is "all ASCII that is not a 'control character'".
The "non-printable" control characters are UNICODE hexadecimal 00 to 1F and 7F
(HTML entity �
to 
+ 
or decimal 0 to 31 + 127).
PS1: the zero (�
) is the "end of text" mark of PostgreSQL text datatype internal representation, so not need to be checked, but no problems to include it in the range.
PS2: about the secondary question "how to convert a word with encode bug to a valid word?",
see an heuristic at my answer.
\u....
but no regular or previsible code point. – Freudian