invalid byte sequence for encoding "UTF8"
Asked Answered
P

24

162

I'm trying to import some data into my database. So I've created a temporary table,

create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));

And now I'm trying to import the data,

 copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv

But then I get the error,

ERROR:  invalid byte sequence for encoding "UTF8": 0xc92c

How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp table? Or should I attempt to change the encoding of the file?

Potentilla answered 1/2, 2011 at 19:51 Comment(2)
change the encoding option on import. I set mine to "Windows-1251" and it worked without complaint.Intensifier
Thanks @BrianD, I was facing this issue as well and this worked for me.Spinks
E
134

If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".

But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy utility has detected or guessed that you're feeding it a UTF8 file.

If you're running under some variant of Unix, you can check the encoding (more or less) with the file utility.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.

If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)

You can use the iconv utility to change encoding of the input data.

iconv -f original_charset -t utf-8 originalfile > newfile

You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".

Eddings answered 1/2, 2011 at 20:36 Comment(10)
Says the file is ASCII, but it contains accented characters, so that must be wrong?Potentilla
@Mark: Well, either PostgreSQL copy is guessing wrong, or the file utility is guessing wrong. For a Windows file that has accents, file tells me "ISO-8859 text, with CRLF line terminators". The docs for copysay "Input data is interpreted according to the current client encoding, . . . even if the data does not pass through the client but is read from or written to a file directly by the server." Hmmm. I think I know how to change that.Smocking
I'm not sure whether your best bet is to try to convert the data to a known encoding, to change the client encoding during the import, or both. I'll edit my answer.Smocking
Will accept this answer, but I think the problem was actually with the data (updated Q).Potentilla
@MikeSherrill'Catcall' I'm facing a similar problem, as my input file is a *.tsv in UTF-16BE encoding format. Do you know any tool which can help?Amon
If it really is UTF-16BE, and you're still getting invalid byte sequence errors, you probably have bad data. But you can try converting the tsv file to the same encoding your shell uses. (Use iconv.)Smocking
I found this helpful, thanks. By the way, it runs on OS X terminals as wellWarbeck
file only looks at the first two kilobytes of the file, in any case it's impossible to detect presence of UTF-8 by inspection (except for files that contain only ASCII characters) file is just guessing, when inspecting content that's all it ever does.Spermous
This worked for me, but in a slightly different way. The "iconv" command actually bombed on my file, but it did right where the problem was - some weird kind of "-" character. Anyway, I removed that, and my file was able to load into postgres. Thanks for the tip!Unreasonable
Just to help others and the search engines: this works for converting a Stripe CSV export with unreadable characters back into UTF-8: ` iconv -f ISO-8859-15 -t utf-8 customers.csv > customers-utf8.csv`Lexicostatistics
P
74
psql=# copy tmp from '/path/to/file.csv' with delimiter ',' csv header encoding 'windows-1251';

Adding encoding option worked in my case.

Purplish answered 21/5, 2014 at 21:23 Comment(3)
it will complete without error, it may or may not give useful results. you need to know the intended encoding of the data.Spermous
In my scenario how did above query worked ? I have csv file encoded with UTF8 and DB encoded with UTF8.Pyroelectric
In my case it worked with enconding "latin1" for a file with many accented characters.E
E
24

If you are ok with discarding nonconvertible characters, you can use -c flag

iconv -c -t utf8 filename.csv > filename.utf8.csv

and then copy them to your table

Elwina answered 3/11, 2017 at 12:3 Comment(1)
On Mac it was iconv -c -t UTF-8 filename.csv > filename.utf8.csv for meMendoza
P
17

Apparently I can just set the encoding on the fly,

 set client_encoding to 'latin1'

And then re-run the query. Not sure what encoding I should be using though.


latin1 made the characters legible, but most of the accented characters were in upper-case where they shouldn't have been. I assumed this was due to a bad encoding, but I think its actually the data that was just bad. I ended up keeping the latin1 encoding, but pre-processing the data and fixed the casing issues.

Potentilla answered 23/9, 2015 at 20:44 Comment(1)
Interestingly, I got the error on a SELECT statement! This solved it because it was my psql client giving the error, not the database itself. (Which would have rejected the data in the first place had the encoding forbidden it.)Physiological
S
10

I had the same problem: my file was not encoded as UTF-8. I have solved it by opening the file with notepad++ and changing the encoding of the file.

Go to "Encoding" and select "Convert to UTF-8". Save changes and that's all!

Studer answered 3/3, 2020 at 10:39 Comment(0)
P
8

This error means that records encoding in the file is different with respect to the connection. In this case iconv may return the error, sometimes even despite //IGNORE flag:

iconv -f ASCII -t utf-8//IGNORE < b.txt > /a.txt

iconv: illegal input sequence at position (some number)

The trick is to find incorrect characters and replace it. To do it on Linux use "vim" editor:

vim (your text file), press "ESC": button and type ":goto (number returned by iconv)"

To find non ASCII characters you may use the following command:

grep --color='auto' -P "[\x80-\xFF]"

If you remove incorrect characters please check if you really need to convert your file: probably the problem is already solved.

Pretrice answered 1/2, 2011 at 19:51 Comment(1)
iconv -c -f utf8 -t utf8//IGNORE < dirty.txt > clean.txtSpermous
B
6

follow the below steps to solve this issue in pgadmin:

  1. SET client_encoding = 'ISO_8859_5';

  2. COPY tablename(column names) FROM 'D:/DB_BAK/csvfilename.csv' WITH DELIMITER ',' CSV ;

Bray answered 25/8, 2016 at 12:33 Comment(0)
I
5

It depends on what type of machine/encoding generated your import file.

If you're getting it from an English or Western European version of Windows, your best bet is probably setting it to 'WIN1252'. If you are getting it from a different source, consult the list of character encodings here:

http://www.postgresql.org/docs/8.3/static/multibyte.html

If you're getting it from a Mac, you may have to run it through the "iconv" utility first to convert it from MacRoman to UTF-8.

Imperforate answered 1/2, 2011 at 20:8 Comment(0)
C
5

Well I was facing the same problem. And what solved my problem is this:

In excel click on Save as. From save as type, choose .csv Click on Tools. Then choose web options from drop down list. Under Encoding tab, save the document as Unicode(UTF-8). Click OK. Save the file. DONE !

Cut answered 26/5, 2016 at 20:43 Comment(0)
M
5

I ran into this problem under Windows while working exclusively with psql (no graphical tools). To fix this problem, permanently change the default encoding of psql (client) to match the default encoding of the PostgreSQL server. Run the following command in CMD or Powershell:

setx PGCLIENTENCODING UTF8

Close and reopen you command prompt/Powershell for the change to take effect.

Change the encoding of the backup file from Unicode to UTF8 by opening it with Notepad and going to File -> Save As. Change the Encoding dropdown from Unicode to UTF8. (Also change the Save as type from Text Documents (.txt) to All Files in order to avoid adding the .txt extension to your backup file's name). You should now be able to restore your backup.

Mccallister answered 29/3, 2020 at 22:42 Comment(0)
L
4
copy tablename from 'filepath\filename' DELIMITERS '=' ENCODING 'WIN1252';

you can try this to handle UTF8 encoding.

Liederman answered 4/9, 2017 at 6:53 Comment(0)
E
4

Short Example to Solve this Problem in PHP-

$val = "E'\377'";
iconv(mb_detect_encoding($val, mb_detect_order(), true), "UTF-8", $val);

Error Detail: As POSTGRES database do not handle other than UTF-8 Characters when we try to pass above given inputs to a column its give error of "invalid byte sequence for encoding "UTF8": 0xab" .

So just convert that value into UTF-8 before insertion in POSTGRES Database.

Expressionism answered 23/7, 2019 at 17:54 Comment(0)
M
4

Open your csv file in excel, and save it in utf8-csv format

Mixture answered 7/6, 2021 at 10:36 Comment(0)
M
3

You can replace the backslash character with, for example a pipe character, with sed.

sed -i -- 's/\\/|/g' filename.txt
Mooneye answered 14/6, 2015 at 22:23 Comment(0)
K
3

I had the same problem, and found a nice solution here: http://blog.e-shell.org/134

This is caused by a mismatch in your database encodings, surely because the database from where you got the SQL dump was encoded as SQL_ASCII while the new one is encoded as UTF8. .. Recode is a small tool from the GNU project that let you change on-the-fly the encoding of a given file.

So I just recoded the dumpfile before playing it back:

postgres> gunzip -c /var/backups/pgall_b1.zip | recode iso-8859-1..u8 | psql test

In Debian or Ubuntu systems, recode can be installed via package.

Keef answered 12/3, 2017 at 22:4 Comment(0)
T
2

This error may occur if input data contain escape character itself. By default escape character is "\" symbol, so if your input text contain "\" character - try to change the default value using ESCAPE option.

Tyro answered 27/5, 2014 at 11:8 Comment(0)
M
1

For python, you need to use

Class pg8000.types.Bytea (str) Bytea is a str-derived class that is mapped to a PostgreSQL byte array.

or

Pg8000.Binary (value) Construct an object holding binary data.

Madelaine answered 18/5, 2017 at 11:9 Comment(0)
G
1

Open file CSV by Notepad++ . Choose menu Encoding \ Encoding in UTF-8, then fix few cell manuallly.

Then try import again.

Gammy answered 19/4, 2018 at 3:19 Comment(0)
M
1

Alternative cause on Windows with pgadmin v4.4:

Column names with non-ASCII characters will somehow mess up the psql import command, and give you this unintuitive error message. Your UTF8 csv data is probably fine.

Solution:

Rename your fields.

Example:

"Résultat" -> resultat
Mere answered 20/10, 2020 at 15:7 Comment(0)
N
1

better to identify problem lines with command:

grep -naxv '.*' source_data.txt
Nonanonage answered 23/2, 2022 at 9:47 Comment(0)
D
0

It is also very possible with this error that the field is encrypted in place. Be sure you are looking at the right table, in some cases administrators will create an unencrypted view that you can use instead. I recently encountered a very similar issue.

Davidadavidde answered 8/2, 2016 at 20:38 Comment(0)
I
0

I got the same error when I was trying to copy a csv generated by Excel to a Postgres table (all on a Mac). This is how I resolved it:

1) Open the File in Atom (the IDE that I use)

2) Make an insignificant change in the file. Save the file. Undo the change. Save again.

Presto! Copy command worked now.

(I think Atom saved it in a format which worked)

Intranuclear answered 27/3, 2017 at 7:10 Comment(0)
H
0

If your CSV is going to be exported from SQL Server, it is huge, and it has Unicode characters, you can export it by setting the encoding as UTF-8:

Right-Click DB > Tasks > Export > 'SQL Server Native Client 11.0' >> 'Flat File Destination > File name: ... > Code page: UTF-8 >> ...

In the next page it asks whether you want to copy data from a table or you want to write a query. If you have char or varchar data types in your table, select the query option and cast those columns as nvarchar(max). E.g if myTable has two columns where the first one is varchar and second one int, I cast the first one to nvarchar:

select cast (col1 as nvarchar(max)) col1
       , col2
from myTable
Hospitium answered 23/9, 2019 at 19:39 Comment(0)
F
-1

some of lolutions may be very sambles

i there any spaces in the name of comlun will be cause this problem

review every columns name for exaple "colum_name "#>>rong "colum_nam" #>>right

Feverous answered 22/12, 2020 at 11:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.