rodbc character encoding error with PostgreSQL
Asked Answered
J

4

7

I'm getting a new error which I've never gotten before when connecting from R to a GreenPlum PostgreSQL database using RODBC. I've gotten the error using both EMACS/ESS and RStudio, and the RODBC call has worked as is in the past.

library(RODBC)
gp <- odbcConnect("greenplum", believeNRows = FALSE)
data <- sqlQuery(gp, "select * from mytable")

> data
[1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in  "WIN1252\";\nError while executing the query" 
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'"

EDIT: Just tried querying another table and did get results. So I guess it's not an RODBC problem but a PostgreSQL table encoding problem.

R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RODBC_1.3-2
> 
Jillene answered 23/8, 2011 at 20:43 Comment(2)
Does it work in a normal R session? Output from sessionInfo() might be helpful in this case. It looks like something has changed such that one or both systems locales/encodings has changed. (By the way, sn't that a typo in the argument name believeNRows in the odbcConnect() call?)Fondue
@Gavin no it doesn't work from normal R session - just tried. I just added the output from sessionInfo(), and fixed the typo.Jillene
L
4

First, the issue arises because R is trying to convert to a Windows locale that supports UTF8. Unfortunately, Brian Ripley has reported numerous times that Windows has no UTF8 locales. From hours spent searching the web, StackOverflow, Microsoft, etc., I have come to the conclusion that Microsoft hates UTF-8 Windows won't support UTF8.

As a result, I'm not sure that there's an easy solution to this, if there is any solution at all. The best I can recommend is to wrap some kind of conversion on the server side, look at filtering the data if you can, or try a different language, if appropriate (e.g. Chinese, Japanese, Korean).

If you do decide to wrap a converter, unicode.org recommends this ICU toolkit.

Lakshmi answered 23/8, 2011 at 21:40 Comment(2)
thanks. I don't know why I'm getting funky characters in the data, it's all english and the table I am querying is all aggregations. I'll look into it.Jillene
Shhhh, don't say that explicitly. You should write it in gobbledygook or you'll get us all in trouble. ;-) My encryption algorithm on Windows is to just convert everything to UTF8 - voila!Lakshmi
P
3

0xc280 is a control element ( U+0080 in Unicode) that is causing trouble pretty often when using SQL and the likes. The problem often lies in the conversion chain that invariably happens when you use different applications that use different encoding schemes. Windows has UTF-8 included by now, so it's not strictly a Windows problem. I believe the problem arises before R reads the data in.

In fact, in the chain the character sequence 0x80 in UNICODE will be mapped to 0xc280 in UTF-8. This is supposed to be a control sequence, and cannot be printed. But chances are big that the 0x80 is in fact not UNICODE, but Windows Latin-1 or Latin-2. In that case, the 0x80 represents the euro sign. That might explain how it ends up in your data. Check if you can find something like that in the data, that would explain something already.

My guess is that the solution will not lie at the R-end of this workchain, but before that. It will try automatic conversion, but this one is reported to fail in some cases (also for SQL and Oracle btw). Check in which encoding you're working in Postgresql, and try to use any of the latin types. There might be other links involved (a Putty or similar terminal for example). I'm pretty sure all the encodings there are ISO8859-1, which is Latin-1. Somewhere UTF-8 gets thrown in between, and when the 0x80 character gets wrongly mapped to 0xc280, you get trouble.

So check the encodings in your complete workchain, and make sure that they all match. If they don't, the automatic conversion done between each step is bound to give trouble for some characters.

Hope this helps.

Purslane answered 24/8, 2011 at 1:6 Comment(1)
thanks for the explanation. I think the source of the problem is that a field in the table contains url query strings which may contain weird international characters. However, I've been able to query this table using other clients via ODBC such as PgAdmin and RazorSQL. Weird...Jillene
C
1

I might have posted this response elswhere but here goes.

I get similar error when connecting to Postgres DB from MS SQL Management client. Tyring to fix the source data is almost impossible in my case.

My Scenario:

  1. Trying to connect to Postgress using MS SQL Linked Objects via an ODBC System DSN, and see errors such as "ERROR: character 0xc280 of encoding "UTF8" has no equivalent in"WIN1252";
  2. Select statements on some tables work and others throw this error.

Fix: Use an ODBC driver that supports Unicode. I am using an ODBC driver from PostgreSQL Global Development Group. Go to Configure DSN/Manage DSN and select the Unicode driver.

Good luck.

Cloistral answered 4/10, 2013 at 20:34 Comment(0)
I
0

By default Greenplum use UTF8 for character encoding. You could check this by logging in to Greenplum server and launching psql - console client for Greenplum. In this console application you could issue command: \l to list all of the databases configured in the Greenplum - this should also describe character set for database.

I think your prblem is that R doesnt support UTF8 for chars (You use different locale) But you could use On-the-fly transcoding in ODBC driver. Not sure about all ODBC drivers but DataDirect drivers support extra option in odbc.ini file (usually located in user home directory) - IANAAppCodePage.

You could find appropriate code for this parameter on this link: http://www.iana.org/assignments/character-sets

Here is the example od ODBC.ini content:

[ODBC]
Driver=/opt/odbc/lib/S0gplm60.so
IANAAppCodePage=2252
AlternateServers=
ApplicationUsingThreads=1
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
Database=mysdb
EnableDescribeParam=1
ExtendedColumnMetadata=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
FetchRefCursor=1
FetchTSWTZasTimestamp=0
FetchTWFSasTime=0
HostName=192.168.1.100
InitializationString=
LoadBalanceTimeout=0
LoadBalancing=0
LoginTimeout=15
LogonID=
MaxPoolSize=100
MinPoolSize=0
Password=
Pooling=0
PortNumber=5432  
QueryTimeout=0
ReportCodepageConversionErrors=0
TransactionErrorBehavior=1
XMLDescribeType=-10
Indogermanic answered 14/11, 2011 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.