How do I configure pyodbc to correctly accept strings from SQL Server using freeTDS and unixODBC?

Asked 11/12, 2012 at 21:20 Answered 3/9, 2013 at 16:26

Solved sql-server-2008 unicode pyodbc freetds unixodbc

I can not get a valid string from an MSSQL server into python. I believe there is an encoding mismatch somewhere. I believe it is between the ODBC layer and python because I am able to get readable results in tsql and isql.

What character encoding does pyodbc expect? What do I need to change in the chain to get this to work?

Specific Example

Here is a simplified python script as an example:

#!/usr/bin/env python
import pyodbc

dsn = 'yourdb'
user = 'import'
password = 'get0lddata'
database = 'YourDb'

def get_cursor():
    con_string = 'DSN=%s;UID=%s;PWD=%s;DATABASE=%s;' % (dsn, user, password, database)
    conn = pyodbc.connect(con_string)
    return conn.cursor()

if __name__ == '__main__':
    c = get_cursor()
    c.execute("select id, name from recipe where id = 4140567")

    row = c.fetchone()
    if row:
        print row

The output of this script is:

(Decimal('4140567'), u'\U0072006f\U006e0061\U00650067')

Alternatively, if the last line of the script is changed to:

print "{0}, '{1}'".format(row.id, row.name)

Then the result is:

Traceback (most recent call last):
  File "/home/mdenson/projects/test.py", line 20, in <module>
    print "{0}, '{1}'".format(row.id, row.name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

A transcript using tsql to execute the same query:

root@luke:~# tsql -S cmw -U import -P get0lddata
locale is "C"
locale charset is "ANSI_X3.4-1968"
using default charset "UTF-8"
1> select id, name from recipe where id = 4140567
2> go
id      name
4140567 orange2
(1 row affected)

and also in isql:

root@luke:~# isql -v yourdb import get0lddata
SQL>  select id, name from recipe where id = 4140567
+----------------------+--------------------------+
| id                   | name                     |
+----------------------+--------------------------+
| 4140567              | orange2                  |
+----------------------+--------------------------+
SQLRowCount returns 1
1 rows fetched

So I have worked at this for the morning and looked high and low and haven't figured out what is amiss.

Details

Here are version details:

Client is Ubuntu 12.04
freetds v0.91
unixodbc 2.2.14
python 2.7.3
pyodbc 2.1.7-1 (from ubuntu package) & 3.0.7-beta06 (compiled from source)
Server is XP with SQL Server Express 2008 R2

Here are the contents of a few configuration files on the client.

/etc/freetds/freetds.conf

[global]
    tds version = 8.0
    text size = 64512
[cmw]
    host = 192.168.90.104
    port = 1433
    tds version = 8.0
    client charset = UTF-8

/etc/odbcinst.ini

[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
FileUsage = 1

/etc/odbc.ini

[yourdb]
Driver = FreeTDS
Description = ODBC connection via FreeTDS
Trace = No
Servername = cmw
Database = YourDB
Charset = UTF-8

Tidy answered 11/12, 2012 at 21:20 Comment(1)

In my continuing work on this I have now compiled the latest version of pyodbc, 3.0.7-beta06, but no change in behavior. – Tidy 11/12, 2012 at 22:37

So after continued work I am now getting unicode characters into python. Unfortunately the solution I've stumbled upon is about as satisfying as kissing your cousin.

I solved the problem by installing the python3 and python3-dev packages and then rebuilding pyodbc with python3.

Now that I've done this my scripts now work even though I am still running them with python 2.7.

So I don't know what was fixed by doing this, but it now works and I can move on to the project I started with.

Tidy answered 18/12, 2012 at 19:55 Comment(1)

It's not the python version, it's ubuntu's package. Debian had the same problem. I removed debian's package and installed pyodbc via pip and everything works fine, no python 3 required. – Accuracy 19/12, 2012 at 17:32

Any chance you're having a problem with a BOM (Byte Order Marker)? If so, maybe this snippet of code will help:

import codecs
if s.beginswith( codecs.BOM_UTF8 ):
    # The byte string s begins with the BOM: Do something.
    # For example, decode the string as UTF-8

if u[0] == unicode( codecs.BOM_UTF8, "utf8" ):
    # The unicode string begins with the BOM: Do something.
    # For example, remove the character.

# Strip the BOM from the beginning of the Unicode string, if it exists
u.lstrip( unicode( codecs.BOM_UTF8, "utf8" ) )

I found that snippet on this page.

Wrench answered 12/12, 2012 at 0:19 Comment(2)

Hmm. I've read your answer and the link and I am not sure this is the problem or at least I don't know how to do something with it. It does appear from the string u'\U0072006f\U006e0061\U00650067' that each pair of letters is swapped but there is a missing 7th character and no sign of a BOM that I can see. – Tidy 12/12, 2012 at 1:39

I was really just taking a stab at the problem based on the error saying it couldn't "...encode the characters in positions 0-2 because they are not in range". Sorry Matthew. – Wrench 12/12, 2012 at 3:3

If you upgrade the pyodbc to version 3 the problem will be solved.

Overzealous answered 3/9, 2013 at 16:26 Comment(0)

Recommended topics

Hot tags