UnicodeDecodeError: 'utf-16-le'
Asked Answered
C

4

8

I'm trying to pull a list of tables from an access database. The script goes through and displays about a third of the tables and gets the following error:

Traceback (most recent call last): File "C:/GageInfo/sourcecode for GageTrakNotify/__test script.py", line 31, in for fld in cursor2.columns(rows.table_name): UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 132-133: illegal encoding

Any tips would be appreciated.

import pyodbc


cursor1 = conn.cursor()
cursor2 = conn.cursor()

tblCount = 0
for rows in cursor1.tables():
   if rows.table_type == "TABLE":
       tblCount += 1
       print(rows.table_name)
       for fld in cursor2.columns(rows.table_name):
           print(fld.table_name, fld.column_name)

conn.close()
Crosswind answered 5/12, 2017 at 2:38 Comment(1)
Use the output from your current code to identify the last successful print, then open the database in Access itself and check the table in Design View to determine the column name (or table name) that is causing the error. Then, edit your question to show us what it is.Spirituous
S
5

This is almost certainly caused by a known issue with cursor.columns() when using the Access ODBC driver if the table metadata includes "Descriptions" for the columns. The following GitHub issue includes a discussion of the problem and possible workarounds:

https://github.com/mkleehammer/pyodbc/issues/328

TL;DR: The Access ODBC driver returns trailing garbage bytes under those circumstances and some combinations of the bytes will not be valid UTF-16LE encoded characters. A pyodbc output converter function can be used to intercept the byte string and decode it without throwing an exception. The specific workaround is described here:

https://github.com/mkleehammer/pyodbc/issues/328#issuecomment-419655266

Spirituous answered 1/8, 2021 at 23:19 Comment(0)
C
1

Thanks for the tips and feedback. I resolved it by capturing the information in a list and then printing out results, and using a try statement to handle the error and apply the right unicode.

Your suggestions pointed me in the right direction.

Crosswind answered 6/12, 2017 at 16:31 Comment(0)
Q
0

This is my first answer, I apologize if it is very generic.

Set decoding with:

conn.setdecoding(pyodbc.SQL_WCHAR, encoding='latin-1')

Place it before conn.cursor(), also, you might be missing it:

conn_str = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    r"DBQ=path\to\your\access\database.accdb;"
    r"unicode_results=True;"
)
conn = pyodbc.connect(conn_str)
Quiet answered 29/3, 2023 at 22:36 Comment(0)
P
-1

It sounds like you have a Unicode (non-ASCII) embedded in a table name somewhere. Figuring out what table it's stopping on will confirm whether that's the case or not, but ultimately you want the Python script to just handle Unicode, which can be done with the string decode method:

for fld in cursor2.columns(rows.table_name.decode('utf-16-le')):

This should be done anywhere there is a possibility of Unicode characters cropping up (for instance, the column names as well).

Petulia answered 5/12, 2017 at 17:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.