Using xlrd to read Excel xls file containing Chinese and/or Hindi characters
Asked Answered
D

4

6

http://scienceoss.com/read-excel-files-from-python/comment-page-1/#comment-1051

From the above link, I used this utility to read an XLS file. If the XLS file contains different language characters like Chinese or Hindi, it does not output them correctly. Is there a workaround for this?

After Googling, I found this:

import xlrd

def upload_xls(dir,file,request):
    try:
        global msg
        global row_num
        row_num = []
        header_arr = []
        global file_path
        file_path = dir
        #reader = csv.reader(open(file), delimiter='#', quotechar='"')
        book = xlrd.open_workbook('dodgy.xls',encoding='cp1252')   ##To specify UTF8-encoding
        wb.sheet_names()
        sh =  wb.sheet_by_index(0)
        valid_xl_format = 0
        invalid_xl_format = 0
     except:
        print "Error

But there is an error in the line book = open_workbook('dodgy.xls',encoding='cp1252'):

TypeError: open_workbook() got an unexpected keyword argument 'encoding'

Depopulate answered 18/8, 2010 at 11:53 Comment(3)
Can you post the rest of your code? And the exact error that line gives? It sounds like you're trying to use a function you haven't defined or imported.Goggle
You are not reading a CSV, you're reading an XLS.Kussell
encoding_override="cp1252" is extremely unlikely to be a fix for a problem with Chinese or Hindi characters -- see my answer.Euphemia
S
6

According to the xlrd module documentation, the correct parameter is: encoding_override="cp1252" and not encoding="cp1252".

From the way you are importing the xlrd module you should be calling the function as xlrd.open_workbook but in the example code you use the function directly, as if you had used from xlrd import *.

Shipmaster answered 18/8, 2010 at 12:20 Comment(1)
formally correct but very unlikely to solve the OP's real problem. See my answer.Euphemia
E
10

[dis]claimer: I'm the author of xlrd.

If the xls contains different language characters like chine or hindi.It does not output the exact wordings.Is there a work around for this..

The encoding_override argument is (as explained in the documentation) used ONLY for OLD files (produced by Excels earlier than Excel 97 (that's the year 1997)) and only then when the internally-recorded "codepage" is missing or incorrect.

Note: Old file with Chinese characters: Overriding with 'cp1252' is guaranteed to raise an exception.

Note: Old file with "Hindi" (Devanagari?) characters: very unlikely ... as far as I know there never was an officially-supported codepage for any of the ISCII scripts, and I haven't heard of any unofficial one. Any information on this topic and/or sample files would be very welcome.

Excel 97 and later versions record all text data in (effectively) UTF-16LE. The encoding_override is ignored if the file is a valid Excel-97-or-later file.

Whatever the version of Excel that produced the file, (as documented) xlrd returns unicode strings. Your problems are much more likely to be related to how you are displaying or converting those unicode strings.

For further assistance, edit your question to show examples of the actual output together with the "exact wording".

Euphemia answered 19/8, 2010 at 0:25 Comment(0)
S
6

According to the xlrd module documentation, the correct parameter is: encoding_override="cp1252" and not encoding="cp1252".

From the way you are importing the xlrd module you should be calling the function as xlrd.open_workbook but in the example code you use the function directly, as if you had used from xlrd import *.

Shipmaster answered 18/8, 2010 at 12:20 Comment(1)
formally correct but very unlikely to solve the OP's real problem. See my answer.Euphemia
C
1

There is a csv module in the standard library, which handles unicode in Python 3.1.

Warning: in Python 2.x the csv library does not handle unicode.

Cornflakes answered 18/8, 2010 at 12:1 Comment(0)
S
0

There is a similar question. The answer was the Output was causing issue, not XLRD.

Answer on how set your script to UTF-8 -> https://mcmap.net/q/37576/-changing-default-encoding-of-python

Sides answered 13/5, 2016 at 3:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.