xlrd.biffh.XLRDError: Excel xlsx file; not supported [duplicate]
Asked Answered
R

2

292

I am trying to read a macro-enabled Excel worksheet using pandas.read_excel with the xlrd library. It's running fine in local, but when I try to push the same into PCF, I am getting this error:

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] df1=pd.read_excel(os.path.join(APP_PATH, os.path.join("Data", "aug_latest.xlsm")),sheet_name=None)

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] return open_workbook(filepath_or_buffer)
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] File "/home/vcap/deps/0/python/lib/python3.8/site-packages/xlrd/__init__.py", line 170, in open_workbook
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] xlrd.biffh.XLRDError: Excel xlsx file; not supported

How can I resolve this error?

Ramakrishna answered 11/12, 2020 at 15:53 Comment(1)
Does this answer your question? pandas cannot open xlsx fileHeller
H
511

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange, but still present, in the readme on the repository and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

In your case, the solution is to:

  • make sure you are on a recent version of Pandas, at least 1.0.1, and preferably the latest release. 1.2 will make his even clearer.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your Pandas code to be:
    df1 = pd.read_excel(
         os.path.join(APP_PATH, "Data", "aug_latest.xlsm"),
         engine='openpyxl',
    )
    
Heller answered 12/12, 2020 at 14:49 Comment(10)
what if you don't know the sheet name? can you pass this to pd.ExcelFile?Afflict
Chris, thanks for the xlrd update to support Python 3.9. However, this is a major change in the package with no deprecation warning, so I would suggest a more informative error message, e.g. clarifying when (date and version) xlrd dropped support for non-xls files.Delsiedelsman
@kyox - there was a notice on the repo for over a year and various announcements on the mailing list and elsewhere going back over four years.Heller
@ChristopherTurnbull specifying the sheet name is optional. If you omit it, the first sheet in the file will be opened.Senegambia
As per this pandas developer and the discussion above it link apparently xlrd no longer supports .xlsx files. One should wait for the newest pandas version 1.2.0 or put the parameter read_excel(engine='openpyxl')Lucie
I install pandas==1.1.4 and xlrd==1.2.0Kriskrischer
Installing the module pip install openpyxl and including in all my read_excel functions the openpyxl engine read_excel("my.xlsx",engine='openpyxl') saved my code and my time! Thank you so much @ChrisWithers!Gentleman
As a user who didn't actually KNOW pandas was using xlrd to open xlsx files, a deprecation warning coming from the code would have been REALLY useful... I can't read all of the mailing lists of all of the libraries that I might POSSIBLY be using, somewhere 3 layers deep in my code...Stanstance
Good answer, but the passive aggressive, condescending tone isn't helpful to the numerous less technical users of pandas. Like a grumpy TSA screener, you're assuming that every member of the public is as deeply familiar as you are with a piece of software.Cockburn
2022 update - this answer is still correct, but the latest version of Pandas supports xlsx files. I had this error with version 1.1.4 but after upgrading to 1.3.5 the error is gone. pip install --upgrade pandasRescission
D
282

The previous version, xlrd 1.2.0, may appear to work, but it could also expose you to potential security vulnerabilities. With that warning out of the way, if you still want to give it a go, type the following command:

pip install xlrd==1.2.0
Dukes answered 11/12, 2020 at 16:47 Comment(7)
This is absolutely the wrong answer. Do not use xlrd for reading xlsx files, use openpyxl.readthedocs.io/en/stable.Heller
@Dukes What do you mean by "potential security vulnerabilities"?Carboxylase
@RicS - that was from my edit. .xlsx files are zip files containing xml, both zip and xml have well published security issues that xlrd did a poor job of addressing.Heller
@ChrisWithers why this decision instead of fixing support for xlsx?Epicure
Lower version of xlrd might have some vulnerabilities but some (old) libraries require this exact version of xlrdDenunciation
I sincerely did not understand yet which security vulnerabilities you are talking about.Appositive
@Chris Withers - What if I have a file content (from requests.get download) but not an excel file? Now I must create a file instead of xlrd file_contents=content?Weksler

© 2022 - 2024 — McMap. All rights reserved.