Openpyxl can't read xlsx file, but if i save the file, it opens
Asked Answered
S

3

6

So, I tried to open an excel file with openpyxl with this line

wb_bs = openpyxl.load_workbook(filename=filepath)

And got this error:


C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py:214: UserWarning: Workbook contains no default style, apply openpyxl's default
  warn("Workbook contains no default style, apply openpyxl's default")
Traceback (most recent call last):
    wb_bs = openpyxl.load_workbook(filename=url_nova, data_only=True)
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
    reader.read()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 280, in read
    self.read_worksheets()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 228, in read_worksheets
    ws_parser.bind_all()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\worksheet\_reader.py", line 434, in bind_all
    self.bind_cells()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\worksheet\_reader.py", line 337, in bind_cells
    for idx, row in self.parser.parse():
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\worksheet\_reader.py", line 149, in parse
    obj = prop[1].from_tree(element)
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
    return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'address'
PS C:\Users\T-Gamer\Python programs\cmtrat\Cmtrat Helper> & C:/Users/T-Gamer/AppData/Local/Programs/Python/Python38-32/python.exe "c:/Users/T-Gamer/Python programs/cmtrat/Cmtrat Helper/excel_scripts/ostest.py"
C:\Users\T-Gamer\Python programs\cmtrat\Cmtrat Helper\excel_scripts\copias\diario_padrao.xlsx
C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py:214: UserWarning: Workbook contains no default style, apply openpyxl's default
  warn("Workbook contains no default style, apply openpyxl's default")
Traceback (most recent call last):
    wb_bs = openpyxl.load_workbook(filename=url_nova, data_only=True)
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
    reader.read()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 280, in read
    self.read_worksheets()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 228, in read_worksheets
    ws_parser.bind_all()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\worksheet\_reader.py", line 434, in bind_all
    self.bind_cells()
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\worksheet\_reader.py", line 337, in bind_cells
    for idx, row in self.parser.parse():
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\worksheet\_reader.py", line 149, in parse
    obj = prop[1].from_tree(element)
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "C:\Users\T-Gamer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
    return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'address'

The thing is:

  • If I create the .xlsx file, it opens
  • If I download the file from this specific source(the one I need) and try to open it straight away, it generates the error.
  • If I run the code after I open and save the .xlsx file(no changes), it works.

I suppose it has something to do with excel version conflict, but I've tried everything and nothing seems to work.

openpyxl==3.0.5 python==3.8.5

Spikes answered 29/9, 2020 at 19:15 Comment(2)
Any chance you have found a solution to this issue? Experiencing the same and manually opening and saving the file for me is not a viable solution.Boll
The only way out I can think of is creating files on an older version of Excel(older than the one AKX mentioned below).Spikes
C
1

The reason may be the security prevention of MS-Windows: Whenever you download an MS-Office file from an outer source (internet), MS-Windows inserts a flag in that file which marks the file to be opened in protected view only. That protection stays still until you enable editing and save the file with the security flag set off.

The warning text that appears when you open a newly downloaded MS-Office file:

PROTECTED VIEW
Be careful - files from the Internet can contain viruses.
Unless you need to edit, it's safer to stay in Protected View.
Corsiglia answered 5/3, 2021 at 5:42 Comment(0)
F
4

This seems to be an Openpyxl bug regarding some Excel files, reported here: https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1071

Unfortunately it looks like there's no fix, just the workaround you've found:

In the just released version of Excel (Version 1803 (Build 9126.2259 Click-to-Run)), Microsoft has modified the way hyperlinks are stored in Excel files. As a workaround, you could try opening and re-saving the file in Google Sheets or LibreOffice. I am experiencing similar issues with data validation.
Source

Feuar answered 29/9, 2020 at 19:18 Comment(2)
The truth I didn't want to read :(( Anyway, thank you for the answer.Spikes
Not a bug in openpyxl.Unleash
C
1

The reason may be the security prevention of MS-Windows: Whenever you download an MS-Office file from an outer source (internet), MS-Windows inserts a flag in that file which marks the file to be opened in protected view only. That protection stays still until you enable editing and save the file with the security flag set off.

The warning text that appears when you open a newly downloaded MS-Office file:

PROTECTED VIEW
Be careful - files from the Internet can contain viruses.
Unless you need to edit, it's safer to stay in Protected View.
Corsiglia answered 5/3, 2021 at 5:42 Comment(0)
L
0

I got the same problem recently.

And when I looked into it, I did find something, that I think I need to share, in case it helps.

The problem probably occurs due to the format incompatibility.

Let's see,

I got a A_old.xlsx just downloaded from internet.

and if I open it and save it (without doing anything else), I got A_new.xlsx


Read A_old.xlsx

When I try to read A_old.xlsx with:

sheets = pd.read_excel(xls_path, sheet_name=None, header=0, engine="openpyxl")

It did read out something like sheets' names. But if I want to read the sheet's content, I got a empty DataFrames. And also the warning:

UserWarning: Workbook contains no default style, apply openpyxl's default

Read A_new.xlsx

However, the same code works perfectly with A_new.xlsx, even without any warning.


So, there must be some difference between A_old.xlsx and A_new.xlsx

Use winrar to open the two files and compare, you'll find out why (some unimportant files are not listed):

A_old.xlsx

/_rels
/docProps
    app.xml
    core.xml
/xl
    /worksheets
        /sheet1.xml
        /sheet2.xml
        /sheet3.xml
    /sharedStrings.xml
    /styles.xml
    /workbook.xml
[Content_Types].xml

A_new.xlsx

/_rels
/docProps
    app.xml
    core.xml
    custom.xml (*)
/xl
    /theme
        /theme1.xml (*)
    /worksheets
        /sheet1.xml (*)
        /sheet2.xml (*)
        /sheet3.xml (*)
    /sharedStrings.xml (*)
    /styles.xml (*)
    /workbook.xml (*)
[Content_Types].xml (*)

I marked some of the changed files with (*).

Most important of the changes, the sheet?.xml files and sharedStrings.xml are largely altered in A_new.xlsx, these files are the essential part of how content was encoded.

Seems like Microsoft somehow changed the format of xlsx at some point but I failed to find any notice on websites.

That should be the reason why we failed to read A_old.xlsx (because the encoding/decoding algorithm is different) while success on A_new.xlsx.

(I tried to replace on the sheet?.xml and sharedStrings.xml in A_old.xlsx, and read it again with the same code, it works!)

By the way, the theme folder should explain why we got the warning:

UserWarning: Workbook contains no default style, apply openpyxl's default

Solution:

Although we find the reason now. Seems like python/pandas/openpyxl engine does not have the ability to deal with the decoding of file like A_old.xlsx.

So either we write one by ourselves which should not be complicated but it does take some time, or we just accept it. Sorry I didn't find any simple solution for that.

Litigate answered 10/3 at 14:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.