Error opening Excel (XLSX) files from pandas xlsxwriter
Asked Answered
W

1

8

Upon opening an XLSX file in MS Excel, an error dialog is presented: "We found a problem with some content in filename.xlsx ..." Clicking "Yes" to attempt recovery yields the following XML error message:

 <?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
 <logFileName>error359720_09.xml</logFileName>
 <summary>Errors were detected in file 'C:\Users\username\Github\Project\Data\20200420b.xlsx'</summary>
-<removedRecords>
 <removedRecord>Removed Records: Formula from /xl/worksheets/sheet1.xml part</removedRecord>
 </removedRecords>
 </recoveryLog>

The Excel file was produced by pandas with the xlsxwriter engine. My python code is reading a CSV file as a pandas dataframe, manipulating some of the data, formatting the output with text_wrap, aligning text to the left and top of the cell, setting column-specific widths, and for some columns setting date formats. Excerpts of the code appear below.

import pandas as pd
.
# read CSV into pandas dataframe, etc.
.
xlsxfile = '20200420b.xlsx'
writer = pd.ExcelWriter(xlsxfile, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
.
# several worksheet.add_format() and worksheet.set_column() statements define the formats
.
writer.save()

Digging into the Excel file XML and doing a diff of the original and repaired Sheet1.xml files identifies the following differences:

  1. There are additional xmlns parameters in the worksheet tag for the repaired file.
  2. The repaired file includes 'x14ac:dyDescent="0.25"' in the Sheet1.xml sheetFormatPr tag and elsewhere in the XML file.
  3. Several equivalent but distinct column formats for adjacent columns in the original are consolidated into ranges in the repaired file.

Questions:

- Does anyone have insights into the cause of the issue and a potential resolution?
- This issue happens consistently with a given input CSV file but does not happen with all CSV files.
- Is there any way to trick pandas/xlsxwriter into using the excel_version = 2010 setting?
- Would this likely resolve the error on opening issue?
- Other suggestions?
- Thank you in advance for any thoughts, questions, or suggestions.


Details:

- Microsoft Excel for Office 365 MSO (16.0.11929.20436) 32-bit
- Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 22:20:52) [MSC v.1916 32 bit (Intel)]
- pandas v. 1.0.3
- XlsxWriter v. 1.2.8
- My python script reports xlsxwriter worksheet.excel_version = 2007.
- The XlsxWriter code (https://github.com/jmcnamara/XlsxWriter/blob/master/xlsxwriter/worksheet.py) includes a statement to include the 'x14ac:dyDescent="0.25"' value if worksheet.excel_value = 2010.


The original and repaired XML files (in part, excluding cell data) are included below.

Original:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet 
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<dimension ref="A1:S746"/>
<sheetViews>
<sheetView tabSelected="1" zoomScale="80" zoomScaleNormal="80" workbookViewId="0"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15"/>
<cols>
<col min="1" max="1" width="9.140625" style="1"/>
<col min="2" max="2" width="13.42578125" style="1" customWidth="1"/>
<col min="3" max="3" width="11.28515625" style="1" customWidth="1"/>
<col min="4" max="4" width="36.7109375" style="1" customWidth="1"/>
<col min="5" max="5" width="18.7109375" style="1" customWidth="1"/>
<col min="6" max="6" width="40.7109375" style="1" customWidth="1"/>
<col min="7" max="7" width="9.140625" style="1"/>
<col min="8" max="8" width="9.140625" style="1"/>
<col min="9" max="9" width="9.140625" style="1"/>
<col min="10" max="10" width="15.5703125" style="1" customWidth="1"/>
<col min="11" max="11" width="13.7109375" style="1" customWidth="1"/>
<col min="12" max="12" width="14.28515625" style="1" customWidth="1"/>
<col min="13" max="13" width="22.28515625" style="2" customWidth="1"/>
<col min="14" max="14" width="22.28515625" style="2" customWidth="1"/>
<col min="15" max="15" width="13" style="2" customWidth="1"/>
<col min="16" max="16" width="24.140625" style="2" customWidth="1"/>
<col min="17" max="17" width="23.5703125" style="2" customWidth="1"/>
<col min="18" max="18" width="13" style="2" customWidth="1"/>
<col min="19" max="19" width="9.140625" style="1"/>
</cols>
<sheetData>

Repaired:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
mc:Ignorable="x14ac xr xr2 xr3" 
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" 
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" 
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" 
xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" 
xr:uid="{00000000-0001-0000-0000-000000000000}">
<dimension ref="A1:S746"/>
<sheetViews>
<sheetView tabSelected="1" zoomScale="80" zoomScaleNormal="80" workbookViewId="0"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<cols>
<col min="1" max="1" width="9.140625" style="1"/>
<col min="2" max="2" width="13.42578125" style="1" customWidth="1"/>
<col min="3" max="3" width="11.28515625" style="1" customWidth="1"/>
<col min="4" max="4" width="36.7109375" style="1" customWidth="1"/>
<col min="5" max="5" width="18.7109375" style="1" customWidth="1"/>
<col min="6" max="6" width="40.7109375" style="1" customWidth="1"/>
<col min="7" max="9" width="9.140625" style="1"/>
<col min="10" max="10" width="15.5703125" style="1" customWidth="1"/>
<col min="11" max="11" width="13.7109375" style="1" customWidth="1"/>
<col min="12" max="12" width="14.28515625" style="1" customWidth="1"/>
<col min="13" max="14" width="22.28515625" style="2" customWidth="1"/>
<col min="15" max="15" width="13" style="2" customWidth="1"/>
<col min="16" max="16" width="24.140625" style="2" customWidth="1"/>
<col min="17" max="17" width="23.5703125" style="2" customWidth="1"/>
<col min="18" max="18" width="13" style="2" customWidth="1"/>
<col min="19" max="19" width="9.140625" style="1"/>
</cols>
<sheetData>

Extra:

For comparison, the Sheet1.xml from a comparable file that does not give an error upon opening in MS Excel is given below. It lacks the 'x14ac:dyDescent="0.25"' value.

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<worksheet xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<dimension ref="A1:AI105"/>
<sheetViews>
<sheetView workbookViewId="0" zoomScaleNormal="80" zoomScale="80" tabSelected="1"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15"/>
<cols>
<col customWidth="1" style="1" width="13.42578125" max="1" min="1"/>
<col customWidth="1" style="1" width="10.7109375" max="2" min="2"/>
<col customWidth="1" style="1" width="36.7109375" max="3" min="3"/>
<col customWidth="1" style="1" width="13.7109375" max="4" min="4"/>
<col customWidth="1" style="1" width="13.7109375" max="5" min="5"/>
<col customWidth="1" style="1" width="11.5703125" max="6" min="6"/>
<col customWidth="1" style="1" width="18.7109375" max="7" min="7"/>
<col customWidth="1" style="1" width="20.7109375" max="8" min="8"/>
<col customWidth="1" style="1" width="24.85546875" max="9" min="9"/>
<col customWidth="1" style="1" width="34.85546875" max="10" min="10"/>
<col customWidth="1" style="1" width="23.5703125" max="11" min="11"/>
<col customWidth="1" style="1" width="37.140625" max="12" min="12"/>
<col customWidth="1" style="1" width="44.42578125" max="13" min="13"/>
<col customWidth="1" style="1" width="16.140625" max="14" min="14"/>
<col customWidth="1" style="1" width="14.28515625" max="15" min="15"/>
<col customWidth="1" style="1" width="15.85546875" max="16" min="16"/>
<col customWidth="1" style="1" width="13.42578125" max="17" min="17"/>
<col customWidth="1" style="1" width="20.42578125" max="18" min="18"/>
<col customWidth="1" style="1" width="11.42578125" max="19" min="19"/>
<col style="1" width="9.140625" max="20" min="20"/>
<col customWidth="1" style="1" width="13.7109375" max="21" min="21"/>
<col customWidth="1" style="1" width="14.42578125" max="22" min="22"/>
<col customWidth="1" style="1" width="18.42578125" max="23" min="23"/>
<col customWidth="1" style="2" width="13" max="24" min="24"/>
<col customWidth="1" style="2" width="24.140625" max="25" min="25"/>
<col customWidth="1" style="2" width="22.28515625" max="26" min="26"/>
<col customWidth="1" style="2" width="24" max="27" min="27"/>
<col customWidth="1" style="2" width="23.5703125" max="28" min="28"/>
<col customWidth="1" style="1" width="18.7109375" max="29" min="29"/>
<col customWidth="1" style="1" width="18" max="30" min="30"/>
<col customWidth="1" style="1" width="19.140625" max="31" min="31"/>
<col customWidth="1" style="1" width="30.42578125" max="32" min="32"/>
<col customWidth="1" style="1" width="19.85546875" max="33" min="33"/>
<col customWidth="1" style="1" width="18.28515625" max="34" min="34"/>
<col customWidth="1" style="1" width="40.7109375" max="35" min="35"/>
</cols>
<sheetData>
Willpower answered 21/4, 2020 at 22:54 Comment(5)
Good analysis. I would say the issue is with a unintended formula in the file, since the Excel removedRecord message is about a formula. Possibly a string like '=something' that is being interpreted, by default, as a formula. Try turning off formula conversion: pd.ExcelWriter('filename.xlsx', engine='xlsxwriter', options={'strings_to_formulas': False}).Cryptogram
BTW, the issue isn't related to x14ac:dyDescent. That is a completely optional parameter and wouldn't cause an issue like this (or any other issue). There could possibly by an issue with overlapping column ranges but from your XML output that doesn't look like the case.Cryptogram
Thank you for the feedback and the suggestion. Adding the options={'strings_to_formulas':False} to the statement immediately fixed the issue with no other changes to the code. BTW, worksheet.excel_version now reports that it is 2010 (vs 2007). It had been my intent to have an Excel file that was entirely free of formulas, intended or unintended. This is good information. Thank you for the prompt resolution.Willpower
I was getting the same error when opening my excel file. I had Pandas installed, but not xlsxwriter. In my Conda Environment I installed xlsxwriter 1.3.7 and the error went away? The script even runs fine and the excel file is created without import xlsxwriter? However, there is a warning: C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\xlsxwriter\workbook.py:329: UserWarning: Calling close() on already closed file. warn("Calling close() on already closed file.")Giorgione
Solved for me by installing xlsxwriter. Also got the error Calling close() on an already closed file. Resolved by removing writer.save() from the code.Humus
W
11

As suggested by @jmcnamara, the author of XlsxWriter, the issue was identified to be an unintended formula. By turning off the default conversion of strings to formulas with the strings_to_formulas option, the issue was resolved.

pd.ExcelWriter('filename.xlsx', engine='xlsxwriter', options={'strings_to_formulas': False})

Note (if using the code above gives you a future warning):

FutureWarning: Use of **kwargs is deprecated, use engine_kwargs instead.

pd.ExcelWriter('filename.xlsx', engine='xlsxwriter', engine_kwargs={'options':{'strings_to_formulas': False}})

Thanks to Suprateem Bannerjee for identifying this solution here.

Willpower answered 22/4, 2020 at 12:0 Comment(3)
Hi @BalooRM, I've seen the same error by using the engine of openpyxl, and that strings_to_formulas solution didn't work for that engine. Do you know if there is any way we can resolve the same error by using openpyxl? Thanks!Biotope
@XavierSun, I have not experienced that issue with openpyxl.Willpower
Thanks @BalooRM. Hopefully someone will have the solution.Biotope

© 2022 - 2024 — McMap. All rights reserved.