Upon opening an XLSX file in MS Excel, an error dialog is presented: "We found a problem with some content in filename.xlsx ..." Clicking "Yes" to attempt recovery yields the following XML error message:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error359720_09.xml</logFileName>
<summary>Errors were detected in file 'C:\Users\username\Github\Project\Data\20200420b.xlsx'</summary>
-<removedRecords>
<removedRecord>Removed Records: Formula from /xl/worksheets/sheet1.xml part</removedRecord>
</removedRecords>
</recoveryLog>
The Excel file was produced by pandas with the xlsxwriter engine. My python code is reading a CSV file as a pandas dataframe, manipulating some of the data, formatting the output with text_wrap, aligning text to the left and top of the cell, setting column-specific widths, and for some columns setting date formats. Excerpts of the code appear below.
import pandas as pd
.
# read CSV into pandas dataframe, etc.
.
xlsxfile = '20200420b.xlsx'
writer = pd.ExcelWriter(xlsxfile, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
.
# several worksheet.add_format() and worksheet.set_column() statements define the formats
.
writer.save()
Digging into the Excel file XML and doing a diff of the original and repaired Sheet1.xml files identifies the following differences:
- There are additional xmlns parameters in the worksheet tag for the repaired file.
- The repaired file includes 'x14ac:dyDescent="0.25"' in the Sheet1.xml sheetFormatPr tag and elsewhere in the XML file.
- Several equivalent but distinct column formats for adjacent columns in the original are consolidated into ranges in the repaired file.
Questions:
- Does anyone have insights into the cause of the issue and a potential resolution?
- This issue happens consistently with a given input CSV file but does not happen with all CSV files.
- Is there any way to trick pandas/xlsxwriter into using the excel_version = 2010 setting?
- Would this likely resolve the error on opening issue?
- Other suggestions?
- Thank you in advance for any thoughts, questions, or suggestions.
Details:
- Microsoft Excel for Office 365 MSO (16.0.11929.20436) 32-bit
- Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 22:20:52) [MSC v.1916 32 bit (Intel)]
- pandas v. 1.0.3
- XlsxWriter v. 1.2.8
- My python script reports xlsxwriter worksheet.excel_version = 2007.
- The XlsxWriter code (https://github.com/jmcnamara/XlsxWriter/blob/master/xlsxwriter/worksheet.py) includes a statement to include the 'x14ac:dyDescent="0.25"' value if worksheet.excel_value = 2010.
The original and repaired XML files (in part, excluding cell data) are included below.
Original:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<dimension ref="A1:S746"/>
<sheetViews>
<sheetView tabSelected="1" zoomScale="80" zoomScaleNormal="80" workbookViewId="0"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15"/>
<cols>
<col min="1" max="1" width="9.140625" style="1"/>
<col min="2" max="2" width="13.42578125" style="1" customWidth="1"/>
<col min="3" max="3" width="11.28515625" style="1" customWidth="1"/>
<col min="4" max="4" width="36.7109375" style="1" customWidth="1"/>
<col min="5" max="5" width="18.7109375" style="1" customWidth="1"/>
<col min="6" max="6" width="40.7109375" style="1" customWidth="1"/>
<col min="7" max="7" width="9.140625" style="1"/>
<col min="8" max="8" width="9.140625" style="1"/>
<col min="9" max="9" width="9.140625" style="1"/>
<col min="10" max="10" width="15.5703125" style="1" customWidth="1"/>
<col min="11" max="11" width="13.7109375" style="1" customWidth="1"/>
<col min="12" max="12" width="14.28515625" style="1" customWidth="1"/>
<col min="13" max="13" width="22.28515625" style="2" customWidth="1"/>
<col min="14" max="14" width="22.28515625" style="2" customWidth="1"/>
<col min="15" max="15" width="13" style="2" customWidth="1"/>
<col min="16" max="16" width="24.140625" style="2" customWidth="1"/>
<col min="17" max="17" width="23.5703125" style="2" customWidth="1"/>
<col min="18" max="18" width="13" style="2" customWidth="1"/>
<col min="19" max="19" width="9.140625" style="1"/>
</cols>
<sheetData>
Repaired:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="x14ac xr xr2 xr3"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
xr:uid="{00000000-0001-0000-0000-000000000000}">
<dimension ref="A1:S746"/>
<sheetViews>
<sheetView tabSelected="1" zoomScale="80" zoomScaleNormal="80" workbookViewId="0"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<cols>
<col min="1" max="1" width="9.140625" style="1"/>
<col min="2" max="2" width="13.42578125" style="1" customWidth="1"/>
<col min="3" max="3" width="11.28515625" style="1" customWidth="1"/>
<col min="4" max="4" width="36.7109375" style="1" customWidth="1"/>
<col min="5" max="5" width="18.7109375" style="1" customWidth="1"/>
<col min="6" max="6" width="40.7109375" style="1" customWidth="1"/>
<col min="7" max="9" width="9.140625" style="1"/>
<col min="10" max="10" width="15.5703125" style="1" customWidth="1"/>
<col min="11" max="11" width="13.7109375" style="1" customWidth="1"/>
<col min="12" max="12" width="14.28515625" style="1" customWidth="1"/>
<col min="13" max="14" width="22.28515625" style="2" customWidth="1"/>
<col min="15" max="15" width="13" style="2" customWidth="1"/>
<col min="16" max="16" width="24.140625" style="2" customWidth="1"/>
<col min="17" max="17" width="23.5703125" style="2" customWidth="1"/>
<col min="18" max="18" width="13" style="2" customWidth="1"/>
<col min="19" max="19" width="9.140625" style="1"/>
</cols>
<sheetData>
Extra:
For comparison, the Sheet1.xml from a comparable file that does not give an error upon opening in MS Excel is given below. It lacks the 'x14ac:dyDescent="0.25"' value.
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<worksheet xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<dimension ref="A1:AI105"/>
<sheetViews>
<sheetView workbookViewId="0" zoomScaleNormal="80" zoomScale="80" tabSelected="1"/>
</sheetViews>
<sheetFormatPr defaultRowHeight="15"/>
<cols>
<col customWidth="1" style="1" width="13.42578125" max="1" min="1"/>
<col customWidth="1" style="1" width="10.7109375" max="2" min="2"/>
<col customWidth="1" style="1" width="36.7109375" max="3" min="3"/>
<col customWidth="1" style="1" width="13.7109375" max="4" min="4"/>
<col customWidth="1" style="1" width="13.7109375" max="5" min="5"/>
<col customWidth="1" style="1" width="11.5703125" max="6" min="6"/>
<col customWidth="1" style="1" width="18.7109375" max="7" min="7"/>
<col customWidth="1" style="1" width="20.7109375" max="8" min="8"/>
<col customWidth="1" style="1" width="24.85546875" max="9" min="9"/>
<col customWidth="1" style="1" width="34.85546875" max="10" min="10"/>
<col customWidth="1" style="1" width="23.5703125" max="11" min="11"/>
<col customWidth="1" style="1" width="37.140625" max="12" min="12"/>
<col customWidth="1" style="1" width="44.42578125" max="13" min="13"/>
<col customWidth="1" style="1" width="16.140625" max="14" min="14"/>
<col customWidth="1" style="1" width="14.28515625" max="15" min="15"/>
<col customWidth="1" style="1" width="15.85546875" max="16" min="16"/>
<col customWidth="1" style="1" width="13.42578125" max="17" min="17"/>
<col customWidth="1" style="1" width="20.42578125" max="18" min="18"/>
<col customWidth="1" style="1" width="11.42578125" max="19" min="19"/>
<col style="1" width="9.140625" max="20" min="20"/>
<col customWidth="1" style="1" width="13.7109375" max="21" min="21"/>
<col customWidth="1" style="1" width="14.42578125" max="22" min="22"/>
<col customWidth="1" style="1" width="18.42578125" max="23" min="23"/>
<col customWidth="1" style="2" width="13" max="24" min="24"/>
<col customWidth="1" style="2" width="24.140625" max="25" min="25"/>
<col customWidth="1" style="2" width="22.28515625" max="26" min="26"/>
<col customWidth="1" style="2" width="24" max="27" min="27"/>
<col customWidth="1" style="2" width="23.5703125" max="28" min="28"/>
<col customWidth="1" style="1" width="18.7109375" max="29" min="29"/>
<col customWidth="1" style="1" width="18" max="30" min="30"/>
<col customWidth="1" style="1" width="19.140625" max="31" min="31"/>
<col customWidth="1" style="1" width="30.42578125" max="32" min="32"/>
<col customWidth="1" style="1" width="19.85546875" max="33" min="33"/>
<col customWidth="1" style="1" width="18.28515625" max="34" min="34"/>
<col customWidth="1" style="1" width="40.7109375" max="35" min="35"/>
</cols>
<sheetData>
removedRecord
message is about a formula. Possibly a string like'=something'
that is being interpreted, by default, as a formula. Try turning off formula conversion:pd.ExcelWriter('filename.xlsx', engine='xlsxwriter', options={'strings_to_formulas': False})
. – Cryptogramx14ac:dyDescent
. That is a completely optional parameter and wouldn't cause an issue like this (or any other issue). There could possibly by an issue with overlapping column ranges but from your XML output that doesn't look like the case. – Cryptogramxlsxwriter
. In my Conda Environment I installedxlsxwriter 1.3.7
and the error went away? The script even runs fine and the excel file is created withoutimport xlsxwriter
? However, there is a warning:C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\xlsxwriter\workbook.py:329: UserWarning: Calling close() on already closed file. warn("Calling close() on already closed file.")
– Giorgionexlsxwriter
. Also got the errorCalling close() on an already closed file
. Resolved by removingwriter.save()
from the code. – Humus