Pandas Excel Writer using Openpyxl with existing workbook
Asked Answered
O

1

10

I have code from a while ago that I am re-using for a new task. The task is to write a new DataFrame into a new sheet, into an existing excel file. But there is one part of the code that I do not understand, but it just makes the code "work".

working:

from openpyxl import load_workbook
import pandas as pd
file = r'YOUR_PATH_TO_EXCEL_HERE'

df1 = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book # <---------------------------- piece i do not understand
df1.to_excel(writer, sheet_name='New', index=None)
writer.save()

The little line of writer.book=book has me stumped. Without that piece of code, the Excel file will delete all other sheets, except the sheet used in the sheetname= parameter in df1.to_excel.

i looked at xlsxwriter's documentation as well as openpyxl's, but cannot seem to figure out why that line gives me my expected output. Any ideas?

edit: i believe this post is where i got the original idea from.

Omora answered 20/7, 2017 at 16:20 Comment(4)
I think panda will just use the writer to write the data to a new book thus ovewriting the existing book if you don't specify that the book already has data.Anatomy
I did not know there was a way to prevent pandas from overwriting the whole workbook. This is very good to know!Discourtesy
@Discourtesy glad I could help! I thought this was neat as well. Now it's just getting to the bottom of why this works and how I can expand on it after i figure this out!Omora
In terms of understanding what is going on, it is worth noting that there is no such thing as "specifying that the book already has data". Even when you are using the genuine Excel program from Microsoft, whenever you save, it is always writing a completely, utterly brand-new file. It so happens that the brand-new file can have the exact same name as the old file, and in that case, it seems as though the file has been "updated", but really the old file was overwritten by the new.Heterogeneity
C
5

In the source code of ExcelWriter, with openpyxl, it initializes empty workbook and delete all sheets. That's why you need to add it explicitly

class _OpenpyxlWriter(ExcelWriter):
    engine = 'openpyxl'
    supported_extensions = ('.xlsx', '.xlsm')

    def __init__(self, path, engine=None, **engine_kwargs):
        # Use the openpyxl module as the Excel writer.
        from openpyxl.workbook import Workbook

        super(_OpenpyxlWriter, self).__init__(path, **engine_kwargs)

        # Create workbook object with default optimized_write=True.
        self.book = Workbook()

        # Openpyxl 1.6.1 adds a dummy sheet. We remove it.
        if self.book.worksheets:
            try:
                self.book.remove(self.book.worksheets[0])
            except AttributeError:

                # compat
                self.book.remove_sheet(self.book.worksheets[0])
Corinthians answered 16/3, 2018 at 5:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.