How to write to an existing excel file without overwriting data (using pandas)?
Asked Answered
E

17

169

I use pandas to write to excel file in the following fashion:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx already consists of number of different tabs. However, it does not yet contain "Main".

Pandas correctly writes to "Main" sheet, unfortunately it also deletes all other tabs.

Electrophoresis answered 26/11, 2013 at 14:5 Comment(4)
can you give an example or ExcelReader? I haven't found anything like this in the documentation.Electrophoresis
I think there is no such thing like ExcelReader in pandas. I use read_excel to read data from excel. I don't think it would save data to excel.Electrophoresis
@nrathaus there doesn't seem to be an ExcelReaderPneumonic
Note that there is some confusion in the answers about what exactly the question is asking. Some answers assume that "Main" does not yet exist, and the OP is simply adding a new sheet to an existing excel workbook. Others assume "Main" already exists, and that the OP wants to append new data to the bottom of "Main".Populous
G
189

Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()
Goldston answered 26/11, 2013 at 15:45 Comment(13)
Can you please explain what is writer.sheets for?Electrophoresis
ExcelWriter for some reason uses this variable to access the sheet. If you leave it empty it will not know that sheet Main is already there and will create a new sheet.Goldston
This solution works fine. It has one drawback though. It breaks formulas and connections within the spreadsheet. Any ideas how to change this behaviour?Electrophoresis
What exactly do you get broken..? You might ask this as a separate question and tag it with openpyxl and provide enough details: what kind of formulas do you have, how data is updated, how does it brake the formulas. Now I just can't help, too many things I don't know.Goldston
#20262948Electrophoresis
Hi, this doesn't work for xls files. do you know how to do the same with xls files?Oreste
This allows you to add content to an existing spreadsheet, but unfortunately this approach strips the spreadsheet of any graphs, or any other content that openpyxl can't natively understand. I've not found a way round this, except adding data to a separate Excel file which the graphs then reference.Bishop
what does dict((ws.title, ws) for ws in book.worksheets) mean?Executory
can it be used with .xlsm files instead?Snitch
@Lumos, it generates a dictionary that includes all the worksheet names. The writer will look in it to see if it's in that list before creating a new sheet. If blank, it will create a new sheet instead.Deprived
The 'writer.save()' at the end of this code block was causing issues for me. Changing to 'writer.close()' fixed all my problems, but I'm not sure why.Deprived
i my case it worked by adding those lines in the end, writer.save() writer.close() book.save('Masterfile.xlsx') book.close().Redoubt
writes a corrupt excel file for meMaggot
F
64

UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel() and pd.ExcelWriter() have been changed - a new if_sheet_exists parameter has been introduced, which has invalidated the function below.

Here you can find an updated version of the append_df_to_excel(), which is working for Pandas 1.3.0+.


Here is a helper function:

import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

Tested with the following versions:

  • Pandas 1.2.3
  • Openpyxl 3.0.5
Fawne answered 10/12, 2017 at 15:21 Comment(19)
This solution worked perfect for me, the other ones posted here do not work. Thanks a lot! Just one comment: when the file does not exist, I get an error "NameError: global name 'FileNotFoundError' is not defined"Merocrine
@cholo14, thank you for pointing this out! I've tested it on Python 3.x, so i missed that bug. I've fixed it in the answer...Fawne
This worked for me but is there a way to maintain the xlsx formatting (from the original xlsx file)?Villa
@2one, I don't know exactly - give it a try or ask a new SO questionFawne
is there a way to write to columns instead of only rows? Like I wanna update a sheet automatically, but not appending new rows, but columns thanks!Woorali
Did that before posting the comment. :)Rivi
@BhushanDhamale, glad my answer was helpful :)Fawne
I've tried many times to get this to work for me. I get a zipfile error when passing a simple .xlsx file to it. Each time, the file gets corrupted and I have to delete it and replace it with a copy. I'm also tyring to save to a file that was created with a macro enabled workbook (.xlsm) but saved as a macro free workbook (.xlsx). I've also tried using abspath as well as text string. I get this error: File "C:\Users\lee\Anaconda3\lib\zipfile.py", line 1325, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip fileDeviltry
As of pandas 1.2.0 the code will create a problem (it works just fine up to 1.1.5), raising BadZipFile exception, because at the time when you instantiate pd.ExcelWriter it creates empty file with size 0 bytes and overwrites the existing file. One has to specify mode='a'. see https://mcmap.net/q/145195/-export-pandas-dataframe-to-xlsx-dealing-with-the-openpyxl-issue-on-python-3-9 and https://mcmap.net/q/145194/-openpyxl-load_workbook-on-a-legit-xlsx-file-leads-to-a-zipfile-badzipfile-error/4046632Blameful
@leeprevost, check my comment aboveBlameful
@MaxU, a small note - I've been advised that with mode='a' it will raise error if the file does not exists. It was unexpected that the file is not created in this case.Blameful
@buran, thanks again) I have updated the answer correspondinglyFawne
@MaxU: Thank you, this works perfectly! How is it that a function like this is not part of the base language??Choke
@MaxU: (1/2) One follow-up question: What would be the proper way to close the writer in this case? (I assume it isn't closing the writer because I get a sharing violation when I try to save the file after calling append_df_to_excel(). ) I tried changing writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a') to with pd.ExcelWriter(filename, engine='openpyxl', mode='a') as writer:, and then indenting everything that comes after. The function still writes to Excel, but now Excel gives me an error message when I open the file:Choke
(2/2) "We found a problem with some content in 'test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes." ...Any idea why I'm getting this error?Choke
@MaxU: Actually, correction to what I just said: My attempted fix via the context manager works the first time I call the function. I can then open the file and save it with no issue. But if I then call the function on the same Excel file (and all the other arguments still the same), I get the "We found a problem..." error when I try to open the Excel file. Any suggestions would be greatly appreciated!Choke
Ok, I think I found a solution to my problem: #54094672. I had written a string containing a formula to Excel. I put options['strings_to_formulas'] = False and then passed this options dictionary to the constructor of ExcelWriter. The problem seems to be gone now!Choke
@Leonidas, congrats! Sorry i was bit busy and couldn't look into your question)Fawne
@MaxU: No worries at all! I actually have another follow-up question. If you get a chance, could you please take a look? #67894860. Thanks!Choke
N
28

With openpyxlversion 2.4.0 and pandasversion 0.19.2, the process @ski came up with gets a bit simpler:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!
Nev answered 6/3, 2017 at 23:46 Comment(5)
This doesn't work for me. If there is already a "Main" worksheet, it will create a new one called "Main1" with the new data only and leave the "Main" worksheet content unchanged.Fondafondant
@Fondafondant I think there might be confusion on this question between two different goals. This allows you to add additional sheets to an existing workbook. It is not intended to append additional data to an existing sheet. If there is a sheet naming conflict, it renames the sheet. This is a feature, not a bug.Populous
As @Fondafondant said, this only creates more sheets, with different names. The first solution, from MaxU works, and the output you will get, will be the df in the first sheet, as many times as you desired (this is, with headers multiplied as many times as well.) One simple tecnique: each iteration you append the dataframe to a list. In the end you only need to concat. If they follow the same structure will work as a charm. list_my_dfs = [df1, df2, ...] # List of your dataframes my_dfs_together = pd.concat(list_my_df ) # concat my dataframes in a single dfVentricle
@SusanaSilvaSantos, take a look at what T.C Proctor commented just before you. The OP wanted to add a Nonexistent worksheet to an existing workbook. This code does that. Appending data to an existing sheet within the workbook was not part of the scope. If that is not needed, this will suffice.Nev
there's an if_sheet_exists parameter for ExcelWriter which allows you to specify "replace" this parameter can only be used with mode 'a'Inexecution
M
25

Starting in pandas 0.24 you can simplify this with the mode keyword argument of ExcelWriter:

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 
Marcelline answered 14/1, 2019 at 20:42 Comment(4)
overwrites for me.Unbalance
@Unbalance I think there might be confusion on this question between two different goals. This allows you to add additional sheets to an existing workbook. It is not intended to append additional data to an existing sheet.Populous
mode = 'a' adds more sheets, but what if I want to overwrite data on the existing sheets?Ramiah
there's an if_sheet_exists parameter for ExcelWriter which allows you to specify "replace" this parameter can only be used with mode 'a'Inexecution
M
12

I know this is an older thread, but this is the first item you find when searching, and the above solutions don't work if you need to retain charts in a workbook that you already have created. In that case, xlwings is a better option - it allows you to write to the excel book and keeps the charts/chart data.

simple example:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()
Murex answered 24/10, 2017 at 16:40 Comment(6)
Is there a way to create file if it doesnt exist first?Toothlike
Yes, did you explore the docs? docs.xlwings.org/en/stable/api.htmlMurex
wb = xw.Book(filename) on their website says it creates a book. but it doesntToothlike
wb = xw.Book() creates a new empty book, when you pass it a path you are trying to load an existing book.Murex
i am not getting referenced tab to update(i.e. trunk and load data. any ideas?Toothlike
Note: xlwings interacts with a running instance of Excel and therefore does not run on Linux.Pneumonic
C
11

Old question, but I am guessing some people still search for this - so...

I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option. It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict. For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.

Note: future versions of pandas (0.21.0+) will change the "sheetname" parameter to "sheet_name".

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

For the example in the 2013 question:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)
Cavendish answered 22/4, 2017 at 21:51 Comment(2)
This sort of worked, however, my merged cells, cell colors, and cell widths were not preserved.Pneumonic
Yes, with this method that type of formatting will be lost because each worksheet is converted to a pandas dataframe (with none of that excel formatting), then converted from dataframes to worksheets withinin a new excel workbook (which has the same name as the original file). It appears that a new "append" method using openpyxl may be forthcoming which might preserve original file worksheet formatting? github.com/pandas-dev/pandas/pull/21251Cavendish
N
10

There is a better solution in pandas 0.24:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

before:

enter image description here

after:

enter image description here

so upgrade your pandas now:

pip install --upgrade pandas
Nodus answered 22/3, 2019 at 6:5 Comment(3)
This is a duplicate of this earlier answerPopulous
Just a heads-up for the future, this does not work with the XslxWriter option.Anchor
it does also by default not work with engine=openpyxl as it will just add a new worksheet called the only worksheet1Garry
H
3

The solution of @MaxU is not working for the updated version of python and related packages. It raises the error: "zipfile.BadZipFile: File is not a zip file"

I generated a new version of the function that works fine with the updated version of python and related packages and tested with python: 3.9 | openpyxl: 3.0.6 | pandas: 1.2.3

In addition I added more features to the helper function:

  1. Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
  2. You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
  3. Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

Here the function:

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

Example Usage:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:\DataScience\df.xlsx"   

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE 
  
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN
Hobbism answered 12/3, 2021 at 12:26 Comment(3)
This piece of code helped me a lot. Will take it to my collection. Surprisingly works even when excel file are opened. Also the function are thread-safe, tried it with 40 threads, each writes to a file a single dataframe row with 1s interval.Slowmoving
One minor change may require If you have using Pandas 1.4+ version add extra argument: if_sheet_exists='replace in writer assignment: writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a', if_sheet_exists='replace'). Otherwise your solution works perfectly. Thank you for making an effort and posting solution.Cussed
For some reason I still get this: writer.book = load_workbook(filename) AttributeError: can't set attributeMoulden
S
2

I used the answer described here

from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()
Stonehenge answered 11/3, 2021 at 16:21 Comment(0)
M
2

2023: Short answer (pandas 1.3.0+)

Only updating cells values. Keeping the formatting and other cells unchanged.

with pd.ExcelWriter('my-existing-file.xlsx', engine='openpyxl', mode="a",
                    if_sheet_exists="overlay"  # => update value only
                    ) as writer:
    df.to_excel(writer, sheet_name="my-existing-sheet", index=None, startcol=2, startrow=3)  # Start at C4 
Millsaps answered 5/10, 2023 at 9:11 Comment(0)
S
1
def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.

Sileas answered 7/2, 2017 at 15:20 Comment(0)
D
1
writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

The "keep_date_col" hope help you

Dirkdirks answered 14/11, 2017 at 19:46 Comment(0)
L
0
book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()
Lustig answered 8/11, 2018 at 0:26 Comment(1)
While this might answer the authors' question, it lacks some explaining words and/or links to documentation. Raw code snippets are not very helpful without some phrases around them. You may also find how to write a good answer very helpful. Please edit your answer.Pereyra
C
0

Solution by @MaxU worked very well. I have just one suggestion:

If truncate_sheet=True is specified than "startrow" should NOT be retained from existing sheet. I suggest:

        if startrow is None and sheet_name in writer.book.sheetnames:
            if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
                startrow = writer.book[sheet_name].max_row

Cerumen answered 28/11, 2020 at 16:55 Comment(0)
M
0

I'd reccommend using xlwings (https://docs.xlwings.org/en/stable/api.html), it is really powerful for this application... This is how I use it:

import xlwings as xw
import pandas as pd
import xlsxwriter

# function to get the active workbook
def getActiveWorkbook():
    try:
        # logic from xlwings to grab the current excel file
        activeWb = xw.books.active
    except:
        # print error message if unable to get the current workbook
        print('Unable to grab the current Workbook')
        pause()
        exitProgram()
    else:
        return activeWb

# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
    lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
    lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
    return str(lastRow), lastCol + str(lastRow)

activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])

# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
    ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')

This seems to work very well for my applications because .xlsm workbooks can be very tricky. You can execute this as a python script or turn it into and executable with pyinstaller and then run the .exe through an excel macro. You can also call VBA macros from Python using xlwings which is very useful.

Mishamishaan answered 24/8, 2022 at 21:8 Comment(7)
but does it write more data on the same sheet or create a new sheet without deleting the older sheets? i have been dealing with this for awhile now. tried pretty much everything that is supposed to work on so and its not working on my data for some reason..Maggot
@Maggot Did you try it? I use it to write to existing sheets, you can even use that getLastRow function to write to the bottom of a column of data or to the end of a row with pre-existing data. Or you can slightly modify it to create a new sheet if you want.Mishamishaan
how do you modify it to create a new sheet? i m lost in this excel stuff. never used it before with python and its a messMaggot
@Maggot So the ws = activeWb.sheets[sheetName] line establishes ws as an existing sheet. I believe if you do ws = activeWb.sheets.add(name="NameOfNewSheet", before="sheetName", after="sheetName") instead, you can still use ws.range("A1").options(index=False, header=False).value = df.fillna('') to send your dataframe to that ws. Check it out hereMishamishaan
kool thank you so much ! hopefully its running faster than pyopenxl cuz writing million rows data takes about an hr and not sure how to improve thatMaggot
@Maggot So I've never had an application to use it, but I'm pretty sure that ws.range("A1").options(index=False, header=False).value = df.fillna('') can be rewritten to something like ws.range("A1").options(index=False, header=False, chunksize=###).value = df.fillna(''). Then this is the definition of range().options(chunksize=) Use a chunksize, e.g. 10000 to prevent timeout or memory issues when reading or writing large amounts of data. Works with all formats, including DataFrames, NumPy arrays, and list of lists.Mishamishaan
thank you - will let you know when i try it!Maggot
C
-1

Method:

  • Can create file if not present
  • Append to existing excel as per sheet name
import pandas as pd
from openpyxl import load_workbook

def write_to_excel(df, file):
    try:
        book = load_workbook(file)
        writer = pd.ExcelWriter(file, engine='openpyxl') 
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
        df.to_excel(writer, **kwds)
        writer.save()
    except FileNotFoundError as e:
        df.to_excel(file, **kwds)

Usage:

df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])
Chalkstone answered 11/8, 2020 at 7:5 Comment(0)
P
-1

You can write to an existing Excel file without overwriting data using pandas by using the pandas.DataFrame.to_excel() method and specifying the mode parameter as 'a' (append mode).

Here's an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')
Podophyllin answered 1/2, 2023 at 12:45 Comment(1)
TypeError: to_excel() got an unexpected keyword argument 'mode'Assumed

© 2022 - 2024 — McMap. All rights reserved.