Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?
Asked Answered
U

8

31

Many spreadsheets have formulas and formatting that Python tools for reading and writing Excel files cannot faithfully reproduce. That means that any file I want to create programmatically must be something I basically create from scratch, and then other Excel files (with the aforementioned sophistication) have to refer to that file (which creates a variety of other dependency issues).

My understanding of Excel file 'tabs' is that they're actually just a collection of XML files. Well, is it possible to use pandas (or one of the underlying read/write engines such as xlsxwriter or openpyxl to modify just one of the tabs, leaving other tabs (with more wicked stuff in there) intact?

EDIT: I'll try to further articulate the problem with an example.

  • Excel Sheet test.xlsx has four tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4
  • I read Sheet3 into a DataFrame (let's call it df) using pandas.read_excel()
  • Sheet1 and Sheet2 contain formulas, graphs, and various formatting that neither openpyxl nor xlrd can successfully parse, and Sheet4 contains other data. I don't want to touch those tabs at all.
  • Sheet2 actually has some references to cells on Sheet3
  • I make some edits to df and now want to write it back to sheet3, leaving the other sheets untouched (and the references to it from other worksheets in the workbook intact)

Can I do that and, if so, how?

Uncircumcised answered 25/1, 2015 at 22:38 Comment(0)
K
11

I had a similar question regarding the interaction between excel and python (in particular, pandas), and I was referred to this question.

Thanks to some pointers by stackoverflow community, I found a package called xlwings that seems to cover a lot of the functionalities HaPsantran required.

To use the OP's example:

Working with an existing excel file, you can drop an anchor in the data block (Sheet3) you want to import to pandas by naming it in excel and do:

# opened an existing excel file

wb = Workbook(Existing_file)

# Find in the excel file a named cell and reach the boundary of the cell block (boundary defined by empty column / row) and read the cell 

df = Range(Anchor).table.value

# import pandas and manipulate the data block
df = pd.DataFrame(df) # into Pandas DataFrame
df['sum'] = df.sum(axis= 1)

# write back to Sheet3
Range(Anchor).value = df.values

tested that this implementation didn't temper existing formula in the excel file

Let me know if this solves your problem and if there's anything I can help.

Big kudos to the developer of xlwings, they made this possible.


Below is an update to my earlier answer after further question from @jamzsabb, and to reflect a changed API after xlwings updated to >= 0.9.0.

import xlwings as xw
import pandas as pd
target_df = xw.Range('A7').options(pd.DataFrame, expand='table').value # only do this if the 'A7' cell (the cell within area of interest) is in active worksheet
#otherwise do:
#sht = xw.Book(r'path to your xlxs file\name_of_file.xlsx`).sheets['name of sheet']
#target_df = sht.Range('A7').options(pd.DataFrame, expand='table').value # you can also change 'A7' to any name that you've given to a cell like 'interest_table`
Kinnard answered 18/9, 2015 at 6:33 Comment(7)
This looks very promising. I will test it when time permits and return to comment after that. Thanks for the heads up about xlwings.Uncircumcised
I've been messing with this for a while and I'm having trouble understanding it. What is the anchor object? Is that the sheet name in the workbook? When df is declared, how is it linked to the Workbook object? I've been on xlwings documentation and not seeing anything useful for editing a single sheet in an excel anywhere.....Cenesthesia
Sorry for not being clear, Anchor means the named cell in the worksheet. It should be a string like "Hero". I just thought it may be easier to name a cell and then access it in python quickly, instead of locating it via the column, row coordinates. This way your business logic get clearer as well. And the scope of my answer is more focused on editing a particular area on a specific spreadsheet.Kinnard
Thanks for that @PaulDong, looks like another issue I was having was my xlwings was version 0.7 for some reason, even when I updated with pip. Had to manually install, now everything works great. Thanks for the detailed help!Cenesthesia
@PaulDong, would you be able to please go into some more detail in your response? I am trying to understand how to use this if I have a pandas dataframe, for example, with 3 tabs (sheets) and I want to completely replace 2 of them, each to be a coming from a separate pandas dataframe. I don't really follow your code, how can I apply it to this situation?Greenheart
@mkheifetz gladly. sorry for the late reply. But it seems not to be something the original OP was asking. so to change the focus of the answer may create unnecessary distraction for this topic. in fact pandas.to_excel can achieve the functionality once you specify the sheet_name argument. if it is still unclear please ask a new question and I shall provide a working snippet.Kinnard
@Kinnard I managed how to figure out how to get the thing to work based on another stackoverflow question and answer. Do you by any chance to know what I can do for some help with my question here, nobody seems to be responding to it: #49492787Greenheart
C
6

I'm 90% confident the answer to "can pandas do this" is no. Posting a negative is tough, because there always might be something clever that I've missed, but here's a case:

Possible interface engines are xlrd/xlwt/xlutils, openpyxl, and xlsxwriter. None will work for your purposes, as xlrd/wt don't support all formulae, xlsxwriter can't modify existing xlsx files, and openpyxl loses images and charts.

Since I often need to do this, I've taken to only writing simple output to a separate file and then calling the win32api directly to copy the data between the workbooks while preserving all of my colleague's shiny figures. It's annoying, because it means I have to do it under Windows instead of *nix, but it works.

If you're working under Windows, you could do something similar. (I wonder if it makes sense to add a native insert option using this approach to help people in this situation, or if we should simply post a recipe.)


P.S.: This very problem has annoyed me enough from time to time that I've thought of learning enough of the modern Excel format to add support for this to one of the libraries.

P.P.S.: But since ignoring things you're not handling and returning them unmodified seems easy enough, the fact that no one seems to support it makes me think there are some headaches, and where Redmond's involved I'm willing to believe it. @john-machin would know the details, if he's about..

Cahoon answered 31/1, 2015 at 17:26 Comment(3)
Thanks, @DSM; this is kind of what I suspected. Do you happen to know if there's a Mac equivalent to modify Excel files the way you can with win32api? There are answers like this, but it'd be great to know if it's even worth exploring.Uncircumcised
Note that xlwings is putting a wrapper around pywin32 on Windows and appscript on Mac to get a fully cross-platform compatible solution, also working around a few of pywin32's limitations.Nitty
@DSM, how about a more simplified version of this? If I just need to modify say 2 out of 3 worksheets in excel, both of which I have as pandas dataframes. How can I do this? I looked at a ton of answers and even posted my own question to which nobody responded, I am thinking this should be easier then the original question asked here and doable?: #49414324Greenheart
S
6

I'm adding an answer that uses openpyxl. As of version 2.5, you can preserve charts in existing files (further details on the issue are available here).

For demonstration purposes, I create an xlsx file using pandas following the OPs guidelines. The tab named 'Sheet2' has formulas that reference 'Sheet3' and contains a chart.

import pandas as pd

df = pd.DataFrame({'col_a': [1,2,3],
                  'col_b': [4,5,6]})

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook=writer.book
worksheet = writer.sheets['Sheet1']
df.head(0).to_excel(writer, sheet_name='Sheet2', index=False)
workbook=writer.book
worksheet = writer.sheets['Sheet2']
for i in range(2, len(df) + 2):
    worksheet.write_formula('A%d' % (i), "=Sheet3!A%d" % (i))
    worksheet.write_formula('B%d' % (i), "=Sheet3!B%d" % (i))
chart = workbook.add_chart({'type': 'column'})

chart.add_series({'values': '=Sheet2!$A$2:$A$4'})
chart.add_series({'values': '=Sheet2!$B$2:$B$4'})

worksheet.insert_chart('A7', chart)

df.to_excel(writer, sheet_name='Sheet3', index=False)
df.to_excel(writer, sheet_name='Sheet4', index=False)

writer.save()

Expected test.xlsx after running the code above:

test.xlsx after first block of code

Then if we run the code below, using openpyxl, we can modify the data in 'Sheet3' while preserving formulas and chart in 'Sheet2' and the updated data is now in this file.

from openpyxl import load_workbook

wb = load_workbook('test.xlsx')
ws = wb['Sheet3']
ws['B2'] = 7
ws['B3'] = 8
ws['B4'] = 9
wb.save('test.xlsx')

Expected test.xlsx after running the second block of code:

test.xlsx after second block of code

Subsonic answered 9/8, 2018 at 4:47 Comment(0)
H
1

As far as I know Pandas does not do that by itself.

I wrote some small utility library pandasxltable (based on openpyxl) in order to facilitate the interaction between a excel template and pandas data-frames. The library allows you to fetch as data-frame and update Excel Data Tables (not really a tab but part of it)from dataframe.

Holophrastic answered 19/8, 2019 at 14:8 Comment(0)
S
0

if you're talking about 'sheets' as 'tabs', then it is possible to modify just one of the tabs by accessing the particular one using the parse(sheet_name) function.

an example is here: Reading an Excel file in python using pandas

to write back to excel, (while controlling the sheets) use the to_excel function, here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Scalariform answered 25/1, 2015 at 23:56 Comment(1)
If you could post a block of code that demonstrates this (such that the source file, which contains other sheets in addition to the one(s) you read in, is unaltered except for the sheets you write), that would be helpful.Uncircumcised
P
0

Required: call path to exist excels file.

Input: List string.

Output: append row.

from datetime import datetime,timedelta
from openpyxl import load_workbook,Workbook
       
   def write_log_excels(status):
       """
       Function to write log in excel  
       """
               try:
                   # Point
                   log_list = ["1","2","3","4","5","6","7","8", "9"]
                   date_n = datetime.now()
                   date_n = date_n.strftime("%Y-%m-%d %H:%M:%S")
                   sdate = date_n
   
                   wk = load_workbook('filename.xlsx')
                   wh = wk.active
                   lenth = wh.max_row
                   # wk.close()
                   pl = log_list
                   if lenth == 0:
                       # ws = Workbook()
                       # wb = ws.active
                       wh['A1'] = 'TITLE1'
                       wh['B1'] = 'TITLE2'
                       wh['C1'] = 'TITLE3'
                       wh['D1'] = 'TITLE4'
                       wh['E1'] = 'TITLE5'
                       wh['F1'] = 'TITLE6'
                       wh['G1'] = 'TITLE7'
                       wh['H1'] = 'TITLE8'
                       wh['I1'] = 'TITLE9'
                       lenth = 1
                   if pl is not None:
                       w = lenth + 1
                       wh['A{}'.format(w)] =  pl[0]
                       wh['B{}'.format(w)] =  pl[1]
                       wh['C{}'.format(w)] =  pl[2]
                       wh['D{}'.format(w)] =  pl[3]
                       wh['E{}'.format(w)] =  pl[4]
                       wh['F{}'.format(w)] =  pl[5]
                       wh['G{}'.format(w)] =  pl[3]
                       wh['H{}'.format(w)] =  pl[4]
                       wh['I{}'.format(w)] =  pl[5]
                   wk.save('filename.xlsx')
       
                   log_list.clear()
               except Exception as e:
                   print('write_log_excels :' + str(e))
       write_log_excels('')

Or using this for auto create col,row.

def work_sheet(wsheet):
    data_sheet = []
    col = [] #column in sheet
    for c in range(wsheet.max_column):
        #got alphabels with max_(len)_column found in worksheet
        col.append(string.ascii_uppercase[c])

    for r in range(2,wsheet.max_row + 1):
        data_row = []
        for c in range(len(col)):
            #got values exactly with "sheet[colum-row]"
            data = wsheet['{}{}'.format(col[c],r)].value
            data_row.append(data)
        data_sheet.append(data_row)
    return data_sheet
Polloch answered 1/12, 2020 at 4:48 Comment(0)
B
0

This is quite an old question, but I believe you can do it this way (tested with pandas 1.4.3):

df = pd.read_excel(pd.ExcelFile('file.xlsx'), sheet_name='Sheet1')
# make modifications to your dataframe
df.to_excel('file.xlsx', sheet_name=sheet_name)

This is because to_excel with sheet_name as param will write to that single sheet only, keeping the other ones intact

Baecher answered 20/7, 2022 at 8:52 Comment(0)
H
0

Since I faced the same problem in 2024..

Yes, pandas can save specific sheets only, and moreover, it can modify only specific cells, which can be quite useful in some cases where you do not want to have mess with tricky formatting, None, Nan etc.

import pandas as pd

with pd.ExcelWriter("file.xlsx", mode="a", if_sheet_exists="overlay") as writer:
    df_cells.to_excel(writer, sheet_name="my_sheet", index=False, startrow=10, startcol=11)

tested with pandas 2.2.2

Hypertension answered 21/5 at 14:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.