Modify an Excel file with Pandas, with minimal change of the layout
Asked Answered
G

4

5

I've already read Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file? but here my question is specific to the layout mentioned hereafter.

How to open an Excel file with Pandas, do some modifications, and save it back:

  • (1) without removing that there is a Filter on the first row

    enter image description here

  • (2) without modifying the "displayed column width" of the columns as displayed in Excel

  • (3) without removing the formulas which might be present on some cells

?

Here is what I tried, it's a short example (in reality I do more processing with Pandas):

import pandas as pd

df = pd.read_excel('in.xlsx')
df['AB'] = df['A'].astype(str) + ' ' + df['B'].astype(str)  # create a new column from 2 others
del df['Date']                                              # delete columns
del df['Time']
df.to_excel('out.xlsx', index=False)

With this code, the Filter of the first row is removed and the displayed column width are set to a default, which is not very handy (because we would have to manually set the correct width for all columns).

Gloam answered 1/3, 2021 at 11:55 Comment(4)
can you mention your current OS/enviroment as well?Cothurnus
@Manakin I'm using Windows + Python 3.7Gloam
This solution helps solve (3) #32773454Benignant
@Benignant can you maybe post an answer showing an example of code using the solution you mentioned (using pandas + openpyxl)? With only openpyxl, that's easy; but the difficulty here is to use pandas + openpyxl.Gloam
C
5

If you are using a machine that has Excel installed on it, then I highly recommend using the flexible xlwings API. This answers all your questions.

Let's assume I have an Excel file called demo.xlxs in the same directory as my program.

enter image description here

app.py

import xlwings as xw # pip install xlwings
import pandas as pd

wb = xw.Book('demo.xlsx')

enter image description here

This will create a initiate an xl workbook instance and open your Excel editor to allow you to invoke Python commands.

enter image description here


Let's assume we have the following dataframe that we want to use to replace the ID and Name column:

    new_name
A   John_new
B  Adams_new
C     Mo_new
D  Safia_new

wb.sheets['Sheet1']['A1:B1'].value = df

enter image description here


Finally, you can save and close.

wb.save()
wb.close()
Cothurnus answered 9/3, 2021 at 4:45 Comment(0)
E
2

I would recommend xlwings, as it interfaces with excel's COM interfaces (like built-in vba), so it is more powerful. I never tested the "preservation of filtering or formula", official doc may provide ways.

For my own use, I just build everything into python, filtering, formulas, so I don't even touch the excel sheet.

Demo:

# [step 0] boiler plate stuff
df = pd.DataFrame(
    index=pd.date_range("2020-01-01 11:11:11", periods=100, freq="min"), 
    columns=list('abc'))
df['a'] = np.random.randn(100, 1) 
df['b'] = df['a'] * 2 + 10

# [step 1] google xlwings, and pip/conda install xlwings

# [step 2] open a new excel sheet, no need to save 
#        (basically this code will indiscriminally wipe whatever sheet that is active on your desktop)

# [step 3] magic, ...and things you can do
import xlwings as xw

wb = xw.books.active
ws = wb.sheets.active
ws.range('A1').current_region.options(index=1).value = df 
# I believe this preserves existing formatting, HOWEVER, it will destory filtering

if 1:
# show casing some formatting you can do
    active_window = wb.app.api.ActiveWindow
    active_window.FreezePanes = False
    active_window.SplitColumn = 2 # const_splitcolumn
    active_window.SplitRow = 1
    active_window.FreezePanes = True


    ws.cells.api.Font.Name = 'consolas'
    ws.api.Rows(1).Orientation = 60
    ws.api.Columns(1).Font.Bold = True
    ws.api.Columns(1).Font.ColorIndex = 26
    ws.api.Rows(1).Font.Bold = True
    ws.api.Rows(1).Borders.Weight = 4
    ws.autofit('c') # 'c' means columns, autofitting columns
    ws.range(1,1).api.AutoFilter(1)
Epithelioma answered 7/3, 2021 at 19:4 Comment(0)
G
2

This is a solution for (1), (2), but not (3) from my original question. (If you have an idea for (3), a comment and/or another answer is welcome).

In this solution, we open the input Excel file two times:

  • once with openpyxl: this is useful to keep the original layout (which seems totally discarded when reading as a pandas dataframe!)
  • once as a pandas dataframe df to benefit from pandas' great API to manipulate/modify the data itself. Note: data modification is handier with pandas than with openpyxl because we have vectorization, filtering df[df['foo'] == 'bar'], direct access to the columns by name df['foo'], etc.

The following code modifies the input file and keeps the layout: the first row "Filter" is not removed and the column width of each colum is not modified.

import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')  # load as openpyxl workbook; useful to keep the original layout
                                 # which is discarded in the following dataframe
df = pd.read_excel('test.xlsx')  # load as dataframe (modifications will be easier with pandas API!)
ws = wb.active
df.iloc[1, 1] = 'hello world'    # modify a few things
rows = dataframe_to_rows(df, index=False)
for r_idx, row in enumerate(rows, 1):
    for c_idx, value in enumerate(row, 1):
        ws.cell(row=r_idx, column=c_idx, value=value)
wb.save('test2.xlsx')
Gloam answered 8/3, 2021 at 17:59 Comment(0)
D
0

I think this is not field of pandas, you must use openpyxl in order to take care of all formatting, blocked_rows, name ranges and so on. Main difference is that you cannot use vectorial computation as in pandas so you need to introduce some loop.

Decontrol answered 6/3, 2021 at 18:25 Comment(2)
That's what I noticed indeed: with pandas we totally lose the layout of the input XLSX, but we have a great API to modify the data. With openpyxl, we keep the layout but the data modification API is much less handy with no vectorization. Would be great if it's possible to mix them to have the best of both worlds...Gloam
Here is a solution using openpyxl, which solves (1), (2), but not (3). Do you have an idea of how to modify it to also have criteria (3) from my original question?Gloam

© 2022 - 2024 — McMap. All rights reserved.