Merge rows based on value (pandas to excel - xlsxwriter)
Asked Answered
F

5

17

I'm trying to output a Pandas dataframe into an excel file using xlsxwriter. However I'm trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)

See below for output vs output expected:

enter image description here

(As you can see based off the image above I'm trying to merge cells under the Name column when they have the same values).

Here is what I have thus far:

#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)

#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)


for row_index in range(1,len(car_report)):
     if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name'] 
     #find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
     #from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)

Any help is greatly appreciated!

Thank you!

Fineberg answered 14/4, 2020 at 21:57 Comment(0)
R
15

Your logic is almost correct, however i approached your problem through a slightly different approach:

1) Sort the column, make sure that all the values are grouped together.

2) Reset the index (using reset_index() and maybe pass the arg drop=True).

3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.

4) Then start iterating over the rows of that list and check some conditions:

4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.

4b) With this algorithm you 'll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.

4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.

Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.

Here is a working example to achieve exactly what you want to do:

import pandas as pd

# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
    if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
        startCells.append(row)


writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})


lastRow = len(df)

for row in startCells:
    try:
        endRow = startCells[startCells.index(row)+1]-1
        if row == endRow:
            worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
    except IndexError:
        if row == lastRow:
            worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)


writer.save()

Output:

enter image description here

Relaxation answered 15/4, 2020 at 12:6 Comment(6)
thanks for the solution. How would you update the code if I've got rows in multiple columns that need to be merged?Bullard
Hello Lisa, i m not sure if i have understood correctly your requirements. Maybe raise a separate issue and ping me so i can have a look?Relaxation
Hey Dimitris, here I've posted a new one #68209003 thanks in advance ;)Bullard
I see that someone already posted an answer which is one line! It would had taken me many more lines to get this working but i am glad that someone helped you on this.Relaxation
How can I merge cells like B2,C2 like thatBushing
Hello, i would recommend you to have a look at xlsxwriter's documentation or examples on how to use the merge functionality xlsxwriter.readthedocs.io/example_merge1.htmlRelaxation
D
7

Alternate Approach: One can use the unique() function to find the index assigned to each unique value (car name in this example). Using the above test data,

import pandas as pd

# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})

for car in df['Name'].unique():
    # find indices and add one to account for header
    u=df.loc[df['Name']==car].index.values + 1

    if len(u) <2: 
        pass # do not merge cells if there is only one car name
    else:
        # merge cells using the first and last indices
        worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
Dyarchy answered 5/5, 2021 at 1:49 Comment(0)
H
1

I think this is a better answer to your problem

df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

# Use the groupby() function to group the rows by 'Name'
grouped = df.groupby('Name')

# Use the first() function to find the first row of each group
first_rows = grouped.first()

# Create a new column 'start_row' that contains the index of the first row of each group
first_rows['start_row'] = first_rows.index.map(lambda x: (df['Name'] == x).idxmax())

# Create a new column 'end_row' that contains the index of the last row of each group
first_rows['end_row'] = grouped.last().index.map(lambda x: (df['Name'] == x).idxmax())

# Create an empty list to store the merge ranges
merge_ranges = []

# Iterate over the first_rows dataframe and add the merge ranges to the list
for index, row in first_rows.iterrows():
    merge_ranges.append((row['start_row'], 0, row['end_row'], 0))

# Write the dataframe to an excel file and apply the merge ranges
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
for merge_range in merge_ranges:
    worksheet.merge_range(*merge_range, "", worksheet.get_default_format())
writer.save()
Hairball answered 16/1, 2023 at 22:52 Comment(0)
V
0

Alternate Approach : Other than xlsxwriter you can also use a pivot table.

dataframe=pd.pivot_table(df,index=[column name...])
df.to_excel(dataframe)
Valenta answered 7/12, 2022 at 12:56 Comment(1)
#76331844. Please can someone help me with this , im stuck with this from a long time , tried all possible ways but not workingKinser
D
0

Should "just work" with set_index() and to_excel()

my_index_cols = ['Name'] # this can also be a list of multiple columns
df.set_index(my_index_cols).to_excel('filename.xlsx', index=True, header=None)

see also: https://mcmap.net/q/745129/-merge-rows-based-on-value-pandas-to-excel-xlsxwriter-addition

Distend answered 15/2, 2023 at 9:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.