Write pandas dataframe to Excel with xlsxwriter and include `write_rich_string` formatting
Asked Answered
A

2

10

The following is reproducible and generates the desired output.

import xlsxwriter, pandas as pd

workbook = xlsxwriter.Workbook('pandas_with_rich_strings.xlsx')
worksheet = workbook.add_worksheet()

# Set up some formats to use.
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})

df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
}) 


headRows = 1


for colNum in range(len(df.columns)):
    xlColCont = df[df.columns[colNum]].tolist()
    worksheet.write_string(0, colNum , str(df.columns[colNum]), bold)
    for rowNum in range(len(xlColCont)):
        if df.columns[colNum] == 'numCol': 
            worksheet.write_number(rowNum+headRows, colNum , xlColCont[rowNum])        
        elif df.columns[colNum] == 'richText':
            worksheet.write_rich_string(rowNum+headRows, colNum , *xlColCont[rowNum])
        else:
            worksheet.write_string(rowNum+headRows, colNum , str(xlColCont[rowNum]))


workbook.close()

However, how would I do the same thing without iterating over each column and write the entire pandas dataframe to the Excel file in one go and include the write_rich_string formatting?

The following does not work.

writer = pd.ExcelWriter('pandas_with_rich_strings.xlsx', engine='xlsxwriter')
workbook = xlsxwriter.Workbook('pandas_with_rich_strings.xlsx')
worksheet = workbook.add_worksheet('pandas_df')
df.to_excel(writer,'pandas_df')
writer.save()
Amatruda answered 14/6, 2018 at 19:49 Comment(0)
S
11

I'm not sure that my answer is much better than the way you do it, but I've cut it down to use only one for loop and make use of pandas.DataFrame.to_excel() to initially put the dataframe in excel. Please note that I then overwrite the last column using worksheet.write_rich_string().

import pandas as pd

writer = pd.ExcelWriter('pandas_with_rich_strings.xlsx', engine='xlsxwriter')
workbook  = writer.book
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})
df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
}) 
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
# you then need to overwite the richtext column with
for idx, x in df['richText'].iteritems():
    worksheet.write_rich_string(idx + 1, 2, *x)
writer.save()

With Expected Outputted .xlsx:

Expected Outputted .xlsx

Stonechat answered 14/6, 2018 at 22:1 Comment(1)
hi, i'm curious what is the meaning of '*' in worksheet.write_rich_string(idx + 1, 2, *x)?Andrews
U
2

Your code could be simplified with the use of worksheet.add_write_handler() to detect the list and call worksheet.write_rich_string() automatically from worksheet.write without having to manually check the type. You'd think

worksheet.add_write_handler(list, xlsxwriter.worksheet.Worksheet.write_rich_string)

should work but doesn't because of some issue with the varargs for that method getting mangled (the last arg is optional styling for the whole cell). So instead the following does work

worksheet.add_write_handler(list, lambda worksheet, row, col, args: worksheet._write_rich_string(row, col, *args))

Unfortunately, this approach isn't easily compatible with pd.to_excel because it has to be set on the worksheet before the data is written and because ExcelWriter serializes lists and dicts to strings before writing (there is a note in the documentation that says this is to be compatible with CSV writers). Subclassing pd.io.excel._xlsxwriter._XlsxWriter can work:

import xlsxwriter, pandas as pd

class RichExcelWriter(pd.io.excel._xlsxwriter._XlsxWriter):
    def __init__(self, *args, **kwargs):
        super(RichExcelWriter, self).__init__(*args, **kwargs)

    def _value_with_fmt(self, val):
        if type(val) == list:
            return val, None
        return super(RichExcelWriter, self)._value_with_fmt(val)

    def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0, freeze_panes=None):
        sheet_name = self._get_sheet_name(sheet_name)
        if sheet_name in self.sheets:
            wks = self.sheets[sheet_name]
        else:
            wks = self.book.add_worksheet(sheet_name)
            #add handler to the worksheet when it's created
            wks.add_write_handler(list, lambda worksheet, row, col, list, style: worksheet._write_rich_string(row, col, *list))
            self.sheets[sheet_name] = wks
        super(RichExcelWriter, self).write_cells(cells, sheet_name, startrow, startcol, freeze_panes)



writer = RichExcelWriter('pandas_with_rich_strings_class.xlsx')
workbook  = writer.book
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})
df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
})

df.to_excel(writer, sheet_name='Sheet1', index=False)

writer.save()

Alternately we can keep the use of xlsxwriter directly and the use the ExcelFormatter from pandas which also handles the header formatting and takes many of the same arguments that to_excel takes.

import xlsxwriter, pandas as pd
from pandas.io.formats.excel import ExcelFormatter

workbook = xlsxwriter.Workbook('pandas_with_rich_strings.xlsx')
worksheet = workbook.add_worksheet()

# Set up some formats to use.
bold = workbook.add_format({'bold': True})
italic = workbook.add_format({'italic': True})
red = workbook.add_format({'color': 'red'})

df = pd.DataFrame({
    'numCol': [1, 50, 327],
    'plainText': ['plain', 'text', 'column'],
    'richText': [
        ['This is ', bold, 'bold'],
        ['This is ', italic, 'italic'],
        ['This is ', red, 'red']
    ]
})

worksheet.add_write_handler(list, lambda worksheet, row, col, args: worksheet.write_rich_string(row, col, *args))
cells = ExcelFormatter(df, index=False).get_formatted_cells()

for cell in cells:
    worksheet.write(cell.row, cell.col ,cell.val)

workbook.close()

This produces the desired output and no need to loop over the data twice. In fact it uses the same generator that pandas would have used so it's just as efficient as pandas.to_excel() and the formatter class takes many of the same arguments.

Uneventful answered 11/9, 2020 at 19:22 Comment(1)
Note that as of pandas 1.5 the made write_cells() of _XlsxWriter private -> _write_cells() (see release notes, PR). After changing this also in your custom class RichExcelWriter it works like charm!Khaddar

© 2022 - 2024 — McMap. All rights reserved.