Is it possible to extract data that I've written to a xlsxwriter.worksheet?
Yes. Even though XlsxWriter is write only, it stores the table values in an internal structure and only writes them to file when workbook.close()
is executed.
Every Worksheet has a table
attribute. It is a dictionary, containing entries for all populated rows (row numbers starting at 0 are the keys). These entries are again dictionaries, containing entries for all populated cells within the row (column numbers starting at 0 are the keys).
Therefore, table[row][col]
will give you the entry at the desired position (but only in case there is an entry, it will fail otherwise).
Note that these entries are still not the text, number or formula you are looking for, but named tuples, which also contain the cell format. You can type check the entries and extract the contents depending on their nature. Here are the possible outcomes of type(entry)
and the fields of the named tuples that are accessible:
xlsxwriter.worksheet.cell_string_tuple
: string, format
xlsxwriter.worksheet.cell_number_tuple
: number, format
xlsxwriter.worksheet.cell_blank_tuple
: format
xlsxwriter.worksheet.cell_boolean_tuple
: boolean, format
xlsxwriter.worksheet.cell_formula_tuple
: formula, format, value
xlsxwriter.worksheet.cell_arformula_tuple
: formula, format, value, range
For numbers, booleans, and formulae, the contents can be accessed by reading the respective field of the named tuple.
For array formulae, the contents are only present in the upper left cell of the output range, while the rest of the cells are represented by number entries with 0 value.
For strings, the situation is more complicated, since Excel's storage concept has a shared string table, while the individual cell entries only point to an index of this table. The shared string table can be accessed as the str_table.string_table
attribute of the worksheet. It is a dictionary, where the keys are strings and the values are the associated indices. In order to access the strings by index, you can generate a sorted list from the dictionary as follows:
shared_strings = sorted(worksheet.str_table.string_table, key=worksheet.str_table.string_table.get)
I expanded your example from above to include all the explained features. It now looks like this:
import xlsxwriter
output = "test.xlsx"
workbook = xlsxwriter.Workbook(output)
worksheet = workbook.add_worksheet()
worksheet.write(0, 0, 'top left')
worksheet.write(0, 1, 42)
worksheet.write(0, 2, None)
worksheet.write(2, 1, True)
worksheet.write(2, 2, '=SUM(X5:Y7)')
worksheet.write_array_formula(2,3,3,4, '{=TREND(X5:X7,Y5:Y7)}')
worksheet.write(4,0, 'more text')
worksheet.write(4,1, 'even more text')
worksheet.write(4,2, 'more text')
worksheet.write(4,3, 'more text')
for row in range(5):
row_dict = worksheet.table.get(row, None)
for col in range(5):
if row_dict != None:
col_entry = row_dict.get(col, None)
else:
col_entry = None
print(row,col,col_entry)
shared_strings = sorted(worksheet.str_table.string_table, key=worksheet.str_table.string_table.get)
print()
if type(worksheet.table[0][0]) == xlsxwriter.worksheet.cell_string_tuple:
print(shared_strings[worksheet.table[0][0].string])
# type checking omitted for the rest...
print(worksheet.table[0][1].number)
print(bool(worksheet.table[2][1].boolean))
print('='+worksheet.table[2][2].formula)
print('{='+worksheet.table[2][3].formula+'}')
workbook.close()