Formula and encoding issues when saving df to Excel
Asked Answered
A

1

0

I'm developing a script which gathers some YouTube data. The script of course creates a pandas dataframe which is later exported to Excel. I'm experiencing two major issues which somehow seem to be related to each other.

So, Excel 365 allows users to insert an image to a cell using IMAGE() formula (https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5). Script extracts YouTube thumbnail link to a video and saves it to a defaultdict(list) dictionary. Next and in parallel, the IMAGE() formula string is created. After saving the df to .xlsx by a dedicated ExcelWriter (as recommended here: https://mcmap.net/q/940002/-save-pandas-dataframes-with-formulas-to-xlsx-files) my formulas are always followed by =@ no matter which name and settings - English or local - I use. It's strange because xlsxwriter requires English names: https://xlsxwriter.readthedocs.io/working_with_formulas.html).

Code (some parts are deleted for better readability):

if export_by_xlsxwriter:
    # English formula name - recommended by xlsxwriter guide
    channel_videos_data_dict["thumbnailHyperlink_en"].append(
        fr'=IMAGE("{thumbnail_url}",,1)')
    
    # local formula name
    # note: in my local language formula arguments are splitted by ";" - not ","
    # interestingly, using ";" makes workbook corrupted
    channel_videos_data_dict["thumbnailHyperlink_locale"].append(
        fr'=OBRAZ("{thumbnail_url}",,1)')

writer: pd.ExcelWriter = pd.ExcelWriter("data.xlsx", engine = "xlsxwriter")
df.to_excel(writer)
writer.save()
writer.close()

I managed to save this df to .csv. Formulas now work fine (written in local language!) but I lose all the implicit formatting (Excel automatically converts urls to hyperlinks etc.), encoding is crashed and some videos IDs which are followed by - are mistakenly considered as formulas (ironically). Code:

df.to_csv("data.csv", encoding = "utf-8", sep = ";")

I thought I can at least deal with encoding issues:

df.to_csv("data.csv", encoding = "windows-1250", sep = ";")

...but I get this error:

# ironically again, this is "loudly crying face" emoji 😭
UnicodeEncodeError: 
'charmap' codec can't encode character '\U0001f62d' in position 305: character maps to <undefined>

Thus, my questions are:

  1. How to save the df using xlsxwriter with formulas preserved and working? (get rid of @ in short)
  2. Alternatively, how to save the df to .csv with proper encoding and videos IDs starting with - treated as text and text only?
Antiperiodic answered 25/5, 2023 at 17:30 Comment(0)
A
5

The Implicit Intersection Operator @ in a formula usually means that an array formula is returning a scalar value (see the XlsxWriter docs) although it can sometimes indicate an unknown formula.

In your case the =IMAGE() function is relatively new to Excel and as such it is classified as "future function" (see the XlsxWriter docs section on Formulas added in Excel 2010 and later).

As a result you will need to prefix it with _xlfn..

import xlsxwriter

workbook = xlsxwriter.Workbook("image.xlsx")
worksheet = workbook.add_worksheet()


# Make the cell bigger for clarity.
worksheet.set_row(0, 80)
worksheet.set_column(0, 0, 14)

# Insert an image via a formula.
worksheet.write(0, 0, '=_xlfn.IMAGE("https://support.content.office.net/en-us/media/35aecc53-b3c1-4895-8a7d-554716941806.jpg")')

workbook.close()

This will fix the @ issue but the formula may still not work. I get this the first time I try to load the file:

enter image description here

However, once I allow/trust the image/url it loads as expected:

enter image description here

You may get different results depending on your security setting and/or OS.

Archenteron answered 25/5, 2023 at 18:11 Comment(1)
Thanks, it works now! Concerning second issue (local encoding) the solution is very simple: df.to_csv("data.csv", encoding = "windows-1250", sep = ";", errors = "ignore").Antiperiodic

© 2022 - 2024 — McMap. All rights reserved.