I'm developing a script which gathers some YouTube data. The script of course creates a pandas
dataframe which is later exported to Excel. I'm experiencing two major issues which somehow seem to be related to each other.
So, Excel 365 allows users to insert an image to a cell using IMAGE()
formula (https://support.microsoft.com/en-au/office/image-function-7e112975-5e52-4f2a-b9da-1d913d51f5d5). Script extracts YouTube thumbnail link to a video and saves it to a defaultdict(list)
dictionary. Next and in parallel, the IMAGE()
formula string is created. After saving the df
to .xlsx
by a dedicated ExcelWriter
(as recommended here: https://mcmap.net/q/940002/-save-pandas-dataframes-with-formulas-to-xlsx-files) my formulas are always followed by =@
no matter which name and settings - English or local - I use. It's strange because xlsxwriter
requires English names: https://xlsxwriter.readthedocs.io/working_with_formulas.html).
Code (some parts are deleted for better readability):
if export_by_xlsxwriter:
# English formula name - recommended by xlsxwriter guide
channel_videos_data_dict["thumbnailHyperlink_en"].append(
fr'=IMAGE("{thumbnail_url}",,1)')
# local formula name
# note: in my local language formula arguments are splitted by ";" - not ","
# interestingly, using ";" makes workbook corrupted
channel_videos_data_dict["thumbnailHyperlink_locale"].append(
fr'=OBRAZ("{thumbnail_url}",,1)')
writer: pd.ExcelWriter = pd.ExcelWriter("data.xlsx", engine = "xlsxwriter")
df.to_excel(writer)
writer.save()
writer.close()
I managed to save this df
to .csv
. Formulas now work fine (written in local language!) but I lose all the implicit formatting (Excel automatically converts urls to hyperlinks etc.), encoding is crashed and some videos IDs which are followed by -
are mistakenly considered as formulas (ironically). Code:
df.to_csv("data.csv", encoding = "utf-8", sep = ";")
I thought I can at least deal with encoding issues:
df.to_csv("data.csv", encoding = "windows-1250", sep = ";")
...but I get this error:
# ironically again, this is "loudly crying face" emoji 😭
UnicodeEncodeError:
'charmap' codec can't encode character '\U0001f62d' in position 305: character maps to <undefined>
Thus, my questions are:
- How to save the
df
usingxlsxwriter
with formulas preserved and working? (get rid of@
in short) - Alternatively, how to save the
df
to.csv
with proper encoding and videos IDs starting with-
treated as text and text only?
df.to_csv("data.csv", encoding = "windows-1250", sep = ";", errors = "ignore")
. – Antiperiodic