I've read a .xlsb file and parsed date columns using a code below:
dateparser = lambda x: pd.to_datetime(x)
data = pd.read_excel(r"test.xlsb", engine="pyxlsb",
parse_dates=["start_date","end_date"],
date_parser=dateparser
)
My input columns in the .xlsb file have format DD/MM/YYYY (e.g. 26/01/2008). As an output of the above-mentioned code I get, for example: 1970-01-01 00:00:00.000038840. Only the last 5 digits changes.
If I read the same file without parsing dates, the same columns are of float64 type and containing only the last 5 digits of output before (e.g. 38840.0).
I assume this is a problem associated with date encoding itself. Does anyone know how to fix this issue?
pyxlsb
operates this way. I often face use cases where I wouldn't know in advance that a column in a sheet is supposed to be a date, andpyxlsb
reading it as a float means having to depend on the column name actually including "date", or some other signifier. Other Excel engines don't have this issue. – Grekin