Pandas.read_excel reads date into timestamp, I want a string
Asked Answered
B

3

11

I read a large Excel file into pandas using .read_excel, and the file has date columns. When read into pandas, the dates default to a timestamp. Since the file is large, I would like to read the dates as a string.

If that is not possible, then I would at least like to export the date back to Excel in the same format as it is in the original file (e.g. "8/18/2009").

My two questions are:

  1. Can I avoid converting the Excel date into a timestamp in pandas?
  2. If not possible, how can I write back the date in the original format efficiently?
Byword answered 23/2, 2016 at 13:37 Comment(8)
"When read into pandas the date defaults to a timestamp or, at least, when I export it back to Excel." Which of the two is it?Geriatrics
According to the comments in this question, there is no way to avoid converting Excel dates into timestamps: #34157330Geriatrics
You could try this: https://mcmap.net/q/162609/-faster-way-to-read-excel-files-to-pandas-dataframeGeriatrics
The code "f.write(vbscript.encode('utf-8'))" from the third comment doesn't work in python 3. I put it in the 2to3 converter and it didn't make changes. Any suggestions?Byword
What is the error message?Geriatrics
It wasn't in binary, I changed "f = open('ExcelToCsv.vbs','w')" to "f = open('ExcelToCsv.vbs','wb')"Byword
You could try to ask the author of the answer, by adding a comment to his answer. This is outside of my area of expertise unfortunately.Geriatrics
The problem is that Excel doesn't store dates as strings, it stores them as numbers with a special format code.Raptor
B
3
  1. I am not sure how to read the date and not convert into timestamp using read_excel.
  2. Because the date is already converted into datetime while reading it into a dataframe, here is how the date can be printed in the original format - I have used 'mm/dd/yyyy'.
import pandas as pd

df = pd.read_excel(
    "file_to_read.xlsx",
    sheet_name="sheetname",
)
writer = pd.ExcelWriter(
    "file_to_write.xlsx",
    engine="xlsxwriter",
    datetime_format="mm/dd/yyyy",
)
df.to_excel(
    writer,
    index=False,
    header=True,
    sheet_name="sheetname",
)
Bev answered 28/1, 2022 at 20:14 Comment(0)
C
1

this is similar as issue here. Leave dates as strings using read_excel function from pandas in python

check the answers:

  • Using converters{'Date': str} option inside the pandas.read_excel which helps.
    pandas.read_excel(xlsx, sheet, converters={'Date': str})
  • you can try convert your timestamp back to the original format
    df['Date'][0].strftime('%Y/%m/%d')
Curtsy answered 13/12, 2016 at 11:9 Comment(0)
N
1

I had the same problem. This is what solved the issue for me:

df = pd.read_excel(excel_link, sheet_name, dtype=str)

If you don't mind converting the df or entire column to string

Nourishing answered 7/10, 2020 at 21:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.