I have several CSV files with the format:
Year,Day,Hour,Min,Sec.,P1'S1
2003, 1, 0, 0,12.22, 0.541
2003, 1, 1, 0,20.69, 0.708
2003, 1, 2, 0, 4.95, 0.520
2003, 1, 3, 0,13.42, 0.539
...
(where day, is the day of the year) and I'm trying to read them using the pandas library (seems a fantastic lib so far).
There is a built-in function to read CSV in pandas, and even better, that function supposedly checks the columns for a date type. and automatically uses that as an index (which would be exactly perfect for what I'm doing).
The thing is, I cannot get it to work with date data in this format.
I tried:
data = pd.read_csv("csvFile.csv", index_col=[0, 1], , index_col=[0, 1, 2, 3, 4] parse_dates=True)
but it only gets the year correctly:
In [36]: data.index
Out[36]:
MultiIndex
[(<Timestamp: 2003-09-04 00:00:00>, 1, 0, 0, 12.22)
(<Timestamp: 2003-09-04 00:00:00>, 1, 1, 0, 20.69)
(<Timestamp: 2003-09-04 00:00:00>, 1, 2, 0, 4.95) ...,
(<Timestamp: 2003-09-04 00:00:00>, 365, 21, 0, 3.77)
(<Timestamp: 2003-09-04 00:00:00>, 365, 22, 0, 14.6)
(<Timestamp: 2003-09-04 00:00:00>, 365, 23, 0, 13.36)]
From the documentation, I see that you can specify the "date_parser" attribute in the read_csv function of pandas. But the documentation doesn't show how and I'm not being able to figure it out. Anyone with experience in the subject that can give a hand.
Cheers, Bruno