python linear regression predict by date
Asked Answered
M

6

34

I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.

This is the dataframe I have:

data_df = 
date          value
2016-01-15    1555
2016-01-16    1678
2016-01-17    1789
...  

y = np.asarray(data_df['value'])
X = data_df[['date']]
X_train, X_test, y_train, y_test = train_test_split             
(X,y,train_size=.7,random_state=42)

model = LinearRegression() #create linear regression object
model.fit(X_train, y_train) #train model on train data
model.score(X_train, y_train) #check score

print (‘Coefficient: \n’, model.coef_)
print (‘Intercept: \n’, model.intercept_) 
coefs = zip(model.coef_, X.columns)
model.__dict__
print "sl = %.1f + " % model.intercept_ + \
     " + ".join("%.1f %s" % coef for coef in coefs) #linear model

I tried to convert the date unsuccessfully

data_df['conv_date'] = data_df.date.apply(lambda x: x.toordinal())

data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
Minimus answered 24/10, 2016 at 11:35 Comment(1)
might want to look into ARMA or ARIMA models for time series dataVacuum
P
43

Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:

import datetime as dt
data_df['Date'] = pd.to_datetime(data_df['Date'])
data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)
Parthenogenesis answered 24/10, 2016 at 12:5 Comment(4)
This unfortunately doesn't work - I get this error message TypeError: descriptor 'toordinal' requires a 'datetime.date' object but received a 'str'Minimus
could i do this? data_df['date'] = pd.to_datetime(data_df['date'],format='%Y-%m-%d')Minimus
Hi jeangelj, please add this line: import datetime as dt data_df['Date'] = pd.to_datetime(data_df['Date']) data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)Parthenogenesis
Please share the code snippet to convert it back to original value, this is because once I have converted date to numerical and predicted the numerical date value I want to convert it back to original format.Sanctified
B
6

convert:

1) date to dataframe index

df = df.set_index('date', append=False)

2) convert datetime object to float64 object

df = df.index.to_julian_date()

run the regression with date being the independent variable.

Bengt answered 25/10, 2016 at 20:31 Comment(0)
S
2

Liner regression works on numerical data. Datetime type is not appropriate for this case. You should remove that column after separating it to three separate columns (year, month and day).

Shipman answered 24/10, 2016 at 11:49 Comment(0)
C
1

When using

dt.datetime.toordinal

be careful that it only converts dates values and does not take into account minutes, seconds etc.. For a complete answer on generating ordinals from full datetime objects you can use something like:

df['Datetime column'].apply(lambda x: time.mktime(x.timetuple()))
Cupped answered 18/11, 2017 at 3:33 Comment(0)
P
0

It is really important to differentiate the data types that you want to use for regression/classification.

When you are using time series, that is another case but if you want to use time data as a numerical data type as your input, then you should transform your data type from datetime to float (if your data_df['conv_date] is a datetime object, if not then you should first transform it by using; data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D") )

I agree with Thomas Vetterli's answer. It is useful to be careful what kind of time data you are using.

If you are only using year and month data then dt.datetime.toordinal would be enough to use;

>>import datetime
>>data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
>>data_df['conv_date'] = data_df['conv_date'].map(datetime.datetime.toordinal)
737577

But if you want to use also the hour, minute and second information then time.mktime() suits better;

>>import time
>>data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")
>>data_df['conv_date'] = data_df['conv_date'].apply(lambda  var: time.mktime(var.timetuple()))
1591016041.0 

Also 1591016044.0 is another exemplary output from my data, it varies with changes in seconds.

Pattern answered 26/8, 2020 at 8:59 Comment(0)
T
0

I'm diving into the different options given here and I just wanted to summarize them. It takes time to write a full answer but this is what I've researched.

Examples reference

I took the same date with different data types following the requirements of each method. Maybe I'm missing other options.

t = pd.Timestamp('2021-09-03 00:00:00')    
   # Timestamp('2021-09-03 00:00:00')   pandas._libs.tslibs.timestamps.Timestamp
t2 = dtt.date(2021, 9, 3)          
   #  datetime.date(2021, 9, 3)     datetime.date

Pandas methods

pandas.to_numeric(arg, errors='raise', downcast=None)
# argscalar, list, tuple, 1-d array, or Series
Example
st3 = pd.to_numeric(df_example.index, downcast='integer')
st3[0]  
1630627200000000000

Python/Pandas with the same outcome

  • pandas.Timestamp.toordinal
  • Python-date.toordinal
    Note: The Proleptic Gregorian ordinal gives the number of days elapsed from the date 01/Jan/0001. And here ordinal is called Proleptic since the Gregorian calendar itself is followed from October 1582.Aug 23, 2021.
# I checked it out and I found a 215 days difference
hoy = dtt.date.today()   # datetime.date(2022, 8, 3)
hoy.toordinal()  # 738370   - (365 * 2022)  =  340
hoy.timetuple()  # tm_yday=215  ---> 340 - 215 = 125
Example
t2.toordinal()
    738036
pd.Timestamp.toordinal(a) 
    738036  

Python methods

Example
time.mktime(t2.timetuple())     
1630638000.0
Tews answered 3/8, 2022 at 17:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.