Use Scikit Learn to do linear regression on a time series pandas data frame

I'm trying to do a simple linear regression on a pandas data frame using scikit learn linear regressor. My data is a time series, and the pandas data frame has a datetime index:

                value
2007-01-01    0.771305
2007-02-01    0.256628
2008-01-01    0.670920
2008-02-01    0.098047

Doing something simple as

from sklearn import linear_model

lr = linear_model.LinearRegression()

lr(data.index, data['value'])

didn't work:

float() argument must be a string or a number

So I tried to create a new column with the dates to try to transform it:

data['date'] = data.index
data['date'] = pd.to_datetime(data['date'])
lr(data['date'], data['value'])

but now I get:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

So the regressor can't handle datetime. I saw a bunch of ways to convert integer data to datetime, but couldn't find a way to convert from datetime to integer, for example.

What is the proper way to do this?

PS: I'm interested in using scikit because I'm planning on doing more stuff with it later, so no statsmodels for now.

In [36]: X = (df.index - df.index[0]).days.reshape(-1, 1) In [37]: y = df['value'].values In [38]: linear_model.LinearRegression().fit(X, y) Out[38]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Recommended topics

Hot tags