timeseries fitted values from trend python
Asked Answered
I

1

7

I have daily stock price data from yahoo finance in a dataframe called price_data.

I would like to add a column to this which provides the fitted value from a time series trend of the Adj Close column.

Here is the structure of the data I am using:

In [41]: type(price_data)
Out[41]: pandas.core.frame.DataFrame

In [42]: list(price_data.columns.values)
Out[42]: ['Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

In [45]: type(price_data.index)
Out[45]: pandas.tseries.index.DatetimeIndex

What is the neatest way of achieving this in the Python language?


As an aside, the following achieved this in the R language

all_time_fitted <- function(data)
{
    all_time_model  <- lm(Adj.Close ~ Date, data=data)
    fitted_value    <- predict(all_time_model)

    return(fitted_value)
}

Here is some sample data:

In [3]: price_data
Out[3]: 
             Open   High    Low  Close     Volume  Adj Close  
Date                                                                     
2005-09-27  21.05  21.40  19.10  19.30     961200   19.16418
2005-09-28  19.30  20.53  19.20  20.50    5747900   20.35573
2005-09-29  20.40  20.58  20.10  20.21    1078200   20.06777
2005-09-30  20.26  21.05  20.18  21.01    3123300   20.86214
2005-10-03  20.90  21.75  20.90  21.50    1057900   21.34869
2005-10-04  21.44  22.50  21.44  22.16    1768800   22.00405
2005-10-05  22.10  22.31  21.75  22.20     904300   22.04377
Ihram answered 30/4, 2015 at 6:51 Comment(4)
Could you please add some sample of the input data ?Varitype
added sample data to questionIhram
And, might be silly, but what do you mean by a "fitted value" ?Varitype
the expected value for y (Adj Close) calculated by the model, for a given value of x (Date). here is a reference businessdictionary.com/definition/fitted-value.htmlIhram
M
7

Quick and dirty ...

# get some data
import pandas.io.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2015, 4, 30)
df=web.DataReader("F", 'yahoo', start, end)

# a bit of munging - better column name - Day as integer 
df = df.rename(columns={'Adj Close':'AdjClose'})
dayZero = df.index[0]
df['Day'] = (df.index - dayZero).days

# fit a linear regression
import statsmodels.formula.api as sm
fit = sm.ols(formula="AdjClose ~ Day", data=df).fit()
print(fit.summary())
predict = fit.predict(df)
df['fitted'] = predict

# plot
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,4))
ax.scatter(df.index, df.AdjClose)
ax.plot(df.index, df.fitted, 'r')
ax.set_ylabel('$')
fig.suptitle('Yahoo')

plt.show()

enter image description here

Manzoni answered 30/4, 2015 at 9:22 Comment(1)
thanks Mark, this has done the job. I've up-voted for now, and will mark as accepted if there is no neater solution by tomorrow. By a neater solution I mean something that uses inbuilt capabilities for doing time-series trends, as opposed to requiring the day as an integerIhram

© 2022 - 2024 — McMap. All rights reserved.