Python statsmodels OLS: how to save learned model to file
Asked Answered
B

2

25

I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here.

sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload it? My training data is huge and it takes around half a minute to learn the model. So I was wondering if any save/load capability exists in OLS model.

I tried the repr() method on the model object but it does not return any useful information.

Buhrstone answered 7/5, 2013 at 8:53 Comment(0)
H
50

The models and results instances all have a save and load method, so you don't need to use the pickle module directly.

Edit to add an example:

import statsmodels.api as sm

data = sm.datasets.longley.load_pandas()

data.exog['constant'] = 1

results = sm.OLS(data.endog, data.exog).fit()
results.save("longley_results.pickle")

# we should probably add a generic load to the main namespace
from statsmodels.regression.linear_model import OLSResults
new_results = OLSResults.load("longley_results.pickle")

# or more generally
from statsmodels.iolib.smpickle import load_pickle
new_results = load_pickle("longley_results.pickle")

Edit 2 We've now added a load method to main statsmodels API in master, so you can just do

new_results = sm.load('longley_results.pickle')
Halyard answered 13/5, 2013 at 1:57 Comment(7)
Additionally, if you use the pickled results and model only for prediction, then it is possible to strip the training data (but many methods won't work anymore) statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.save.htmlCogon
@Halyard could you give an example?Buhrstone
Sure. Edited to add an example.Halyard
jseabold: I tried the sm.load method but the interpreter complains that the module does not have 'load' attribute. Is there a new version of statsmodels that I should be using?Buhrstone
It is in master on github and will be in the next release. You need to install from source if you want to use it now.Halyard
any alternative strategy to save in a json file for example?Schoonover
You can use the json module (or pandasjson) just as you would the pickle module to dump results objects to json. We have plans to make something built-in for the next release.Halyard
L
8

I've installed the statsmodels library and found that you can save the values using the pickle module in python.

Models and results are pickleable via save/load, optionally saving the model data. [source]

As an example:

Given that you have the results saved in the variable results:

To save the file:

import pickle    
with open('learned_model.pkl','w') as f:
  pickle.dump(results,f)

To read the file:

import pickle
with open('learned_model.pkl','r') as f:
  model_results = pickle.load(f)
Lozar answered 7/5, 2013 at 11:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.