Use of statsmodels.imputation.mice

I am exploring statsmodels.imputation.mice package to use for imputing missing values. I haven't seen any example of its usage, though, outside of http://www.statsmodels.org. From what I gather, one would create an instance of mice.MICEData and use it in conjunction with mice.MICE().fit(). Example from http://www.statsmodels.org/dev/generated/statsmodels.imputation.mice.MICE.html

>>> imp = mice.MICEData(data)
>>> fml = 'y ~ x1 + x2 + x3 + x4'
>>> mice = mice.MICE(fml, sm.OLS, imp)
>>> results = mice.fit(10, 10)
>>> print(results.summary())

The imputed values in an instance of MiceData are not fixed, though. What I mean is that if

imp = mice.MICEData(data)

Every call

imp.update('x1')

(assuming data has a column 'x1') draws a new sample for the missing values using “predictive mean matching”. That's all good if I use MICEDdata with MICE.fit(). However, let's say I want to use this package to impute the value values once, and then use a predictor from another package, say from sklearn, to fit the data. I wonder, what would be a reasonable approach. I can run update several times and average the prediction for each missing value. Alternatively, I can create several data sets with different imputed values and fit each of those sets. However, if my data set is huge, it can get pretty expensive.

Recommended topics

Hot tags