I am exploring statsmodels.imputation.mice package to use for imputing missing values. I haven't seen any example of its usage, though, outside of http://www.statsmodels.org. From what I gather, one would create an instance of mice.MICEData and use it in conjunction with mice.MICE().fit(). Example from http://www.statsmodels.org/dev/generated/statsmodels.imputation.mice.MICE.html
>>> imp = mice.MICEData(data)
>>> fml = 'y ~ x1 + x2 + x3 + x4'
>>> mice = mice.MICE(fml, sm.OLS, imp)
>>> results = mice.fit(10, 10)
>>> print(results.summary())
The imputed values in an instance of MiceData are not fixed, though. What I mean is that if
imp = mice.MICEData(data)
Every call
imp.update('x1')
(assuming data has a column 'x1') draws a new sample for the missing values using “predictive mean matching”. That's all good if I use MICEDdata with MICE.fit(). However, let's say I want to use this package to impute the value values once, and then use a predictor from another package, say from sklearn, to fit the data. I wonder, what would be a reasonable approach. I can run update several times and average the prediction for each missing value. Alternatively, I can create several data sets with different imputed values and fit each of those sets. However, if my data set is huge, it can get pretty expensive.