It is common to want to append the results of predictions to the dataset used to make the predictions, but the statsmodels predict
function returns (non-indexed) results of a potentially different length than the dataset on which predictions are based.
For example, if the test dataset, test
, contains any null entries, then
mod_fit = sm.Logit.from_formula('Y ~ A B C', train).fit()
press = mod_fit.predict(test)
will produce an array that is shorter than the length of test
, and cannot be usefully appended with
test['preds'] = preds
And since the result of predict
is not indexed, there is no way to recover the rows to which the results should be attached.
What is the idiom for associating predict
results to the rows from which they were generated? Is there, perhaps, a way to get predict
to return a dataframe that preserves the indices of its argument?
predict
even work this way? Why not return a dataframe with indices that match those of the rows from which the predictions are made? – Commentator0.6.0.dev
, and there though you do not get a dataframe back, missing values are not dropped from predict output. – Fibered0.6.0.dev
len(train) == len(preds)
, regardless of missing values intrain
? What is returned inpress
where there are missing values intrain
? – Commentator