How to combine the output of amelia
Asked Answered
N

1

7

I am handling missing data using imputation. I am exploring the Amelia and rms packages for imputation. I have some queries regarding these packages.

  1. I want to combine the imputed data sets from Amelia. I did see that Amelia has a function mi.meld which combines the result from multiple imputation. But I want to combine the data set first and then train different model. I am not sure if combing the dataset and then using that data to train the model a right way. I want to do so because my testing data also has missing data. I want to handle it using imputation so that I can use it to predict the values.

    for(i in 1:impute$m) {  
      model <- rpart(Y ~X1+X2+X3+X4+X5,
                     data=impute$imputations[[i]],method="anova",control=rpart.control(cp=0.001))
      b.out <- rbind(b.out, model$coef)
      se.out <- rbind(se.out, coef(summary(model))[,2])
    }
    combined.results <- mi.meld(q = b.out, se = se.out)
    
  2. I am also using the rms package for this purpose. I wanted to confirm does the aregImpute function combines the imputed dataset?

    impute<- aregImpute(Y~X1+X2+X3+X4+X5,data= train_data,n.impute=5,nk=0) 
    

Does anyone have suggestions on how can I combine multiple imputed datasets in to one dataset?

Norsworthy answered 27/9, 2014 at 17:58 Comment(0)
H
0

You may combine all imputed data sets in the Amelia output by using the command below:

#save Amelia output:
a.out <- amelia(data, ...)

# stack up all imputed datasets while adding a new column called ImputationNumber to be able to track them:
df_imputed_all <- do.call(rbind, Map(cbind, a.out$imputations, ImputationNumber = seq_along(a.out$imputations)))

Or, you can also use write.amelia function in Amelia package to save the multiple imputed datasets in a single (or multiple) files and examine them.

The code below saves the combined imputed datasets in .dta format (Stata data format). (Change format option to csv or table if you want to use these formats.)

# save all imputed datasets in a single dta file in stacked version:
write.amelia(obj=a.out, separate = FALSE, file.stem = "ameliaimputations", format = "dta")

Using this combined imputed dataset to train your model makes sense to me. Just make sure you don't cause any data leakage issues while imputing the test data (In another words, don't use parameters obtained in training data to impute the test data.)

Hypostasize answered 26/10, 2023 at 18:5 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.