My understanding of the algorithms are as follows:
Simple Imputer
The simple Imputer uses the non missing values in each column to estimate the missing values.
For example if you had a column like age with 10% missing values. It would find the mean age and replace all missing in the age column with that value.
It supports several different methods of imputation such as median and mode(most_common) as well as a constant value you define yourself. These last two can also be used on categorical values.
df = pd.DataFrame({'A':[np.nan,2,3],
'B':[3,5,np.nan],
'C':[0,np.nan,3],
'D':[2,6,3]})
print(df)
A B C D
0 NaN 3.0 0.0 2
1 2.0 5.0 NaN 6
2 3.0 NaN 3.0 3
imp = SimpleImputer()
imp.fit_transform(df)
array([[2.5, 3. , 0. , 2. ],
[2. , 5. , 1.5, 6. ],
[3. , 4. , 3. , 3. ]])
As you can see the imputed values are simply the mean value for each column
iterative Imputer
The Iterative Imputer can do a number of different things depending upon how you configure it. This explanation assumes the default values.
Original Data
A B C D
0 NaN 3.0 0.0 2
1 2.0 5.0 NaN 6
2 3.0 NaN 3.0 3
Firstly It does the same thing as the simple imputer e.g. simple imputes the missing values based upon the initial_strategy parameter(default = Mean).
Initial Pass
A B C D
0 2.5 3.0 0.0 2
1 2.0 5.0 1.5 6
2 3.0 4.0 3.0 3
Secondly it trains the estimator passed in (default = Bayesian_ridge) as a predictor. In our case we have columns; A,B,C,D. So the classifier would fit a model with independent variables A,B,C and dependent variable D
X = df[['A','B','C']]
y = df[['D']]
model = BaysianRidge.fit(X,y)
Then it calls the predict method of this newly fitted model for the values that are flagged as imputed and replaces them.
model.predict(df[df[D = 'imputed_mask']])
This method is repeated for all combinations of columns(the round robin described in the docs) e.g.
X = df[['B','C','D']]
y = df[['A']]
...
X = df[['A','C','D']]
y = df[['B']]
...
X = df[['A','B','D']]
y = df[['C']]
...
This round robin of training an estimator on each combination of columns makes up one pass. This process is repeated until either the stopping tolerance is met or until the iterator reaches the max number of iterations(default = 10)
so if we run for three passes it looks like this:
Original Data
A B C D
0 NaN 3.0 0.0 2
1 2.0 5.0 NaN 6
2 3.0 NaN 3.0 3
Initial (simple) Pass
A B C D
0 2.5 3.0 0.0 2
1 2.0 5.0 1.5 6
2 3.0 4.0 3.0 3
pass_1
[[3.55243135 3. 0. 2. ]
[2. 5. 7.66666393 6. ]
[3. 3.7130697 3. 3. ]]
pass_2
[[ 3.39559017 3. 0. 2. ]
[ 2. 5. 10.39409964 6. ]
[ 3. 3.57003864 3. 3. ]]
pass_3
[[ 3.34980014 3. 0. 2. ]
[ 2. 5. 11.5269743 6. ]
[ 3. 3.51894112 3. 3. ]]
Obviously it doesn't work great for such a small example because there isn't enough data to fit the estimator on so with a smaller data-set it may be best to use the simple impute method.