Debugging multiple imputation in mice

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Debugging multiple imputation in mice

Harish Narayanan
Hello all,

I am trying to impute some missing data using the mice package. The data
set I am working with contains 125 variables (190 observations),
involving both categorical and continuous data. Some of these variables
are missing up to 30% of their data.

I am running into a peculiar problem which is illustrated by the
following example showing both the original data (blue) and the imputed
values (red).

As the plot shows, mice seems to favour 2--3 distinct values for each of
the ten imputations. I would imagine that it would be a bit more
distributed. I observe this behaviour for each of the imputed variables
(~80 variables), at least the ones that I looked at.

I have tried both constructing a predictor matrix (to specify
predictors) and not, allowing mice to figure out sensible defaults. I
have also tried upping the number of iterations per imputation hoping
that would help the algorithm (pmm) converge to a different solution,
but that didn't change the imputations either.

Could you please point me as to where to look to debug this behaviour? I
have been going through the recent mice manual[1], but is there
something in particular I should be looking at? I guess a bigger
question is, should I also be experimenting with other packages such as
Amelia and mi?



[hidden email] mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.