I am trying to impute some missing data using the mice package. The data
set I am working with contains 125 variables (190 observations),
involving both categorical and continuous data. Some of these variables
are missing up to 30% of their data.
I am running into a peculiar problem which is illustrated by the
following example showing both the original data (blue) and the imputed
As the plot shows, mice seems to favour 2--3 distinct values for each of
the ten imputations. I would imagine that it would be a bit more
distributed. I observe this behaviour for each of the imputed variables
(~80 variables), at least the ones that I looked at.
I have tried both constructing a predictor matrix (to specify
predictors) and not, allowing mice to figure out sensible defaults. I
have also tried upping the number of iterations per imputation hoping
that would help the algorithm (pmm) converge to a different solution,
but that didn't change the imputations either.
Could you please point me as to where to look to debug this behaviour? I
have been going through the recent mice manual, but is there
something in particular I should be looking at? I guess a bigger
question is, should I also be experimenting with other packages such as
Amelia and mi?