lm considers removed predictors when finding complete cases

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

lm considers removed predictors when finding complete cases

EDUARDO GARCIA PORTUGUES
Dear R-devel list,

I realized that removing a predictor in lm through the "-"'s operator in
formula() does not affect the complete cases that are considered. A minimal
example is:

summary(lm(Wind ~ ., data = airquality))
# 42 observations deleted due to missingness

summary(lm(Wind ~ . - Ozone, data = airquality))
# still 42 observations deleted due to missingness, even if only 7 are
# missing for the response and the rest of the predictors

summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
# 7 observations deleted due to missingness

I find this behaviour somehow striking and I was wondering whether it is
intended, or whether it would be appropriate to document it in lm's help.

Any insight on this issue is appreciated.

Best regards,
--
Eduardo García Portugués
Assistant professor
Department of Statistics
Carlos III University of Madrid

Office: 7.3.J21 (Leganés)
Phone: (+34) 91624 8836

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: lm considers removed predictors when finding complete cases

David Winsemius

> On Dec 19, 2017, at 11:12 AM, EDUARDO GARCIA PORTUGUES <[hidden email]> wrote:
>
> Dear R-devel list,
>
> I realized that removing a predictor in lm through the "-"'s operator in
> formula() does not affect the complete cases that are considered. A minimal
> example is:
>
> summary(lm(Wind ~ ., data = airquality))
> # 42 observations deleted due to missingness
>
> summary(lm(Wind ~ . - Ozone, data = airquality))
> # still 42 observations deleted due to missingness, even if only 7 are
> # missing for the response and the rest of the predictors
>
> summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
> # 7 observations deleted due to missingness
>
> I find this behaviour somehow striking and I was wondering whether it is
> intended, or whether it would be appropriate to document it in lm's help.

The behavior in the second instance seems consistent with a desire to compare models (full versus reduced) based on the same data. You expectation appears to be something else but you have not really explained your rationale for a different expectation other than to call it "striking". If by "striking" you mean hitting your head and saying "Oh course, I should have thought of that" then we would be in agreement.

--
David.

>
> Any insight on this issue is appreciated.
>
> Best regards,
> --
> Eduardo García Portugués
> Assistant professor
> Department of Statistics
> Carlos III University of Madrid
>
> Office: 7.3.J21 (Leganés)
> Phone: (+34) 91624 8836
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel