variable selection when categorical variables are available

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

variable selection when categorical variables are available

Mike Wolfgang
Dear All,

Probably it is not highly relevant question: Why do stepwise regression
functions in R (step() or stepAIC()) add/delete categorical variables as a
set? For example, I have a four-level factor variable d, so dummies are
d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3
are simultaneously added/removed. What's the concern here if operating
dummies individually? Model interpretability or anything else? (it seems
shrinkage methods can operate them one by one)

Thanks
mike

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: variable selection when categorical variables are available

Frank Harrell
Mike Wolfgang wrote:

> Dear All,
>
> Probably it is not highly relevant question: Why do stepwise regression
> functions in R (step() or stepAIC()) add/delete categorical variables as a
> set? For example, I have a four-level factor variable d, so dummies are
> d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3
> are simultaneously added/removed. What's the concern here if operating
> dummies individually? Model interpretability or anything else? (it seems
> shrinkage methods can operate them one by one)
>
> Thanks
> mike

You would be on shaky ground statistically and interpretation wise to
break up the variables.  Stepwise regression causes enough problems
(invalidating most of the statististics from the final model) without
doing that.

Shrinkage methods do not operate on them one by one; they shrink the
estimates to the mean of all 4 groups (see for example the ols function
in the Design package).

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: variable selection when categorical variables are available

Prof Brian Ripley
In reply to this post by Mike Wolfgang
On Tue, 11 Apr 2006, Mike Wolfgang wrote:

> Probably it is not highly relevant question: Why do stepwise regression
> functions in R (step() or stepAIC()) add/delete categorical variables as a
> set?

Yes, those two do.  Others (e.g. in package leaps) may not.

> For example, I have a four-level factor variable d, so dummies are
> d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3
> are simultaneously added/removed. What's the concern here if operating
> dummies individually? Model interpretability or anything else? (it seems
> shrinkage methods can operate them one by one)


--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html