Quantcast

[R] Stepwise regression

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[R] Stepwise regression

Timothy Mak
Dear all,

I am wondering why the step() procedure in R has the description 'Select a
formula-based model by AIC'.

I have been using Stata and SPSS and neither package made any reference to
AIC in its stepwise procedure, and I read from an earlier R-Help post that
step() is really the 'usual' way for doing stepwise (R Help post from Prof
Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)).

My understanding of the 'usual' way of doing say forward regression is
that variables whose p value drops below a criterion (commonly 0.05)
become candidates for being included in the model, and the one with the
lowest p among these gets chosen, and the step is repeated until all p
values not in the model are above 0.05, cf Hosmer and Lemeshow (1989)
Applied Logistic Regression. The procedure does not require examination of
the AIC.

I am not well aquainted with R enough to understand the codes used in
step(), so can somebody tell me how step() works?

Thanks very much,

Tim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [R] Stepwise regression

Marc Schwartz
On Thu, 2006-12-14 at 14:37 +0000, [hidden email] wrote:

> Dear all,
>
> I am wondering why the step() procedure in R has the description 'Select a
> formula-based model by AIC'.
>
> I have been using Stata and SPSS and neither package made any reference to
> AIC in its stepwise procedure, and I read from an earlier R-Help post that
> step() is really the 'usual' way for doing stepwise (R Help post from Prof
> Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)).
>
> My understanding of the 'usual' way of doing say forward regression is
> that variables whose p value drops below a criterion (commonly 0.05)
> become candidates for being included in the model, and the one with the
> lowest p among these gets chosen, and the step is repeated until all p
> values not in the model are above 0.05, cf Hosmer and Lemeshow (1989)
> Applied Logistic Regression. The procedure does not require examination of
> the AIC.
>
> I am not well aquainted with R enough to understand the codes used in
> step(), so can somebody tell me how step() works?
>
> Thanks very much,
>
> Tim

> library(fortunes)

> fortune("stepwise")

Frank Harrell: Here is an easy approach that will yield results only
slightly less valid than one actually using the response variable:
  x <- data.frame(x1, x2, x3, x4, ..., other potential predictors)
  x[ , sample(ncol(x))]
Andy Liaw: Hmm... Shouldn't that be something like:
  x[, sample(ncol(x), ceiling(ncol(x) * runif(1)))]
   -- Frank Harrell and Andy Liaw (about alternative strategies for
      stepwise regression and `random parsimony')
      R-help (May 2005)


But seriously, using:

  RSiteSearch("stepwise")

will provide links to prior discussions on why the use of stepwise based
model building is to be avoided.

A copy of Frank's book (more info here):

  http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RmS

will also provide insight.


HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [R] Stepwise regression

Gregory Snow
In reply to this post by Timothy Mak
You may want to look at a book that was published more recently than 17
years ago (computing has changed a lot since then).  Doing stepwise
regression using p-values is one approach (and when p-values were the
easiest (only) thing to compute, it was reasonable to use them).  But
think about how many p-values you would be computing and comparing to
0.05 in a stepwise regression, now think about how many you would have
computed if your data had come from a different sample, what is your
type I error rate?  Is the usual p-value theory even meaningful in this
situation?

There are several criteria that can be used in stepwise regression to
decide which term to add/drop, p-value (or F-statistic) is only 1,
others include AIC, BIC, Adjusted R-squared, PRESS, gut feeling, prior
knowledge, cost, ...

 Some of these have properties better than p-values, but most still
suffer from the fact that a small change in the data can result in a
very different model.

Look at the lars, lasso2, and BMA packages for some more modern
alternatives to stepwise regression.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
(801) 408-8111
 

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of
[hidden email]
Sent: Thursday, December 14, 2006 9:28 AM
To: [hidden email]
Subject: [R] Stepwise regression

Dear all,

I am wondering why the step() procedure in R has the description 'Select
a formula-based model by AIC'.

I have been using Stata and SPSS and neither package made any reference
to AIC in its stepwise procedure, and I read from an earlier R-Help post
that
step() is really the 'usual' way for doing stepwise (R Help post from
Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)).

My understanding of the 'usual' way of doing say forward regression is
that variables whose p value drops below a criterion (commonly 0.05)
become candidates for being included in the model, and the one with the
lowest p among these gets chosen, and the step is repeated until all p
values not in the model are above 0.05, cf Hosmer and Lemeshow (1989)
Applied Logistic Regression. The procedure does not require examination
of the AIC.

I am not well aquainted with R enough to understand the codes used in
step(), so can somebody tell me how step() works?

Thanks very much,

Tim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...