You may want to look at a book that was published more recently than 17

years ago (computing has changed a lot since then). Doing stepwise

regression using p-values is one approach (and when p-values were the

easiest (only) thing to compute, it was reasonable to use them). But

think about how many p-values you would be computing and comparing to

0.05 in a stepwise regression, now think about how many you would have

computed if your data had come from a different sample, what is your

type I error rate? Is the usual p-value theory even meaningful in this

situation?

There are several criteria that can be used in stepwise regression to

decide which term to add/drop, p-value (or F-statistic) is only 1,

others include AIC, BIC, Adjusted R-squared, PRESS, gut feeling, prior

knowledge, cost, ...

Some of these have properties better than p-values, but most still

suffer from the fact that a small change in the data can result in a

very different model.

Look at the lars, lasso2, and BMA packages for some more modern

alternatives to stepwise regression.

Hope this helps,

--

Gregory (Greg) L. Snow Ph.D.

Statistical Data Center

Intermountain Healthcare

[hidden email]
(801) 408-8111

-----Original Message-----

From:

[hidden email]
[mailto:

[hidden email]] On Behalf Of

[hidden email]
Sent: Thursday, December 14, 2006 9:28 AM

To:

[hidden email]
Subject: [R] Stepwise regression

Dear all,

I am wondering why the step() procedure in R has the description 'Select

a formula-based model by AIC'.

I have been using Stata and SPSS and neither package made any reference

to AIC in its stepwise procedure, and I read from an earlier R-Help post

that

step() is really the 'usual' way for doing stepwise (R Help post from

Prof Ripley, Fri, 2 Apr 1999 05:06:03 +0100 (BST)).

My understanding of the 'usual' way of doing say forward regression is

that variables whose p value drops below a criterion (commonly 0.05)

become candidates for being included in the model, and the one with the

lowest p among these gets chosen, and the step is repeated until all p

values not in the model are above 0.05, cf Hosmer and Lemeshow (1989)

Applied Logistic Regression. The procedure does not require examination

of the AIC.

I am not well aquainted with R enough to understand the codes used in

step(), so can somebody tell me how step() works?

Thanks very much,

Tim

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.