# lm with a single X and step with several Xi-s, beta coef. quite different:

2 messages
Open this post in threaded view
|
Report Content as Inappropriate

## lm with a single X and step with several Xi-s, beta coef. quite different:

 Hi, (R version 2.15.0) I am running a pgm with 1 response (earlier standardized Y) and 44 independent vars (Xi) from the same data =a2: When I run the 'lm' function on single Xi at a time, the beta coefficient for let's say X1 is = -0.08 (se=0.03256) But when I run the same Y with 44 Xi-s with the 'step' function (because I left direction parameter empty, I assume a backward multiple reg is implemented), 12 Xia-a remain in the final model where X1 is still present, the X1 beta coefficient becomes = --0.43402 (se=0.06847) I did not expect such a drastic change (4 times smaller) in the beta coeff. from "lm" with X1 (bx1=-0.08) to "step" with final 12 Xis including X1 (bx1=--0.43402). I understand that step function is producing partial reg coeff, when all other Xi-s are held constant, but is there any good reason why X1 in a multivariate reg. can become so significant (from lm px1=0.00296 ** to step px1=2.55e-10 ***)? Some of the 44 Xi-s are correlated to each other, but I am hoping that stepwise reg will drop some of those correlated ones. The Xi-s represent variables coded numerically as 0,1,2 to apply a linear regression on them. For example the frequency of X1 is: [1] x1 Levels: x1 0 1 2 3459 985 96 output of lm(Y ~ X1): ==================  > obj1<-lm(y ~ x1, data=a2)  > summary(obj1) Call: lm(formula = y ~ x1, data = a2) Residuals: Min 1Q Median 3Q Max -3.3418 -0.7240 -0.0462 0.6577 4.2929 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.03635 0.01781 2.042 0.04124 * x1 -0.09682 0.03256 -2.973 0.00296 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.024 on 4255 degrees of freedom Multiple R-squared: 0.002074, Adjusted R-squared: 0.001839 F-statistic: 8.842 on 1 and 4255 DF, p-value: 0.002961 output from the step function on 44 Xi-s: ==================================== a2 <-na.omit(ac16g761[,3:(44+2+1)]) lm.a2<-lm(y ~ ., data=a2) lm.final <-step(lm.a2,trace=F) summary(lm.final) Call: lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12, data = a2) Residuals: Min 1Q Median 3Q Max -3.2955 -0.7210 -0.0611 0.6623 4.1064 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.01065 0.02637 0.404 0.686412 x1 -0.43402 0.06847 -6.339 2.55e-10 *** x2 -0.17109 0.11370 -1.505 0.132464 x3 0.23552 0.11552 2.039 0.041533 * x4 -0.19898 0.10133 -1.964 0.049625 * x5 0.06653 0.03796 1.752 0.079769 . x6 0.18319 0.08592 2.132 0.033070 * x7 -0.17443 0.05095 -3.424 0.000624 *** x8 0.24013 0.06516 3.685 0.000232 *** x9 0.19202 0.08009 2.398 0.016543 * x10 -0.17257 0.05576 -3.095 0.001983 ** x11 -0.23537 0.05704 -4.126 3.75e-05 *** x12 0.25992 0.06260 4.152 3.35e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.02 on 4244 degrees of freedom Multiple R-squared: 0.01353, Adjusted R-squared: 0.01074 F-statistic: 4.851 on 12 and 4244 DF, p-value: 5.466e-08 Thank you in advance, Aldi P.S. Sorry that I cannot distribute these data for a test. -- ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.