7 messages
Open this post in threaded view
|

 Dear Lister, I have a question about variable selection for regression. if the IV is not significantly related to DV in the bivariate analysis, does it make sense to include this IV into the full model with multiple IVs? Thank you so much!         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|

## Re: Question about variable selection

 That depends on whether the IV could have some significant interactions with other Ivs not considered in the bivariate analysis.  E.g., > iv <- expand.grid(-2:2, -2:2) > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) > summary(lm(y ~ iv[,1])) Call: lm(formula = y ~ iv[, 1]) Residuals:      Min       1Q   Median       3Q      Max -4.06259 -1.06048 -0.02377  1.05901  4.04315 Coefficients:             Estimate Std. Error t value Pr(>|t|)     (Intercept)  3.01908    0.41482   7.278 2.09e-07 *** iv[, 1]      0.01417    0.29332   0.048    0.962     --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.074 on 23 degrees of freedom Multiple R-Squared: 0.0001014,  Adjusted R-squared: -0.04337 F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619 > summary(lm(y ~ iv[,1] * iv[,2])) Call: lm(formula = y ~ iv[, 1] * iv[, 2]) Residuals:      Min       1Q   Median       3Q      Max -0.22390 -0.08894 -0.01279  0.13525  0.17608 Coefficients:                  Estimate Std. Error t value Pr(>|t|)     (Intercept)      3.019083   0.026330 114.665   <2e-16 *** iv[, 1]          0.014167   0.018618   0.761    0.455     iv[, 2]         -0.005486   0.018618  -0.295    0.771     iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 *** --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1316 on 21 degrees of freedom Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958 F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16 Andy From: Wensui Liu > > Dear Lister, > > I have a question about variable selection for regression. > > if the IV is not significantly related to DV in the bivariate > analysis, does > it make sense to include this IV into the full model with > multiple IVs? > > Thank you so much! > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html> > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|

## Re: Question about variable selection

 Thank you so much for your reply, Andy. But what if I am only interesed in main effects instead of interactions? On 2/18/06, Liaw, Andy <[hidden email]> wrote: > > That depends on whether the IV could have some significant interactions > with > other Ivs not considered in the bivariate analysis.  E.g., > > > iv <- expand.grid(-2:2, -2:2) > > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) > > summary(lm(y ~ iv[,1])) > > Call: > lm(formula = y ~ iv[, 1]) > > Residuals: >      Min       1Q   Median       3Q      Max > -4.06259 -1.06048 -0.02377  1.05901  4.04315 > > Coefficients: >             Estimate Std. Error t value Pr(>|t|) > (Intercept)  3.01908    0.41482   7.278 2.09e-07 *** > iv[, 1]      0.01417    0.29332   0.048    0.962 > --- > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 2.074 on 23 degrees of freedom > Multiple R-Squared: 0.0001014,  Adjusted R-squared: -0.04337 > F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619 > > > summary(lm(y ~ iv[,1] * iv[,2])) > > Call: > lm(formula = y ~ iv[, 1] * iv[, 2]) > > Residuals: >      Min       1Q   Median       3Q      Max > -0.22390 -0.08894 -0.01279  0.13525  0.17608 > > Coefficients: >                  Estimate Std. Error t value Pr(>|t|) > (Intercept)      3.019083   0.026330 114.665   <2e-16 *** > iv[, 1]          0.014167   0.018618   0.761    0.455 > iv[, 2]         -0.005486   0.018618  -0.295    0.771 > iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 *** > --- > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.1316 on 21 degrees of freedom > Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958 > F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16 > > > > > Andy > > From: Wensui Liu > > > > Dear Lister, > > > > I have a question about variable selection for regression. > > > > if the IV is not significantly related to DV in the bivariate > > analysis, does > > it make sense to include this IV into the full model with > > multiple IVs? > > > > Thank you so much! > > > >       [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html> > > > > > > > ------------------------------------------------------------------------------ > Notice:  This e-mail message, together with any attachment...{{dropped}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|

## Re: Question about variable selection

Open this post in threaded view
|