Question about variable selection

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about variable selection

Wensui Liu
Dear Lister,

I have a question about variable selection for regression.

if the IV is not significantly related to DV in the bivariate analysis, does
it make sense to include this IV into the full model with multiple IVs?

Thank you so much!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Question about variable selection

Liaw, Andy
That depends on whether the IV could have some significant interactions with
other Ivs not considered in the bivariate analysis.  E.g.,

> iv <- expand.grid(-2:2, -2:2)
> y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1)
> summary(lm(y ~ iv[,1]))

Call:
lm(formula = y ~ iv[, 1])

Residuals:
     Min       1Q   Median       3Q      Max
-4.06259 -1.06048 -0.02377  1.05901  4.04315

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.01908    0.41482   7.278 2.09e-07 ***
iv[, 1]      0.01417    0.29332   0.048    0.962    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.074 on 23 degrees of freedom
Multiple R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619

> summary(lm(y ~ iv[,1] * iv[,2]))

Call:
lm(formula = y ~ iv[, 1] * iv[, 2])

Residuals:
     Min       1Q   Median       3Q      Max
-0.22390 -0.08894 -0.01279  0.13525  0.17608

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      3.019083   0.026330 114.665   <2e-16 ***
iv[, 1]          0.014167   0.018618   0.761    0.455    
iv[, 2]         -0.005486   0.018618  -0.295    0.771    
iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1316 on 21 degrees of freedom
Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958
F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16




Andy

From: Wensui Liu

>
> Dear Lister,
>
> I have a question about variable selection for regression.
>
> if the IV is not significantly related to DV in the bivariate
> analysis, does
> it make sense to include this IV into the full model with
> multiple IVs?
>
> Thank you so much!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Question about variable selection

Wensui Liu
Thank you so much for your reply, Andy.

But what if I am only interesed in main effects instead of interactions?



On 2/18/06, Liaw, Andy <[hidden email]> wrote:

>
> That depends on whether the IV could have some significant interactions
> with
> other Ivs not considered in the bivariate analysis.  E.g.,
>
> > iv <- expand.grid(-2:2, -2:2)
> > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1)
> > summary(lm(y ~ iv[,1]))
>
> Call:
> lm(formula = y ~ iv[, 1])
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -4.06259 -1.06048 -0.02377  1.05901  4.04315
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  3.01908    0.41482   7.278 2.09e-07 ***
> iv[, 1]      0.01417    0.29332   0.048    0.962
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 2.074 on 23 degrees of freedom
> Multiple R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
> F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619
>
> > summary(lm(y ~ iv[,1] * iv[,2]))
>
> Call:
> lm(formula = y ~ iv[, 1] * iv[, 2])
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -0.22390 -0.08894 -0.01279  0.13525  0.17608
>
> Coefficients:
>                  Estimate Std. Error t value Pr(>|t|)
> (Intercept)      3.019083   0.026330 114.665   <2e-16 ***
> iv[, 1]          0.014167   0.018618   0.761    0.455
> iv[, 2]         -0.005486   0.018618  -0.295    0.771
> iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.1316 on 21 degrees of freedom
> Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958
> F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16
>
>
>
>
> Andy
>
> From: Wensui Liu
> >
> > Dear Lister,
> >
> > I have a question about variable selection for regression.
> >
> > if the IV is not significantly related to DV in the bivariate
> > analysis, does
> > it make sense to include this IV into the full model with
> > multiple IVs?
> >
> > Thank you so much!
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
>
>
>
> ------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any attachment...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Question about variable selection

Fox, John
Dear Wensui and Andy,

When the explanatory variables are correlated it's perfectly possible for
the marginal relationship between and X and Y to be zero and a partial
relationship nonzero (even in the absence of interactions) -- this is simply
a reflection of the more general point that partial and marginal
relationships can differ.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Wensui Liu
> Sent: Saturday, February 18, 2006 2:03 PM
> To: Liaw, Andy
> Cc: [hidden email]
> Subject: Re: [R] Question about variable selection
>
> Thank you so much for your reply, Andy.
>
> But what if I am only interesed in main effects instead of
> interactions?
>
>
>
> On 2/18/06, Liaw, Andy <[hidden email]> wrote:
> >
> > That depends on whether the IV could have some significant
> > interactions with other Ivs not considered in the bivariate
> analysis.  
> > E.g.,
> >
> > > iv <- expand.grid(-2:2, -2:2)
> > > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) summary(lm(y ~
> > > iv[,1]))
> >
> > Call:
> > lm(formula = y ~ iv[, 1])
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -4.06259 -1.06048 -0.02377  1.05901  4.04315
> >
> > Coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  3.01908    0.41482   7.278 2.09e-07 ***
> > iv[, 1]      0.01417    0.29332   0.048    0.962
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Residual standard error: 2.074 on 23 degrees of freedom Multiple
> > R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
> > F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619
> >
> > > summary(lm(y ~ iv[,1] * iv[,2]))
> >
> > Call:
> > lm(formula = y ~ iv[, 1] * iv[, 2])
> >
> > Residuals:
> >      Min       1Q   Median       3Q      Max
> > -0.22390 -0.08894 -0.01279  0.13525  0.17608
> >
> > Coefficients:
> >                  Estimate Std. Error t value Pr(>|t|)
> > (Intercept)      3.019083   0.026330 114.665   <2e-16 ***
> > iv[, 1]          0.014167   0.018618   0.761    0.455
> > iv[, 2]         -0.005486   0.018618  -0.295    0.771
> > iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Residual standard error: 0.1316 on 21 degrees of freedom
> > Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958
> > F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16
> >
> >
> >
> >
> > Andy
> >
> > From: Wensui Liu
> > >
> > > Dear Lister,
> > >
> > > I have a question about variable selection for regression.
> > >
> > > if the IV is not significantly related to DV in the bivariate
> > > analysis, does it make sense to include this IV into the
> full model
> > > with multiple IVs?
> > >
> > > Thank you so much!
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> >
> >
> >
> >
> ----------------------------------------------------------------------
> > --------
> > Notice:  This e-mail message, together with any
> > attachment...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Question about variable selection

Wensui Liu
Dear John,

I fully understand your point that a IV might not be significantly
correlated with DV in bivariate situation but might be significantly
correlated with DV with the presense of other IVs. But does this significant
partial relationship reflect the true relation between IV and DV and really
help to predict DV?

>From here, let's go one step further. If I do multiple resampling from
original dataset, build bivariate LM between IV and DV with different
samples, and still can't get significant result, do you think I should give
a chance to this IV by looking at its partial relationship with DV?

Thank you so much!

On 2/18/06, John Fox <[hidden email]> wrote:

>
> Dear Wensui and Andy,
>
> When the explanatory variables are correlated it's perfectly possible for
> the marginal relationship between and X and Y to be zero and a partial
> relationship nonzero (even in the absence of interactions) -- this is
> simply
> a reflection of the more general point that partial and marginal
> relationships can differ.
>
> Regards,
> John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of Wensui Liu
> > Sent: Saturday, February 18, 2006 2:03 PM
> > To: Liaw, Andy
> > Cc: [hidden email]
> > Subject: Re: [R] Question about variable selection
> >
> > Thank you so much for your reply, Andy.
> >
> > But what if I am only interesed in main effects instead of
> > interactions?
> >
> >
> >
> > On 2/18/06, Liaw, Andy <[hidden email]> wrote:
> > >
> > > That depends on whether the IV could have some significant
> > > interactions with other Ivs not considered in the bivariate
> > analysis.
> > > E.g.,
> > >
> > > > iv <- expand.grid(-2:2, -2:2)
> > > > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) summary(lm(y ~
> > > > iv[,1]))
> > >
> > > Call:
> > > lm(formula = y ~ iv[, 1])
> > >
> > > Residuals:
> > >      Min       1Q   Median       3Q      Max
> > > -4.06259 -1.06048 -0.02377  1.05901  4.04315
> > >
> > > Coefficients:
> > >             Estimate Std. Error t value Pr(>|t|)
> > > (Intercept)  3.01908    0.41482   7.278 2.09e-07 ***
> > > iv[, 1]      0.01417    0.29332   0.048    0.962
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > Residual standard error: 2.074 on 23 degrees of freedom Multiple
> > > R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
> > > F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619
> > >
> > > > summary(lm(y ~ iv[,1] * iv[,2]))
> > >
> > > Call:
> > > lm(formula = y ~ iv[, 1] * iv[, 2])
> > >
> > > Residuals:
> > >      Min       1Q   Median       3Q      Max
> > > -0.22390 -0.08894 -0.01279  0.13525  0.17608
> > >
> > > Coefficients:
> > >                  Estimate Std. Error t value Pr(>|t|)
> > > (Intercept)      3.019083   0.026330 114.665   <2e-16 ***
> > > iv[, 1]          0.014167   0.018618   0.761    0.455
> > > iv[, 2]         -0.005486   0.018618  -0.295    0.771
> > > iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
> > > ---
> > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > >
> > > Residual standard error: 0.1316 on 21 degrees of freedom
> > > Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958
> > > F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16
> > >
> > >
> > >
> > >
> > > Andy
> > >
> > > From: Wensui Liu
> > > >
> > > > Dear Lister,
> > > >
> > > > I have a question about variable selection for regression.
> > > >
> > > > if the IV is not significantly related to DV in the bivariate
> > > > analysis, does it make sense to include this IV into the
> > full model
> > > > with multiple IVs?
> > > >
> > > > Thank you so much!
> > > >
> > > >       [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > >
> > > >
> > >
> > >
> > >
> > >
> > ----------------------------------------------------------------------
> > > --------
> > > Notice:  This e-mail message, together with any
> > > attachment...{{dropped}}
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
>
>


--
WenSui Liu
(http://statcompute.blogspot.com)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Question about variable selection

Fox, John
Dear Wensui,

I don't think that it's possible to answer these questions mechanically,
especially if you're interested in the "true" relationship between the
response and a set of explanatory variables. If, however, you have a pure
prediction problem, then variable selection is a more reasonable approach,
as long as it's done carefully (in my opinion).

I don't see how resampling and repeatedly examining the marginal
relationship between Y and an X is relevant to the question of whether there
is a partial relationship in the absence of a marginal relationship. (This
is close to what Wittgenstein once called buying two copies of the same
newspaper to see whether what was said in the first one is true.) After all,
as I said (and as you understand), the partial and marginal relationship can
differ -- so evidence about the marginal relationship is not necessarily
relevant to inference about the partial relationship. (As well,
bootstrapping a linear least-squares regression likely isn't going to give
you much additional information anyway.)

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Wensui Liu
> Sent: Saturday, February 18, 2006 3:03 PM
> To: John Fox
> Cc: [hidden email]
> Subject: Re: [R] Question about variable selection
>
> Dear John,
>
> I fully understand your point that a IV might not be
> significantly correlated with DV in bivariate situation but
> might be significantly correlated with DV with the presense
> of other IVs. But does this significant partial relationship
> reflect the true relation between IV and DV and really help
> to predict DV?
>
> >From here, let's go one step further. If I do multiple
> resampling from
> original dataset, build bivariate LM between IV and DV with
> different samples, and still can't get significant result, do
> you think I should give a chance to this IV by looking at its
> partial relationship with DV?
>
> Thank you so much!
>
> On 2/18/06, John Fox <[hidden email]> wrote:
> >
> > Dear Wensui and Andy,
> >
> > When the explanatory variables are correlated it's
> perfectly possible
> > for the marginal relationship between and X and Y to be zero and a
> > partial relationship nonzero (even in the absence of
> interactions) --
> > this is simply a reflection of the more general point that
> partial and
> > marginal relationships can differ.
> >
> > Regards,
> > John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: [hidden email]
> > > [mailto:[hidden email]] On Behalf Of Wensui Liu
> > > Sent: Saturday, February 18, 2006 2:03 PM
> > > To: Liaw, Andy
> > > Cc: [hidden email]
> > > Subject: Re: [R] Question about variable selection
> > >
> > > Thank you so much for your reply, Andy.
> > >
> > > But what if I am only interesed in main effects instead of
> > > interactions?
> > >
> > >
> > >
> > > On 2/18/06, Liaw, Andy <[hidden email]> wrote:
> > > >
> > > > That depends on whether the IV could have some significant
> > > > interactions with other Ivs not considered in the bivariate
> > > analysis.
> > > > E.g.,
> > > >
> > > > > iv <- expand.grid(-2:2, -2:2)
> > > > > y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1)
> summary(lm(y
> > > > > ~
> > > > > iv[,1]))
> > > >
> > > > Call:
> > > > lm(formula = y ~ iv[, 1])
> > > >
> > > > Residuals:
> > > >      Min       1Q   Median       3Q      Max
> > > > -4.06259 -1.06048 -0.02377  1.05901  4.04315
> > > >
> > > > Coefficients:
> > > >             Estimate Std. Error t value Pr(>|t|)
> > > > (Intercept)  3.01908    0.41482   7.278 2.09e-07 ***
> > > > iv[, 1]      0.01417    0.29332   0.048    0.962
> > > > ---
> > > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > > >
> > > > Residual standard error: 2.074 on 23 degrees of freedom Multiple
> > > > R-Squared: 0.0001014,  Adjusted R-squared: -0.04337
> > > > F-statistic: 0.002333 on 1 and 23 DF,  p-value: 0.9619
> > > >
> > > > > summary(lm(y ~ iv[,1] * iv[,2]))
> > > >
> > > > Call:
> > > > lm(formula = y ~ iv[, 1] * iv[, 2])
> > > >
> > > > Residuals:
> > > >      Min       1Q   Median       3Q      Max
> > > > -0.22390 -0.08894 -0.01279  0.13525  0.17608
> > > >
> > > > Coefficients:
> > > >                  Estimate Std. Error t value Pr(>|t|)
> > > > (Intercept)      3.019083   0.026330 114.665   <2e-16 ***
> > > > iv[, 1]          0.014167   0.018618   0.761    0.455
> > > > iv[, 2]         -0.005486   0.018618  -0.295    0.771
> > > > iv[, 1]:iv[, 2]  0.992865   0.013165  75.418   <2e-16 ***
> > > > ---
> > > > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> > > >
> > > > Residual standard error: 0.1316 on 21 degrees of freedom
> > > > Multiple R-Squared: 0.9963,     Adjusted R-squared: 0.9958
> > > > F-statistic:  1896 on 3 and 21 DF,  p-value: < 2.2e-16
> > > >
> > > >
> > > >
> > > >
> > > > Andy
> > > >
> > > > From: Wensui Liu
> > > > >
> > > > > Dear Lister,
> > > > >
> > > > > I have a question about variable selection for regression.
> > > > >
> > > > > if the IV is not significantly related to DV in the bivariate
> > > > > analysis, does it make sense to include this IV into the
> > > full model
> > > > > with multiple IVs?
> > > > >
> > > > > Thank you so much!
> > > > >
> > > > >       [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide!
> > > > > http://www.R-project.org/posting-guide.html
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> --------------------------------------------------------------------
> > > --
> > > > --------
> > > > Notice:  This e-mail message, together with any
> > > > attachment...{{dropped}}
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> >
> >
>
>
> --
> WenSui Liu
> (http://statcompute.blogspot.com)
> Senior Decision Support Analyst
> Health Policy and Clinical Effectiveness Cincinnati Children
> Hospital Medical Center
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Question about variable selection

William Revelle
Dear Wensui,

What you are asking about is called in psychology a "suppressor"
variable: a predictor variable unrelated to the criterion but
correlated with the other predictors. (X1 in the following example)
Although it has a zero relationship with the DV, it does "really"
help to predict the DV by removing extraneous variance from the other
IVs.  (I am not going to touch the Wittgenstein issue of truth here).
Should it be included in the predictor set? Yes.  Is there any easy
way to find all possible suppressors? No.


Consider the following:

#demonstration of "suppressor effects"
library(mvtnorm)
sigma <- matrix(c(1,.5,0,.5,1,.5,0,.5,1),ncol=3)
my.data <- data.frame(rmvnorm(1000,sigma=sigma))
names(my.data) <- c("X1", "X2", "Y")
round(cor(my.data),2)
summary(lm(Y~ X1 + X2,data= my.data))

which produces
       X1   X2     Y
X1  1.00 0.45 -0.04
X2  0.45 1.00  0.51
Y  -0.04 0.51  1.00
>  summary(lm(Y~ X1 + X2,data= my.data))

Call:
lm(formula = Y ~ X1 + X2, data = my.data)

Residuals:
      Min       1Q   Median       3Q      Max
-2.09350 -0.58069  0.02280  0.53436  3.02017

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.02807    0.02557   1.098    0.273  
X1          -0.32849    0.02813 -11.680   <2e-16 ***
X2           0.65666    0.02861  22.951   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8081 on 997 degrees of freedom
Multiple R-Squared: 0.3465, Adjusted R-squared: 0.3452
F-statistic: 264.4 on 2 and 997 DF,  p-value: < 2.2e-16



At 3:22 PM -0500 2/18/06, John Fox wrote:

>Dear Wensui,
>
>I don't think that it's possible to answer these questions mechanically,
>especially if you're interested in the "true" relationship between the
>response and a set of explanatory variables. If, however, you have a pure
>prediction problem, then variable selection is a more reasonable approach,
>as long as it's done carefully (in my opinion).
>
>I don't see how resampling and repeatedly examining the marginal
>relationship between Y and an X is relevant to the question of whether there
>is a partial relationship in the absence of a marginal relationship. (This
>is close to what Wittgenstein once called buying two copies of the same
>newspaper to see whether what was said in the first one is true.) After all,
>as I said (and as you understand), the partial and marginal relationship can
>differ -- so evidence about the marginal relationship is not necessarily
>relevant to inference about the partial relationship. (As well,
>bootstrapping a linear least-squares regression likely isn't going to give
>you much additional information anyway.)
>
>Regards,
>  John
>
>--------------------------------
>John Fox
>Department of Sociology


.... (discussion of interaction from Andy Liaw)


>  > > > From: Wensui Liu
>  > > > >
>  > > > > Dear Lister,
>  > > > >
>  > > > > I have a question about variable selection for regression.
>  > > > >
>  > > > > if the IV is not significantly related to DV in the bivariate
>  > > > > analysis, does it make sense to include this IV into the
>  > > full model
>  > > > > with multiple IVs?
>  > > > >
>  > > > > Thank you so much!


--
William Revelle http://pmc.psych.northwestern.edu/revelle.html   
Professor http://personality-project.org/personality.html
Department of Psychology       http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html