[R] Profile confidence intervals and LR chi-square test

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[R] Profile confidence intervals and LR chi-square test

Brant Inman

System: R 2.3.1 on Windows XP machine.

I am building a logistic regression model for a sample of 100 cases in
dataframe "d", in which there are 3 binary covariates: x1, x2 and x3.

----------------

> summary(d)
 y      x1     x2     x3    
 0:54   0:50   0:64   0:78  
 1:46   1:50   1:36   1:22  

> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))

> summary(fit)

Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit),
    data = d)

Deviance Residuals:
    Min       1Q   Median       3Q      Max  
-1.6503  -1.0220  -0.7284   0.9965   1.7069  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -0.3772     0.3721  -1.014   0.3107  
x11          -0.8144     0.4422  -1.842   0.0655 .
x21           0.9226     0.4609   2.002   0.0453 *
x31           1.3347     0.5576   2.394   0.0167 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 137.99  on 99  degrees of freedom
Residual deviance: 120.65  on 96  degrees of freedom
AIC: 128.65

Number of Fisher Scoring iterations: 4

> exp(fit$coef)
(Intercept)         x11         x21         x31
  0.6858006   0.4429233   2.5157321   3.7989873
---------------

After reading the appropriate sections in MASS4 (7.2 and 8.4 in
particular), I decided to estimate the 95% confidence intervals for the
odds ratios using the profile method implemented in the "confint"
function. I then used the "anova" function to perform the deviance
chi-square tests for each covariate.

---------------
> ci <- confint(fit); exp(ci)
Waiting for profiling to be done...
                2.5 %    97.5 %
(Intercept) 0.3246680  1.413684
x11         0.1834819  1.048154
x21         1.0256096  6.314473
x31         1.3221533 12.129210

> anova(fit, test='Chisq')
Analysis of Deviance Table

Model: binomial, link: logit

Response: y

Terms added sequentially (first to last)


     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    99    137.989          
x1    1    5.856        98    132.133     0.016
x2    1    5.271        97    126.862     0.022
x3    1    6.212        96    120.650     0.013
----------------

My question relates to the interpretation of the significance of
variable x1.  The OR for x1 is 0.443 and its profile confidence interval
is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
would tend to suggest that x1 is NOT a significant predictor of y.
However, the deviance chi-square test has a P-value of 0.016, which
suggests that x1 is indeed a significant predictor of y. How do I
reconcile these two differing messages? I do recognize that the upper
bound of the confidence interval is pretty close to 1, but I am certain
that some journal reviewer will point out the problem as inconsistent.

Brant Inman

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Profile confidence intervals and LR chi-square test

Henric Nilsson
On 2006-11-14 00:41, Inman, Brant A. M.D. skrev:
> System: R 2.3.1 on Windows XP machine.

Time to upgrade!

>
> I am building a logistic regression model for a sample of 100 cases in
> dataframe "d", in which there are 3 binary covariates: x1, x2 and x3.

Please provide a reproducible example (as suggested by the posting guide).

>
> ----------------
>
>> summary(d)
>  y      x1     x2     x3    
>  0:54   0:50   0:64   0:78  
>  1:46   1:50   1:36   1:22  
>
>> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))
>
>> summary(fit)
>
> Call:
> glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit),
>     data = d)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max  
> -1.6503  -1.0220  -0.7284   0.9965   1.7069  
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)  
> (Intercept)  -0.3772     0.3721  -1.014   0.3107  
> x11          -0.8144     0.4422  -1.842   0.0655 .
> x21           0.9226     0.4609   2.002   0.0453 *
> x31           1.3347     0.5576   2.394   0.0167 *
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 137.99  on 99  degrees of freedom
> Residual deviance: 120.65  on 96  degrees of freedom
> AIC: 128.65
>
> Number of Fisher Scoring iterations: 4
>
>> exp(fit$coef)
> (Intercept)         x11         x21         x31
>   0.6858006   0.4429233   2.5157321   3.7989873
> ---------------
>
> After reading the appropriate sections in MASS4 (7.2 and 8.4 in
> particular), I decided to estimate the 95% confidence intervals for the
> odds ratios using the profile method implemented in the "confint"
> function. I then used the "anova" function to perform the deviance
> chi-square tests for each covariate.
>
> ---------------
>> ci <- confint(fit); exp(ci)
> Waiting for profiling to be done...
>                 2.5 %    97.5 %
> (Intercept) 0.3246680  1.413684
> x11         0.1834819  1.048154
> x21         1.0256096  6.314473
> x31         1.3221533 12.129210
>
>> anova(fit, test='Chisq')
> Analysis of Deviance Table
>
> Model: binomial, link: logit
>
> Response: y
>
> Terms added sequentially (first to last)
               ^^^^^^^^^^^^
Hence, your use of the `anova' function doesn't return tests
corresponding to the CIs computed above.

>
>
>      Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989          
> x1    1    5.856        98    132.133     0.016
> x2    1    5.271        97    126.862     0.022
> x3    1    6.212        96    120.650     0.013
> ----------------
>
> My question relates to the interpretation of the significance of
> variable x1.  The OR for x1 is 0.443 and its profile confidence interval
> is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
> would tend to suggest that x1 is NOT a significant predictor of y.

This is also suggested by the Wald test computed by the `summary' function.

> However, the deviance chi-square test has a P-value of 0.016, which
> suggests that x1 is indeed a significant predictor of y. How do I

That p-value corresponds to adding x1 to a model containing only the
intercept term.

> reconcile these two differing messages? I do recognize that the upper

Generally, in order to compute the LR test for the null hypothesis of
some subset of the parameters being equal to zero, you need to
explicitly fit both the restricted and the unrestricted model and
compare them using the `anova' function.

Also, see FAQ 7.18.


HTH,
Henric



> bound of the confidence interval is pretty close to 1, but I am certain
> that some journal reviewer will point out the problem as inconsistent.
>
> Brant Inman
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Profile confidence intervals and LR chi-square test

Prof Brian Ripley
In reply to this post by Brant Inman
Your problem is the interpretation of anova(): it is a sequential test and
x1 is the first term.  Using dropterm() would give you the correct LR
test.

However, you also have a Wald test given by the line

> x11          -0.8144     0.4422  -1.842   0.0655 .

which is not significant at the 5% level.  The correct LRT would be
expected to be more accurate, and your inversion of the profile likelihood
is just a way to compute the LRT.


On Mon, 13 Nov 2006, Inman, Brant A.   M.D. wrote:

>
> System: R 2.3.1 on Windows XP machine.
>
> I am building a logistic regression model for a sample of 100 cases in
> dataframe "d", in which there are 3 binary covariates: x1, x2 and x3.
>
> ----------------
>
>> summary(d)
> y      x1     x2     x3
> 0:54   0:50   0:64   0:78
> 1:46   1:50   1:36   1:22
>
>> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))
>
>> summary(fit)
>
> Call:
> glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit),
>    data = d)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -1.6503  -1.0220  -0.7284   0.9965   1.7069
>
> Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -0.3772     0.3721  -1.014   0.3107
> x11          -0.8144     0.4422  -1.842   0.0655 .
> x21           0.9226     0.4609   2.002   0.0453 *
> x31           1.3347     0.5576   2.394   0.0167 *
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 137.99  on 99  degrees of freedom
> Residual deviance: 120.65  on 96  degrees of freedom
> AIC: 128.65
>
> Number of Fisher Scoring iterations: 4
>
>> exp(fit$coef)
> (Intercept)         x11         x21         x31
>  0.6858006   0.4429233   2.5157321   3.7989873
> ---------------
>
> After reading the appropriate sections in MASS4 (7.2 and 8.4 in
> particular), I decided to estimate the 95% confidence intervals for the
> odds ratios using the profile method implemented in the "confint"
> function. I then used the "anova" function to perform the deviance
> chi-square tests for each covariate.
>
> ---------------
>> ci <- confint(fit); exp(ci)
> Waiting for profiling to be done...
>                2.5 %    97.5 %
> (Intercept) 0.3246680  1.413684
> x11         0.1834819  1.048154
> x21         1.0256096  6.314473
> x31         1.3221533 12.129210
>
>> anova(fit, test='Chisq')
> Analysis of Deviance Table
>
> Model: binomial, link: logit
>
> Response: y
>
> Terms added sequentially (first to last)
>
>
>     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989
> x1    1    5.856        98    132.133     0.016
> x2    1    5.271        97    126.862     0.022
> x3    1    6.212        96    120.650     0.013
> ----------------
>
> My question relates to the interpretation of the significance of
> variable x1.  The OR for x1 is 0.443 and its profile confidence interval
> is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
> would tend to suggest that x1 is NOT a significant predictor of y.
> However, the deviance chi-square test has a P-value of 0.016, which
> suggests that x1 is indeed a significant predictor of y. How do I
> reconcile these two differing messages? I do recognize that the upper
> bound of the confidence interval is pretty close to 1, but I am certain
> that some journal reviewer will point out the problem as inconsistent.
>
> Brant Inman
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Profile confidence intervals and LR chi-square test

Brant Inman
In reply to this post by Brant Inman

Thank you to Prof Ripley and Henric Nilsson for their observation that I
was using "anova" inappropriately for the question that I was trying to
answer. Note that the Wald statistics and confidence interval were
calculable in the previous email but that I prefered using the more
accurate deviance statistics.  I will demonstrate my error for the
benefit of those new users of R (like me) that may not have appreciated
how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY
actually means in this context.

Since the "anova" function is a sequential test of the current model,
only the statistic for the last term in the model formula is a true
deviance chi-square statistic for (full model) .vs. (full model - term).
For instance, using the data upon which this question was based,
consider the following 2 models:

----------------------

> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),
na.action=na.omit)
> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)

----------------------

Here, fit0 is the null (intercept-only) model and fit1 is the full model
(without interactions because interactions are not biologically
plausible in this medical dataset). Now notice the result of the "anova"
function for the full model:

----------------------

> anova(fit1, test='Chisq')

...

     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    99    137.989          
x1    1    8.267        98    129.721     0.004
x2    1    5.639        97    124.083     0.018
x3    1    3.433        96    120.650     0.064

-----------------------

It is incorrect to interpret the deviance chi-square test presented
above for x1 (P=0.004) as the deviance chi-square statistic comparing
(y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1)
.vs. (y~x1). Similarly, the deviance chi-square statistic calculated for
x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs.
(y~x1+x2).  Lastly, the deviance chi-square statistic for x3(P=0.064) is
probably the most intuitive because it is for the comparison of
(y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to
present for x3 in the full model.  So how do you get the correct
deviance chi-square statistics for x1 and x2 in the full model?  There
are a couple of ways of arriving at the same answer as I will
demonstrate for the case of x1.

Option#1: Reorder the full model so that x1 is the last term in the
model formula

-----------------------

> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),
na.action=na.omit)
> anova(fit2, test='Chisq')
...

     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    99    137.989          
x2    1    7.305        98    130.683     0.007
x3    1    3.821        97    126.862     0.051
x1    1    6.212        96    120.650     0.013

-----------------------

Notice that the deviance chi-square statistics for all of the variables
have changed, despite fit2 being identical in content to fit1. Just the
order of the terms in the model formula have changed from fit1 to fit2.
The deviance statistic for x1 is now the correct one for the full model,
that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3).

Option#2: Compare the full model to a reduced model that does not
include x1.

-----------------------

> fit3 <- update(fit1, . ~ . - x1)
> anova(fit1, fit3, test='Chisq')
...

Model 1: y ~ x1 + x2 + x3
Model 2: y ~ x2 + x3
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1        96    120.650                      
2        97    126.862 -1   -6.212     0.013

-----------------------

Fit3 is the same model as fit1 except that it is missing the x1 term.
Therefore, any change in the deviance chi-square statistic is due to the
deletion of x1 from full model. Notice that the difference in residual
deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same
the difference b/t x1 and x3 in option#1.


Brant

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Profile confidence intervals and LR chi-square test

Prof Brian Ripley
I did mention dropterm (and drop1 can also be used).  This is more
efficient, simpler and less-error prone than setting up your own anova()'s
(especially if you want to do LRTs for dropping several terms).

On Tue, 14 Nov 2006, Inman, Brant A.   M.D. wrote:

>
> Thank you to Prof Ripley and Henric Nilsson for their observation that I
> was using "anova" inappropriately for the question that I was trying to
> answer. Note that the Wald statistics and confidence interval were
> calculable in the previous email but that I prefered using the more
> accurate deviance statistics.  I will demonstrate my error for the
> benefit of those new users of R (like me) that may not have appreciated
> how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY
> actually means in this context.
>
> Since the "anova" function is a sequential test of the current model,
> only the statistic for the last term in the model formula is a true
> deviance chi-square statistic for (full model) .vs. (full model - term).
> For instance, using the data upon which this question was based,
> consider the following 2 models:
>
> ----------------------
>
>> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),
> na.action=na.omit)
>> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)
>
> ----------------------
>
> Here, fit0 is the null (intercept-only) model and fit1 is the full model
> (without interactions because interactions are not biologically
> plausible in this medical dataset). Now notice the result of the "anova"
> function for the full model:
>
> ----------------------
>
>> anova(fit1, test='Chisq')
>
> ...
>
>     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989
> x1    1    8.267        98    129.721     0.004
> x2    1    5.639        97    124.083     0.018
> x3    1    3.433        96    120.650     0.064
>
> -----------------------
>
> It is incorrect to interpret the deviance chi-square test presented
> above for x1 (P=0.004) as the deviance chi-square statistic comparing
> (y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1)
> .vs. (y~x1). Similarly, the deviance chi-square statistic calculated for
> x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs.
> (y~x1+x2).  Lastly, the deviance chi-square statistic for x3(P=0.064) is
> probably the most intuitive because it is for the comparison of
> (y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to
> present for x3 in the full model.  So how do you get the correct
> deviance chi-square statistics for x1 and x2 in the full model?  There
> are a couple of ways of arriving at the same answer as I will
> demonstrate for the case of x1.
>
> Option#1: Reorder the full model so that x1 is the last term in the
> model formula
>
> -----------------------
>
>> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),
> na.action=na.omit)
>> anova(fit2, test='Chisq')
> ...
>
>     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989
> x2    1    7.305        98    130.683     0.007
> x3    1    3.821        97    126.862     0.051
> x1    1    6.212        96    120.650     0.013
>
> -----------------------
>
> Notice that the deviance chi-square statistics for all of the variables
> have changed, despite fit2 being identical in content to fit1. Just the
> order of the terms in the model formula have changed from fit1 to fit2.
> The deviance statistic for x1 is now the correct one for the full model,
> that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3).
>
> Option#2: Compare the full model to a reduced model that does not
> include x1.
>
> -----------------------
>
>> fit3 <- update(fit1, . ~ . - x1)
>> anova(fit1, fit3, test='Chisq')
> ...
>
> Model 1: y ~ x1 + x2 + x3
> Model 2: y ~ x2 + x3
>  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
> 1        96    120.650
> 2        97    126.862 -1   -6.212     0.013
>
> -----------------------
>
> Fit3 is the same model as fit1 except that it is missing the x1 term.
> Therefore, any change in the deviance chi-square statistic is due to the
> deletion of x1 from full model. Notice that the difference in residual
> deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same
> the difference b/t x1 and x3 in option#1.
>
>
> Brant
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.