Thank you to Prof Ripley and Henric Nilsson for their observation that I

was using "anova" inappropriately for the question that I was trying to

answer. Note that the Wald statistics and confidence interval were

calculable in the previous email but that I prefered using the more

accurate deviance statistics. I will demonstrate my error for the

benefit of those new users of R (like me) that may not have appreciated

how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY

actually means in this context.

Since the "anova" function is a sequential test of the current model,

only the statistic for the last term in the model formula is a true

deviance chi-square statistic for (full model) .vs. (full model - term).

For instance, using the data upon which this question was based,

consider the following 2 models:

----------------------

> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),

na.action=na.omit)

> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)

----------------------

Here, fit0 is the null (intercept-only) model and fit1 is the full model

(without interactions because interactions are not biologically

plausible in this medical dataset). Now notice the result of the "anova"

function for the full model:

----------------------

> anova(fit1, test='Chisq')

...

Df Deviance Resid. Df Resid. Dev P(>|Chi|)

NULL 99 137.989

x1 1 8.267 98 129.721 0.004

x2 1 5.639 97 124.083 0.018

x3 1 3.433 96 120.650 0.064

-----------------------

It is incorrect to interpret the deviance chi-square test presented

above for x1 (P=0.004) as the deviance chi-square statistic comparing

(y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1)

.vs. (y~x1). Similarly, the deviance chi-square statistic calculated for

x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs.

(y~x1+x2). Lastly, the deviance chi-square statistic for x3(P=0.064) is

probably the most intuitive because it is for the comparison of

(y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to

present for x3 in the full model. So how do you get the correct

deviance chi-square statistics for x1 and x2 in the full model? There

are a couple of ways of arriving at the same answer as I will

demonstrate for the case of x1.

Option#1: Reorder the full model so that x1 is the last term in the

model formula

-----------------------

> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),

na.action=na.omit)

> anova(fit2, test='Chisq')

...

Df Deviance Resid. Df Resid. Dev P(>|Chi|)

NULL 99 137.989

x2 1 7.305 98 130.683 0.007

x3 1 3.821 97 126.862 0.051

x1 1 6.212 96 120.650 0.013

-----------------------

Notice that the deviance chi-square statistics for all of the variables

have changed, despite fit2 being identical in content to fit1. Just the

order of the terms in the model formula have changed from fit1 to fit2.

The deviance statistic for x1 is now the correct one for the full model,

that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3).

Option#2: Compare the full model to a reduced model that does not

include x1.

-----------------------

> fit3 <- update(fit1, . ~ . - x1)

> anova(fit1, fit3, test='Chisq')

...

Model 1: y ~ x1 + x2 + x3

Model 2: y ~ x2 + x3

Resid. Df Resid. Dev Df Deviance P(>|Chi|)

1 96 120.650

2 97 126.862 -1 -6.212 0.013

-----------------------

Fit3 is the same model as fit1 except that it is missing the x1 term.

Therefore, any change in the deviance chi-square statistic is due to the

deletion of x1 from full model. Notice that the difference in residual

deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same

the difference b/t x1 and x3 in option#1.

Brant

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.