

How does R estimate the intercept term \alpha in a loglinear
model with Poisson model and log link for a contingency table of counts?
(E.g., for a 2by2 table {n_{ij}) with \log(\mu) = \alpha + \beta_{i} +
\gamma_{j})
I fitted such a model and checked the calculations by hand. I agreed
with the main effect terms but not the intercept. Interestingly, I
agreed with the fitted value provided by R for the first cell {11} in
the table.
If my estimate of intercept = \hat{\alpha}, my estimate of the fitted
value for the first cell = exp(\hat{\alpha}) but R seems to be doing
something else for the estimate of the intercept.
However if I check the R $fitted_value for n_{11} it agrees with my
exp(\hat{\alpha}).
I would expect that with the cornerpoint parametrization, the
estimates for a 2 x 2 table would correspond to expected frequencies
exp(\alpha), exp(\alpha + \beta), exp(\alpha + \gamma), exp(\alpha +
\beta + \gamma). The MLE of \alpha appears to be log(n_{.1} *
n_{1.}/n_{..}), but this is not equal to the intercept given by R in the
example I tried.
With thanks in anticipation,
Colin Aitken

Professor Colin Aitken,
Professor of Forensic Statistics,
School of Mathematics, King’s Buildings, University of Edinburgh,
Mayfield Road, Edinburgh, EH9 3JZ.
Tel: 0131 650 4877
Email: [hidden email]
Fax : 0131 650 6553
http://www.maths.ed.ac.uk/~cggaThe University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Nov 7, 2011, at 12:59 PM, Colin Aitken wrote:
> How does R estimate the intercept term \alpha in a loglinear
> model with Poisson model and log link for a contingency table of
> counts?
>
> (E.g., for a 2by2 table {n_{ij}) with \log(\mu) = \alpha +
> \beta_{i} + \gamma_{j})
>
> I fitted such a model and checked the calculations by hand. I
> agreed with the main effect terms but not the intercept.
> Interestingly, I agreed with the fitted value provided by R for the
> first cell {11} in the table.
>
> If my estimate of intercept = \hat{\alpha}, my estimate of the
> fitted value for the first cell = exp(\hat{\alpha}) but R seems to
> be doing something else for the estimate of the intercept.
>
> However if I check the R $fitted_value for n_{11} it agrees with my
> exp(\hat{\alpha}).
>
> I would expect that with the cornerpoint parametrization, the
> estimates for a 2 x 2 table would correspond to expected frequencies
> exp(\alpha), exp(\alpha + \beta), exp(\alpha + \gamma), exp(\alpha +
> \beta + \gamma). The MLE of \alpha appears to be log(n_{.1} * n_{1.}/
> n_{..}), but this is not equal to the intercept given by R in the
> example I tried.
>
> With thanks in anticipation,
>
> Colin Aitken
>
>
> 
> Professor Colin Aitken,
> Professor of Forensic Statistics,
Do you suppose you could provide a datacorpse for us to dissect?
Noting the tag line for every posting ....
> and provide commented, minimal, selfcontained, reproducible code.

David Winsemius, MD
West Hartford, CT
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Nov 07, 2011 at 7:59pm Colin Aitken wrote:
> How does R estimate the intercept term \alpha in a loglinear
> model with Poisson model and log link for a contingency table of counts?
Colin,
If you fitted this using a GLM then the default in R is to use socalled treatment contrasts (i.e. Dunnett contrasts). See ?contr.treatment. Take the first example on the ?glm help page
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts < c(18,17,15,20,10,20,25,13,12)
outcome < gl(3,1,9)
treatment < gl(3,3)
print(d.AD < data.frame(treatment, outcome, counts))
glm.D93 < glm(counts ~ outcome + treatment, family=poisson())
anova(glm.D93)
summary(glm.D93)
< snip >
Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 3.045e+00 1.709e01 17.815 <2e16 ***
outcome2 4.543e01 2.022e01 2.247 0.0246 *
outcome3 2.930e01 1.927e01 1.520 0.1285
treatment2 1.338e15 2.000e01 0.000 1.0000
treatment3 1.421e15 2.000e01 0.000 1.0000
< snip >
> levels(outcome)
[1] "1" "2" "3"
> levels(treatment)
[1] "1" "2" "3"
So here the intercept represents the estimated counts at the first level of "outcome" (i.e. outcome = 1) and the first level of "treatment" (i.e. treatment = 1).
> predict(glm.D93, newdata=data.frame(outcome="1", treatment="1"))
1
3.044522
Regards, Mark.
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa


On Nov 07, 2011 at 9:04pm Mark Difford wrote:
> So here the intercept represents the estimated counts...
Perhaps I should have added (though surely unnecessary in your case) that exponentiation gives the predicted/estimated counts, viz 21 (compared to 18 for the saturated model).
##
> exp(3.044522)
[1] 20.99999
Regards, Mark.
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa


On 08/11/11 07:11, David Winsemius wrote:
(in response to
>> Professor Colin Aitken,
>> Professor of Forensic Statistics,
!!!)
<SNIP>
>
> Do you suppose you could provide a datacorpse for us to dissect?
Fortune nomination!!!
cheers,
Rolf Turner
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Nov 08, 2011; 4:58am Rolf Turner wrote:
>(in response to
>>> Professor Colin Aitken,
>>> Professor of Forensic Statistics,
>!!!)
><SNIP>
>>
>> Do you suppose you could provide a datacorpse for us to dissect?
>Fortune nomination!!!
I think Sherlock would have said, "But it's elementary, my dear Watson. Oftentimes a corpse is not necessary, as here."
Regards, Mark.
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa


Sorry about that. However I have solved the problem by declaring the
explanatory variables as factors.
An unresolved problem is: what does R do when the explanatory factors
are not defined as factors when it obtains a different value for the
intercept but the correct value for the fitted value?
A description of the data and the R code and output is attached for
anyone interested.
Best wishes,
Colin Aitken

David Winsemius wrote:
>
> On Nov 7, 2011, at 12:59 PM, Colin Aitken wrote:
>
>> How does R estimate the intercept term \alpha in a loglinear
>> model with Poisson model and log link for a contingency table of counts?
>>
>> (E.g., for a 2by2 table {n_{ij}) with \log(\mu) = \alpha + \beta_{i}
>> + \gamma_{j})
>>
>> I fitted such a model and checked the calculations by hand. I agreed
>> with the main effect terms but not the intercept. Interestingly, I
>> agreed with the fitted value provided by R for the first cell {11} in
>> the table.
>>
>> If my estimate of intercept = \hat{\alpha}, my estimate of the fitted
>> value for the first cell = exp(\hat{\alpha}) but R seems to be doing
>> something else for the estimate of the intercept.
>>
>> However if I check the R $fitted_value for n_{11} it agrees with my
>> exp(\hat{\alpha}).
>>
>> I would expect that with the cornerpoint parametrization, the
>> estimates for a 2 x 2 table would correspond to expected frequencies
>> exp(\alpha), exp(\alpha + \beta), exp(\alpha + \gamma), exp(\alpha +
>> \beta + \gamma). The MLE of \alpha appears to be log(n_{.1} *
>> n_{1.}/n_{..}), but this is not equal to the intercept given by R in
>> the example I tried.
>>
>> With thanks in anticipation,
>>
>> Colin Aitken
>>
>>
>> 
>> Professor Colin Aitken,
>> Professor of Forensic Statistics,
>
> Do you suppose you could provide a datacorpse for us to dissect?
>
> Noting the tag line for every posting ....
>> and provide commented, minimal, selfcontained, reproducible code.
>

Professor Colin Aitken,
Professor of Forensic Statistics,
School of Mathematics, King’s Buildings, University of Edinburgh,
Mayfield Road, Edinburgh, EH9 3JZ.
Tel: 0131 650 4877
Email: [hidden email]
Fax : 0131 650 6553
http://www.maths.ed.ac.uk/~cggaThe University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Nov 08, 2011 at 11:16am Colin Aitken wrote:
> An unresolved problem is: what does R do when the explanatory factors
> are not defined as factors when it obtains a different value for the
> intercept but the correct value for the fitted value?
Colin,
I don't think that happens (that the fitted values are identical if predictors are cast as numerical), but the following could (really is answered by my initial answer). Once again, using the example I gave above, but using the second level of "outcome" as a reference level for a new fit, called glm.D93R. (For this part of the question a corpse would have been nice, though not really needed"yours" was unfortunately buried too deeply for me to find it,)
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts < c(18,17,15,20,10,20,25,13,12)
outcome < gl(3,1,9)
treatment < gl(3,3)
glm.D93 < glm(counts ~ outcome + treatment, family=poisson())
glm.D93R < glm(counts ~ C(outcome, base=2) + treatment, family=poisson())
## treat predictor as numeric
glm.D93N < glm(counts ~ as.numeric(as.character(outcome)) + as.numeric(as.character (treatment)), family=poisson())
> coef(glm.D93)
(Intercept) outcome2 outcome3 treatment2 treatment3
3.044522e+00 4.542553e01 2.929871e01 1.337909e15 1.421085e15
## Different value for the Intercept but same fitted values (see below) as the earlier fit (above)
##
summary(glm.D93R)
< snipped and edited for clarity>
Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 2.590e+00 1.958e01 13.230 <2e16 ***
outcome1 4.543e01 2.022e01 2.247 0.0246 *
outcome3 1.613e01 2.151e01 0.750 0.4535
treatment2 3.349e16 2.000e01 0.000 1.0000
treatment3 6.217e16 2.000e01 0.000 1.0000
< snip >
> fitted(glm.D93)
1 2 3 4 5 6 7 8 9
21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667
> fitted(glm.D93R)
1 2 3 4 5 6 7 8 9
21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667
## if predictors treated as numericcheck summary(glm.D93N) yourself
> fitted(glm.D93N)
1 2 3 4 5 6 7 8 9
19.40460 16.52414 14.07126 19.40460 16.52414 14.07126 19.40460 16.52414 14.07126
Regards, Mark.
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa

