|
How does R estimate the intercept term \alpha in a loglinear
model with Poisson model and log link for a contingency table of counts? (E.g., for a 2-by-2 table {n_{ij}) with \log(\mu) = \alpha + \beta_{i} + \gamma_{j}) I fitted such a model and checked the calculations by hand. I agreed with the main effect terms but not the intercept. Interestingly, I agreed with the fitted value provided by R for the first cell {11} in the table. If my estimate of intercept = \hat{\alpha}, my estimate of the fitted value for the first cell = exp(\hat{\alpha}) but R seems to be doing something else for the estimate of the intercept. However if I check the R $fitted_value for n_{11} it agrees with my exp(\hat{\alpha}). I would expect that with the corner-point parametrization, the estimates for a 2 x 2 table would correspond to expected frequencies exp(\alpha), exp(\alpha + \beta), exp(\alpha + \gamma), exp(\alpha + \beta + \gamma). The MLE of \alpha appears to be log(n_{.1} * n_{1.}/n_{..}), but this is not equal to the intercept given by R in the example I tried. With thanks in anticipation, Colin Aitken -- Professor Colin Aitken, Professor of Forensic Statistics, School of Mathematics, King’s Buildings, University of Edinburgh, Mayfield Road, Edinburgh, EH9 3JZ. Tel: 0131 650 4877 E-mail: [hidden email] Fax : 0131 650 6553 http://www.maths.ed.ac.uk/~cgga The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Nov 7, 2011, at 12:59 PM, Colin Aitken wrote: > How does R estimate the intercept term \alpha in a loglinear > model with Poisson model and log link for a contingency table of > counts? > > (E.g., for a 2-by-2 table {n_{ij}) with \log(\mu) = \alpha + > \beta_{i} + \gamma_{j}) > > I fitted such a model and checked the calculations by hand. I > agreed with the main effect terms but not the intercept. > Interestingly, I agreed with the fitted value provided by R for the > first cell {11} in the table. > > If my estimate of intercept = \hat{\alpha}, my estimate of the > fitted value for the first cell = exp(\hat{\alpha}) but R seems to > be doing something else for the estimate of the intercept. > > However if I check the R $fitted_value for n_{11} it agrees with my > exp(\hat{\alpha}). > > I would expect that with the corner-point parametrization, the > estimates for a 2 x 2 table would correspond to expected frequencies > exp(\alpha), exp(\alpha + \beta), exp(\alpha + \gamma), exp(\alpha + > \beta + \gamma). The MLE of \alpha appears to be log(n_{.1} * n_{1.}/ > n_{..}), but this is not equal to the intercept given by R in the > example I tried. > > With thanks in anticipation, > > Colin Aitken > > > -- > Professor Colin Aitken, > Professor of Forensic Statistics, Do you suppose you could provide a data-corpse for us to dissect? Noting the tag line for every posting .... > and provide commented, minimal, self-contained, reproducible code. -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Colin Aitken
On Nov 07, 2011 at 7:59pm Colin Aitken wrote:
> How does R estimate the intercept term \alpha in a loglinear > model with Poisson model and log link for a contingency table of counts? Colin, If you fitted this using a GLM then the default in R is to use so-called treatment contrasts (i.e. Dunnett contrasts). See ?contr.treatment. Take the first example on the ?glm help page ## Dobson (1990) Page 93: Randomized Controlled Trial : counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts)) glm.D93 <- glm(counts ~ outcome + treatment, family=poisson()) anova(glm.D93) summary(glm.D93) < snip > Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.045e+00 1.709e-01 17.815 <2e-16 *** outcome2 -4.543e-01 2.022e-01 -2.247 0.0246 * outcome3 -2.930e-01 1.927e-01 -1.520 0.1285 treatment2 1.338e-15 2.000e-01 0.000 1.0000 treatment3 1.421e-15 2.000e-01 0.000 1.0000 < snip > > levels(outcome) [1] "1" "2" "3" > levels(treatment) [1] "1" "2" "3" So here the intercept represents the estimated counts at the first level of "outcome" (i.e. outcome = 1) and the first level of "treatment" (i.e. treatment = 1). > predict(glm.D93, newdata=data.frame(outcome="1", treatment="1")) 1 3.044522 Regards, Mark.
Mark Difford (Ph.D.)
Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa |
|
On Nov 07, 2011 at 9:04pm Mark Difford wrote:
> So here the intercept represents the estimated counts... Perhaps I should have added (though surely unnecessary in your case) that exponentiation gives the predicted/estimated counts, viz 21 (compared to 18 for the saturated model). ## > exp(3.044522) [1] 20.99999 Regards, Mark.
Mark Difford (Ph.D.)
Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa |
|
In reply to this post by David Winsemius
On 08/11/11 07:11, David Winsemius wrote: (in response to >> Professor Colin Aitken, >> Professor of Forensic Statistics, !!!) <SNIP> > > Do you suppose you could provide a data-corpse for us to dissect? Fortune nomination!!! cheers, Rolf Turner ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Nov 08, 2011; 4:58am Rolf Turner wrote:
>(in response to >>> Professor Colin Aitken, >>> Professor of Forensic Statistics, >!!!) ><SNIP> >> >> Do you suppose you could provide a data-corpse for us to dissect? >Fortune nomination!!! I think Sherlock would have said, "But it's elementary, my dear Watson. Oftentimes a corpse is not necessary, as here." Regards, Mark.
Mark Difford (Ph.D.)
Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa |
|
In reply to this post by David Winsemius
Sorry about that. However I have solved the problem by declaring the
explanatory variables as factors. An unresolved problem is: what does R do when the explanatory factors are not defined as factors when it obtains a different value for the intercept but the correct value for the fitted value? A description of the data and the R code and output is attached for anyone interested. Best wishes, Colin Aitken ------------------- David Winsemius wrote: > > On Nov 7, 2011, at 12:59 PM, Colin Aitken wrote: > >> How does R estimate the intercept term \alpha in a loglinear >> model with Poisson model and log link for a contingency table of counts? >> >> (E.g., for a 2-by-2 table {n_{ij}) with \log(\mu) = \alpha + \beta_{i} >> + \gamma_{j}) >> >> I fitted such a model and checked the calculations by hand. I agreed >> with the main effect terms but not the intercept. Interestingly, I >> agreed with the fitted value provided by R for the first cell {11} in >> the table. >> >> If my estimate of intercept = \hat{\alpha}, my estimate of the fitted >> value for the first cell = exp(\hat{\alpha}) but R seems to be doing >> something else for the estimate of the intercept. >> >> However if I check the R $fitted_value for n_{11} it agrees with my >> exp(\hat{\alpha}). >> >> I would expect that with the corner-point parametrization, the >> estimates for a 2 x 2 table would correspond to expected frequencies >> exp(\alpha), exp(\alpha + \beta), exp(\alpha + \gamma), exp(\alpha + >> \beta + \gamma). The MLE of \alpha appears to be log(n_{.1} * >> n_{1.}/n_{..}), but this is not equal to the intercept given by R in >> the example I tried. >> >> With thanks in anticipation, >> >> Colin Aitken >> >> >> -- >> Professor Colin Aitken, >> Professor of Forensic Statistics, > > Do you suppose you could provide a data-corpse for us to dissect? > > Noting the tag line for every posting .... >> and provide commented, minimal, self-contained, reproducible code. > Professor Colin Aitken, Professor of Forensic Statistics, School of Mathematics, King’s Buildings, University of Edinburgh, Mayfield Road, Edinburgh, EH9 3JZ. Tel: 0131 650 4877 E-mail: [hidden email] Fax : 0131 650 6553 http://www.maths.ed.ac.uk/~cgga The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Nov 08, 2011 at 11:16am Colin Aitken wrote:
> An unresolved problem is: what does R do when the explanatory factors > are not defined as factors when it obtains a different value for the > intercept but the correct value for the fitted value? Colin, I don't think that happens (that the fitted values are identical if predictors are cast as numerical), but the following could (really is answered by my initial answer). Once again, using the example I gave above, but using the second level of "outcome" as a reference level for a new fit, called glm.D93R. (For this part of the question a corpse would have been nice, though not really needed---"yours" was unfortunately buried too deeply for me to find it,) ## Dobson (1990) Page 93: Randomized Controlled Trial : counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) glm.D93 <- glm(counts ~ outcome + treatment, family=poisson()) glm.D93R <- glm(counts ~ C(outcome, base=2) + treatment, family=poisson()) ## treat predictor as numeric glm.D93N <- glm(counts ~ as.numeric(as.character(outcome)) + as.numeric(as.character (treatment)), family=poisson()) > coef(glm.D93) (Intercept) outcome2 outcome3 treatment2 treatment3 3.044522e+00 -4.542553e-01 -2.929871e-01 1.337909e-15 1.421085e-15 ## Different value for the Intercept but same fitted values (see below) as the earlier fit (above) ## summary(glm.D93R) < snipped and edited for clarity> Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.590e+00 1.958e-01 13.230 <2e-16 *** outcome1 4.543e-01 2.022e-01 2.247 0.0246 * outcome3 1.613e-01 2.151e-01 0.750 0.4535 treatment2 -3.349e-16 2.000e-01 0.000 1.0000 treatment3 -6.217e-16 2.000e-01 0.000 1.0000 < snip > > fitted(glm.D93) 1 2 3 4 5 6 7 8 9 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 > fitted(glm.D93R) 1 2 3 4 5 6 7 8 9 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 ## if predictors treated as numeric---check summary(glm.D93N) yourself > fitted(glm.D93N) 1 2 3 4 5 6 7 8 9 19.40460 16.52414 14.07126 19.40460 16.52414 14.07126 19.40460 16.52414 14.07126 Regards, Mark.
Mark Difford (Ph.D.)
Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa |
| Powered by Nabble | Edit this page |
