On Thu, 13 Aug 2009, hesicaia wrote:

>

> Hello,

>

> I have a question regarding estimation of the dispersion parameter (theta)

> for generalized linear models with the negative binomial error structure. As

The theta is different from the dispersion. In the usual GLM notation:

E[y] = mu

VAR[y] = dispersion * V(mu)

The function V() depends on the family and is

Poisson: V(mu) = mu

NB: V(mu) = mu + 1/theta * mu^2

For both models, dispersion is known to be 1 (from the likelihood).

However, a quasi-Poisson approach can be adopted where dispersion is

estimated (but does not correspond to a specific likelihood).

Thus, dispersion and theta are really different things although both of

them can be used to capture overdispersion.

> I understand, there are two main methods to fit glm's using the nb error

> structure in R: glm.nb() or glm() with the negative.binomial(theta) family.

> Both functions are implemented through the MASS library. Fitting the model

> using these two functions to the same data produces much different results

> for me in terms of estimated theta and the coefficients, and I am not sure

> why.

...because you tell them to do different things.

> the following model:

> mod<-glm.nb(count ~ year + season + time + depth,

> link="log",data=dat,control=glm.control(maxit=100,trace=T))

> estimates theta as 0.0109

This approach estimates theta (= 0.0109) and assumes that dispersion is

known to be 1. The underlying estimated variance function is:

VAR[y] = mu + 91.74312 * mu^2

> while the following model:

> mod2<-glm(count ~ year + season + time + depth,

> family=negative.binomial(theta=100),link="log",data=dat,control=glm.control(maxit=100,trace=T))

> will not accept 0.0109 as theta and instead estimates it as 1271 (these are

> fisheries catch data and so are very overdispersed).

This does not estimate theta at all but keeps it fixed (= 100). By

default, however, dispersion will be estimated by the summary() method,

presumably leading to the value of 1271 you report. The underlying

variance function would then be

VAR[y] = 1271 * mu + 12.71 * mu^2

> Fitting a quasipoisson model also yields a large dispersion parameter

> (1300). The models also produce different coefficients and P-values, which

> is disconcerting.

This implies yet another variance function, namely

VAR[y] = 1300 * mu

If you want to get essentially the same result as

summary(mod)

from using glm+negative.binomial you can do

mod0 <- glm(count ~ year + season + time + depth, data = dat,

family = negative.binomial(theta = mod$theta),

control = glm.control(maxit = 100))

summary(mod0, dispersion = 1)

(Note that link = "log" is not needed.)

> What am I doing wrong here? I've read through the help sections

> (?negative.binomial,?glm.nb, and ?glm) but did not find any answers.

I guess the authors of the "MASS" package would say that the software

accompanies a book which should be consulted...and they would be right.

Reading the corresponding sections in MASS (the book) will surely be

helpful. Consulting McCullagh & Nelder about some general GLM theory can't

hurt either. Finally, some extended modeling (including excess zeros) is

available in

http://www.jstatsoft.org/v27/i08/(Apologies for advertising this twice on the same day.)

hth,

Z

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.