glm poisson and quasipoisson

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

glm poisson and quasipoisson

Wim Kreinen
Hello,

I have a question about modelling via  glm. I have a dataset (see dput)
that looks like as if it where poisson distributed (actually I would
appreciate that) but it isnt because  mean unequals var.


> mean (x)
[1] 901.7827
> var (x)
[1] 132439.3


Anyway, I tried to model it via poisson and quasipoisson. Actually, just to
get an impression how glm works. But I dont know how to interprete the
data. Of course this is the case because my knowledge concerning logistic
regressions is rather limited. Hoping there is somebody with mercy I would
like to understand which parameters are important, e.g. which paramter
might give me a hint that a poisson model is a bad idea. For hints
concerning some tutorials  about reading glm-output I would appreciate as
well.

Thanks
Wim


> skn300.glmp <- glm (freq~n, data=skn300.tab, family=poisson)
> summary (skn300.glmp)

Call:
glm(formula = freq ~ n, family = poisson, data = skn300.tab)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-51.332   -9.383   -6.599   -3.959   55.111

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  7.2374375  0.0093285   775.8   <2e-16 ***
n           -0.0539424  0.0003699  -145.8   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 71731  on 96  degrees of freedom
Residual deviance: 37383  on 95  degrees of freedom
AIC: 37800

Number of Fisher Scoring iterations: 6

>
> skn300.glmq <- glm (freq~n, data=skn300.tab, family=quasipoisson)
> summary (skn300.glmq)

Call:
glm(formula = freq ~ n, family = quasipoisson, data = skn300.tab)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-51.332   -9.383   -6.599   -3.959   55.111

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.237438   0.186381  38.831  < 2e-16 ***
n           -0.053942   0.007391  -7.298  8.8e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 399.1874)

    Null deviance: 71731  on 96  degrees of freedom
Residual deviance: 37383  on 95  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 6


>  dput (skn300.tab)
structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
    kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
    3333L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
    18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
    22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
    24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
    24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
    24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
    24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
    24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
    24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
    24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
    24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names = c("n",
"freq", "kum"), row.names = c(NA, -97L), class = "data.frame")

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: glm poisson and quasipoisson

Ivan-K
I am sure, that this is not a pure Poisson! Huge overdispersion!
You get inflated confidence intervals!
(although, the point estimates of the regression coefficients stay the same)
Try to look for the causes of overdispersion! It may be geteroscedastisity?
What is the nature of the response, is it the positive integers?
Perhaps in your model still missing something important predictors?
Or just you can try the Gamma or log-normal distr.

Ivan, IPAE UB RAS
Reply | Threaded
Open this post in threaded view
|

Re: glm poisson and quasipoisson

ilai-2
In reply to this post by Wim Kreinen
On Thu, Jan 31, 2013 at 2:13 PM, Wim Kreinen <[hidden email]> wrote:

> Hello,
>
> I have a question about modelling via  glm.


I think you are way off track. Either the data, glm, or both, are not what
you think they are.


> I have a dataset

skn300.tab <- structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
    kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
    3333L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
    18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
    22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
    24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
    24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
    24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
    24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
    24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
    24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
    24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
    24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names = c("n",
"freq", "kum"), row.names = c(NA, -97L), class = "data.frame")


> that looks like as if it where poisson distributed (actually I would
> appreciate that) but it isnt


plot(skn300.tab)

My guess, we are looking at the pdf and cdf (maybe even of a Poisson
process), but not at any "data" that lends itself to a (generalized) linear
model. Consult a statistician, post on stackexchange, read about
regression, or better define your actual R problem here, demonstrating this
is not homework - see the posting guide.

Cheers




> because  mean unequals var.
>
>
> > mean (x)
> [1] 901.7827
> > var (x)
> [1] 132439.3
>
>
> Anyway, I tried to model it via poisson and quasipoisson. Actually, just to
> get an impression how glm works. But I dont know how to interprete the
> data. Of course this is the case because my knowledge concerning logistic
> regressions is rather limited. Hoping there is somebody with mercy I would
> like to understand which parameters are important, e.g. which paramter
> might give me a hint that a poisson model is a bad idea. For hints
> concerning some tutorials  about reading glm-output I would appreciate as
> well.
>
> Thanks
> Wim
>
>
> > skn300.glmp <- glm (freq~n, data=skn300.tab, family=poisson)
> > summary (skn300.glmp)
>
> Call:
> glm(formula = freq ~ n, family = poisson, data = skn300.tab)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -51.332   -9.383   -6.599   -3.959   55.111
>
> Coefficients:
>               Estimate Std. Error z value Pr(>|z|)
> (Intercept)  7.2374375  0.0093285   775.8   <2e-16 ***
> n           -0.0539424  0.0003699  -145.8   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
>     Null deviance: 71731  on 96  degrees of freedom
> Residual deviance: 37383  on 95  degrees of freedom
> AIC: 37800
>
> Number of Fisher Scoring iterations: 6
>
> >
> > skn300.glmq <- glm (freq~n, data=skn300.tab, family=quasipoisson)
> > summary (skn300.glmq)
>
> Call:
> glm(formula = freq ~ n, family = quasipoisson, data = skn300.tab)
>
> Deviance Residuals:
>     Min       1Q   Median       3Q      Max
> -51.332   -9.383   -6.599   -3.959   55.111
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)  7.237438   0.186381  38.831  < 2e-16 ***
> n           -0.053942   0.007391  -7.298  8.8e-11 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for quasipoisson family taken to be 399.1874)
>
>     Null deviance: 71731  on 96  degrees of freedom
> Residual deviance: 37383  on 95  degrees of freedom
> AIC: NA
>
> Number of Fisher Scoring iterations: 6
>
>
> >  dput (skn300.tab)
> structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
> 100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
> 1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
> 297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
> 78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
> 19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
> 4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
> 1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
>     kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
>     3333L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
>     18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
>     22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
>     24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
>     24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
>     24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
>     24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
>     24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
>     24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
>     24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
>     24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names = c("n",
> "freq", "kum"), row.names = c(NA, -97L), class = "data.frame")
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.