# Binomial GLM, chisq.test, or?

6 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Binomial GLM, chisq.test, or?

 Hi, I have a data set with 999 observations, for each of them I have data on four variables: site, colony, gender (quite a few NA values), and cohort. This is how the data set looks like: > str(dispersal) 'data.frame': 999 obs. of  4 variables:  \$ site  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ...  \$ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ...  \$ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ...  \$ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ... Now, I want to estimate if sites 1 and site 2 differ on some of the other variables. For instance there are relatively more males in site 1 with respect to site 2, more individuals of the main colony in site 2 with respect to site 1 and this sort of things. I thought I might do a binomial GLM considering as response variable the site, I tried to run the more general model to have a look to overdispersion but I believe there is something wrong even before worrying about overdispersion. I know (I did a chisq.test) that cohort2004 is very diversly represented between the two sites but it is not reflected in the results of GLM. Here there are the results of chisq.test and the GLM: 1) > age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1))) > dimnames(age_cohort)<-list(site=c("M","D"), +                        cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996)) > age_cohort     cohorts site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996    M  142   95   46   33   14   59   18   12    7    1    0    D  258  144   54   70   20   11    6    8    2    3    1 > (Xsqagec <- chisq.test(age_cohort))  # Prints test summary         Pearson's Chi-squared test data:  age_cohort X-squared = 82.6016, df = 10, p-value = 1.549e-13 Mensajes de aviso perdidos In chisq.test(age_cohort) : Chi-squared approximation may be incorrect > Xsqagec\$observed   # observed counts     cohorts site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996    M  142   95   46   33   14   59   18   12    7    1    0    D  258  144   54   70   20   11    6    8    2    3    1 > Xsqagec\$expected   # expected counts under the null     cohorts site     2010     2009     2008     2007     2006     2004     2003    M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717    D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283     cohorts site      2002     2001     2000      1996    M  8.505976 3.827689 1.701195 0.4252988    D 11.494024 5.172311 2.298805 0.5747012 > Xsqagec\$residuals  # Pearson residuals     cohorts site       2010       2009       2008       2007       2006    M -2.1559111 -0.6592367  0.5321050 -1.6326395 -0.1210101    D  1.8546283  0.5671101 -0.4577448  1.4044825  0.1040993     cohorts site       2004       2003       2002       2001       2000    M  5.3569686  2.4391720  1.1980192  1.6214643 -0.5376032    D -4.6083465 -2.0983042 -1.0305993 -1.3948690  0.4624746     cohorts site       1996    M -0.6521494    D  0.5610132 > Xsqagec\$stdres     # standardized residuals     cohorts site       2010       2009       2008       2007       2006    M -3.6665549 -0.9962228  0.7397057 -2.2733888 -0.1623984    D  3.6665549  0.9962228 -0.7397057  2.2733888  0.1623984     cohorts site       2004       2003       2002       2001       2000    M  7.3264142  3.2566808  1.5962909  2.1485311 -0.7105713    D -7.3264142 -3.2566808 -1.5962909 -2.1485311  0.7105713     cohorts site       1996    M -0.8606814    D  0.8606814 2) > model1<-glm(site~gender*colony*cohort,binomial) > summary(model1) Call: glm(formula = site ~ gender * colony * cohort, family = binomial) Deviance Residuals:      Min        1Q    Median        3Q       Max   -1.84648  -0.96954  -0.00036   1.11269   2.03933   Coefficients: (12 not defined because of singularities)                                  Estimate Std. Error z value (Intercept)                    -1.657e+01  2.400e+03  -0.007 gender1                        -2.231e-01  9.220e-01  -0.242 colonyother                     9.531e-02  8.006e-01   0.119 cohort2002                      1.717e-08  3.393e+03   0.000 cohort2003                      1.766e+01  2.400e+03   0.007 cohort2004                      1.807e+01  2.400e+03   0.008 cohort2006                      1.697e+01  2.400e+03   0.007 cohort2007                      1.726e+01  2.400e+03   0.007 cohort2008                      1.606e+01  2.400e+03   0.007 cohort2009                      1.657e+01  2.400e+03   0.007 cohort2010                      1.587e+01  2.400e+03   0.007 gender1:colonyother             9.163e-01  1.087e+00   0.843 gender1:cohort2002              1.719e+01  2.400e+03   0.007 gender1:cohort2003             -1.823e-01  1.713e+00  -0.106 gender1:cohort2004              2.231e-01  1.329e+00   0.168 gender1:cohort2006             -5.878e-01  1.586e+00  -0.371 gender1:cohort2007             -6.454e-02  1.784e+00  -0.036 gender1:cohort2008              8.881e-01  1.156e+00   0.768 gender1:cohort2009             -2.817e-02  1.199e+00  -0.023 gender1:cohort2010                     NA         NA      NA colonyother:cohort2002                 NA         NA      NA colonyother:cohort2003                 NA         NA      NA colonyother:cohort2004          1.497e+01  1.697e+03   0.009 colonyother:cohort2006         -1.707e+01  2.400e+03  -0.007 colonyother:cohort2007         -7.885e-01  1.772e+00  -0.445 colonyother:cohort2008                 NA         NA      NA colonyother:cohort2009                 NA         NA      NA colonyother:cohort2010                 NA         NA      NA gender1:colonyother:cohort2002         NA         NA      NA gender1:colonyother:cohort2003         NA         NA      NA gender1:colonyother:cohort2004 -9.163e-01  2.400e+03   0.000 gender1:colonyother:cohort2006         NA         NA      NA gender1:colonyother:cohort2007 -2.575e+00  2.379e+00  -1.082 gender1:colonyother:cohort2008         NA         NA      NA gender1:colonyother:cohort2009         NA         NA      NA gender1:colonyother:cohort2010         NA         NA      NA                                Pr(>|z|) (Intercept)                       0.994 gender1                           0.809 colonyother                       0.905 cohort2002                        1.000 cohort2003                        0.994 cohort2004                        0.994 cohort2006                        0.994 cohort2007                        0.994 cohort2008                        0.995 cohort2009                        0.994 cohort2010                        0.995 gender1:colonyother               0.399 gender1:cohort2002                0.994 gender1:cohort2003                0.915 gender1:cohort2004                0.867 gender1:cohort2006                0.711 gender1:cohort2007                0.971 gender1:cohort2008                0.442 gender1:cohort2009                0.981 gender1:cohort2010                   NA colonyother:cohort2002               NA colonyother:cohort2003               NA colonyother:cohort2004            0.993 colonyother:cohort2006            0.994 colonyother:cohort2007            0.656 colonyother:cohort2008               NA colonyother:cohort2009               NA colonyother:cohort2010               NA gender1:colonyother:cohort2002       NA gender1:colonyother:cohort2003       NA gender1:colonyother:cohort2004    1.000 gender1:colonyother:cohort2006       NA gender1:colonyother:cohort2007    0.279 gender1:colonyother:cohort2008       NA gender1:colonyother:cohort2009       NA gender1:colonyother:cohort2010       NA (Dispersion parameter for binomial family taken to be 1)     Null deviance: 311.91  on 224  degrees of freedom Residual deviance: 271.61  on 201  degrees of freedom   (774 observations deleted due to missingness) AIC: 319.61 Number of Fisher Scoring iterations: 15 I thought that perhaps keeping the gender as explanatory variable was reducing a lot the sample size and it was the matter (I removed it): > model1<-glm(site~colony*cohort,binomial) > summary(model1) Call: glm(formula = site ~ colony * cohort, family = binomial) Deviance Residuals:     Min       1Q   Median       3Q      Max   -1.8683  -0.9712  -0.8757   1.3468   1.7470   Coefficients: (6 not defined because of singularities)                         Estimate Std. Error z value Pr(>|z|) (Intercept)             -15.5661  1455.3976  -0.011    0.991 colonyother               0.2544     0.2167   1.174    0.240 cohort2000               14.4675  1455.3981   0.010    0.992 cohort2001               16.8188  1455.3978   0.012    0.991 cohort2002               15.9715  1455.3977   0.011    0.991 cohort2003               16.6647  1455.3977   0.011    0.991 cohort2004               17.1194  1455.3976   0.012    0.991 cohort2006               15.2607  1455.3976   0.010    0.992 cohort2007               14.9470  1455.3976   0.010    0.992 cohort2008               15.3837  1455.3976   0.011    0.992 cohort2009               15.1762  1455.3976   0.010    0.992 cohort2010               14.8053  1455.3976   0.010    0.992 colonyother:cohort2000        NA         NA      NA       NA colonyother:cohort2001        NA         NA      NA       NA colonyother:cohort2002        NA         NA      NA       NA colonyother:cohort2003        NA         NA      NA       NA colonyother:cohort2004   13.7583   550.0887   0.025    0.980 colonyother:cohort2006  -15.5151  1455.3976  -0.011    0.991 colonyother:cohort2007   -0.9163     0.5979  -1.533    0.125 colonyother:cohort2008        NA         NA      NA       NA colonyother:cohort2009   -0.6912     0.5214  -1.326    0.185 colonyother:cohort2010        NA         NA      NA       NA (Dispersion parameter for binomial family taken to be 1)     Null deviance: 1361.4  on 998  degrees of freedom Residual deviance: 1267.9  on 983  degrees of freedom AIC: 1299.9 Number of Fisher Scoring iterations: 14   Any comment/suggestion on this? Thanks for any help
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Binomial GLM, chisq.test, or?

 Without going deeply into your analysis, 2 comments: 1) Use the anova command to test two nested models using: anova(model1, model2, test="Chisq") 2) glm's are non-trivial models (at least to me), be sure to google for some tutorials in order to understand what you are looking at... Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: [hidden email] |  972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Fri, May 4, 2012 at 6:38 PM, lincoln <[hidden email]> wrote: > Hi, > > I have a data set with 999 observations, for each of them I have data on > four variables: > site, colony, gender (quite a few NA values), and cohort. > > This is how the data set looks like: > > str(dispersal) > 'data.frame':   999 obs. of  4 variables: >  \$ site  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ... >  \$ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ... >  \$ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ... >  \$ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ... > > Now, I want to estimate if sites 1 and site 2 differ on some of the other > variables. For instance there are relatively more males in site 1 with > respect to site 2, more individuals of the main colony in site 2 with > respect to site 1 and this sort of things. > > I thought I might do a binomial GLM considering as response variable the > site, I tried to run the more general model to have a look to > overdispersion > but I believe there is something wrong even before worrying about > overdispersion. I know (I did a chisq.test) that cohort2004 is very > diversly > represented between the two sites but it is not reflected in the results of > GLM. Here there are the results of chisq.test and the GLM: > > 1) > /> > > age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1))) > > dimnames(age_cohort)<-list(site=c("M","D"), > + > cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996)) > > age_cohort >    cohorts > site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 >   M  142   95   46   33   14   59   18   12    7    1    0 >   D  258  144   54   70   20   11    6    8    2    3    1 > > (Xsqagec <- chisq.test(age_cohort))  # Prints test summary > >        Pearson's Chi-squared test > > data:  age_cohort > X-squared = 82.6016, df = 10, p-value = 1.549e-13 > > Mensajes de aviso perdidos > In chisq.test(age_cohort) : Chi-squared approximation may be incorrect > > Xsqagec\$observed   # observed counts >    cohorts > site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 >   M  142   95   46   33   14   59   18   12    7    1    0 >   D  258  144   54   70   20   11    6    8    2    3    1 > > Xsqagec\$expected   # expected counts under the null >    cohorts > site     2010     2009     2008     2007     2006     2004     2003 >   M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717 >   D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283 >    cohorts > site      2002     2001     2000      1996 >   M  8.505976 3.827689 1.701195 0.4252988 >   D 11.494024 5.172311 2.298805 0.5747012 > > Xsqagec\$residuals  # Pearson residuals >    cohorts > site       2010       2009       2008       2007       2006 >   M -2.1559111 -0.6592367  0.5321050 -1.6326395 -0.1210101 >   D  1.8546283  0.5671101 -0.4577448  1.4044825  0.1040993 >    cohorts > site       2004       2003       2002       2001       2000 >   M  5.3569686  2.4391720  1.1980192  1.6214643 -0.5376032 >   D -4.6083465 -2.0983042 -1.0305993 -1.3948690  0.4624746 >    cohorts > site       1996 >   M -0.6521494 >   D  0.5610132 > > Xsqagec\$stdres     # standardized residuals >    cohorts > site       2010       2009       2008       2007       2006 >   M -3.6665549 -0.9962228  0.7397057 -2.2733888 -0.1623984 >   D  3.6665549  0.9962228 -0.7397057  2.2733888  0.1623984 >    cohorts > site       2004       2003       2002       2001       2000 >   M  7.3264142  3.2566808  1.5962909  2.1485311 -0.7105713 >   D -7.3264142 -3.2566808 -1.5962909 -2.1485311  0.7105713 >    cohorts > site       1996 >   M -0.8606814 >   D  0.8606814 > / > > > 2) > /> model1<-glm(site~gender*colony*cohort,binomial) > > summary(model1) > > Call: > glm(formula = site ~ gender * colony * cohort, family = binomial) > > Deviance Residuals: >     Min        1Q    Median        3Q       Max > -1.84648  -0.96954  -0.00036   1.11269   2.03933 > > Coefficients: (12 not defined because of singularities) >                                 Estimate Std. Error z value > (Intercept)                    -1.657e+01  2.400e+03  -0.007 > gender1                        -2.231e-01  9.220e-01  -0.242 > colonyother                     9.531e-02  8.006e-01   0.119 > cohort2002                      1.717e-08  3.393e+03   0.000 > cohort2003                      1.766e+01  2.400e+03   0.007 > cohort2004                      1.807e+01  2.400e+03   0.008 > cohort2006                      1.697e+01  2.400e+03   0.007 > cohort2007                      1.726e+01  2.400e+03   0.007 > cohort2008                      1.606e+01  2.400e+03   0.007 > cohort2009                      1.657e+01  2.400e+03   0.007 > cohort2010                      1.587e+01  2.400e+03   0.007 > gender1:colonyother             9.163e-01  1.087e+00   0.843 > gender1:cohort2002              1.719e+01  2.400e+03   0.007 > gender1:cohort2003             -1.823e-01  1.713e+00  -0.106 > gender1:cohort2004              2.231e-01  1.329e+00   0.168 > gender1:cohort2006             -5.878e-01  1.586e+00  -0.371 > gender1:cohort2007             -6.454e-02  1.784e+00  -0.036 > gender1:cohort2008              8.881e-01  1.156e+00   0.768 > gender1:cohort2009             -2.817e-02  1.199e+00  -0.023 > gender1:cohort2010                     NA         NA      NA > colonyother:cohort2002                 NA         NA      NA > colonyother:cohort2003                 NA         NA      NA > colonyother:cohort2004          1.497e+01  1.697e+03   0.009 > colonyother:cohort2006         -1.707e+01  2.400e+03  -0.007 > colonyother:cohort2007         -7.885e-01  1.772e+00  -0.445 > colonyother:cohort2008                 NA         NA      NA > colonyother:cohort2009                 NA         NA      NA > colonyother:cohort2010                 NA         NA      NA > gender1:colonyother:cohort2002         NA         NA      NA > gender1:colonyother:cohort2003         NA         NA      NA > gender1:colonyother:cohort2004 -9.163e-01  2.400e+03   0.000 > gender1:colonyother:cohort2006         NA         NA      NA > gender1:colonyother:cohort2007 -2.575e+00  2.379e+00  -1.082 > gender1:colonyother:cohort2008         NA         NA      NA > gender1:colonyother:cohort2009         NA         NA      NA > gender1:colonyother:cohort2010         NA         NA      NA >                               Pr(>|z|) > (Intercept)                       0.994 > gender1                           0.809 > colonyother                       0.905 > cohort2002                        1.000 > cohort2003                        0.994 > cohort2004                        0.994 > cohort2006                        0.994 > cohort2007                        0.994 > cohort2008                        0.995 > cohort2009                        0.994 > cohort2010                        0.995 > gender1:colonyother               0.399 > gender1:cohort2002                0.994 > gender1:cohort2003                0.915 > gender1:cohort2004                0.867 > gender1:cohort2006                0.711 > gender1:cohort2007                0.971 > gender1:cohort2008                0.442 > gender1:cohort2009                0.981 > gender1:cohort2010                   NA > colonyother:cohort2002               NA > colonyother:cohort2003               NA > colonyother:cohort2004            0.993 > colonyother:cohort2006            0.994 > colonyother:cohort2007            0.656 > colonyother:cohort2008               NA > colonyother:cohort2009               NA > colonyother:cohort2010               NA > gender1:colonyother:cohort2002       NA > gender1:colonyother:cohort2003       NA > gender1:colonyother:cohort2004    1.000 > gender1:colonyother:cohort2006       NA > gender1:colonyother:cohort2007    0.279 > gender1:colonyother:cohort2008       NA > gender1:colonyother:cohort2009       NA > gender1:colonyother:cohort2010       NA > > (Dispersion parameter for binomial family taken to be 1) > >    Null deviance: 311.91  on 224  degrees of freedom > Residual deviance: 271.61  on 201  degrees of freedom >  (774 observations deleted due to missingness) > AIC: 319.61 > > Number of Fisher Scoring iterations: 15 > / > > I thought that perhaps keeping the gender as explanatory variable was > reducing a lot the sample size and it was the matter (I removed it): > > > > /> model1<-glm(site~colony*cohort,binomial) > > summary(model1) > > Call: > glm(formula = site ~ colony * cohort, family = binomial) > > Deviance Residuals: >    Min       1Q   Median       3Q      Max > -1.8683  -0.9712  -0.8757   1.3468   1.7470 > > Coefficients: (6 not defined because of singularities) >                        Estimate Std. Error z value Pr(>|z|) > (Intercept)             -15.5661  1455.3976  -0.011    0.991 > colonyother               0.2544     0.2167   1.174    0.240 > cohort2000               14.4675  1455.3981   0.010    0.992 > cohort2001               16.8188  1455.3978   0.012    0.991 > cohort2002               15.9715  1455.3977   0.011    0.991 > cohort2003               16.6647  1455.3977   0.011    0.991 > cohort2004               17.1194  1455.3976   0.012    0.991 > cohort2006               15.2607  1455.3976   0.010    0.992 > cohort2007               14.9470  1455.3976   0.010    0.992 > cohort2008               15.3837  1455.3976   0.011    0.992 > cohort2009               15.1762  1455.3976   0.010    0.992 > cohort2010               14.8053  1455.3976   0.010    0.992 > colonyother:cohort2000        NA         NA      NA       NA > colonyother:cohort2001        NA         NA      NA       NA > colonyother:cohort2002        NA         NA      NA       NA > colonyother:cohort2003        NA         NA      NA       NA > colonyother:cohort2004   13.7583   550.0887   0.025    0.980 > colonyother:cohort2006  -15.5151  1455.3976  -0.011    0.991 > colonyother:cohort2007   -0.9163     0.5979  -1.533    0.125 > colonyother:cohort2008        NA         NA      NA       NA > colonyother:cohort2009   -0.6912     0.5214  -1.326    0.185 > colonyother:cohort2010        NA         NA      NA       NA > > (Dispersion parameter for binomial family taken to be 1) > >    Null deviance: 1361.4  on 998  degrees of freedom > Residual deviance: 1267.9  on 983  degrees of freedom > AIC: 1299.9 > > Number of Fisher Scoring iterations: 14 >  / > > Any comment/suggestion on this? > Thanks for any help > > -- > View this message in context: > http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Binomial GLM, chisq.test, or?

 Thanks Tal for answering, Anyway I still have no idea on why the binomial GLM is missing the relationship between the response variable and the explanatory variable "cohort". Is there anyone who might help me to understand this?
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Binomial GLM, chisq.test, or?

 Hi Lincoln, Some thoughts: 1) Did you intend to use  "cohort" as a factor and not as a numeric? (at least that is what it looks like in your output) 2) Is there a strong correlation between "cohort" and the other explanatory variables you are trying in your model? ----------------Contact Details:------------------------------------------------------- Contact me: [hidden email] |  972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Mon, May 7, 2012 at 10:28 AM, lincoln <[hidden email]> wrote: > Thanks Tal for answering, > > Anyway I still have no idea on why the binomial GLM is missing the > relationship between the response variable and the explanatory variable > "cohort". > > Is there anyone who might help me to understand this? > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941p4614184.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Gwet's AC1

 R has functions for computing kappa, fleiss's kappa, etc., but can it compute Gwet's AC1? Thanks, Matt.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.