|
Hi,
I have a data set with 999 observations, for each of them I have data on four variables: site, colony, gender (quite a few NA values), and cohort. This is how the data set looks like: > str(dispersal) 'data.frame': 999 obs. of 4 variables: $ site : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ... $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ... $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ... $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ... Now, I want to estimate if sites 1 and site 2 differ on some of the other variables. For instance there are relatively more males in site 1 with respect to site 2, more individuals of the main colony in site 2 with respect to site 1 and this sort of things. I thought I might do a binomial GLM considering as response variable the site, I tried to run the more general model to have a look to overdispersion but I believe there is something wrong even before worrying about overdispersion. I know (I did a chisq.test) that cohort2004 is very diversly represented between the two sites but it is not reflected in the results of GLM. Here there are the results of chisq.test and the GLM: 1) > age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1))) > dimnames(age_cohort)<-list(site=c("M","D"), + cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996)) > age_cohort cohorts site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 M 142 95 46 33 14 59 18 12 7 1 0 D 258 144 54 70 20 11 6 8 2 3 1 > (Xsqagec <- chisq.test(age_cohort)) # Prints test summary Pearson's Chi-squared test data: age_cohort X-squared = 82.6016, df = 10, p-value = 1.549e-13 Mensajes de aviso perdidos In chisq.test(age_cohort) : Chi-squared approximation may be incorrect > Xsqagec$observed # observed counts cohorts site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 M 142 95 46 33 14 59 18 12 7 1 0 D 258 144 54 70 20 11 6 8 2 3 1 > Xsqagec$expected # expected counts under the null cohorts site 2010 2009 2008 2007 2006 2004 2003 M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717 D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283 cohorts site 2002 2001 2000 1996 M 8.505976 3.827689 1.701195 0.4252988 D 11.494024 5.172311 2.298805 0.5747012 > Xsqagec$residuals # Pearson residuals cohorts site 2010 2009 2008 2007 2006 M -2.1559111 -0.6592367 0.5321050 -1.6326395 -0.1210101 D 1.8546283 0.5671101 -0.4577448 1.4044825 0.1040993 cohorts site 2004 2003 2002 2001 2000 M 5.3569686 2.4391720 1.1980192 1.6214643 -0.5376032 D -4.6083465 -2.0983042 -1.0305993 -1.3948690 0.4624746 cohorts site 1996 M -0.6521494 D 0.5610132 > Xsqagec$stdres # standardized residuals cohorts site 2010 2009 2008 2007 2006 M -3.6665549 -0.9962228 0.7397057 -2.2733888 -0.1623984 D 3.6665549 0.9962228 -0.7397057 2.2733888 0.1623984 cohorts site 2004 2003 2002 2001 2000 M 7.3264142 3.2566808 1.5962909 2.1485311 -0.7105713 D -7.3264142 -3.2566808 -1.5962909 -2.1485311 0.7105713 cohorts site 1996 M -0.8606814 D 0.8606814 2) > model1<-glm(site~gender*colony*cohort,binomial) > summary(model1) Call: glm(formula = site ~ gender * colony * cohort, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.84648 -0.96954 -0.00036 1.11269 2.03933 Coefficients: (12 not defined because of singularities) Estimate Std. Error z value (Intercept) -1.657e+01 2.400e+03 -0.007 gender1 -2.231e-01 9.220e-01 -0.242 colonyother 9.531e-02 8.006e-01 0.119 cohort2002 1.717e-08 3.393e+03 0.000 cohort2003 1.766e+01 2.400e+03 0.007 cohort2004 1.807e+01 2.400e+03 0.008 cohort2006 1.697e+01 2.400e+03 0.007 cohort2007 1.726e+01 2.400e+03 0.007 cohort2008 1.606e+01 2.400e+03 0.007 cohort2009 1.657e+01 2.400e+03 0.007 cohort2010 1.587e+01 2.400e+03 0.007 gender1:colonyother 9.163e-01 1.087e+00 0.843 gender1:cohort2002 1.719e+01 2.400e+03 0.007 gender1:cohort2003 -1.823e-01 1.713e+00 -0.106 gender1:cohort2004 2.231e-01 1.329e+00 0.168 gender1:cohort2006 -5.878e-01 1.586e+00 -0.371 gender1:cohort2007 -6.454e-02 1.784e+00 -0.036 gender1:cohort2008 8.881e-01 1.156e+00 0.768 gender1:cohort2009 -2.817e-02 1.199e+00 -0.023 gender1:cohort2010 NA NA NA colonyother:cohort2002 NA NA NA colonyother:cohort2003 NA NA NA colonyother:cohort2004 1.497e+01 1.697e+03 0.009 colonyother:cohort2006 -1.707e+01 2.400e+03 -0.007 colonyother:cohort2007 -7.885e-01 1.772e+00 -0.445 colonyother:cohort2008 NA NA NA colonyother:cohort2009 NA NA NA colonyother:cohort2010 NA NA NA gender1:colonyother:cohort2002 NA NA NA gender1:colonyother:cohort2003 NA NA NA gender1:colonyother:cohort2004 -9.163e-01 2.400e+03 0.000 gender1:colonyother:cohort2006 NA NA NA gender1:colonyother:cohort2007 -2.575e+00 2.379e+00 -1.082 gender1:colonyother:cohort2008 NA NA NA gender1:colonyother:cohort2009 NA NA NA gender1:colonyother:cohort2010 NA NA NA Pr(>|z|) (Intercept) 0.994 gender1 0.809 colonyother 0.905 cohort2002 1.000 cohort2003 0.994 cohort2004 0.994 cohort2006 0.994 cohort2007 0.994 cohort2008 0.995 cohort2009 0.994 cohort2010 0.995 gender1:colonyother 0.399 gender1:cohort2002 0.994 gender1:cohort2003 0.915 gender1:cohort2004 0.867 gender1:cohort2006 0.711 gender1:cohort2007 0.971 gender1:cohort2008 0.442 gender1:cohort2009 0.981 gender1:cohort2010 NA colonyother:cohort2002 NA colonyother:cohort2003 NA colonyother:cohort2004 0.993 colonyother:cohort2006 0.994 colonyother:cohort2007 0.656 colonyother:cohort2008 NA colonyother:cohort2009 NA colonyother:cohort2010 NA gender1:colonyother:cohort2002 NA gender1:colonyother:cohort2003 NA gender1:colonyother:cohort2004 1.000 gender1:colonyother:cohort2006 NA gender1:colonyother:cohort2007 0.279 gender1:colonyother:cohort2008 NA gender1:colonyother:cohort2009 NA gender1:colonyother:cohort2010 NA (Dispersion parameter for binomial family taken to be 1) Null deviance: 311.91 on 224 degrees of freedom Residual deviance: 271.61 on 201 degrees of freedom (774 observations deleted due to missingness) AIC: 319.61 Number of Fisher Scoring iterations: 15 I thought that perhaps keeping the gender as explanatory variable was reducing a lot the sample size and it was the matter (I removed it): > model1<-glm(site~colony*cohort,binomial) > summary(model1) Call: glm(formula = site ~ colony * cohort, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.8683 -0.9712 -0.8757 1.3468 1.7470 Coefficients: (6 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) -15.5661 1455.3976 -0.011 0.991 colonyother 0.2544 0.2167 1.174 0.240 cohort2000 14.4675 1455.3981 0.010 0.992 cohort2001 16.8188 1455.3978 0.012 0.991 cohort2002 15.9715 1455.3977 0.011 0.991 cohort2003 16.6647 1455.3977 0.011 0.991 cohort2004 17.1194 1455.3976 0.012 0.991 cohort2006 15.2607 1455.3976 0.010 0.992 cohort2007 14.9470 1455.3976 0.010 0.992 cohort2008 15.3837 1455.3976 0.011 0.992 cohort2009 15.1762 1455.3976 0.010 0.992 cohort2010 14.8053 1455.3976 0.010 0.992 colonyother:cohort2000 NA NA NA NA colonyother:cohort2001 NA NA NA NA colonyother:cohort2002 NA NA NA NA colonyother:cohort2003 NA NA NA NA colonyother:cohort2004 13.7583 550.0887 0.025 0.980 colonyother:cohort2006 -15.5151 1455.3976 -0.011 0.991 colonyother:cohort2007 -0.9163 0.5979 -1.533 0.125 colonyother:cohort2008 NA NA NA NA colonyother:cohort2009 -0.6912 0.5214 -1.326 0.185 colonyother:cohort2010 NA NA NA NA (Dispersion parameter for binomial family taken to be 1) Null deviance: 1361.4 on 998 degrees of freedom Residual deviance: 1267.9 on 983 degrees of freedom AIC: 1299.9 Number of Fisher Scoring iterations: 14 Any comment/suggestion on this? Thanks for any help |
|
Without going deeply into your analysis, 2 comments:
1) Use the anova command to test two nested models using: anova(model1, model2, test="Chisq") 2) glm's are non-trivial models (at least to me), be sure to google for some tutorials in order to understand what you are looking at... Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: [hidden email] | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Fri, May 4, 2012 at 6:38 PM, lincoln <[hidden email]> wrote: > Hi, > > I have a data set with 999 observations, for each of them I have data on > four variables: > site, colony, gender (quite a few NA values), and cohort. > > This is how the data set looks like: > > str(dispersal) > 'data.frame': 999 obs. of 4 variables: > $ site : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ... > $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ... > $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ... > $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ... > > Now, I want to estimate if sites 1 and site 2 differ on some of the other > variables. For instance there are relatively more males in site 1 with > respect to site 2, more individuals of the main colony in site 2 with > respect to site 1 and this sort of things. > > I thought I might do a binomial GLM considering as response variable the > site, I tried to run the more general model to have a look to > overdispersion > but I believe there is something wrong even before worrying about > overdispersion. I know (I did a chisq.test) that cohort2004 is very > diversly > represented between the two sites but it is not reflected in the results of > GLM. Here there are the results of chisq.test and the GLM: > > 1) > /> > > age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1))) > > dimnames(age_cohort)<-list(site=c("M","D"), > + > cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996)) > > age_cohort > cohorts > site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 > M 142 95 46 33 14 59 18 12 7 1 0 > D 258 144 54 70 20 11 6 8 2 3 1 > > (Xsqagec <- chisq.test(age_cohort)) # Prints test summary > > Pearson's Chi-squared test > > data: age_cohort > X-squared = 82.6016, df = 10, p-value = 1.549e-13 > > Mensajes de aviso perdidos > In chisq.test(age_cohort) : Chi-squared approximation may be incorrect > > Xsqagec$observed # observed counts > cohorts > site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 > M 142 95 46 33 14 59 18 12 7 1 0 > D 258 144 54 70 20 11 6 8 2 3 1 > > Xsqagec$expected # expected counts under the null > cohorts > site 2010 2009 2008 2007 2006 2004 2003 > M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717 > D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283 > cohorts > site 2002 2001 2000 1996 > M 8.505976 3.827689 1.701195 0.4252988 > D 11.494024 5.172311 2.298805 0.5747012 > > Xsqagec$residuals # Pearson residuals > cohorts > site 2010 2009 2008 2007 2006 > M -2.1559111 -0.6592367 0.5321050 -1.6326395 -0.1210101 > D 1.8546283 0.5671101 -0.4577448 1.4044825 0.1040993 > cohorts > site 2004 2003 2002 2001 2000 > M 5.3569686 2.4391720 1.1980192 1.6214643 -0.5376032 > D -4.6083465 -2.0983042 -1.0305993 -1.3948690 0.4624746 > cohorts > site 1996 > M -0.6521494 > D 0.5610132 > > Xsqagec$stdres # standardized residuals > cohorts > site 2010 2009 2008 2007 2006 > M -3.6665549 -0.9962228 0.7397057 -2.2733888 -0.1623984 > D 3.6665549 0.9962228 -0.7397057 2.2733888 0.1623984 > cohorts > site 2004 2003 2002 2001 2000 > M 7.3264142 3.2566808 1.5962909 2.1485311 -0.7105713 > D -7.3264142 -3.2566808 -1.5962909 -2.1485311 0.7105713 > cohorts > site 1996 > M -0.8606814 > D 0.8606814 > / > > > 2) > /> model1<-glm(site~gender*colony*cohort,binomial) > > summary(model1) > > Call: > glm(formula = site ~ gender * colony * cohort, family = binomial) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.84648 -0.96954 -0.00036 1.11269 2.03933 > > Coefficients: (12 not defined because of singularities) > Estimate Std. Error z value > (Intercept) -1.657e+01 2.400e+03 -0.007 > gender1 -2.231e-01 9.220e-01 -0.242 > colonyother 9.531e-02 8.006e-01 0.119 > cohort2002 1.717e-08 3.393e+03 0.000 > cohort2003 1.766e+01 2.400e+03 0.007 > cohort2004 1.807e+01 2.400e+03 0.008 > cohort2006 1.697e+01 2.400e+03 0.007 > cohort2007 1.726e+01 2.400e+03 0.007 > cohort2008 1.606e+01 2.400e+03 0.007 > cohort2009 1.657e+01 2.400e+03 0.007 > cohort2010 1.587e+01 2.400e+03 0.007 > gender1:colonyother 9.163e-01 1.087e+00 0.843 > gender1:cohort2002 1.719e+01 2.400e+03 0.007 > gender1:cohort2003 -1.823e-01 1.713e+00 -0.106 > gender1:cohort2004 2.231e-01 1.329e+00 0.168 > gender1:cohort2006 -5.878e-01 1.586e+00 -0.371 > gender1:cohort2007 -6.454e-02 1.784e+00 -0.036 > gender1:cohort2008 8.881e-01 1.156e+00 0.768 > gender1:cohort2009 -2.817e-02 1.199e+00 -0.023 > gender1:cohort2010 NA NA NA > colonyother:cohort2002 NA NA NA > colonyother:cohort2003 NA NA NA > colonyother:cohort2004 1.497e+01 1.697e+03 0.009 > colonyother:cohort2006 -1.707e+01 2.400e+03 -0.007 > colonyother:cohort2007 -7.885e-01 1.772e+00 -0.445 > colonyother:cohort2008 NA NA NA > colonyother:cohort2009 NA NA NA > colonyother:cohort2010 NA NA NA > gender1:colonyother:cohort2002 NA NA NA > gender1:colonyother:cohort2003 NA NA NA > gender1:colonyother:cohort2004 -9.163e-01 2.400e+03 0.000 > gender1:colonyother:cohort2006 NA NA NA > gender1:colonyother:cohort2007 -2.575e+00 2.379e+00 -1.082 > gender1:colonyother:cohort2008 NA NA NA > gender1:colonyother:cohort2009 NA NA NA > gender1:colonyother:cohort2010 NA NA NA > Pr(>|z|) > (Intercept) 0.994 > gender1 0.809 > colonyother 0.905 > cohort2002 1.000 > cohort2003 0.994 > cohort2004 0.994 > cohort2006 0.994 > cohort2007 0.994 > cohort2008 0.995 > cohort2009 0.994 > cohort2010 0.995 > gender1:colonyother 0.399 > gender1:cohort2002 0.994 > gender1:cohort2003 0.915 > gender1:cohort2004 0.867 > gender1:cohort2006 0.711 > gender1:cohort2007 0.971 > gender1:cohort2008 0.442 > gender1:cohort2009 0.981 > gender1:cohort2010 NA > colonyother:cohort2002 NA > colonyother:cohort2003 NA > colonyother:cohort2004 0.993 > colonyother:cohort2006 0.994 > colonyother:cohort2007 0.656 > colonyother:cohort2008 NA > colonyother:cohort2009 NA > colonyother:cohort2010 NA > gender1:colonyother:cohort2002 NA > gender1:colonyother:cohort2003 NA > gender1:colonyother:cohort2004 1.000 > gender1:colonyother:cohort2006 NA > gender1:colonyother:cohort2007 0.279 > gender1:colonyother:cohort2008 NA > gender1:colonyother:cohort2009 NA > gender1:colonyother:cohort2010 NA > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 311.91 on 224 degrees of freedom > Residual deviance: 271.61 on 201 degrees of freedom > (774 observations deleted due to missingness) > AIC: 319.61 > > Number of Fisher Scoring iterations: 15 > / > > I thought that perhaps keeping the gender as explanatory variable was > reducing a lot the sample size and it was the matter (I removed it): > > > > /> model1<-glm(site~colony*cohort,binomial) > > summary(model1) > > Call: > glm(formula = site ~ colony * cohort, family = binomial) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.8683 -0.9712 -0.8757 1.3468 1.7470 > > Coefficients: (6 not defined because of singularities) > Estimate Std. Error z value Pr(>|z|) > (Intercept) -15.5661 1455.3976 -0.011 0.991 > colonyother 0.2544 0.2167 1.174 0.240 > cohort2000 14.4675 1455.3981 0.010 0.992 > cohort2001 16.8188 1455.3978 0.012 0.991 > cohort2002 15.9715 1455.3977 0.011 0.991 > cohort2003 16.6647 1455.3977 0.011 0.991 > cohort2004 17.1194 1455.3976 0.012 0.991 > cohort2006 15.2607 1455.3976 0.010 0.992 > cohort2007 14.9470 1455.3976 0.010 0.992 > cohort2008 15.3837 1455.3976 0.011 0.992 > cohort2009 15.1762 1455.3976 0.010 0.992 > cohort2010 14.8053 1455.3976 0.010 0.992 > colonyother:cohort2000 NA NA NA NA > colonyother:cohort2001 NA NA NA NA > colonyother:cohort2002 NA NA NA NA > colonyother:cohort2003 NA NA NA NA > colonyother:cohort2004 13.7583 550.0887 0.025 0.980 > colonyother:cohort2006 -15.5151 1455.3976 -0.011 0.991 > colonyother:cohort2007 -0.9163 0.5979 -1.533 0.125 > colonyother:cohort2008 NA NA NA NA > colonyother:cohort2009 -0.6912 0.5214 -1.326 0.185 > colonyother:cohort2010 NA NA NA NA > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 1361.4 on 998 degrees of freedom > Residual deviance: 1267.9 on 983 degrees of freedom > AIC: 1299.9 > > Number of Fisher Scoring iterations: 14 > / > > Any comment/suggestion on this? > Thanks for any help > > -- > View this message in context: > http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks Tal for answering,
Anyway I still have no idea on why the binomial GLM is missing the relationship between the response variable and the explanatory variable "cohort". Is there anyone who might help me to understand this? |
|
Hi Lincoln,
Some thoughts: 1) Did you intend to use "cohort" as a factor and not as a numeric? (at least that is what it looks like in your output) 2) Is there a strong correlation between "cohort" and the other explanatory variables you are trying in your model? ----------------Contact Details:------------------------------------------------------- Contact me: [hidden email] | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Mon, May 7, 2012 at 10:28 AM, lincoln <[hidden email]> wrote: > Thanks Tal for answering, > > Anyway I still have no idea on why the binomial GLM is missing the > relationship between the response variable and the explanatory variable > "cohort". > > Is there anyone who might help me to understand this? > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941p4614184.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
R has functions for computing kappa, fleiss's kappa, etc., but can it compute Gwet's AC1?
Thanks, Matt. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Tal Galili
Hi Tal,
Thanks for replying. (1) I am going to use cohort as a factor and (2) no, there are no strong correlation between "cohort" and the other predictors. I am using a binomial GLM and the lack of significance of "cohort" seems it was due to one of the 11 levels (the base level) of this factor to be very scarcely represented, in other words there was just one individual of cohort 1996. I did not realize it before someone made me point it out. |
| Powered by Nabble | Edit this page |
