|
Hi all,
I can't find the error in the binomial GLM I have done. I want to use that because there are more than one explanatory variables (all categorical) and a binary response variable. This is how my data set looks like: > str(data) 'data.frame': 1004 obs. of 5 variables: $ site : int 0 0 0 0 0 0 0 0 0 0 ... $ sex : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ... $ age : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11 11 ... $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ... I know that, particularly for one level of variable "cohort" (2004 value), it should be a strong effect of variable "cohort" on variable "site" so I do a Chi square test that confirms the null hypothesis there is a difference in sites on the way "cohort" is distributed: > (chisq.test(data$site,data$cohort)) Pearson's Chi-squared test data: data$site and data$cohort X-squared = 82.6016, df = 10, p-value = 1.549e-13 Mensajes de aviso perdidos In chisq.test(data$site, data$cohort) : Chi-squared approximation may be incorrect After that, I have tried to use a binomial GLM with all the explanatory variables but I couldn't find any significance of any variable, neither cohort, and for this reason I tried to use only cohort as predictor and I get this: > BinomialGlm <- glm(site ~ cohort, data=data,binomial) > summary(BinomialGlm) Call: glm(formula = site ~ cohort, family = binomial, data = data) Deviance Residuals: Min 1Q Median 3Q Max -1.9239 -0.9365 -0.9365 1.3584 1.6651 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -12.57 324.74 -0.039 0.969 cohort2000 11.47 324.75 0.035 0.972 cohort2001 13.82 324.74 0.043 0.966 cohort2002 12.97 324.74 0.040 0.968 cohort2003 13.66 324.74 0.042 0.966 cohort2004 14.25 324.74 0.044 0.965 cohort2006 12.21 324.74 0.038 0.970 cohort2007 11.81 324.74 0.036 0.971 cohort2008 12.41 324.74 0.038 0.970 cohort2009 12.15 324.74 0.037 0.970 cohort2010 11.97 324.74 0.037 0.971 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1369.3 on 1003 degrees of freedom Residual deviance: 1283.7 on 993 degrees of freedom AIC: 1305.7 Number of Fisher Scoring iterations: 11 I tired to use simple GLM (gaussian family) and I get results that are more logicals: > GaussGlm <- glm(site ~ cohort, data=data) > summary(GaussGlm) Call: glm(formula = site ~ cohort, data = data) Deviance Residuals: Min 1Q Median 3Q Max -0.8429 -0.3550 -0.3550 0.6025 0.7500 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.740e-14 4.762e-01 0.000 1.0000 cohort2000 2.500e-01 5.324e-01 0.470 0.6388 cohort2001 7.778e-01 5.020e-01 1.549 0.1216 cohort2002 6.000e-01 4.880e-01 1.230 0.2192 cohort2003 7.500e-01 4.861e-01 1.543 0.1231 cohort2004 8.429e-01 4.796e-01 1.757 0.0792 . cohort2006 4.118e-01 4.832e-01 0.852 0.3943 cohort2007 3.204e-01 4.785e-01 0.670 0.5033 cohort2008 4.600e-01 4.786e-01 0.961 0.3367 cohort2009 3.975e-01 4.772e-01 0.833 0.4051 cohort2010 3.550e-01 4.768e-01 0.745 0.4567 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 0.2267955) Null deviance: 245.40 on 1003 degrees of freedom Residual deviance: 225.21 on 993 degrees of freedom AIC: 1372.5 Number of Fisher Scoring iterations: 2 What is going on? Any suggestion/commentary? |
|
1. As this is a statistical, rather than an R issue, you would do
better posting on a statistical help site like stats.stackexchange.com (although some generous soul here may respond). 2. You would also probably do better consulting with a local statistical resource if available, as it is difficult to explain such issues remotely. Cheers, Bert On Mon, May 7, 2012 at 10:05 AM, lincoln <[hidden email]> wrote: > Hi all, > > I can't find the error in the binomial GLM I have done. I want to use that > because there are more than one explanatory variables (all categorical) and > a binary response variable. > This is how my data set looks like: >> str(data) > 'data.frame': 1004 obs. of 5 variables: > $ site : int 0 0 0 0 0 0 0 0 0 0 ... > $ sex : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ... > $ age : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... > $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11 > 11 ... > $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ... > > I know that, particularly for one level of variable "cohort" (2004 value), > it should be a strong effect of variable "cohort" on variable "site" so I do > a Chi square test that confirms the null hypothesis there is a difference in > sites on the way "cohort" is distributed: > >> (chisq.test(data$site,data$cohort)) > > Pearson's Chi-squared test > > data: data$site and data$cohort > X-squared = 82.6016, df = 10, *p-value = 1.549e-13* > > Mensajes de aviso perdidos > In chisq.test(data$site, data$cohort) : > Chi-squared approximation may be incorrect > > > > > After that, I have tried to use a binomial GLM with all the explanatory > variables but I couldn't find any significance of any variable, neither > cohort, and for this reason I tried to use only cohort as predictor and I > get this: > > >> BinomialGlm <- glm(site ~ cohort, data=data,binomial) >> summary(BinomialGlm) > > Call: > glm(formula = site ~ cohort, family = binomial, data = data) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.9239 -0.9365 -0.9365 1.3584 1.6651 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -12.57 324.74 -0.039 0.969 > cohort2000 11.47 324.75 0.035 0.972 > cohort2001 13.82 324.74 0.043 0.966 > cohort2002 12.97 324.74 0.040 0.968 > cohort2003 13.66 324.74 0.042 0.966 > *cohort2004 14.25 324.74 0.044 0.965* > cohort2006 12.21 324.74 0.038 0.970 > cohort2007 11.81 324.74 0.036 0.971 > cohort2008 12.41 324.74 0.038 0.970 > cohort2009 12.15 324.74 0.037 0.970 > cohort2010 11.97 324.74 0.037 0.971 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 1369.3 on 1003 degrees of freedom > Residual deviance: 1283.7 on 993 degrees of freedom > AIC: 1305.7 > > Number of Fisher Scoring iterations: 11 > > > > > I tired to use simple GLM (gaussian family) and I get results that are more > logicals: > >> GaussGlm <- glm(site ~ cohort, data=data) >> summary(GaussGlm) > > Call: > glm(formula = site ~ cohort, data = data) > > Deviance Residuals: > Min 1Q Median 3Q Max > -0.8429 -0.3550 -0.3550 0.6025 0.7500 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.740e-14 4.762e-01 0.000 1.0000 > cohort2000 2.500e-01 5.324e-01 0.470 0.6388 > cohort2001 7.778e-01 5.020e-01 1.549 0.1216 > cohort2002 6.000e-01 4.880e-01 1.230 0.2192 > cohort2003 7.500e-01 4.861e-01 1.543 0.1231 > *cohort2004 8.429e-01 4.796e-01 1.757 0.0792 .* > cohort2006 4.118e-01 4.832e-01 0.852 0.3943 > cohort2007 3.204e-01 4.785e-01 0.670 0.5033 > cohort2008 4.600e-01 4.786e-01 0.961 0.3367 > cohort2009 3.975e-01 4.772e-01 0.833 0.4051 > cohort2010 3.550e-01 4.768e-01 0.745 0.4567 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > (Dispersion parameter for gaussian family taken to be 0.2267955) > > Null deviance: 245.40 on 1003 degrees of freedom > Residual deviance: 225.21 on 993 degrees of freedom > AIC: 1372.5 > > Number of Fisher Scoring iterations: 2 > > > > What is going on? Any suggestion/commentary? > > -- > View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Perhaps I haven't explained it that well as I would have liked to.
To me this was an R issue because I didn't understand why the binomial GLM is getting these results and I believed this was something due to the way I am implementing it in R, not to the binomial GLM itself. If I was wrong and this is something due to the binomial GLM itself and if there is no relationship with R, it would be nice if someone might warn me about that and, maybe, give me a track (there exist good souls in the world). I hope R forum is not uniquely for people with a very strong statistical background, because it would limit a lot the potential of this forum and of the software itself. Thanks for any help Cheers |
|
In reply to this post by Bert Gunter
On May 7, 2012, at 19:39 , Bert Gunter wrote: > 1. As this is a statistical, rather than an R issue, you would do > better posting on a statistical help site like stats.stackexchange.com > (although some generous soul here may respond). > > 2. You would also probably do better consulting with a local > statistical resource if available, as it is difficult to explain such > issues remotely. > > Cheers, > Bert Well, in this case it is pretty obvious that there are no positive outcomes in the 1996 cohort (log odds essentially -Inf); as this is the base level, the Wald tests are completely unreliable (Hauck-Donner effect). The LRT is 1369-1283=86 which is quite consistent with chisq.test(). -pd > > On Mon, May 7, 2012 at 10:05 AM, lincoln <[hidden email]> wrote: >> Hi all, >> >> I can't find the error in the binomial GLM I have done. I want to use that >> because there are more than one explanatory variables (all categorical) and >> a binary response variable. >> This is how my data set looks like: >>> str(data) >> 'data.frame': 1004 obs. of 5 variables: >> $ site : int 0 0 0 0 0 0 0 0 0 0 ... >> $ sex : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ... >> $ age : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... >> $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11 >> 11 ... >> $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ... >> >> I know that, particularly for one level of variable "cohort" (2004 value), >> it should be a strong effect of variable "cohort" on variable "site" so I do >> a Chi square test that confirms the null hypothesis there is a difference in >> sites on the way "cohort" is distributed: >> >>> (chisq.test(data$site,data$cohort)) >> >> Pearson's Chi-squared test >> >> data: data$site and data$cohort >> X-squared = 82.6016, df = 10, *p-value = 1.549e-13* >> >> Mensajes de aviso perdidos >> In chisq.test(data$site, data$cohort) : >> Chi-squared approximation may be incorrect >> >> >> >> >> After that, I have tried to use a binomial GLM with all the explanatory >> variables but I couldn't find any significance of any variable, neither >> cohort, and for this reason I tried to use only cohort as predictor and I >> get this: >> >> >>> BinomialGlm <- glm(site ~ cohort, data=data,binomial) >>> summary(BinomialGlm) >> >> Call: >> glm(formula = site ~ cohort, family = binomial, data = data) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -1.9239 -0.9365 -0.9365 1.3584 1.6651 >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -12.57 324.74 -0.039 0.969 >> cohort2000 11.47 324.75 0.035 0.972 >> cohort2001 13.82 324.74 0.043 0.966 >> cohort2002 12.97 324.74 0.040 0.968 >> cohort2003 13.66 324.74 0.042 0.966 >> *cohort2004 14.25 324.74 0.044 0.965* >> cohort2006 12.21 324.74 0.038 0.970 >> cohort2007 11.81 324.74 0.036 0.971 >> cohort2008 12.41 324.74 0.038 0.970 >> cohort2009 12.15 324.74 0.037 0.970 >> cohort2010 11.97 324.74 0.037 0.971 >> >> (Dispersion parameter for binomial family taken to be 1) >> >> Null deviance: 1369.3 on 1003 degrees of freedom >> Residual deviance: 1283.7 on 993 degrees of freedom >> AIC: 1305.7 >> >> Number of Fisher Scoring iterations: 11 >> >> >> >> >> I tired to use simple GLM (gaussian family) and I get results that are more >> logicals: >> >>> GaussGlm <- glm(site ~ cohort, data=data) >>> summary(GaussGlm) >> >> Call: >> glm(formula = site ~ cohort, data = data) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -0.8429 -0.3550 -0.3550 0.6025 0.7500 >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 5.740e-14 4.762e-01 0.000 1.0000 >> cohort2000 2.500e-01 5.324e-01 0.470 0.6388 >> cohort2001 7.778e-01 5.020e-01 1.549 0.1216 >> cohort2002 6.000e-01 4.880e-01 1.230 0.2192 >> cohort2003 7.500e-01 4.861e-01 1.543 0.1231 >> *cohort2004 8.429e-01 4.796e-01 1.757 0.0792 .* >> cohort2006 4.118e-01 4.832e-01 0.852 0.3943 >> cohort2007 3.204e-01 4.785e-01 0.670 0.5033 >> cohort2008 4.600e-01 4.786e-01 0.961 0.3367 >> cohort2009 3.975e-01 4.772e-01 0.833 0.4051 >> cohort2010 3.550e-01 4.768e-01 0.745 0.4567 >> --- >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> >> (Dispersion parameter for gaussian family taken to be 0.2267955) >> >> Null deviance: 245.40 on 1003 degrees of freedom >> Residual deviance: 225.21 on 993 degrees of freedom >> AIC: 1372.5 >> >> Number of Fisher Scoring iterations: 2 >> >> >> >> What is going on? Any suggestion/commentary? >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thank you Peter for showing me the error.
I did not realize it. Now I have removed that cohort (there was just one observation!) and checked the numbers for each of the other cohorts. I have re-run the model and now it seems to make much more sense to me. I am going to use one specific cohort, 2004, as the base level. I will try by just replacing the value "2004" in the row data with a number minor than the others in order to make read it to R as the base level. Hope it will work, on the contrary I have seen there are posts on this issue. Again, thanks |
|
At 08:40 08/05/2012, lincoln wrote:
>Thank you Peter for showing me the error. > >I did not realize it. Now I have removed that cohort (there was just one >observation!) and checked the numbers for each of the other cohorts. I have >re-run the model and now it seems to make much more sense to me. > >I am going to use one specific cohort, 2004, as the base level. I will try >by just replacing the value "2004" in the row data with a number minor than >the others in order to make read it to R as the base level. Hope it will >work, on the contrary I have seen there are posts on this issue. I think ?relevel might help you here >Again, thanks > >-- >View this message in context: >http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340p4616786.html >Sent from the R help mailing list archive at Nabble.com. Michael Dewey [hidden email] http://www.aghmed.fsnet.co.uk/home.html ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
