6 messages
Open this post in threaded view
|
Report Content as Inappropriate

 Hi all, I can't find the error in the binomial GLM I have done. I want to use that because there are more than one explanatory variables (all categorical) and a binary response variable. This is how my data set looks like: > str(data) 'data.frame': 1004 obs. of  5 variables:  \$ site  : int  0 0 0 0 0 0 0 0 0 0 ...  \$ sex   : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ...  \$ age   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...  \$ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11 11 ...  \$ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ... I know that, particularly for one level of variable "cohort" (2004 value), it should be a strong effect of variable "cohort" on variable "site" so I do a Chi square test that confirms the null hypothesis there is a difference in sites on the way "cohort" is distributed: > (chisq.test(data\$site,data\$cohort))         Pearson's Chi-squared test data:  data\$site and data\$cohort X-squared = 82.6016, df = 10, p-value = 1.549e-13Mensajes de aviso perdidos In chisq.test(data\$site, data\$cohort) :   Chi-squared approximation may be incorrect After that, I have tried to use a binomial GLM with all the explanatory variables but I couldn't find any significance of any variable, neither cohort, and for this reason I tried to use only cohort as predictor and I get this: > BinomialGlm <- glm(site ~  cohort, data=data,binomial) > summary(BinomialGlm) Call: glm(formula = site ~ cohort, family = binomial, data = data) Deviance Residuals:     Min       1Q   Median       3Q      Max   -1.9239  -0.9365  -0.9365   1.3584   1.6651   Coefficients:             Estimate Std. Error z value Pr(>|z|) (Intercept)   -12.57     324.74  -0.039    0.969 cohort2000     11.47     324.75   0.035    0.972 cohort2001     13.82     324.74   0.043    0.966 cohort2002     12.97     324.74   0.040    0.968 cohort2003     13.66     324.74   0.042    0.966 cohort2004     14.25     324.74   0.044    0.965cohort2006     12.21     324.74   0.038    0.970 cohort2007     11.81     324.74   0.036    0.971 cohort2008     12.41     324.74   0.038    0.970 cohort2009     12.15     324.74   0.037    0.970 cohort2010     11.97     324.74   0.037    0.971 (Dispersion parameter for binomial family taken to be 1)     Null deviance: 1369.3  on 1003  degrees of freedom Residual deviance: 1283.7  on  993  degrees of freedom AIC: 1305.7 Number of Fisher Scoring iterations: 11 I tired to use simple GLM (gaussian family) and I get results that are more logicals: > GaussGlm <- glm(site ~  cohort, data=data) > summary(GaussGlm) Call: glm(formula = site ~ cohort, data = data) Deviance Residuals:     Min       1Q   Median       3Q      Max   -0.8429  -0.3550  -0.3550   0.6025   0.7500   Coefficients:              Estimate Std. Error t value Pr(>|t|)   (Intercept) 5.740e-14  4.762e-01   0.000   1.0000   cohort2000  2.500e-01  5.324e-01   0.470   0.6388   cohort2001  7.778e-01  5.020e-01   1.549   0.1216   cohort2002  6.000e-01  4.880e-01   1.230   0.2192   cohort2003  7.500e-01  4.861e-01   1.543   0.1231   cohort2004  8.429e-01  4.796e-01   1.757   0.0792 .cohort2006  4.118e-01  4.832e-01   0.852   0.3943   cohort2007  3.204e-01  4.785e-01   0.670   0.5033   cohort2008  4.600e-01  4.786e-01   0.961   0.3367   cohort2009  3.975e-01  4.772e-01   0.833   0.4051   cohort2010  3.550e-01  4.768e-01   0.745   0.4567   --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 0.2267955)     Null deviance: 245.40  on 1003  degrees of freedom Residual deviance: 225.21  on  993  degrees of freedom AIC: 1372.5 Number of Fisher Scoring iterations: 2   What is going on? Any suggestion/commentary?
Open this post in threaded view
|
Report Content as Inappropriate

 1. As this is a statistical, rather than an R issue, you would do better posting on a statistical help site like stats.stackexchange.com (although some generous soul here may respond). 2. You would also probably do better consulting with a local statistical resource if available, as it is difficult to explain such issues remotely. Cheers, Bert On Mon, May 7, 2012 at 10:05 AM, lincoln <[hidden email]> wrote: > Hi all, > > I can't find the error in the binomial GLM I have done. I want to use that > because there are more than one explanatory variables (all categorical) and > a binary response variable. > This is how my data set looks like: >> str(data) > 'data.frame':   1004 obs. of  5 variables: >  \$ site  : int  0 0 0 0 0 0 0 0 0 0 ... >  \$ sex   : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ... >  \$ age   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... >  \$ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11 > 11 ... >  \$ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ... > > I know that, particularly for one level of variable "cohort" (2004 value), > it should be a strong effect of variable "cohort" on variable "site" so I do > a Chi square test that confirms the null hypothesis there is a difference in > sites on the way "cohort" is distributed: > >> (chisq.test(data\$site,data\$cohort)) > >        Pearson's Chi-squared test > > data:  data\$site and data\$cohort > X-squared = 82.6016, df = 10, *p-value = 1.549e-13* > > Mensajes de aviso perdidos > In chisq.test(data\$site, data\$cohort) : >  Chi-squared approximation may be incorrect > > > > > After that, I have tried to use a binomial GLM with all the explanatory > variables but I couldn't find any significance of any variable, neither > cohort, and for this reason I tried to use only cohort as predictor and I > get this: > > >> BinomialGlm <- glm(site ~  cohort, data=data,binomial) >> summary(BinomialGlm) > > Call: > glm(formula = site ~ cohort, family = binomial, data = data) > > Deviance Residuals: >    Min       1Q   Median       3Q      Max > -1.9239  -0.9365  -0.9365   1.3584   1.6651 > > Coefficients: >            Estimate Std. Error z value Pr(>|z|) > (Intercept)   -12.57     324.74  -0.039    0.969 > cohort2000     11.47     324.75   0.035    0.972 > cohort2001     13.82     324.74   0.043    0.966 > cohort2002     12.97     324.74   0.040    0.968 > cohort2003     13.66     324.74   0.042    0.966 > *cohort2004     14.25     324.74   0.044    0.965* > cohort2006     12.21     324.74   0.038    0.970 > cohort2007     11.81     324.74   0.036    0.971 > cohort2008     12.41     324.74   0.038    0.970 > cohort2009     12.15     324.74   0.037    0.970 > cohort2010     11.97     324.74   0.037    0.971 > > (Dispersion parameter for binomial family taken to be 1) > >    Null deviance: 1369.3  on 1003  degrees of freedom > Residual deviance: 1283.7  on  993  degrees of freedom > AIC: 1305.7 > > Number of Fisher Scoring iterations: 11 > > > > > I tired to use simple GLM (gaussian family) and I get results that are more > logicals: > >> GaussGlm <- glm(site ~  cohort, data=data) >> summary(GaussGlm) > > Call: > glm(formula = site ~ cohort, data = data) > > Deviance Residuals: >    Min       1Q   Median       3Q      Max > -0.8429  -0.3550  -0.3550   0.6025   0.7500 > > Coefficients: >             Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.740e-14  4.762e-01   0.000   1.0000 > cohort2000  2.500e-01  5.324e-01   0.470   0.6388 > cohort2001  7.778e-01  5.020e-01   1.549   0.1216 > cohort2002  6.000e-01  4.880e-01   1.230   0.2192 > cohort2003  7.500e-01  4.861e-01   1.543   0.1231 > *cohort2004  8.429e-01  4.796e-01   1.757   0.0792 .* > cohort2006  4.118e-01  4.832e-01   0.852   0.3943 > cohort2007  3.204e-01  4.785e-01   0.670   0.5033 > cohort2008  4.600e-01  4.786e-01   0.961   0.3367 > cohort2009  3.975e-01  4.772e-01   0.833   0.4051 > cohort2010  3.550e-01  4.768e-01   0.745   0.4567 > --- > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > (Dispersion parameter for gaussian family taken to be 0.2267955) > >    Null deviance: 245.40  on 1003  degrees of freedom > Residual deviance: 225.21  on  993  degrees of freedom > AIC: 1372.5 > > Number of Fisher Scoring iterations: 2 > > > > What is going on? Any suggestion/commentary? > > -- > View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

 Perhaps I haven't explained it that well as I would have liked to. To me this was an R issue because I didn't understand why the binomial GLM is getting these results and I believed this was something due to the way I am implementing it in R, not to the binomial GLM itself. If I was wrong and this is something due to the binomial GLM itself and if there is no relationship with R, it would be nice if someone might warn me about that and, maybe, give me a track (there exist good souls in the world). I hope R forum is not uniquely for people with a very strong statistical background, because it would limit a lot the potential of this forum and of the software itself. Thanks for any help Cheers
Open this post in threaded view
|
Report Content as Inappropriate

Open this post in threaded view
|
Report Content as Inappropriate