Quantcast

Can't find the error in a Binomial GLM I am doing, please help

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Can't find the error in a Binomial GLM I am doing, please help

lincoln
Hi all,

I can't find the error in the binomial GLM I have done. I want to use that because there are more than one explanatory variables (all categorical) and a binary response variable.
This is how my data set looks like:
> str(data)
'data.frame': 1004 obs. of  5 variables:
 $ site  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ sex   : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ...
 $ age   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11 11 ...
 $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ...

I know that, particularly for one level of variable "cohort" (2004 value), it should be a strong effect of variable "cohort" on variable "site" so I do a Chi square test that confirms the null hypothesis there is a difference in sites on the way "cohort" is distributed:

> (chisq.test(data$site,data$cohort))

        Pearson's Chi-squared test

data:  data$site and data$cohort
X-squared = 82.6016, df = 10, p-value = 1.549e-13

Mensajes de aviso perdidos
In chisq.test(data$site, data$cohort) :
  Chi-squared approximation may be incorrect




After that, I have tried to use a binomial GLM with all the explanatory variables but I couldn't find any significance of any variable, neither cohort, and for this reason I tried to use only cohort as predictor and I get this:


> BinomialGlm <- glm(site ~  cohort, data=data,binomial)
> summary(BinomialGlm)

Call:
glm(formula = site ~ cohort, family = binomial, data = data)

Deviance Residuals:
    Min       1Q   Median       3Q      Max  
-1.9239  -0.9365  -0.9365   1.3584   1.6651  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -12.57     324.74  -0.039    0.969
cohort2000     11.47     324.75   0.035    0.972
cohort2001     13.82     324.74   0.043    0.966
cohort2002     12.97     324.74   0.040    0.968
cohort2003     13.66     324.74   0.042    0.966
cohort2004     14.25     324.74   0.044    0.965
cohort2006     12.21     324.74   0.038    0.970
cohort2007     11.81     324.74   0.036    0.971
cohort2008     12.41     324.74   0.038    0.970
cohort2009     12.15     324.74   0.037    0.970
cohort2010     11.97     324.74   0.037    0.971

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1369.3  on 1003  degrees of freedom
Residual deviance: 1283.7  on  993  degrees of freedom
AIC: 1305.7

Number of Fisher Scoring iterations: 11




I tired to use simple GLM (gaussian family) and I get results that are more logicals:

> GaussGlm <- glm(site ~  cohort, data=data)
> summary(GaussGlm)

Call:
glm(formula = site ~ cohort, data = data)

Deviance Residuals:
    Min       1Q   Median       3Q      Max  
-0.8429  -0.3550  -0.3550   0.6025   0.7500  

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) 5.740e-14  4.762e-01   0.000   1.0000  
cohort2000  2.500e-01  5.324e-01   0.470   0.6388  
cohort2001  7.778e-01  5.020e-01   1.549   0.1216  
cohort2002  6.000e-01  4.880e-01   1.230   0.2192  
cohort2003  7.500e-01  4.861e-01   1.543   0.1231  
cohort2004  8.429e-01  4.796e-01   1.757   0.0792 .
cohort2006  4.118e-01  4.832e-01   0.852   0.3943  
cohort2007  3.204e-01  4.785e-01   0.670   0.5033  
cohort2008  4.600e-01  4.786e-01   0.961   0.3367  
cohort2009  3.975e-01  4.772e-01   0.833   0.4051  
cohort2010  3.550e-01  4.768e-01   0.745   0.4567  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.2267955)

    Null deviance: 245.40  on 1003  degrees of freedom
Residual deviance: 225.21  on  993  degrees of freedom
AIC: 1372.5

Number of Fisher Scoring iterations: 2
 


What is going on? Any suggestion/commentary?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Can't find the error in a Binomial GLM I am doing, please help

Bert Gunter
1. As this is a statistical, rather than an R issue, you would do
better posting on a statistical help site like stats.stackexchange.com
(although some generous soul here may respond).

2. You would also probably do better consulting with a local
statistical resource if available, as it is difficult to explain such
issues remotely.

Cheers,
Bert

On Mon, May 7, 2012 at 10:05 AM, lincoln <[hidden email]> wrote:

> Hi all,
>
> I can't find the error in the binomial GLM I have done. I want to use that
> because there are more than one explanatory variables (all categorical) and
> a binary response variable.
> This is how my data set looks like:
>> str(data)
> 'data.frame':   1004 obs. of  5 variables:
>  $ site  : int  0 0 0 0 0 0 0 0 0 0 ...
>  $ sex   : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ...
>  $ age   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>  $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11
> 11 ...
>  $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ...
>
> I know that, particularly for one level of variable "cohort" (2004 value),
> it should be a strong effect of variable "cohort" on variable "site" so I do
> a Chi square test that confirms the null hypothesis there is a difference in
> sites on the way "cohort" is distributed:
>
>> (chisq.test(data$site,data$cohort))
>
>        Pearson's Chi-squared test
>
> data:  data$site and data$cohort
> X-squared = 82.6016, df = 10, *p-value = 1.549e-13*
>
> Mensajes de aviso perdidos
> In chisq.test(data$site, data$cohort) :
>  Chi-squared approximation may be incorrect
>
>
>
>
> After that, I have tried to use a binomial GLM with all the explanatory
> variables but I couldn't find any significance of any variable, neither
> cohort, and for this reason I tried to use only cohort as predictor and I
> get this:
>
>
>> BinomialGlm <- glm(site ~  cohort, data=data,binomial)
>> summary(BinomialGlm)
>
> Call:
> glm(formula = site ~ cohort, family = binomial, data = data)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -1.9239  -0.9365  -0.9365   1.3584   1.6651
>
> Coefficients:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)   -12.57     324.74  -0.039    0.969
> cohort2000     11.47     324.75   0.035    0.972
> cohort2001     13.82     324.74   0.043    0.966
> cohort2002     12.97     324.74   0.040    0.968
> cohort2003     13.66     324.74   0.042    0.966
> *cohort2004     14.25     324.74   0.044    0.965*
> cohort2006     12.21     324.74   0.038    0.970
> cohort2007     11.81     324.74   0.036    0.971
> cohort2008     12.41     324.74   0.038    0.970
> cohort2009     12.15     324.74   0.037    0.970
> cohort2010     11.97     324.74   0.037    0.971
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 1369.3  on 1003  degrees of freedom
> Residual deviance: 1283.7  on  993  degrees of freedom
> AIC: 1305.7
>
> Number of Fisher Scoring iterations: 11
>
>
>
>
> I tired to use simple GLM (gaussian family) and I get results that are more
> logicals:
>
>> GaussGlm <- glm(site ~  cohort, data=data)
>> summary(GaussGlm)
>
> Call:
> glm(formula = site ~ cohort, data = data)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -0.8429  -0.3550  -0.3550   0.6025   0.7500
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept) 5.740e-14  4.762e-01   0.000   1.0000
> cohort2000  2.500e-01  5.324e-01   0.470   0.6388
> cohort2001  7.778e-01  5.020e-01   1.549   0.1216
> cohort2002  6.000e-01  4.880e-01   1.230   0.2192
> cohort2003  7.500e-01  4.861e-01   1.543   0.1231
> *cohort2004  8.429e-01  4.796e-01   1.757   0.0792 .*
> cohort2006  4.118e-01  4.832e-01   0.852   0.3943
> cohort2007  3.204e-01  4.785e-01   0.670   0.5033
> cohort2008  4.600e-01  4.786e-01   0.961   0.3367
> cohort2009  3.975e-01  4.772e-01   0.833   0.4051
> cohort2010  3.550e-01  4.768e-01   0.745   0.4567
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for gaussian family taken to be 0.2267955)
>
>    Null deviance: 245.40  on 1003  degrees of freedom
> Residual deviance: 225.21  on  993  degrees of freedom
> AIC: 1372.5
>
> Number of Fisher Scoring iterations: 2
>
>
>
> What is going on? Any suggestion/commentary?
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Can't find the error in a Binomial GLM I am doing, please help

lincoln
Perhaps I haven't explained it that well as I would have liked to.
To me this was an R issue because I didn't understand why the binomial GLM is getting these results and I believed this was something due to the way I am implementing it in R, not to the binomial GLM itself.

If I was wrong and this is something due to the binomial GLM itself and if there is no relationship with R, it would be nice if someone might warn me about that and, maybe, give me a track (there exist good souls in the world).

I hope R forum is not uniquely for people with a very strong statistical background, because it would limit a lot the potential of this forum and of the software itself.

Thanks for any help
Cheers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Can't find the error in a Binomial GLM I am doing, please help

Peter Dalgaard-2
In reply to this post by Bert Gunter

On May 7, 2012, at 19:39 , Bert Gunter wrote:

> 1. As this is a statistical, rather than an R issue, you would do
> better posting on a statistical help site like stats.stackexchange.com
> (although some generous soul here may respond).
>
> 2. You would also probably do better consulting with a local
> statistical resource if available, as it is difficult to explain such
> issues remotely.
>
> Cheers,
> Bert

Well, in this case it is pretty obvious that there are no positive outcomes in the 1996 cohort (log odds essentially -Inf); as this is the base level, the Wald tests are completely unreliable (Hauck-Donner effect). The LRT is 1369-1283=86 which is quite consistent with chisq.test().

-pd


>
> On Mon, May 7, 2012 at 10:05 AM, lincoln <[hidden email]> wrote:
>> Hi all,
>>
>> I can't find the error in the binomial GLM I have done. I want to use that
>> because there are more than one explanatory variables (all categorical) and
>> a binary response variable.
>> This is how my data set looks like:
>>> str(data)
>> 'data.frame':   1004 obs. of  5 variables:
>>  $ site  : int  0 0 0 0 0 0 0 0 0 0 ...
>>  $ sex   : Factor w/ 2 levels "0","1": NA NA NA NA 1 NA NA NA NA NA ...
>>  $ age   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ cohort: Factor w/ 11 levels "1996","2000",..: 11 11 11 11 11 11 11 11 11
>> 11 ...
>>  $ birth : Factor w/ 3 levels "5","6","7": 3 3 2 2 2 2 2 2 2 2 ...
>>
>> I know that, particularly for one level of variable "cohort" (2004 value),
>> it should be a strong effect of variable "cohort" on variable "site" so I do
>> a Chi square test that confirms the null hypothesis there is a difference in
>> sites on the way "cohort" is distributed:
>>
>>> (chisq.test(data$site,data$cohort))
>>
>>        Pearson's Chi-squared test
>>
>> data:  data$site and data$cohort
>> X-squared = 82.6016, df = 10, *p-value = 1.549e-13*
>>
>> Mensajes de aviso perdidos
>> In chisq.test(data$site, data$cohort) :
>>  Chi-squared approximation may be incorrect
>>
>>
>>
>>
>> After that, I have tried to use a binomial GLM with all the explanatory
>> variables but I couldn't find any significance of any variable, neither
>> cohort, and for this reason I tried to use only cohort as predictor and I
>> get this:
>>
>>
>>> BinomialGlm <- glm(site ~  cohort, data=data,binomial)
>>> summary(BinomialGlm)
>>
>> Call:
>> glm(formula = site ~ cohort, family = binomial, data = data)
>>
>> Deviance Residuals:
>>    Min       1Q   Median       3Q      Max
>> -1.9239  -0.9365  -0.9365   1.3584   1.6651
>>
>> Coefficients:
>>            Estimate Std. Error z value Pr(>|z|)
>> (Intercept)   -12.57     324.74  -0.039    0.969
>> cohort2000     11.47     324.75   0.035    0.972
>> cohort2001     13.82     324.74   0.043    0.966
>> cohort2002     12.97     324.74   0.040    0.968
>> cohort2003     13.66     324.74   0.042    0.966
>> *cohort2004     14.25     324.74   0.044    0.965*
>> cohort2006     12.21     324.74   0.038    0.970
>> cohort2007     11.81     324.74   0.036    0.971
>> cohort2008     12.41     324.74   0.038    0.970
>> cohort2009     12.15     324.74   0.037    0.970
>> cohort2010     11.97     324.74   0.037    0.971
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>>    Null deviance: 1369.3  on 1003  degrees of freedom
>> Residual deviance: 1283.7  on  993  degrees of freedom
>> AIC: 1305.7
>>
>> Number of Fisher Scoring iterations: 11
>>
>>
>>
>>
>> I tired to use simple GLM (gaussian family) and I get results that are more
>> logicals:
>>
>>> GaussGlm <- glm(site ~  cohort, data=data)
>>> summary(GaussGlm)
>>
>> Call:
>> glm(formula = site ~ cohort, data = data)
>>
>> Deviance Residuals:
>>    Min       1Q   Median       3Q      Max
>> -0.8429  -0.3550  -0.3550   0.6025   0.7500
>>
>> Coefficients:
>>             Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 5.740e-14  4.762e-01   0.000   1.0000
>> cohort2000  2.500e-01  5.324e-01   0.470   0.6388
>> cohort2001  7.778e-01  5.020e-01   1.549   0.1216
>> cohort2002  6.000e-01  4.880e-01   1.230   0.2192
>> cohort2003  7.500e-01  4.861e-01   1.543   0.1231
>> *cohort2004  8.429e-01  4.796e-01   1.757   0.0792 .*
>> cohort2006  4.118e-01  4.832e-01   0.852   0.3943
>> cohort2007  3.204e-01  4.785e-01   0.670   0.5033
>> cohort2008  4.600e-01  4.786e-01   0.961   0.3367
>> cohort2009  3.975e-01  4.772e-01   0.833   0.4051
>> cohort2010  3.550e-01  4.768e-01   0.745   0.4567
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> (Dispersion parameter for gaussian family taken to be 0.2267955)
>>
>>    Null deviance: 245.40  on 1003  degrees of freedom
>> Residual deviance: 225.21  on  993  degrees of freedom
>> AIC: 1372.5
>>
>> Number of Fisher Scoring iterations: 2
>>
>>
>>
>> What is going on? Any suggestion/commentary?
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Can't find the error in a Binomial GLM I am doing, please help

lincoln
Thank you Peter for showing me the error.

I did not realize it. Now I have removed that cohort (there was just one observation!) and checked the numbers for each of the other cohorts. I have re-run the model and now it seems to make much more sense to me.

I am going to use one specific cohort, 2004, as the base level. I will try by just replacing the value "2004" in the row data with a number minor than the others in order to make read it to R as the base level. Hope it will work, on the contrary I have seen there are posts on this issue.

Again, thanks
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Can't find the error in a Binomial GLM I am doing, please help

Michael Dewey
At 08:40 08/05/2012, lincoln wrote:

>Thank you Peter for showing me the error.
>
>I did not realize it. Now I have removed that cohort (there was just one
>observation!) and checked the numbers for each of the other cohorts. I have
>re-run the model and now it seems to make much more sense to me.
>
>I am going to use one specific cohort, 2004, as the base level. I will try
>by just replacing the value "2004" in the row data with a number minor than
>the others in order to make read it to R as the base level. Hope it will
>work, on the contrary I have seen there are posts on this issue.

I think
?relevel
might help you here


>Again, thanks
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/Can-t-find-the-error-in-a-Binomial-GLM-I-am-doing-please-help-tp4615340p4616786.html
>Sent from the R help mailing list archive at Nabble.com.

Michael Dewey
[hidden email]
http://www.aghmed.fsnet.co.uk/home.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...