problems with glm

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

problems with glm

Roland R Regoes
Dear R users,

     I am having some problems with glm. The first is an error message "subscript out of bounds". The second is the fact that reasonable starting values are not accepted by the function.

     To be more specific, here is an example:

> success <- c(13,12,11,14,14,11,13,11,12)
> failure <- c(0,0,0,0,0,0,0,2,2)
> predictor <- c(0,80*5^(0:7))
> glm(cbind(success,failure) ~ predictor + 0,
+           family=binomial(link="log"),
+           control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
Deviance = 3.348039 Iterations - 1
Error: subscript out of bounds
In addition: Warning message:
step size truncated: out of bounds
>

        The model with intercept yields:

> glm(cbind(success,failure) ~ predictor ,
+           family=binomial(link="log"),
+           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))

Call:  glm(formula = cbind(success, failure) ~ predictor, family = binomial(link = "log"),      control = glm.control(epsilon = 1e-08, trace = FALSE, maxit = 50))

Coefficients:
(Intercept)    predictor  
 -5.830e-17   -4.000e-08  

Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
Null Deviance:    12.08
Residual Deviance: 2.889 AIC: 11.8
There were 33 warnings (use warnings() to see them)
> warnings()
1: step size truncated: out of bounds
...
31: step size truncated: out of bounds
32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,   ...
33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,   ...
>

        Since the intercept in the above fit is fairly small I thought I could use -4e-8 as a reasonable starting value in a model without intercept. But to no avail:

> glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
+           family=binomial(link="log"),
+           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,  :
        cannot find valid starting values: please specify some
>

        I am stuck here. Am I doing something wrong when specifying the starting value? I would appreciate any help. (I could not find anything relevant in the documentation of glm and the mailing list archives, but I did not read the source code of glm yet.)

Roland


PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX 10.4.4

--
-----------------------------------------------------------------------
Roland Regoes
Theoretical Biology
Universitaetsstr. 16
ETH Zentrum, CHN H76.1
CH-8092 Zurich, Switzerland

tel: +41-44-632-6935
fax: +41-44-632-1271

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: problems with glm

Bill.Venables
Most of your problems seem to come from 'link = "log"' whereas you
probably mean 'link = logit' (which is the default.  Hence:

##########################################
> success <- c(13,12,11,14,14,11,13,11,12)
> failure <- c(0,0,0,0,0,0,0,2,2)
> predictor <- c(0,80*5^(0:7))
> glm(cbind(success,failure) ~ predictor,
+           family = binomial, #(link="log"),
+           control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
Deviance = 7.621991 Iterations - 1
Deviance = 6.970934 Iterations - 2
Deviance = 6.941054 Iterations - 3
Deviance = 6.940945 Iterations - 4
Deviance = 6.940945 Iterations - 5

Call:  glm(formula = cbind(success, failure) ~ predictor, family =
binomial,      control = glm.control(epsilon = 1e-08, trace = TRUE,
maxit = 50))

Coefficients:
(Intercept)    predictor  
  4.180e+00   -4.106e-07  

Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
Null Deviance:      12.08
Residual Deviance: 6.941        AIC: 15.85
>

#######################################

Bill Venables,
CMIS, CSIRO Laboratories,
PO Box 120, Cleveland, Qld. 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):    +61 7 3826 7304
Mobile (rarely used):                +61 4 1963 4642
Home Phone:                          +61 7 3286 7700
mailto:[hidden email]
http://www.cmis.csiro.au/bill.venables/ 



-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Roland R Regoes
Sent: Sunday, 15 January 2006 11:04 PM
To: [hidden email]
Subject: [R] problems with glm


Dear R users,

     I am having some problems with glm. The first is an error message
"subscript out of bounds". The second is the fact that reasonable
starting values are not accepted by the function.

     To be more specific, here is an example:

> success <- c(13,12,11,14,14,11,13,11,12)
> failure <- c(0,0,0,0,0,0,0,2,2)
> predictor <- c(0,80*5^(0:7))
> glm(cbind(success,failure) ~ predictor + 0,
+           family=binomial(link="log"),
+           control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
Deviance = 3.348039 Iterations - 1
Error: subscript out of bounds
In addition: Warning message:
step size truncated: out of bounds
>

        The model with intercept yields:

> glm(cbind(success,failure) ~ predictor ,
+           family=binomial(link="log"),
+           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))

Call:  glm(formula = cbind(success, failure) ~ predictor, family =
binomial(link = "log"),      control = glm.control(epsilon = 1e-08,
trace = FALSE, maxit = 50))

Coefficients:
(Intercept)    predictor  
 -5.830e-17   -4.000e-08  

Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
Null Deviance:    12.08
Residual Deviance: 2.889 AIC: 11.8
There were 33 warnings (use warnings() to see them)
> warnings()
1: step size truncated: out of bounds
...
31: step size truncated: out of bounds
32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y,
weights = weights, start = start, etastart = etastart,   ...
33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X,
y = Y, weights = weights, start = start, etastart = etastart,   ...
>

        Since the intercept in the above fit is fairly small I thought I
could use -4e-8 as a reasonable starting value in a model without
intercept. But to no avail:

> glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
+           family=binomial(link="log"),
+           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
Error in glm.fit(x = X, y = Y, weights = weights, start = start,
etastart = etastart,  :
        cannot find valid starting values: please specify some
>

        I am stuck here. Am I doing something wrong when specifying the
starting value? I would appreciate any help. (I could not find anything
relevant in the documentation of glm and the mailing list archives, but
I did not read the source code of glm yet.)

Roland


PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX
10.4.4

--
-----------------------------------------------------------------------
Roland Regoes
Theoretical Biology
Universitaetsstr. 16
ETH Zentrum, CHN H76.1
CH-8092 Zurich, Switzerland

tel: +41-44-632-6935
fax: +41-44-632-1271

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: problems with glm

Roland R Regoes
        I am sorry, I should have emphasized that: I really mean 'family =
binomial(link="log")'.

        I am fitting infection data. Failure and success correspond to being infected or not. The probability of success (ie, not being infected in my context) is derived from a population model describing the interaction between hosts and parasites:
        P(not infected) = exp(-InfectionRate*ParasiteDose)
'ParasiteDose' corresponds to 'predictor'. I want to estimate the  infection rate. Hence I need 'binomial(link="log")' and must set intercept=0.

Roland



On Mon, Jan 16, 2006 at 12:16:09AM +1100, [hidden email] wrote:

> Most of your problems seem to come from 'link = "log"' whereas you
> probably mean 'link = logit' (which is the default.  Hence:
>
> ##########################################
> > success <- c(13,12,11,14,14,11,13,11,12)
> > failure <- c(0,0,0,0,0,0,0,2,2)
> > predictor <- c(0,80*5^(0:7))
> > glm(cbind(success,failure) ~ predictor,
> +           family = binomial, #(link="log"),
> +           control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
> Deviance = 7.621991 Iterations - 1
> Deviance = 6.970934 Iterations - 2
> Deviance = 6.941054 Iterations - 3
> Deviance = 6.940945 Iterations - 4
> Deviance = 6.940945 Iterations - 5
>
> Call:  glm(formula = cbind(success, failure) ~ predictor, family =
> binomial,      control = glm.control(epsilon = 1e-08, trace = TRUE,
> maxit = 50))
>
> Coefficients:
> (Intercept)    predictor  
>   4.180e+00   -4.106e-07  
>
> Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
> Null Deviance:      12.08
> Residual Deviance: 6.941        AIC: 15.85
> >
>
> #######################################
>
> Bill Venables,
> CMIS, CSIRO Laboratories,
> PO Box 120, Cleveland, Qld. 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary):    +61 7 3826 7304
> Mobile (rarely used):                +61 4 1963 4642
> Home Phone:                          +61 7 3286 7700
> mailto:[hidden email]
> http://www.cmis.csiro.au/bill.venables/ 
>
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Roland R Regoes
> Sent: Sunday, 15 January 2006 11:04 PM
> To: [hidden email]
> Subject: [R] problems with glm
>
>
> Dear R users,
>
>      I am having some problems with glm. The first is an error message
> "subscript out of bounds". The second is the fact that reasonable
> starting values are not accepted by the function.
>
>      To be more specific, here is an example:
>
> > success <- c(13,12,11,14,14,11,13,11,12)
> > failure <- c(0,0,0,0,0,0,0,2,2)
> > predictor <- c(0,80*5^(0:7))
> > glm(cbind(success,failure) ~ predictor + 0,
> +           family=binomial(link="log"),
> +           control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
> Deviance = 3.348039 Iterations - 1
> Error: subscript out of bounds
> In addition: Warning message:
> step size truncated: out of bounds
> >
>
> The model with intercept yields:
>
> > glm(cbind(success,failure) ~ predictor ,
> +           family=binomial(link="log"),
> +           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>
> Call:  glm(formula = cbind(success, failure) ~ predictor, family =
> binomial(link = "log"),      control = glm.control(epsilon = 1e-08,
> trace = FALSE, maxit = 50))
>
> Coefficients:
> (Intercept)    predictor  
>  -5.830e-17   -4.000e-08  
>
> Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
> Null Deviance:    12.08
> Residual Deviance: 2.889 AIC: 11.8
> There were 33 warnings (use warnings() to see them)
> > warnings()
> 1: step size truncated: out of bounds
> ...
> 31: step size truncated: out of bounds
> 32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y,
> weights = weights, start = start, etastart = etastart,   ...
> 33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X,
> y = Y, weights = weights, start = start, etastart = etastart,   ...
> >
>
> Since the intercept in the above fit is fairly small I thought I
> could use -4e-8 as a reasonable starting value in a model without
> intercept. But to no avail:
>
> > glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
> +           family=binomial(link="log"),
> +           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
> Error in glm.fit(x = X, y = Y, weights = weights, start = start,
> etastart = etastart,  :
> cannot find valid starting values: please specify some
> >
>
> I am stuck here. Am I doing something wrong when specifying the
> starting value? I would appreciate any help. (I could not find anything
> relevant in the documentation of glm and the mailing list archives, but
> I did not read the source code of glm yet.)
>
> Roland
>
>
> PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX
> 10.4.4
>
> --
> -----------------------------------------------------------------------
> Roland Regoes
> Theoretical Biology
> Universitaetsstr. 16
> ETH Zentrum, CHN H76.1
> CH-8092 Zurich, Switzerland
>
> tel: +41-44-632-6935
> fax: +41-44-632-1271
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

--
-----------------------------------------------------------------------
Roland Regoes
Theoretical Biology
Universitaetsstr. 16
ETH Zentrum, CHN H76.1
CH-8092 Zurich, Switzerland

tel: +41-44-632-6935
fax: +41-44-632-1271

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: problems with glm

Prof Brian Ripley
On Sun, 15 Jan 2006, Roland R Regoes wrote:

> I am sorry, I should have emphasized that: I really mean 'family =
> binomial(link="log")'.
>
> I am fitting infection data. Failure and success correspond to
> being infected or not. The probability of success (ie, not being
> infected in my context) is derived from a population model describing
> the interaction between hosts and parasites:
> P(not infected) = exp(-InfectionRate*ParasiteDose)

> 'ParasiteDose' corresponds to 'predictor'. I want to estimate the
> infection rate. Hence I need 'binomial(link="log")' and must set
> intercept=0.

Then your model is not appropriate for your data.  The only real evidence
is coming from the last two points which have approximately equal failure
rate yet differ by a factor of 5 in the predictor.  Also, your model
without intercept predicts probablilty one for the first case, and the
log link in R does not allow probability zero.  If we drop that case
(which contributes nothing to the model) we get

success <- c(12,11,14,14,11,13,11,12)
failure <- c(0,0,0,0,0,0,2,2)
predictor <- 80*5^(0:7)
glm(cbind(success,failure) ~ predictor+0,
            family = binomial(link="log"),
            control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))

which works, albeit with an infinite null deviance (as the null model is
inappropriate).


> On Mon, Jan 16, 2006 at 12:16:09AM +1100, [hidden email] wrote:
>> Most of your problems seem to come from 'link = "log"' whereas you
>> probably mean 'link = logit' (which is the default.  Hence:
>>
>> ##########################################
>>> success <- c(13,12,11,14,14,11,13,11,12)
>>> failure <- c(0,0,0,0,0,0,0,2,2)
>>> predictor <- c(0,80*5^(0:7))
>>> glm(cbind(success,failure) ~ predictor,
>> +           family = binomial, #(link="log"),
>> +           control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
>> Deviance = 7.621991 Iterations - 1
>> Deviance = 6.970934 Iterations - 2
>> Deviance = 6.941054 Iterations - 3
>> Deviance = 6.940945 Iterations - 4
>> Deviance = 6.940945 Iterations - 5
>>
>> Call:  glm(formula = cbind(success, failure) ~ predictor, family =
>> binomial,      control = glm.control(epsilon = 1e-08, trace = TRUE,
>> maxit = 50))
>>
>> Coefficients:
>> (Intercept)    predictor
>>   4.180e+00   -4.106e-07
>>
>> Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
>> Null Deviance:      12.08
>> Residual Deviance: 6.941        AIC: 15.85

>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Roland R Regoes
>> Sent: Sunday, 15 January 2006 11:04 PM
>> To: [hidden email]
>> Subject: [R] problems with glm
>>
>>
>> Dear R users,
>>
>>      I am having some problems with glm. The first is an error message
>> "subscript out of bounds". The second is the fact that reasonable
>> starting values are not accepted by the function.
>>
>>      To be more specific, here is an example:
>>
>>> success <- c(13,12,11,14,14,11,13,11,12)
>>> failure <- c(0,0,0,0,0,0,0,2,2)
>>> predictor <- c(0,80*5^(0:7))
>>> glm(cbind(success,failure) ~ predictor + 0,
>> +           family=binomial(link="log"),
>> +           control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
>> Deviance = 3.348039 Iterations - 1
>> Error: subscript out of bounds
>> In addition: Warning message:
>> step size truncated: out of bounds
>>>
>>
>> The model with intercept yields:
>>
>>> glm(cbind(success,failure) ~ predictor ,
>> +           family=binomial(link="log"),
>> +           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>>
>> Call:  glm(formula = cbind(success, failure) ~ predictor, family =
>> binomial(link = "log"),      control = glm.control(epsilon = 1e-08,
>> trace = FALSE, maxit = 50))
>>
>> Coefficients:
>> (Intercept)    predictor
>>  -5.830e-17   -4.000e-08
>>
>> Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
>> Null Deviance:    12.08
>> Residual Deviance: 2.889 AIC: 11.8
>> There were 33 warnings (use warnings() to see them)
>>> warnings()
>> 1: step size truncated: out of bounds
>> ...
>> 31: step size truncated: out of bounds
>> 32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y,
>> weights = weights, start = start, etastart = etastart,   ...
>> 33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X,
>> y = Y, weights = weights, start = start, etastart = etastart,   ...
>>>
>>
>> Since the intercept in the above fit is fairly small I thought I
>> could use -4e-8 as a reasonable starting value in a model without
>> intercept. But to no avail:
>>
>>> glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
>> +           family=binomial(link="log"),
>> +           control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>> Error in glm.fit(x = X, y = Y, weights = weights, start = start,
>> etastart = etastart,  :
>> cannot find valid starting values: please specify some
>>>
>>
>> I am stuck here. Am I doing something wrong when specifying the
>> starting value? I would appreciate any help. (I could not find anything
>> relevant in the documentation of glm and the mailing list archives, but
>> I did not read the source code of glm yet.)
>>
>> Roland
>>
>>
>> PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX
>> 10.4.4
>>
>> --
>> -----------------------------------------------------------------------
>> Roland Regoes
>> Theoretical Biology
>> Universitaetsstr. 16
>> ETH Zentrum, CHN H76.1
>> CH-8092 Zurich, Switzerland
>>
>> tel: +41-44-632-6935
>> fax: +41-44-632-1271
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
> --
> -----------------------------------------------------------------------
> Roland Regoes
> Theoretical Biology
> Universitaetsstr. 16
> ETH Zentrum, CHN H76.1
> CH-8092 Zurich, Switzerland
>
> tel: +41-44-632-6935
> fax: +41-44-632-1271
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html