gaussian family change suggestion

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

gaussian family change suggestion

Simon Wood-3
Hi,

Currently the `gaussian' family's initialization code signals an error if
any response data are zero or negative and a log link is used. Given that
zero or negative response data are perfectly legitimate under the GLM
fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
it be worth changing it?

The current offending code from `gaussian' is:

initialize = expression({
            n <- rep.int(1, nobs)
            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
                ((family$link == "inverse" && any(y == 0)) ||
                  (family$link == "log" && any(y <= 0)))) stop(
              "cannot find valid starting values: please specify some")
            mustart <- y
        })

A possible replacement would be ...

initialize = expression({
            n <- rep.int(1, nobs)
            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
                ((family$link == "inverse" && all(y == 0)) ||
                  (family$link == "log" && all(y <= 0)))) stop(
      "cannot find valid starting values: please specify some")
        mustart <- y
      if (family$link=="log") {
        iy <- y<=0
        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
      } else if (family$link=="inverse") {
        iy <- y==0
        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
      }
        })

best,
Simon


>- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>-             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: gaussian family change suggestion

Brian Ripley
On Tue, 11 Apr 2006, Simon Wood wrote:

> Hi,
>
> Currently the `gaussian' family's initialization code signals an error if
> any response data are zero or negative and a log link is used. Given that
> zero or negative response data are perfectly legitimate under the GLM
> fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
> it be worth changing it?

Well, that's not really true: it says it is unable to find starting
values, and asks you to provide some.  I don't really see why your choices
are satisfactory: what happens if all the data are negative for example?
(You get +Inf in the log case, I believe.  The algorithm will probably get
stuck from a finite starting point, but it will produce a mysterious error
from an infinite one.)

We could try even harder, but code that is almost never used tends to get
broken whilst no one is testing it.  So if you want to pursue it I think
we need a comprehensive test suite.

> The current offending code from `gaussian' is:
>
> initialize = expression({
>            n <- rep.int(1, nobs)
>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
>                ((family$link == "inverse" && any(y == 0)) ||
>                  (family$link == "log" && any(y <= 0)))) stop(
>              "cannot find valid starting values: please specify some")
>            mustart <- y
>        })
>
> A possible replacement would be ...
>
> initialize = expression({
>            n <- rep.int(1, nobs)
>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
>                ((family$link == "inverse" && all(y == 0)) ||
>                  (family$link == "log" && all(y <= 0)))) stop(
>      "cannot find valid starting values: please specify some")
>        mustart <- y
>      if (family$link=="log") {
>        iy <- y<=0
>        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
>      } else if (family$link=="inverse") {
>        iy <- y==0
>        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
>      }
>        })
>
> best,
> Simon
>
>
>> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>> -             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: gaussian family change suggestion

Simon Wood-3
> > Currently the `gaussian' family's initialization code signals an error if
> > any response data are zero or negative and a log link is used. Given that
> > zero or negative response data are perfectly legitimate under the GLM
> > fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
> > it be worth changing it?
>
> Well, that's not really true: it says it is unable to find starting
> values, and asks you to provide some.  I don't really see why your choices
> are satisfactory: what happens if all the data are negative for example?

- then the suggested code traps it.

>
> We could try even harder, but code that is almost never used tends to get
> broken whilst no one is testing it.  So if you want to pursue it I think
> we need a comprehensive test suite.

- well, I guess, but I wouldn't have thought that zeros in data modelled
using `gaussian("log")' is such a rare occurance is it? [Or did you mean
that `gaussian("log")' is almost never used, and should hence be kept
simple].

I suppose there are arguments both ways...

best,
Simon

>
> > The current offending code from `gaussian' is:
> >
> > initialize = expression({
> >            n <- rep.int(1, nobs)
> >            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >                ((family$link == "inverse" && any(y == 0)) ||
> >                  (family$link == "log" && any(y <= 0)))) stop(
> >              "cannot find valid starting values: please specify some")
> >            mustart <- y
> >        })
> >
> > A possible replacement would be ...
> >
> > initialize = expression({
> >            n <- rep.int(1, nobs)
> >            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >                ((family$link == "inverse" && all(y == 0)) ||
> >                  (family$link == "log" && all(y <= 0)))) stop(
> >      "cannot find valid starting values: please specify some")
> >        mustart <- y
> >      if (family$link=="log") {
> >        iy <- y<=0
> >        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
> >      } else if (family$link=="inverse") {
> >        iy <- y==0
> >        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
> >      }
> >        })
> >
> > best,
> > Simon
> >
> >
> >> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
> >> -             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: gaussian family change suggestion

Thomas Lumley
In reply to this post by Brian Ripley
On Tue, 11 Apr 2006, Prof Brian Ripley wrote:

> On Tue, 11 Apr 2006, Simon Wood wrote:
>
>> Hi,
>>
>> Currently the `gaussian' family's initialization code signals an error if
>> any response data are zero or negative and a log link is used. Given that
>> zero or negative response data are perfectly legitimate under the GLM
>> fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
>> it be worth changing it?
>
> Well, that's not really true: it says it is unable to find starting
> values, and asks you to provide some.  I don't really see why your choices
> are satisfactory: what happens if all the data are negative for example?
> (You get +Inf in the log case, I believe.  The algorithm will probably get
> stuck from a finite starting point, but it will produce a mysterious error
> from an infinite one.)
>
> We could try even harder, but code that is almost never used tends to get
> broken whilst no one is testing it.  So if you want to pursue it I think
> we need a comprehensive test suite.

I have actually been working with log-link glms quite a bit recently and
was planning to change the defaults.

  -thomas



>
>> The current offending code from `gaussian' is:
>>
>> initialize = expression({
>>            n <- rep.int(1, nobs)
>>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
>>                ((family$link == "inverse" && any(y == 0)) ||
>>                  (family$link == "log" && any(y <= 0)))) stop(
>>              "cannot find valid starting values: please specify some")
>>            mustart <- y
>>        })
>>
>> A possible replacement would be ...
>>
>> initialize = expression({
>>            n <- rep.int(1, nobs)
>>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
>>                ((family$link == "inverse" && all(y == 0)) ||
>>                  (family$link == "log" && all(y <= 0)))) stop(
>>      "cannot find valid starting values: please specify some")
>>        mustart <- y
>>      if (family$link=="log") {
>>        iy <- y<=0
>>        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
>>      } else if (family$link=="inverse") {
>>        iy <- y==0
>>        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
>>      }
>>        })
>>
>> best,
>> Simon
>>
>>
>>> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>>> -             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: gaussian family change suggestion

Brian Ripley
In reply to this post by Simon Wood-3
On Tue, 11 Apr 2006, Simon Wood wrote:

>>> Currently the `gaussian' family's initialization code signals an error if
>>> any response data are zero or negative and a log link is used. Given that
>>> zero or negative response data are perfectly legitimate under the GLM
>>> fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
>>> it be worth changing it?
>>
>> Well, that's not really true: it says it is unable to find starting
>> values, and asks you to provide some.  I don't really see why your choices
>> are satisfactory: what happens if all the data are negative for example?
>
> - then the suggested code traps it.

Yes, but why (it can happen too)?

>> We could try even harder, but code that is almost never used tends to get
>> broken whilst no one is testing it.  So if you want to pursue it I think
>> we need a comprehensive test suite.
>
> - well, I guess, but I wouldn't have thought that zeros in data modelled
> using `gaussian("log")' is such a rare occurance is it? [Or did you mean
> that `gaussian("log")' is almost never used, and should hence be kept
> simple].

I meant initialization code for this case would rarely get used, as I
suppose those people who do this are used to providing starting values
(or there are few or no such people).

>
> I suppose there are arguments both ways...
>
> best,
> Simon
>
>>
>>> The current offending code from `gaussian' is:
>>>
>>> initialize = expression({
>>>            n <- rep.int(1, nobs)
>>>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
>>>                ((family$link == "inverse" && any(y == 0)) ||
>>>                  (family$link == "log" && any(y <= 0)))) stop(
>>>              "cannot find valid starting values: please specify some")
>>>            mustart <- y
>>>        })
>>>
>>> A possible replacement would be ...
>>>
>>> initialize = expression({
>>>            n <- rep.int(1, nobs)
>>>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
>>>                ((family$link == "inverse" && all(y == 0)) ||
>>>                  (family$link == "log" && all(y <= 0)))) stop(
>>>      "cannot find valid starting values: please specify some")
>>>        mustart <- y
>>>      if (family$link=="log") {
>>>        iy <- y<=0
>>>        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
>>>      } else if (family$link=="inverse") {
>>>        iy <- y==0
>>>        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
>>>      }
>>>        })
>>>
>>> best,
>>> Simon
>>>
>>>
>>>> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>>>> -             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>
>> --
>> Brian D. Ripley,                  [hidden email]
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: gaussian family change suggestion

Simon Wood-3
> >>> Currently the `gaussian' family's initialization code signals an error if
> >>> any response data are zero or negative and a log link is used. Given that
> >>> zero or negative response data are perfectly legitimate under the GLM
> >>> fitted using `gaussian("log")', this seems a bit unsatisfactory. Might
> >>> it be worth changing it?
> >>
> >> Well, that's not really true: it says it is unable to find starting
> >> values, and asks you to provide some.  I don't really see why your choices
> >> are satisfactory: what happens if all the data are negative for example?
> >
> > - then the suggested code traps it.
>
> Yes, but why (it can happen too)?

- isn't this (i.e. all response <=0 ) a case where the MLE's of the
linear predictor must tend to minus infinity? So there's probably  an
argument for handling it differently [although my proposed error message
is not the most informative, since modifying the starting values won't help].

best,
Simon

>
> >> We could try even harder, but code that is almost never used tends to get
> >> broken whilst no one is testing it.  So if you want to pursue it I think
> >> we need a comprehensive test suite.
> >
> > - well, I guess, but I wouldn't have thought that zeros in data modelled
> > using `gaussian("log")' is such a rare occurance is it? [Or did you mean
> > that `gaussian("log")' is almost never used, and should hence be kept
> > simple].
>
> I meant initialization code for this case would rarely get used, as I
> suppose those people who do this are used to providing starting values
> (or there are few or no such people).
>
> >
> > I suppose there are arguments both ways...
> >
> > best,
> > Simon
> >
> >>
> >>> The current offending code from `gaussian' is:
> >>>
> >>> initialize = expression({
> >>>            n <- rep.int(1, nobs)
> >>>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >>>                ((family$link == "inverse" && any(y == 0)) ||
> >>>                  (family$link == "log" && any(y <= 0)))) stop(
> >>>              "cannot find valid starting values: please specify some")
> >>>            mustart <- y
> >>>        })
> >>>
> >>> A possible replacement would be ...
> >>>
> >>> initialize = expression({
> >>>            n <- rep.int(1, nobs)
> >>>            if (is.null(etastart) && is.null(start) && is.null(mustart) &&
> >>>                ((family$link == "inverse" && all(y == 0)) ||
> >>>                  (family$link == "log" && all(y <= 0)))) stop(
> >>>      "cannot find valid starting values: please specify some")
> >>>        mustart <- y
> >>>      if (family$link=="log") {
> >>>        iy <- y<=0
> >>>        if (sum(iy)) mustart[iy] <- min(y[!iy])*.5
> >>>      } else if (family$link=="inverse") {
> >>>        iy <- y==0
> >>>        if (sum(iy)) mustart[iy] <- min(abs(y[!iy]))*.5
> >>>      }
> >>>        })
> >>>
> >>> best,
> >>> Simon
> >>>
> >>>
> >>>> - Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
> >>>> -             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>
> >>>
> >>
> >> --
> >> Brian D. Ripley,                  [hidden email]
> >> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford,             Tel:  +44 1865 272861 (self)
> >> 1 South Parks Road,                     +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >>
> >
> >
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel