proposed pbirthday fix

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

proposed pbirthday fix

knoblauch
Actually, since NaN's are also detected in na.action operations, a
simpler fix
might just be to use the na.rm = TRUE option of min

upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)


> Recent news articles concerning an article from The Lancet with
> fabricated
> data indicate
> that in the sample containing some 900 or so patients, more than 200
> had
> the same
> birthday.  I was curious and tried out the p and q birthday functions
> but
> pbirthday
> could not handle 250 coincidences with n = 1000.  The calculation of
> upper
> prior
> to using uniroot produces NaN,
>
> upper<-min(n^k/(c^(k-1)),1)
>
> I was able to get it to work by using logs, however, as in the
> following
> version
>
> function(n, classes = 365, coincident = 2){
>     k <- coincident
>     c <- classes
>     if (coincident < 2) return(1)
>     if (coincident > n) return(0)
>     if (n > classes * (coincident - 1)) return(1)
>     eps <- 1e-14
>     if (qbirthday(1 - eps, classes, coincident) <= n)
>         return(1 - eps)
>     f <- function(p) qbirthday(p, c, k) - n
>     lower <- 0
>     upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
>     nmin <- uniroot(f, c(lower, upper), tol = eps)
>     nmin$root
> }
        [[alternative text/enriched version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: proposed pbirthday fix

Martin Maechler
>>>>> "ken" == ken knoblauch <[hidden email]>
>>>>>     on Mon, 23 Jan 2006 09:43:28 +0100 writes:

    ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

Well, I liked your first fix better -- thank you for it! --
since it's always good practice to formulate such as to avoid
overflow when possible.
All things considered, I think I'd go for

   upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

Martin

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >> function(n, classes = 365, coincident = 2){
    >>     k <- coincident
    >>     c <- classes
    >>     if (coincident < 2) return(1)
    >>     if (coincident > n) return(0)
    >>     if (n > classes * (coincident - 1)) return(1)
    >>     eps <- 1e-14
    >>     if (qbirthday(1 - eps, classes, coincident) <= n)
    >>     return(1 - eps)
    >>     f <- function(p) qbirthday(p, c, k) - n
    >>     lower <- 0
    >>     upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>     nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>     nmin$root
    >> }

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: proposed pbirthday fix

Martin Maechler
>>>>> "MM" == Martin Maechler <[hidden email]>
>>>>>     on Mon, 23 Jan 2006 11:52:55 +0100 writes:

>>>>> "ken" == ken knoblauch <[hidden email]>
>>>>>     on Mon, 23 Jan 2006 09:43:28 +0100 writes:

    ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

    MM> Well, I liked your first fix better -- thank you for it! --
    MM> since it's always good practice to formulate such as to avoid
    MM> overflow when possible.
    MM> All things considered, I think I'd go for

    MM> upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

    MM> Martin

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >>> function(n, classes = 365, coincident = 2){
        ..................

    >>> upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>> nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>> nmin$root
    >>> }

Well, now after inspection, I think ``get it to work''
is a bit of an exaggeration, at least for a purist like me
(some famous fortune teller once guessed it may be because I'm ... Swiss)
who doesn't like to lose precision in probability computations
unnecessarily. One can do much better:

The version of [pq]birthday() I've just committed to R-devel *) now gives

> sapply(c(20,50,100,200), function(k) pbirthday(1000, coincident= k))
[1]  8.596245e-08  9.252349e-41 2.395639e-112 1.758236e-285

whereas the 'na.rm=TRUE' fix  would simply give

[1] 8.596245e-08 0.000000e+00 0.000000e+00 0.000000e+00

--
Martin Maechler, ETH Zurich

*) peek at https://svn.r-project.org/R/trunk/src/library/stats/R/pbirthday.R

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel