# Re: [R] choose(n, k) as n approaches k

7 messages
Open this post in threaded view
|

## Re: [R] choose(n, k) as n approaches k

 On 14/01/2020 10:07 a.m., peter dalgaard wrote: > Yep, that looks wrong (probably want to continue discussion over on R-devel) > > I think the culprit is here (in src/nmath/choose.c) >   >     if (k < k_small_max) { >          int j; >          if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >          if (k <  0) return 0.; >          if (k == 0) return 1.; >          /* else: k >= 1 */ > > if n is a near-integer, then k can become non-integer and negative. In your case, > > n == 4 - 1e-7 > k == 4 > n - k == -1e-7 < 4 > n >= 0 > R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) > > so k gets set to > > n - k == -1e-7 > > which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. > >> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 > [1] -9.289025e-11 > > I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true. Duncan Murdoch > > -pd > > > >> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote: >> >> This struck me as incorrect: >> >>> choose(3.999999, 4) >> [1] 0.9999979 >>> choose(3.9999999, 4) >> [1] 0 >>> choose(4, 4) >> [1] 1 >>> choose(4.0000001, 4) >> [1] 4 >>> choose(4.000001, 4) >> [1] 1.000002 >> >> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >> >> Thanks, >> Erik >> >> *** >> sessionInfo() >> R version 3.6.0 beta (2019-04-15 r76395) >> Platform: x86_64-apple-darwin15.6.0 (64-bit) >> Running under: macOS High Sierra 10.13.6 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: [R] choose(n, k) as n approaches k

 OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the         if (k <  0) return 0.;         if (k == 0) return 1.;         /* else: k >= 1 */ part, because at that point k is sure to be integer, possibly after rounding. It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the         return R_IS_INT(n) ? R_forceint(r) : r; part. -pd > On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote: > > On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote: >>> >>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>>> I think the culprit is here (in src/nmath/choose.c) >>>>      if (k < k_small_max) { >>>>         int j; >>>>         if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>>         if (k <  0) return 0.; >>>>         if (k == 0) return 1.; >>>>         /* else: k >= 1 */ >>>> if n is a near-integer, then k can become non-integer and negative. In your case, >>>> n == 4 - 1e-7 >>>> k == 4 >>>> n - k == -1e-7 < 4 >>>> n >= 0 >>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>> so k gets set to >>>> n - k == -1e-7 >>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>> [1] -9.289025e-11 >>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >>> >>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true. >>> >> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. > > That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. > > I believe that lchoose(n, k) already does this. > > Duncan Murdoch > >>     double r, k0 = k; >>     k = R_forceint(k); >> ... >>     if (fabs(k - k0) > 1e-7) >>         MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); >>   >>> Duncan Murdoch >>> >>>> -pd >>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote: >>>>> >>>>> This struck me as incorrect: >>>>> >>>>>> choose(3.999999, 4) >>>>> [1] 0.9999979 >>>>>> choose(3.9999999, 4) >>>>> [1] 0 >>>>>> choose(4, 4) >>>>> [1] 1 >>>>>> choose(4.0000001, 4) >>>>> [1] 4 >>>>>> choose(4.000001, 4) >>>>> [1] 1.000002 >>>>> >>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>>> >>>>> Thanks, >>>>> Erik >>>>> >>>>> *** >>>>> sessionInfo() >>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>> Running under: macOS High Sierra 10.13.6 >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>>>>> and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: [hidden email]  Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: [R] choose(n, k) as n approaches k

 At the risk of throwing oil on a fire.  If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation?  In particular k<0 may not imply the function should evaluate to zero until we get k<=-1. Example: ``` r choose(5, 4) #> [1] 5 gchoose <- function(n, k) {   gamma(n+1)/(gamma(n+1-k) * gamma(k+1)) } gchoose(5, 4) #> [1] 5 gchoose(5, 0) #> [1] 1 gchoose(5, -0.5) #> [1] 0.2351727 ``` > On Jan 14, 2020, at 10:20 AM, peter dalgaard <[hidden email]> wrote: > > OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the > >        if (k <  0) return 0.; >        if (k == 0) return 1.; >        /* else: k >= 1 */ > > part, because at that point k is sure to be integer, possibly after rounding. > > It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the > >        return R_IS_INT(n) ? R_forceint(r) : r; > > part. > > -pd > > > >> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote: >> >> On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote: >>>> >>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>>>> I think the culprit is here (in src/nmath/choose.c) >>>>>     if (k < k_small_max) { >>>>>        int j; >>>>>        if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>>>        if (k <  0) return 0.; >>>>>        if (k == 0) return 1.; >>>>>        /* else: k >= 1 */ >>>>> if n is a near-integer, then k can become non-integer and negative. In your case, >>>>> n == 4 - 1e-7 >>>>> k == 4 >>>>> n - k == -1e-7 < 4 >>>>> n >= 0 >>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>>> so k gets set to >>>>> n - k == -1e-7 >>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>>> [1] -9.289025e-11 >>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >>>> >>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true. >>>> >>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. >> >> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. >> >> I believe that lchoose(n, k) already does this. >> >> Duncan Murdoch >> >>>    double r, k0 = k; >>>    k = R_forceint(k); >>> ... >>>    if (fabs(k - k0) > 1e-7) >>>        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); >>> >>>> Duncan Murdoch >>>> >>>>> -pd >>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote: >>>>>> >>>>>> This struck me as incorrect: >>>>>> >>>>>>> choose(3.999999, 4) >>>>>> [1] 0.9999979 >>>>>>> choose(3.9999999, 4) >>>>>> [1] 0 >>>>>>> choose(4, 4) >>>>>> [1] 1 >>>>>>> choose(4.0000001, 4) >>>>>> [1] 4 >>>>>>> choose(4.000001, 4) >>>>>> [1] 1.000002 >>>>>> >>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>>>> >>>>>> Thanks, >>>>>> Erik >>>>>> >>>>>> *** >>>>>> sessionInfo() >>>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>>> Running under: macOS High Sierra 10.13.6 >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________________________ >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>>>>>> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: [hidden email]  Priv: [hidden email] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel--------------- John Mount http://www.win-vector.com/  Our book: Practical Data Science with R http://practicaldatascience.com          [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: [R] choose(n, k) as n approaches k

Open this post in threaded view
|