Re: [R] choose(n, k) as n approaches k

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

Duncan Murdoch-2
On 14/01/2020 10:07 a.m., peter dalgaard wrote:

> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>
> I think the culprit is here (in src/nmath/choose.c)
>  
>     if (k < k_small_max) {
>          int j;
>          if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>          if (k <  0) return 0.;
>          if (k == 0) return 1.;
>          /* else: k >= 1 */
>
> if n is a near-integer, then k can become non-integer and negative. In your case,
>
> n == 4 - 1e-7
> k == 4
> n - k == -1e-7 < 4
> n >= 0
> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>
> so k gets set to
>
> n - k == -1e-7
>
> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>
>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
> [1] -9.289025e-11
>
> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.

I think that would break symmetry:  you want choose(n, k) to equal
choose(n, n-k) when n is very close to an integer.  So I'd suggest the
replacement whenever R_IS_INT(n) is true.

Duncan Murdoch

>
> -pd
>
>
>
>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>
>> This struck me as incorrect:
>>
>>> choose(3.999999, 4)
>> [1] 0.9999979
>>> choose(3.9999999, 4)
>> [1] 0
>>> choose(4, 4)
>> [1] 1
>>> choose(4.0000001, 4)
>> [1] 4
>>> choose(4.000001, 4)
>> [1] 1.000002
>>
>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>
>> Thanks,
>> Erik
>>
>> ***
>> sessionInfo()
>> R version 3.6.0 beta (2019-04-15 r76395)
>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>> Running under: macOS High Sierra 10.13.6
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

Peter Dalgaard-2
OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the

        if (k <  0) return 0.;
        if (k == 0) return 1.;
        /* else: k >= 1 */

part, because at that point k is sure to be integer, possibly after rounding.

It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the

        return R_IS_INT(n) ? R_forceint(r) : r;

part.

-pd



> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote:
>
> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote:
>>>
>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>      if (k < k_small_max) {
>>>>         int j;
>>>>         if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>         if (k <  0) return 0.;
>>>>         if (k == 0) return 1.;
>>>>         /* else: k >= 1 */
>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>> n == 4 - 1e-7
>>>> k == 4
>>>> n - k == -1e-7 < 4
>>>> n >= 0
>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>> so k gets set to
>>>> n - k == -1e-7
>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>> [1] -9.289025e-11
>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>
>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>
>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>
> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>
> I believe that lchoose(n, k) already does this.
>
> Duncan Murdoch
>
>>     double r, k0 = k;
>>     k = R_forceint(k);
>> ...
>>     if (fabs(k - k0) > 1e-7)
>>         MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>  
>>> Duncan Murdoch
>>>
>>>> -pd
>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>>>>
>>>>> This struck me as incorrect:
>>>>>
>>>>>> choose(3.999999, 4)
>>>>> [1] 0.9999979
>>>>>> choose(3.9999999, 4)
>>>>> [1] 0
>>>>>> choose(4, 4)
>>>>> [1] 1
>>>>>> choose(4.0000001, 4)
>>>>> [1] 4
>>>>>> choose(4.000001, 4)
>>>>> [1] 1.000002
>>>>>
>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>
>>>>> Thanks,
>>>>> Erik
>>>>>
>>>>> ***
>>>>> sessionInfo()
>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>> Running under: macOS High Sierra 10.13.6
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

John Mount

At the risk of throwing oil on a fire.  If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation?  In particular k<0 may not imply the function should evaluate to zero until we get k<=-1.

Example:

``` r
choose(5, 4)
#> [1] 5

gchoose <- function(n, k) {
  gamma(n+1)/(gamma(n+1-k) * gamma(k+1))
}

gchoose(5, 4)
#> [1] 5
gchoose(5, 0)
#> [1] 1
gchoose(5, -0.5)
#> [1] 0.2351727
```

> On Jan 14, 2020, at 10:20 AM, peter dalgaard <[hidden email]> wrote:
>
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>
>        if (k <  0) return 0.;
>        if (k == 0) return 1.;
>        /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after rounding.
>
> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
>
>        return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> -pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>     if (k < k_small_max) {
>>>>>        int j;
>>>>>        if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>        if (k <  0) return 0.;
>>>>>        if (k == 0) return 1.;
>>>>>        /* else: k >= 1 */
>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>> n == 4 - 1e-7
>>>>> k == 4
>>>>> n - k == -1e-7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>> so k gets set to
>>>>> n - k == -1e-7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>> [1] -9.289025e-11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>>
>>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>>
>> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>>    double r, k0 = k;
>>>    k = R_forceint(k);
>>> ...
>>>    if (fabs(k - k0) > 1e-7)
>>>        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> -pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

---------------
John Mount
http://www.win-vector.com/ <http://www.win-vector.com/>
Our book: Practical Data Science with R
http://practicaldatascience.com <http://practicaldatascience.com/>






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

Peter Dalgaard-2
That crossed my mind too, but presumably someone designed choose() to handle the near-integer cases specially. Otherwise, we already have beta() -- you just need to remember what the connection is ;-).

I would expect that it has to do with the binomial and negative binomial distributions, but I can't offhand picture a calculation that leads to integer k, n plus/minus a tiny numerical error of the sort that one may encounter with, say, seq().

-pd

;-) choose(a,b) = 1/(beta(a-b+1,b+1)*(a+1)) or thereabouts

> On 14 Jan 2020, at 19:36 , John Mount <[hidden email]> wrote:
>
>
> At the risk of throwing oil on a fire.  If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation?  In particular k<0 may not imply the function should evaluate to zero until we get k<=-1.
>
> Example:
>
> ``` r
> choose(5, 4)
> #> [1] 5
>
> gchoose <- function(n, k) {
>   gamma(n+1)/(gamma(n+1-k) * gamma(k+1))
> }
>
> gchoose(5, 4)
> #> [1] 5
> gchoose(5, 0)
> #> [1] 1
> gchoose(5, -0.5)
> #> [1] 0.2351727
> ```
>
>> On Jan 14, 2020, at 10:20 AM, peter dalgaard <[hidden email]> wrote:
>>
>> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>>
>>        if (k <  0) return 0.;
>>        if (k == 0) return 1.;
>>        /* else: k >= 1 */
>>
>> part, because at that point k is sure to be integer, possibly after rounding.
>>
>> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
>>
>>        return R_IS_INT(n) ? R_forceint(r) : r;
>>
>> part.
>>
>> -pd
>>
>>
>>
>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote:
>>>
>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote:
>>>>>
>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>>     if (k < k_small_max) {
>>>>>>        int j;
>>>>>>        if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>>        if (k <  0) return 0.;
>>>>>>        if (k == 0) return 1.;
>>>>>>        /* else: k >= 1 */
>>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>>> n == 4 - 1e-7
>>>>>> k == 4
>>>>>> n - k == -1e-7 < 4
>>>>>> n >= 0
>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>>> so k gets set to
>>>>>> n - k == -1e-7
>>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>>> [1] -9.289025e-11
>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>>>
>>>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>>
>>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>>>
>>> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>>>
>>> I believe that lchoose(n, k) already does this.
>>>
>>> Duncan Murdoch
>>>
>>>>    double r, k0 = k;
>>>>    k = R_forceint(k);
>>>> ...
>>>>    if (fabs(k - k0) > 1e-7)
>>>>        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>> -pd
>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>>>>>>
>>>>>>> This struck me as incorrect:
>>>>>>>
>>>>>>>> choose(3.999999, 4)
>>>>>>> [1] 0.9999979
>>>>>>>> choose(3.9999999, 4)
>>>>>>> [1] 0
>>>>>>>> choose(4, 4)
>>>>>>> [1] 1
>>>>>>>> choose(4.0000001, 4)
>>>>>>> [1] 4
>>>>>>>> choose(4.000001, 4)
>>>>>>> [1] 1.000002
>>>>>>>
>>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Erik
>>>>>>>
>>>>>>> ***
>>>>>>> sessionInfo()
>>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: [hidden email]  Priv: [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ---------------
> John Mount
> http://www.win-vector.com/ 
> Our book: Practical Data Science with R
> http://practicaldatascience.com 
>
>
>
>
>

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

Peter Dalgaard-2
In reply to this post by Peter Dalgaard-2
Um, n(n-1)(n-2)...(n-k+1)/factorial(k), of course, but I think the argument still holds.

Also, note that there are overflow conditions involved with computing

n(n-1)(n-2)...(n-k+1)/factorial(k)
 
as written, so it is computed in a loop with alternating multiply and divide steps. This introduces FP errors even if it is known that the result should be integer. I.e., we cannot remove the final "R_IS_INT(n) ? R_forceint(r) : r" if we want choose(n, k) to return an integer if n and k are integers.

-pd

> On 14 Jan 2020, at 19:20 , peter dalgaard <[hidden email]> wrote:
>
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>
>        if (k <  0) return 0.;
>        if (k == 0) return 1.;
>        /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after rounding.
>
> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
>
>        return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> -pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>     if (k < k_small_max) {
>>>>>        int j;
>>>>>        if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>        if (k <  0) return 0.;
>>>>>        if (k == 0) return 1.;
>>>>>        /* else: k >= 1 */
>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>> n == 4 - 1e-7
>>>>> k == 4
>>>>> n - k == -1e-7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>> so k gets set to
>>>>> n - k == -1e-7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>> [1] -9.289025e-11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>>
>>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>>
>> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>>    double r, k0 = k;
>>>    k = R_forceint(k);
>>> ...
>>>    if (fabs(k - k0) > 1e-7)
>>>        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> -pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

Peter Dalgaard-2
In reply to this post by Peter Dalgaard-2

More importantly, the gamma function is not nice for negative integer arguments, so gchoose() blows up for integer k and n=-1,-2,...

> gchoose(-2,4)
[1] NaN
Warning messages:
1: In gamma(n + 1) : NaNs produced
2: In gamma(n + 1 - k) : NaNs produced
> choose(-2,4)
[1] 5

and so does the formula with beta().

Of course, choose(-r, k) is exactly what you need for the traditional formulation of the negative binomial distribution as the number of successes to get r failures.

-pd

> On 15 Jan 2020, at 01:25 , peter dalgaard <[hidden email]> wrote:
>
> That crossed my mind too, but presumably someone designed choose() to handle the near-integer cases specially. Otherwise, we already have beta() -- you just need to remember what the connection is ;-).
>
> I would expect that it has to do with the binomial and negative binomial distributions, but I can't offhand picture a calculation that leads to integer k, n plus/minus a tiny numerical error of the sort that one may encounter with, say, seq().
>
> -pd
>
> ;-) choose(a,b) = 1/(beta(a-b+1,b+1)*(a+1)) or thereabouts
>
>> On 14 Jan 2020, at 19:36 , John Mount <[hidden email]> wrote:
>>
>>
>> At the risk of throwing oil on a fire.  If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation?  In particular k<0 may not imply the function should evaluate to zero until we get k<=-1.
>>
>> Example:
>>
>> ``` r
>> choose(5, 4)
>> #> [1] 5
>>
>> gchoose <- function(n, k) {
>>  gamma(n+1)/(gamma(n+1-k) * gamma(k+1))
>> }
>>
>> gchoose(5, 4)
>> #> [1] 5
>> gchoose(5, 0)
>> #> [1] 1
>> gchoose(5, -0.5)
>> #> [1] 0.2351727
>> ```
>>
>>> On Jan 14, 2020, at 10:20 AM, peter dalgaard <[hidden email]> wrote:
>>>
>>> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>>>
>>>       if (k <  0) return 0.;
>>>       if (k == 0) return 1.;
>>>       /* else: k >= 1 */
>>>
>>> part, because at that point k is sure to be integer, possibly after rounding.
>>>
>>> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
>>>
>>>       return R_IS_INT(n) ? R_forceint(r) : r;
>>>
>>> part.
>>>
>>> -pd
>>>
>>>
>>>
>>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote:
>>>>>>
>>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>>>    if (k < k_small_max) {
>>>>>>>       int j;
>>>>>>>       if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>>>       if (k <  0) return 0.;
>>>>>>>       if (k == 0) return 1.;
>>>>>>>       /* else: k >= 1 */
>>>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>>>> n == 4 - 1e-7
>>>>>>> k == 4
>>>>>>> n - k == -1e-7 < 4
>>>>>>> n >= 0
>>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>>>> so k gets set to
>>>>>>> n - k == -1e-7
>>>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>>>> [1] -9.289025e-11
>>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>>>>
>>>>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>>>
>>>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>>>>
>>>> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>>>>
>>>> I believe that lchoose(n, k) already does this.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>   double r, k0 = k;
>>>>>   k = R_forceint(k);
>>>>> ...
>>>>>   if (fabs(k - k0) > 1e-7)
>>>>>       MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>> -pd
>>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> This struck me as incorrect:
>>>>>>>>
>>>>>>>>> choose(3.999999, 4)
>>>>>>>> [1] 0.9999979
>>>>>>>>> choose(3.9999999, 4)
>>>>>>>> [1] 0
>>>>>>>>> choose(4, 4)
>>>>>>>> [1] 1
>>>>>>>>> choose(4.0000001, 4)
>>>>>>>> [1] 4
>>>>>>>>> choose(4.000001, 4)
>>>>>>>> [1] 1.000002
>>>>>>>>
>>>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Erik
>>>>>>>>
>>>>>>>> ***
>>>>>>>> sessionInfo()
>>>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: [hidden email]  Priv: [hidden email]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ---------------
>> John Mount
>> http://www.win-vector.com/ 
>> Our book: Practical Data Science with R
>> http://practicaldatascience.com 
>>
>>
>>
>>
>>
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [R] choose(n, k) as n approaches k

Peter Dalgaard-2
In reply to this post by Peter Dalgaard-2

Fix committed to R-devel w/regression test. I settled for just doing

k = R_forceint(n - k)

inside

if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */

so that k stays integer.

In principle, you also could prefix this code

        r = n;
        for(j = 2; j <= k; j++)
            r *= (n-j+1)/j;
        return R_IS_INT(n) ? R_forceint(r) : r;
        /* might have got rounding errors */
with

        if(R_IS_INT(n)) n = R_forceint(n);

but as I said, I believe that is really a no-op because of the rounding at the end.

-pd



> On 14 Jan 2020, at 19:20 , peter dalgaard <[hidden email]> wrote:
>
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>
>        if (k <  0) return 0.;
>        if (k == 0) return 1.;
>        /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after rounding.
>
> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the
>
>        return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> -pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <[hidden email]> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <[hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>     if (k < k_small_max) {
>>>>>        int j;
>>>>>        if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>>        if (k <  0) return 0.;
>>>>>        if (k == 0) return 1.;
>>>>>        /* else: k >= 1 */
>>>>> if n is a near-integer, then k can become non-integer and negative. In your case,
>>>>> n == 4 - 1e-7
>>>>> k == 4
>>>>> n - k == -1e-7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>> so k gets set to
>>>>> n - k == -1e-7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>> [1] -9.289025e-11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.
>>>>
>>>> I think that would break symmetry:  you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer.  So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.
>>
>> That's only true if there's a big difference.  I'd be worried about cases where n and k are close to integers (within 1e-7).  In those cases, k is silently rounded to integer.  As I read your suggestion, n would only be rounded to integer if k > n-k.  I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>>    double r, k0 = k;
>>>    k = R_forceint(k);
>>> ...
>>>    if (fabs(k - k0) > 1e-7)
>>>        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> -pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <[hidden email]> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel