

On 14/01/2020 10:07 a.m., peter dalgaard wrote:
> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>
> I think the culprit is here (in src/nmath/choose.c)
>
> if (k < k_small_max) {
> int j;
> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
> if (k < 0) return 0.;
> if (k == 0) return 1.;
> /* else: k >= 1 */
>
> if n is a nearinteger, then k can become noninteger and negative. In your case,
>
> n == 4  1e7
> k == 4
> n  k == 1e7 < 4
> n >= 0
> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>
> so k gets set to
>
> n  k == 1e7
>
> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>
>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
> [1] 9.289025e11
>
> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
I think that would break symmetry: you want choose(n, k) to equal
choose(n, nk) when n is very close to an integer. So I'd suggest the
replacement whenever R_IS_INT(n) is true.
Duncan Murdoch
>
> pd
>
>
>
>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>
>> This struck me as incorrect:
>>
>>> choose(3.999999, 4)
>> [1] 0.9999979
>>> choose(3.9999999, 4)
>> [1] 0
>>> choose(4, 4)
>> [1] 1
>>> choose(4.0000001, 4)
>> [1] 4
>>> choose(4.000001, 4)
>> [1] 1.000002
>>
>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>
>> Thanks,
>> Erik
>>
>> ***
>> sessionInfo()
>> R version 3.6.0 beta (20190415 r76395)
>> Platform: x86_64appledarwin15.6.0 (64bit)
>> Running under: macOS High Sierra 10.13.6
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
if (k < 0) return 0.;
if (k == 0) return 1.;
/* else: k >= 1 */
part, because at that point k is sure to be integer, possibly after rounding.
It is when nk is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n1)(n2)... product has at least one factor). In the other cases, we get 1 or n(n1)(n2)...(nk+1) which if n is nearinteger gets rounded to produce an integer, due to the
return R_IS_INT(n) ? R_forceint(r) : r;
part.
pd
> On 14 Jan 2020, at 17:02 , Duncan Murdoch < [hidden email]> wrote:
>
> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>>
>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>>> I think the culprit is here (in src/nmath/choose.c)
>>>> if (k < k_small_max) {
>>>> int j;
>>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>>> if (k < 0) return 0.;
>>>> if (k == 0) return 1.;
>>>> /* else: k >= 1 */
>>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>>> n == 4  1e7
>>>> k == 4
>>>> n  k == 1e7 < 4
>>>> n >= 0
>>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>>> so k gets set to
>>>> n  k == 1e7
>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>>> [1] 9.289025e11
>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>>
>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>
>> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
>
> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > nk. I think both n and k should be rounded to integer in this nearinteger situation, regardless of the value of k.
>
> I believe that lchoose(n, k) already does this.
>
> Duncan Murdoch
>
>> double r, k0 = k;
>> k = R_forceint(k);
>> ...
>> if (fabs(k  k0) > 1e7)
>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>
>>> Duncan Murdoch
>>>
>>>> pd
>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>>
>>>>> This struck me as incorrect:
>>>>>
>>>>>> choose(3.999999, 4)
>>>>> [1] 0.9999979
>>>>>> choose(3.9999999, 4)
>>>>> [1] 0
>>>>>> choose(4, 4)
>>>>> [1] 1
>>>>>> choose(4.0000001, 4)
>>>>> [1] 4
>>>>>> choose(4.000001, 4)
>>>>> [1] 1.000002
>>>>>
>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>
>>>>> Thanks,
>>>>> Erik
>>>>>
>>>>> ***
>>>>> sessionInfo()
>>>>> R version 3.6.0 beta (20190415 r76395)
>>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>>> Running under: macOS High Sierra 10.13.6
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>>> and provide commented, minimal, selfcontained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


At the risk of throwing oil on a fire. If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation? In particular k<0 may not imply the function should evaluate to zero until we get k<=1.
Example:
``` r
choose(5, 4)
#> [1] 5
gchoose < function(n, k) {
gamma(n+1)/(gamma(n+1k) * gamma(k+1))
}
gchoose(5, 4)
#> [1] 5
gchoose(5, 0)
#> [1] 1
gchoose(5, 0.5)
#> [1] 0.2351727
```
> On Jan 14, 2020, at 10:20 AM, peter dalgaard < [hidden email]> wrote:
>
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>
> if (k < 0) return 0.;
> if (k == 0) return 1.;
> /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after rounding.
>
> It is when nk is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n1)(n2)... product has at least one factor). In the other cases, we get 1 or n(n1)(n2)...(nk+1) which if n is nearinteger gets rounded to produce an integer, due to the
>
> return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch < [hidden email]> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>> if (k < k_small_max) {
>>>>> int j;
>>>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>>>> if (k < 0) return 0.;
>>>>> if (k == 0) return 1.;
>>>>> /* else: k >= 1 */
>>>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>>>> n == 4  1e7
>>>>> k == 4
>>>>> n  k == 1e7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>>>> so k gets set to
>>>>> n  k == 1e7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>>>> [1] 9.289025e11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>>>
>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
>>
>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > nk. I think both n and k should be rounded to integer in this nearinteger situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>> double r, k0 = k;
>>> k = R_forceint(k);
>>> ...
>>> if (fabs(k  k0) > 1e7)
>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (20190415 r76395)
>>>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>
> 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email] Priv: [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel
John Mount
http://www.winvector.com/ < http://www.winvector.com/>
Our book: Practical Data Science with R
http://practicaldatascience.com < http://practicaldatascience.com/>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


That crossed my mind too, but presumably someone designed choose() to handle the nearinteger cases specially. Otherwise, we already have beta()  you just need to remember what the connection is ;).
I would expect that it has to do with the binomial and negative binomial distributions, but I can't offhand picture a calculation that leads to integer k, n plus/minus a tiny numerical error of the sort that one may encounter with, say, seq().
pd
;) choose(a,b) = 1/(beta(ab+1,b+1)*(a+1)) or thereabouts
> On 14 Jan 2020, at 19:36 , John Mount < [hidden email]> wrote:
>
>
> At the risk of throwing oil on a fire. If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation? In particular k<0 may not imply the function should evaluate to zero until we get k<=1.
>
> Example:
>
> ``` r
> choose(5, 4)
> #> [1] 5
>
> gchoose < function(n, k) {
> gamma(n+1)/(gamma(n+1k) * gamma(k+1))
> }
>
> gchoose(5, 4)
> #> [1] 5
> gchoose(5, 0)
> #> [1] 1
> gchoose(5, 0.5)
> #> [1] 0.2351727
> ```
>
>> On Jan 14, 2020, at 10:20 AM, peter dalgaard < [hidden email]> wrote:
>>
>> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>>
>> if (k < 0) return 0.;
>> if (k == 0) return 1.;
>> /* else: k >= 1 */
>>
>> part, because at that point k is sure to be integer, possibly after rounding.
>>
>> It is when nk is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n1)(n2)... product has at least one factor). In the other cases, we get 1 or n(n1)(n2)...(nk+1) which if n is nearinteger gets rounded to produce an integer, due to the
>>
>> return R_IS_INT(n) ? R_forceint(r) : r;
>>
>> part.
>>
>> pd
>>
>>
>>
>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch < [hidden email]> wrote:
>>>
>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>>>>
>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>> if (k < k_small_max) {
>>>>>> int j;
>>>>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>>>>> if (k < 0) return 0.;
>>>>>> if (k == 0) return 1.;
>>>>>> /* else: k >= 1 */
>>>>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>>>>> n == 4  1e7
>>>>>> k == 4
>>>>>> n  k == 1e7 < 4
>>>>>> n >= 0
>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>>>>> so k gets set to
>>>>>> n  k == 1e7
>>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>>>>> [1] 9.289025e11
>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>>>>
>>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>>
>>>> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
>>>
>>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > nk. I think both n and k should be rounded to integer in this nearinteger situation, regardless of the value of k.
>>>
>>> I believe that lchoose(n, k) already does this.
>>>
>>> Duncan Murdoch
>>>
>>>> double r, k0 = k;
>>>> k = R_forceint(k);
>>>> ...
>>>> if (fabs(k  k0) > 1e7)
>>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>> pd
>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>>>>
>>>>>>> This struck me as incorrect:
>>>>>>>
>>>>>>>> choose(3.999999, 4)
>>>>>>> [1] 0.9999979
>>>>>>>> choose(3.9999999, 4)
>>>>>>> [1] 0
>>>>>>>> choose(4, 4)
>>>>>>> [1] 1
>>>>>>>> choose(4.0000001, 4)
>>>>>>> [1] 4
>>>>>>>> choose(4.000001, 4)
>>>>>>> [1] 1.000002
>>>>>>>
>>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Erik
>>>>>>>
>>>>>>> ***
>>>>>>> sessionInfo()
>>>>>>> R version 3.6.0 beta (20190415 r76395)
>>>>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>
>> 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: [hidden email] Priv: [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rdevel>
> 
> John Mount
> http://www.winvector.com/
> Our book: Practical Data Science with R
> http://practicaldatascience.com
>
>
>
>
>

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Um, n(n1)(n2)...(nk+1)/factorial(k), of course, but I think the argument still holds.
Also, note that there are overflow conditions involved with computing
n(n1)(n2)...(nk+1)/factorial(k)
as written, so it is computed in a loop with alternating multiply and divide steps. This introduces FP errors even if it is known that the result should be integer. I.e., we cannot remove the final "R_IS_INT(n) ? R_forceint(r) : r" if we want choose(n, k) to return an integer if n and k are integers.
pd
> On 14 Jan 2020, at 19:20 , peter dalgaard < [hidden email]> wrote:
>
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>
> if (k < 0) return 0.;
> if (k == 0) return 1.;
> /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after rounding.
>
> It is when nk is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n1)(n2)... product has at least one factor). In the other cases, we get 1 or n(n1)(n2)...(nk+1) which if n is nearinteger gets rounded to produce an integer, due to the
>
> return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch < [hidden email]> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>> if (k < k_small_max) {
>>>>> int j;
>>>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>>>> if (k < 0) return 0.;
>>>>> if (k == 0) return 1.;
>>>>> /* else: k >= 1 */
>>>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>>>> n == 4  1e7
>>>>> k == 4
>>>>> n  k == 1e7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>>>> so k gets set to
>>>>> n  k == 1e7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>>>> [1] 9.289025e11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>>>
>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
>>
>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > nk. I think both n and k should be rounded to integer in this nearinteger situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>> double r, k0 = k;
>>> k = R_forceint(k);
>>> ...
>>> if (fabs(k  k0) > 1e7)
>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (20190415 r76395)
>>>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>
> 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email] Priv: [hidden email]
>
>
>
>
>
>
>
>
>

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


More importantly, the gamma function is not nice for negative integer arguments, so gchoose() blows up for integer k and n=1,2,...
> gchoose(2,4)
[1] NaN
Warning messages:
1: In gamma(n + 1) : NaNs produced
2: In gamma(n + 1  k) : NaNs produced
> choose(2,4)
[1] 5
and so does the formula with beta().
Of course, choose(r, k) is exactly what you need for the traditional formulation of the negative binomial distribution as the number of successes to get r failures.
pd
> On 15 Jan 2020, at 01:25 , peter dalgaard < [hidden email]> wrote:
>
> That crossed my mind too, but presumably someone designed choose() to handle the nearinteger cases specially. Otherwise, we already have beta()  you just need to remember what the connection is ;).
>
> I would expect that it has to do with the binomial and negative binomial distributions, but I can't offhand picture a calculation that leads to integer k, n plus/minus a tiny numerical error of the sort that one may encounter with, say, seq().
>
> pd
>
> ;) choose(a,b) = 1/(beta(ab+1,b+1)*(a+1)) or thereabouts
>
>> On 14 Jan 2020, at 19:36 , John Mount < [hidden email]> wrote:
>>
>>
>> At the risk of throwing oil on a fire. If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation? In particular k<0 may not imply the function should evaluate to zero until we get k<=1.
>>
>> Example:
>>
>> ``` r
>> choose(5, 4)
>> #> [1] 5
>>
>> gchoose < function(n, k) {
>> gamma(n+1)/(gamma(n+1k) * gamma(k+1))
>> }
>>
>> gchoose(5, 4)
>> #> [1] 5
>> gchoose(5, 0)
>> #> [1] 1
>> gchoose(5, 0.5)
>> #> [1] 0.2351727
>> ```
>>
>>> On Jan 14, 2020, at 10:20 AM, peter dalgaard < [hidden email]> wrote:
>>>
>>> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>>>
>>> if (k < 0) return 0.;
>>> if (k == 0) return 1.;
>>> /* else: k >= 1 */
>>>
>>> part, because at that point k is sure to be integer, possibly after rounding.
>>>
>>> It is when nk is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n1)(n2)... product has at least one factor). In the other cases, we get 1 or n(n1)(n2)...(nk+1) which if n is nearinteger gets rounded to produce an integer, due to the
>>>
>>> return R_IS_INT(n) ? R_forceint(r) : r;
>>>
>>> part.
>>>
>>> pd
>>>
>>>
>>>
>>>> On 14 Jan 2020, at 17:02 , Duncan Murdoch < [hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>>>>>
>>>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>>>> if (k < k_small_max) {
>>>>>>> int j;
>>>>>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>>>>>> if (k < 0) return 0.;
>>>>>>> if (k == 0) return 1.;
>>>>>>> /* else: k >= 1 */
>>>>>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>>>>>> n == 4  1e7
>>>>>>> k == 4
>>>>>>> n  k == 1e7 < 4
>>>>>>> n >= 0
>>>>>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>>>>>> so k gets set to
>>>>>>> n  k == 1e7
>>>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>>>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>>>>>> [1] 9.289025e11
>>>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>>>>>
>>>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>>>
>>>>> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
>>>>
>>>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > nk. I think both n and k should be rounded to integer in this nearinteger situation, regardless of the value of k.
>>>>
>>>> I believe that lchoose(n, k) already does this.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>> double r, k0 = k;
>>>>> k = R_forceint(k);
>>>>> ...
>>>>> if (fabs(k  k0) > 1e7)
>>>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>> pd
>>>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>>>>>
>>>>>>>> This struck me as incorrect:
>>>>>>>>
>>>>>>>>> choose(3.999999, 4)
>>>>>>>> [1] 0.9999979
>>>>>>>>> choose(3.9999999, 4)
>>>>>>>> [1] 0
>>>>>>>>> choose(4, 4)
>>>>>>>> [1] 1
>>>>>>>>> choose(4.0000001, 4)
>>>>>>>> [1] 4
>>>>>>>>> choose(4.000001, 4)
>>>>>>>> [1] 1.000002
>>>>>>>>
>>>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Erik
>>>>>>>>
>>>>>>>> ***
>>>>>>>> sessionInfo()
>>>>>>>> R version 3.6.0 beta (20190415 r76395)
>>>>>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>
>>> 
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: [hidden email] Priv: [hidden email]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rdevel>>
>> 
>> John Mount
>> http://www.winvector.com/
>> Our book: Practical Data Science with R
>> http://practicaldatascience.com
>>
>>
>>
>>
>>
>
> 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email] Priv: [hidden email]
>
>
>
>
>
>
>
>
>

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Fix committed to Rdevel w/regression test. I settled for just doing
k = R_forceint(n  k)
inside
if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
so that k stays integer.
In principle, you also could prefix this code
r = n;
for(j = 2; j <= k; j++)
r *= (nj+1)/j;
return R_IS_INT(n) ? R_forceint(r) : r;
/* might have got rounding errors */
with
if(R_IS_INT(n)) n = R_forceint(n);
but as I said, I believe that is really a noop because of the rounding at the end.
pd
> On 14 Jan 2020, at 19:20 , peter dalgaard < [hidden email]> wrote:
>
> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the
>
> if (k < 0) return 0.;
> if (k == 0) return 1.;
> /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after rounding.
>
> It is when nk is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n1)(n2)... product has at least one factor). In the other cases, we get 1 or n(n1)(n2)...(nk+1) which if n is nearinteger gets rounded to produce an integer, due to the
>
> return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch < [hidden email]> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>> if (k < k_small_max) {
>>>>> int j;
>>>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>>>> if (k < 0) return 0.;
>>>>> if (k == 0) return 1.;
>>>>> /* else: k >= 1 */
>>>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>>>> n == 4  1e7
>>>>> k == 4
>>>>> n  k == 1e7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>>>> so k gets set to
>>>>> n  k == 1e7
>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>>>> [1] 9.289025e11
>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>>>
>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
>>
>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > nk. I think both n and k should be rounded to integer in this nearinteger situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>> double r, k0 = k;
>>> k = R_forceint(k);
>>> ...
>>> if (fabs(k  k0) > 1e7)
>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (20190415 r76395)
>>>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>
> 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email] Priv: [hidden email]
>
>
>
>
>
>
>
>
>

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel

