

This struck me as incorrect:
> choose(3.999999, 4)
[1] 0.9999979
> choose(3.9999999, 4)
[1] 0
> choose(4, 4)
[1] 1
> choose(4.0000001, 4)
[1] 4
> choose(4.000001, 4)
[1] 1.000002
Should base::choose(n, k) check whether n is within machine precision of k and return 1?
Thanks,
Erik
***
sessionInfo()
R version 3.6.0 beta (20190415 r76395)
Platform: x86_64appledarwin15.6.0 (64bit)
Running under: macOS High Sierra 10.13.6
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 13/01/2020 6:33 p.m., Wright, Erik Scott wrote:
> This struck me as incorrect:
>
>> choose(3.999999, 4)
> [1] 0.9999979
>> choose(3.9999999, 4)
> [1] 0
>> choose(4, 4)
> [1] 1
>> choose(4.0000001, 4)
> [1] 4
>> choose(4.000001, 4)
> [1] 1.000002
>
> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
I don't think that's the solution. The current code checks whether n is
within 1e7 of an integer; if it is and nk is smaller than k, it
computes choose(n, nk) instead. The problem in your second example is
that nk < 0 which implies the answer should be zero. In the 4th
example nk > 0 but it is not an integer; the code rounds k to an
integer, but the transformation to nk happens after that, so the code
ends up working with a noninteger.
I think a solution would be to force n to be an integer if it is very
close to one.
I note that the source to lchoose() seems to already do this: it
handles your examples nicely.
Duncan Murdoch
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Yep, that looks wrong (probably want to continue discussion over on Rdevel)
I think the culprit is here (in src/nmath/choose.c)
if (k < k_small_max) {
int j;
if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
if (k < 0) return 0.;
if (k == 0) return 1.;
/* else: k >= 1 */
if n is a nearinteger, then k can become noninteger and negative. In your case,
n == 4  1e7
k == 4
n  k == 1e7 < 4
n >= 0
R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
so k gets set to
n  k == 1e7
which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
[1] 9.289025e11
I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
pd
> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>
> This struck me as incorrect:
>
>> choose(3.999999, 4)
> [1] 0.9999979
>> choose(3.9999999, 4)
> [1] 0
>> choose(4, 4)
> [1] 1
>> choose(4.0000001, 4)
> [1] 4
>> choose(4.000001, 4)
> [1] 1.000002
>
> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>
> Thanks,
> Erik
>
> ***
> sessionInfo()
> R version 3.6.0 beta (20190415 r76395)
> Platform: x86_64appledarwin15.6.0 (64bit)
> Running under: macOS High Sierra 10.13.6
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>
> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>> I think the culprit is here (in src/nmath/choose.c)
>> if (k < k_small_max) {
>> int j;
>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>> if (k < 0) return 0.;
>> if (k == 0) return 1.;
>> /* else: k >= 1 */
>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>> n == 4  1e7
>> k == 4
>> n  k == 1e7 < 4
>> n >= 0
>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>> so k gets set to
>> n  k == 1e7
>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>> [1] 9.289025e11
>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>
> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>
But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
double r, k0 = k;
k = R_forceint(k);
...
if (fabs(k  k0) > 1e7)
MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
> Duncan Murdoch
>
>> pd
>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>
>>> This struck me as incorrect:
>>>
>>>> choose(3.999999, 4)
>>> [1] 0.9999979
>>>> choose(3.9999999, 4)
>>> [1] 0
>>>> choose(4, 4)
>>> [1] 1
>>>> choose(4.0000001, 4)
>>> [1] 4
>>>> choose(4.000001, 4)
>>> [1] 1.000002
>>>
>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>
>>> Thanks,
>>> Erik
>>>
>>> ***
>>> sessionInfo()
>>> R version 3.6.0 beta (20190415 r76395)
>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>> Running under: macOS High Sierra 10.13.6
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>> and provide commented, minimal, selfcontained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>
>
>> On 14 Jan 2020, at 16:21 , Duncan Murdoch < [hidden email]> wrote:
>>
>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>> Yep, that looks wrong (probably want to continue discussion over on Rdevel)
>>> I think the culprit is here (in src/nmath/choose.c)
>>> if (k < k_small_max) {
>>> int j;
>>> if(nk < k && n >= 0 && R_IS_INT(n)) k = nk; /* < Symmetry */
>>> if (k < 0) return 0.;
>>> if (k == 0) return 1.;
>>> /* else: k >= 1 */
>>> if n is a nearinteger, then k can become noninteger and negative. In your case,
>>> n == 4  1e7
>>> k == 4
>>> n  k == 1e7 < 4
>>> n >= 0
>>> R_IS_INT(n) = TRUE (relative diff < 1e7 is allowed)
>>> so k gets set to
>>> n  k == 1e7
>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n > 4, e.g.
>>>> factorial(4  1e10)/factorial(1e10)/factorial(4) 1
>>> [1] 9.289025e11
>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n  k step.
>>
>> I think that would break symmetry: you want choose(n, k) to equal choose(n, nk) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true.
>>
>
> But choose() very deliberately ensures that k is integer, so choose(n, nk) is illdefined for noninteger n.
That's only true if there's a big difference. I'd be worried about
cases where n and k are close to integers (within 1e7). In those
cases, k is silently rounded to integer. As I read your suggestion, n
would only be rounded to integer if k > nk. I think both n and k
should be rounded to integer in this nearinteger situation, regardless
of the value of k.
I believe that lchoose(n, k) already does this.
Duncan Murdoch
>
> double r, k0 = k;
> k = R_forceint(k);
> ...
> if (fabs(k  k0) > 1e7)
> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);
>
>
>> Duncan Murdoch
>>
>>> pd
>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott < [hidden email]> wrote:
>>>>
>>>> This struck me as incorrect:
>>>>
>>>>> choose(3.999999, 4)
>>>> [1] 0.9999979
>>>>> choose(3.9999999, 4)
>>>> [1] 0
>>>>> choose(4, 4)
>>>> [1] 1
>>>>> choose(4.0000001, 4)
>>>> [1] 4
>>>>> choose(4.000001, 4)
>>>> [1] 1.000002
>>>>
>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1?
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>> ***
>>>> sessionInfo()
>>>> R version 3.6.0 beta (20190415 r76395)
>>>> Platform: x86_64appledarwin15.6.0 (64bit)
>>>> Running under: macOS High Sierra 10.13.6
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>>>> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

