A different error in sample()

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

A different error in sample()

Duncan Murdoch-2
This may be a doc error or a coding bug.

The help page for sample says:

"Non-integer positive numerical values of n or x will be truncated to
the next smallest integer, which has to be no larger than
.Machine$integer.max."

This is not true:

 > table(sample(2.5, 1000000, replace = TRUE))

      1      2      3
399933 399716 200351

We shouldn't have those 3's if truncation of x had occurred.

Duncan Murdoch

 > sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Joris FA Meys
I believe the word "truncated" is causing the confusion. 3 is "the next
smallest integer" following 2.5. But it is not the truncation done by
trunc(). Rewording to "rounding the next smallest integer" would get rid of
that confusion imho.

Cheers
Joris

On Wed, Sep 19, 2018 at 7:57 PM Duncan Murdoch <[hidden email]>
wrote:

> This may be a doc error or a coding bug.
>
> The help page for sample says:
>
> "Non-integer positive numerical values of n or x will be truncated to
> the next smallest integer, which has to be no larger than
> .Machine$integer.max."
>
> This is not true:
>
>  > table(sample(2.5, 1000000, replace = TRUE))
>
>       1      2      3
> 399933 399716 200351
>
> We shouldn't have those 3's if truncation of x had occurred.
>
> Duncan Murdoch
>
>  > sessionInfo()
> R version 3.5.1 (2018-07-02)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.1 tools_3.5.1
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

R devel mailing list
In reply to this post by Duncan Murdoch-2
Although it seems to be pretty weird to enter a numeric vector of length one that is not an integer as the first argument to sample(), the results do not seem to match what is documented in the manual. In addition, the results below do not support the use of round rather than truncate in the documentation. Consider the code below.
The first sentence in the details section says: "If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x."
In the console:> 1:2.001
[1] 1 2
> 1:2.9
[1] 1 2

truncation:
> trunc(2.9)
[1] 2

So, this seems to support the quote from in previous emails: "Non-integer positive numerical values of n or x will be truncated to the next smallest integer, which has to be no larger than .Machine$integer.max."
However, again in the console:> set.seed(123)
 > table(sample(2.001, 10000, replace=TRUE))

   1    2    3
5052 4941    7

So, neither rounding nor truncation is occurring. Next, define a sequence.
> x <- seq(2.001, 2.51, length.out=20)
Now, grab all of the threes from sample()-ing this sequence.

 > set.seed(123)
> threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])

Check for NAs (I cheated here and found a nice seed).> any(is.na(threes))
[1] FALSE
Now, the (to me) disturbing result.

> is.unsorted(threes)
[1] FALSE

or equivalently

> all(diff(threes) > 0)
[1] TRUE

So the number of threes grows monotonically as 2.001 moves to 2.5. As I hinted above, the monotonic growth is not assured. My guess is that the growth is stochastic and relates to some "probability weighting" based on how close the element of x is to 3. Perhaps this has been brought up before, but it seems relevant to the current discussion.
A potential aid to this issue would be something like
if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad idea to use vectors of length 1 in the x argument that are not integers.")
Hope that helps,luke

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Dario Strbenac-2
In reply to this post by Joris FA Meys
Good day,

The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.

> round(2.5)
[1] 2

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Wolfgang Huber-3
Besides wording of the documentation re truncating vs rounding, there
is something peculiar going on with the fractional part of n:

 > table(sample.int(2.5, 1e6, replace = TRUE))

      1      2      3
399051 401035 199914

 > table(sample.int(3, 1e6, replace = TRUE))

      1      2      3
332956 332561 334483

 > table(sample.int(2.01, 1e6, replace = TRUE))

      1      2      3
497173 497866   4961

 > sessionInfo()
R Under development (unstable) (2018-09-17 r75319)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Users/whuber/R/lib/libRblas.dylib
LAPACK: /Users/whuber/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] fortunes_1.5-4

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0


20.9.18 03:00, Dario Strbenac scripsit:

> Good day,
>
> The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.
>
>> round(2.5)
> [1] 2
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia

--
With thanks in advance-

Wolfgang

-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

[hidden email]
http://www.huber.embl.de

My book with Susan Holmes: http://www.huber.embl.de/msmb

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Wolfgang Huber-3
FWIW, I suspect this is related to the function R_unif_index that was
introduced in src/main/RNG.c around revision 72356, or the way this
function is used in do_sample in src/main/random.c.

20.9.18 08:19, Wolfgang Huber scripsit:

> Besides wording of the documentation re truncating vs rounding, there is
> something peculiar going on with the fractional part of n:
>
>  > table(sample.int(2.5, 1e6, replace = TRUE))
>
>       1      2      3
> 399051 401035 199914
>
>  > table(sample.int(3, 1e6, replace = TRUE))
>
>       1      2      3
> 332956 332561 334483
>
>  > table(sample.int(2.01, 1e6, replace = TRUE))
>
>       1      2      3
> 497173 497866   4961
>
>  > sessionInfo()
> R Under development (unstable) (2018-09-17 r75319)
> Platform: x86_64-apple-darwin17.7.0 (64-bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS: /Users/whuber/R/lib/libRblas.dylib
> LAPACK: /Users/whuber/R/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] fortunes_1.5-4
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0 tools_3.6.0
>
>
> 20.9.18 03:00, Dario Strbenac scripsit:
>> Good day,
>>
>> The use of "rounding" also doesn't make sense. If The number is
>> halfway between two integers, it is rounded to the nearest even integer.
>>
>>> round(2.5)
>> [1] 2
>>
>> --------------------------------------
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>

--
With thanks in advance-

Wolfgang

-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany

[hidden email]
http://www.huber.embl.de

My book with Susan Holmes: http://www.huber.embl.de/msmb

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Martin Maechler
>>>>> Wolfgang Huber
>>>>>     on Thu, 20 Sep 2018 08:47:47 +0200 writes:

    > FWIW, I suspect this is related to the function
    > R_unif_index that was introduced in src/main/RNG.c around
    > revision 72356, or the way this function is used in
    > do_sample in src/main/random.c.

Yes, it is just the use of 'dn' instead of 'n'
- a one letter thinko I'd say.

But *no*, it's much older than revision 72356; e.g., it's already in

    R version 3.0.0 (2013-04-03) -- "Masked Marvel"

but not yet in

    R version 2.15.3 (2013-03-01) -- "Security Blanket"

----

Here, I clearly think we see a regression bug, and hopefully not
one that should trigger often in practice...
and  -- without any statistics about the consequences out in
package space --
I do think we should fix this in code and let the documentation
become "great again" ;-)

Martin





    > 20.9.18 08:19, Wolfgang Huber scripsit:
    >> Besides wording of the documentation re truncating vs
    >> rounding, there is something peculiar going on with the
    >> fractional part of n:
    >>
    >> > table(sample.int(2.5, 1e6, replace = TRUE))
    >>
    >>      1      2      3 399051 401035 199914
    >>
    >> > table(sample.int(3, 1e6, replace = TRUE))
    >>
    >>      1      2      3 332956 332561 334483
    >>
    >> > table(sample.int(2.01, 1e6, replace = TRUE))
    >>
    >>      1      2      3 497173 497866   4961
    >>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Kim Seonghyun
In reply to this post by Dario Strbenac-2
Hi,

I have not checked the source code, but I think it is because of banker's round.

https://en.wikipedia.org/wiki/Rounding#Round_half_to_even

Best regards,
Kim

-----Original Message-----
From: R-devel <[hidden email]> On Behalf Of Dario Strbenac
Sent: den 20 september 2018 03:00
To: r-devel <[hidden email]>
Subject: Re: [Rd] A different error in sample()

Good day,

The use of "rounding" also doesn't make sense. If The number is halfway between two integers, it is rounded to the nearest even integer.

> round(2.5)
[1] 2

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Emil
In reply to this post by R devel mailing list
But do we handle it as an error in what sample does, or how the documentation is?
I think what is done now would be best described as "ceilinged", i.e. what ceiling() does. But is there an English word to describe this?
Or just use "converted to the next smallest integer"?

But then again, what happens is that the answer is ceilinged, not the input.
I guess the rationale is that multiplying by any integer and then dividing should give the same results:
ceiling(sample(n * x, size=1e6, replace = TRUE) / x) gives the same results for any integer n and x, it's nice that this also holds for non-integer n.
The most important thing is why people would use sample with a non-integer x, I don’t see many use cases.
So I agree with Luke that a warning would be best, regardless of what the docs say.

Best regards,
Emil Bode

    Although it seems to be pretty weird to enter a numeric vector of length one that is not an integer as the first argument to sample(), the results do not seem to match what is documented in the manual. In addition, the results below do not support the use of round rather than truncate in the documentation. Consider the code below.
    The first sentence in the details section says: "If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x."
    In the console:> 1:2.001
    [1] 1 2
    > 1:2.9
    [1] 1 2
   
    truncation:
    > trunc(2.9)
    [1] 2
   
    So, this seems to support the quote from in previous emails: "Non-integer positive numerical values of n or x will be truncated to the next smallest integer, which has to be no larger than .Machine$integer.max."
    However, again in the console:> set.seed(123)
     > table(sample(2.001, 10000, replace=TRUE))
   
       1    2    3
    5052 4941    7
   
    So, neither rounding nor truncation is occurring. Next, define a sequence.
    > x <- seq(2.001, 2.51, length.out=20)
    Now, grab all of the threes from sample()-ing this sequence.
   
     > set.seed(123)
    > threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])
   
    Check for NAs (I cheated here and found a nice seed).> any(is.na(threes))
    [1] FALSE
    Now, the (to me) disturbing result.
   
    > is.unsorted(threes)
    [1] FALSE
   
    or equivalently
   
    > all(diff(threes) > 0)
    [1] TRUE
   
    So the number of threes grows monotonically as 2.001 moves to 2.5. As I hinted above, the monotonic growth is not assured. My guess is that the growth is stochastic and relates to some "probability weighting" based on how close the element of x is to 3. Perhaps this has been brought up before, but it seems relevant to the current discussion.
    A potential aid to this issue would be something like
    if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad idea to use vectors of length 1 in the x argument that are not integers.")
    Hope that helps,luke
   
    [[alternative HTML version deleted]]
   
    ______________________________________________
    [hidden email] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
   

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Joris FA Meys
In reply to this post by R devel mailing list
To be more clear: I do NOT state that the function "round" is used. I read
the documentation as "non integer positive numerical values will be
replaced by the next smallest integer", the important part being the NEXT
smallest integer, i.e. how ceiling() does it. So that's exactly what I
would expect. If "replaced by" causes less confusion than "rounded to" or
"truncated to", then use that.

I do agree that this wording would still indicate that this happens prior
to the sampling, whereas the output indicates that this is done after the
sampling. I can reproduce the sample() outcome using runif() as follows:

> table(ceiling(runif(10000,0,2.1)))
   1    2    3
4774 4756  470

> table(ceiling(runif(10000,0,3)))
   1    2    3
3273 3440 3287

I don't know if that's the intended behaviour, but there is some logic in
it. It's up to the R core team to decide if this is OK and rephrase the
help page so it becomes more clear what actually happens, or simply add
something like

if( (x%%1) != 0) x <- ceiling(x)

prior to the sampling algorithm.

Cheers
Joris

On Thu, Sep 20, 2018 at 9:44 AM lmo via R-devel <[hidden email]>
wrote:

> Although it seems to be pretty weird to enter a numeric vector of length
> one that is not an integer as the first argument to sample(), the results
> do not seem to match what is documented in the manual. In addition, the
> results below do not support the use of round rather than truncate in the
> documentation. Consider the code below.
> The first sentence in the details section says: "If x has length 1, is
> numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes
> place from 1:x."
> In the console:> 1:2.001
> [1] 1 2
> > 1:2.9
> [1] 1 2
>
> truncation:
> > trunc(2.9)
> [1] 2
>
> So, this seems to support the quote from in previous emails: "Non-integer
> positive numerical values of n or x will be truncated to the next smallest
> integer, which has to be no larger than .Machine$integer.max."
> However, again in the console:> set.seed(123)
>  > table(sample(2.001, 10000, replace=TRUE))
>
>    1    2    3
> 5052 4941    7
>
> So, neither rounding nor truncation is occurring. Next, define a sequence.
> > x <- seq(2.001, 2.51, length.out=20)
> Now, grab all of the threes from sample()-ing this sequence.
>
>  > set.seed(123)
> > threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])
>
> Check for NAs (I cheated here and found a nice seed).> any(is.na(threes))
> [1] FALSE
> Now, the (to me) disturbing result.
>
> > is.unsorted(threes)
> [1] FALSE
>
> or equivalently
>
> > all(diff(threes) > 0)
> [1] TRUE
>
> So the number of threes grows monotonically as 2.001 moves to 2.5. As I
> hinted above, the monotonic growth is not assured. My guess is that the
> growth is stochastic and relates to some "probability weighting" based on
> how close the element of x is to 3. Perhaps this has been brought up
> before, but it seems relevant to the current discussion.
> A potential aid to this issue would be something like
> if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad
> idea to use vectors of length 1 in the x argument that are not integers.")
> Hope that helps,luke
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Peter Dalgaard-2
In reply to this post by R devel mailing list
Yup, that is a bug, at least in the documentation. Probably a clearer example is

x <- seq(2.001, 2.999, length.out=999)
threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])
plot(threes, type="l")
curve(10000*(x-2)/x, add=TRUE, col="red")

which is entirely consistent with what you'd expect from floor(runif(10000, 0, y)) + 1, and as far as I can tell from the source, that is what is happening internally.

(Strict monotonicity is a bit of a red herring, it is jut a matter of having spaced the y so far apart that the probability of an order reversal becomes negligible.)

So either we should do what the documentation says we do, or the documentation should not say that we do what we do not actually do...

The suspect code is this snippet from do_sample:

            int n = (int) dn;
            .....

            if (replace || k < 2) {
                for (int i = 0; i < k; i++) iy[i] = (int)(R_unif_index(dn) + 1);
            } else {
                int *x = (int *)R_alloc(n, sizeof(int));
                for (int i = 0; i < n; i++) x[i] = i;
                for (int i = 0; i < k; i++) {
                    int j = (int)(R_unif_index(n));
                    iy[i] = x[j] + 1;
                    x[j] = x[--n];
                }
            }

(notice arguments to R_unif_index)

-pd

> On 20 Sep 2018, at 01:53 , lmo via R-devel <[hidden email]> wrote:
>
> Although it seems to be pretty weird to enter a numeric vector of length one that is not an integer as the first argument to sample(), the results do not seem to match what is documented in the manual. In addition, the results below do not support the use of round rather than truncate in the documentation. Consider the code below.
> The first sentence in the details section says: "If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x."
> In the console:> 1:2.001
> [1] 1 2
>> 1:2.9
> [1] 1 2
>
> truncation:
>> trunc(2.9)
> [1] 2
>
> So, this seems to support the quote from in previous emails: "Non-integer positive numerical values of n or x will be truncated to the next smallest integer, which has to be no larger than .Machine$integer.max."
> However, again in the console:> set.seed(123)
>> table(sample(2.001, 10000, replace=TRUE))
>
>    1    2    3
> 5052 4941    7
>
> So, neither rounding nor truncation is occurring. Next, define a sequence.
>> x <- seq(2.001, 2.51, length.out=20)
> Now, grab all of the threes from sample()-ing this sequence.
>
>> set.seed(123)
>> threes <- sapply(x, function(y) table(sample(y, 10000, replace=TRUE))[3])
>
> Check for NAs (I cheated here and found a nice seed).> any(is.na(threes))
> [1] FALSE
> Now, the (to me) disturbing result.
>
>> is.unsorted(threes)
> [1] FALSE
>
> or equivalently
>
>> all(diff(threes) > 0)
> [1] TRUE
>
> So the number of threes grows monotonically as 2.001 moves to 2.5. As I hinted above, the monotonic growth is not assured. My guess is that the growth is stochastic and relates to some "probability weighting" based on how close the element of x is to 3. Perhaps this has been brought up before, but it seems relevant to the current discussion.
> A potential aid to this issue would be something like
> if(length(x) == 1 && !all.equal(x, as.integer(x))) warning("It is a bad idea to use vectors of length 1 in the x argument that are not integers.")
> Hope that helps,luke
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A different error in sample()

Martin Maechler
In reply to this post by Martin Maechler
>>>>> Martin Maechler
>>>>>     on Thu, 20 Sep 2018 09:20:46 +0200 writes:

>>>>> Wolfgang Huber
>>>>>     on Thu, 20 Sep 2018 08:47:47 +0200 writes:

    >> FWIW, I suspect this is related to the function
    >> R_unif_index that was introduced in src/main/RNG.c around
    >> revision 72356, or the way this function is used in
    >> do_sample in src/main/random.c.

    > Yes, it is just the use of 'dn' instead of 'n'
    > - a one letter thinko I'd say.

    > But *no*, it's much older than revision 72356; e.g., it's already in

    > R version 3.0.0 (2013-04-03) -- "Masked Marvel"

    > but not yet in

    > R version 2.15.3 (2013-03-01) -- "Security Blanket"

    > ----

    > Here, I clearly think we see a regression bug, and hopefully not
    > one that should trigger often in practice...
    > and  -- without any statistics about the consequences out in
    > package space --
    > I do think we should fix this in code and let the documentation
    > become "great again" ;-)

We have agreed that this is simply a regression and should be
fixed without a change to the documenation.

Consequently, ~ 5 minutes ago

$ svn log -v -c75338
------------------------------------------------------------------------
r75338 | maechler | 2018-09-20 17:38:46 +0200 (Thu, 20 Sep 2018) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/main/random.c
   M /trunk/tests/reg-tests-1d.R

revert sample.int(<non-integer>, k, replace=TRUE) to sane pre_R-3.0.0 behaviour
------------------------------------------------------------------------

This is now back to "correct" behaviour  in  "R-devel (>= 75338)"

(and, as Duncan Murdoch also said by choosing this thread's
 Subject, this is really a different issue than the  "Bias in R's....")

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel