bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample

Berry, Charles
When  `length( skewed.probs ) > 200' uniform samples are generated in R-devel.

R-3.5.1 behaves as expected.

`epsilon` can be a lot bigger than illustrated and still the uniform distribution is produced.


Chuck

> set.seed(123)
>
> epsilon <- 1e-10
>
> ## uniform to 200 then small
> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200)))
> ## uniform to 201 then small
> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201)))
>
> brks  <- c(0,99,199,200,201,Inf)
> tab200 <- sample( length(p200), 10000, prob=p200, replace=TRUE)
> tab201 <- sample( length(p201), 10000, prob=p201, replace=TRUE)
>
> cbind(
+   s200=table(cut(tab200, brks)),
+   p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 10000 ,1),
+   s201=table(cut(tab201, brks )),
+   p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 10000 ,1))
          s200 p200 s201   p201
(0,99]    5017 4950  984 4925.4
(99,199]  4925 5000  959 4975.1
(199,200]   58   50    9   49.8
(200,201]    0    0    6   49.8
(201,Inf]    0    0 8042    0.0
>
>
>
>
> sessionInfo()
R Under development (unstable) (2019-03-02 r76189)
Platform: x86_64-apple-darwin18.2.0 (64-bit)
Running under: macOS Mojave 10.14.3

Matrix products: default
BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib
LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

loaded via a namespace (and not attached):
[1] compiler_3.6.0
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample

Tierney, Luke
Thanks. We'll need to look into how best to address this.

Best,

luke

On Sun, 3 Mar 2019, Berry, Charles wrote:

> When  `length( skewed.probs ) > 200' uniform samples are generated in R-devel.
>
> R-3.5.1 behaves as expected.
>
> `epsilon` can be a lot bigger than illustrated and still the uniform distribution is produced.
>
>
> Chuck
>
>> set.seed(123)
>>
>> epsilon <- 1e-10
>>
>> ## uniform to 200 then small
>> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200)))
>> ## uniform to 201 then small
>> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201)))
>>
>> brks  <- c(0,99,199,200,201,Inf)
>> tab200 <- sample( length(p200), 10000, prob=p200, replace=TRUE)
>> tab201 <- sample( length(p201), 10000, prob=p201, replace=TRUE)
>>
>> cbind(
> +   s200=table(cut(tab200, brks)),
> +   p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 10000 ,1),
> +   s201=table(cut(tab201, brks )),
> +   p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 10000 ,1))
>          s200 p200 s201   p201
> (0,99]    5017 4950  984 4925.4
> (99,199]  4925 5000  959 4975.1
> (199,200]   58   50    9   49.8
> (200,201]    0    0    6   49.8
> (201,Inf]    0    0 8042    0.0
>>
>>
>>
>>
>> sessionInfo()
> R Under development (unstable) (2019-03-02 r76189)
> Platform: x86_64-apple-darwin18.2.0 (64-bit)
> Running under: macOS Mojave 10.14.3
>
> Matrix products: default
> BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib
> LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample

Tierney, Luke
In reply to this post by Berry, Charles
This is now fixed in R-devel.

Best,

luke

On Sun, 3 Mar 2019, Berry, Charles wrote:

> When  `length( skewed.probs ) > 200' uniform samples are generated in R-devel.
>
> R-3.5.1 behaves as expected.
>
> `epsilon` can be a lot bigger than illustrated and still the uniform distribution is produced.
>
>
> Chuck
>
>> set.seed(123)
>>
>> epsilon <- 1e-10
>>
>> ## uniform to 200 then small
>> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200)))
>> ## uniform to 201 then small
>> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201)))
>>
>> brks  <- c(0,99,199,200,201,Inf)
>> tab200 <- sample( length(p200), 10000, prob=p200, replace=TRUE)
>> tab201 <- sample( length(p201), 10000, prob=p201, replace=TRUE)
>>
>> cbind(
> +   s200=table(cut(tab200, brks)),
> +   p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 10000 ,1),
> +   s201=table(cut(tab201, brks )),
> +   p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 10000 ,1))
>          s200 p200 s201   p201
> (0,99]    5017 4950  984 4925.4
> (99,199]  4925 5000  959 4975.1
> (199,200]   58   50    9   49.8
> (200,201]    0    0    6   49.8
> (201,Inf]    0    0 8042    0.0
>>
>>
>>
>>
>> sessionInfo()
> R Under development (unstable) (2019-03-02 r76189)
> Platform: x86_64-apple-darwin18.2.0 (64-bit)
> Running under: macOS Mojave 10.14.3
>
> Matrix products: default
> BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib
> LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel