Parallel number stream: clusterSetRNGStream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel number stream: clusterSetRNGStream

Colin Gillespie
Dear All,

Is the following expected behaviour?

set.seed(1)
library(parallel)
cl = makeCluster(5)
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# 7  4  2 10 10
clusterSetRNGStream(cl, iseed = NULL)
# 7  4  2 10 10
parSapply(cl, 1:5, function(i) sample(1:10, 1))
stopCluster(cl)

The documentation could be read either way, e.g.

 * iseed: An integer to be supplied to set.seed, or NULL not to set
reproducible seeds.

From Details

.... optionally setting the seed of the streams by set.seed(iseed)
(otherwise they are set from the current seed of the master process:
after selecting the L'Ecuyer generator).

As may be guessed, this caught me out, since I was expecting the same
behaviour as set.seed(NULL).

Thanks

Colin

----------

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Parallel number stream: clusterSetRNGStream

Henrik Bengtsson-5
Yes, I would think this behavior is intentionally, but obviously, I
don't know for sure.  Looking at the code:

> parallel::clusterSetRNGStream
function (cl = NULL, iseed = NULL)
{
    cl <- defaultCluster(cl)
    oldseed <- if (exists(".Random.seed", envir = .GlobalEnv,
        inherits = FALSE))
        get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
    else NULL
    RNGkind("L'Ecuyer-CMRG")
    if (!is.null(iseed))
        set.seed(iseed)
    nc <- length(cl)
    seeds <- vector("list", nc)
    seeds[[1L]] <- .Random.seed

You'll find that:

1. the stream of RNG seeds, originates from .Random.seed.
2a. 'iseed' is only applied if non-NULL, which changes starting .Random.seed.
2b. If iseed = NULL, then the .Random.seed is whatever it was when you
called the function

If you use iseed = NULL, then you need to forward the RNG state
(=.Random.seed) yourself.   Here's an example:

set.seed(1)
library(parallel)
cl <- parallel::makeCluster(5)

str(.Random.seed)
# int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1]  7  4  2 10 10

str(.Random.seed)
# int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1]  7  4  2 10 10

## Forward RNG state
sample.int(1)
# [1] 1

str(.Random.seed)
# int [1:626] 10403 1 1654269195 -1877109783 -961256264 1403523942 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1] 8 6 1 7 5


FYI, you see a similar behavior with parallel::mclapply():

set.seed(1)
library(parallel)
RNGkind("L'Ecuyer-CMRG")
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -1.2673735  0.9045952  1.9502072
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -1.2673735  0.9045952  1.9502072
## Forward RNG state
sample.int(1)
# [1] 1
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -0.09117479 -1.07803714  0.13924063

I can see pros and cons with this behavior, but I think the default is
risky.  For instance, it's not hard to imagine an implementation
resampling algorithm where you have to option to run it via lapply()
or via parallel::mclapply() - there is a non-zero probability that
such an implementation produces identical samples.

Proper parallel RNG can be tricky

/Henrik

On Fri, Jun 7, 2019 at 7:09 AM Colin Gillespie <[hidden email]> wrote:

>
> Dear All,
>
> Is the following expected behaviour?
>
> set.seed(1)
> library(parallel)
> cl = makeCluster(5)
> clusterSetRNGStream(cl, iseed = NULL)
> parSapply(cl, 1:5, function(i) sample(1:10, 1))
> # 7  4  2 10 10
> clusterSetRNGStream(cl, iseed = NULL)
> # 7  4  2 10 10
> parSapply(cl, 1:5, function(i) sample(1:10, 1))
> stopCluster(cl)
>
> The documentation could be read either way, e.g.
>
>  * iseed: An integer to be supplied to set.seed, or NULL not to set
> reproducible seeds.
>
> From Details
>
> .... optionally setting the seed of the streams by set.seed(iseed)
> (otherwise they are set from the current seed of the master process:
> after selecting the L'Ecuyer generator).
>
> As may be guessed, this caught me out, since I was expecting the same
> behaviour as set.seed(NULL).
>
> Thanks
>
> Colin
>
> ----------
>
> R version 3.6.0 (2019-04-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel