Seeding non-R RNG with numbers from R's RNG stream

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Seeding non-R RNG with numbers from R's RNG stream

Tommy Jones
Hi,

I am constructing a function that does sampling in C++ using a non-R RNG
stream for thread safety reasons. This C++ function is wrapped by an R
function, which is user facing. The R wrapper does some sampling itself to
initialize some variables before passing them off to C++. So that my users
do not have to manage two mechanisms to set random seeds, I've constructed
a solution (shown below) that allows both RNGs to be seeded with set.seed
and respond to the state of R's RNG stream.

I believe the below works. However, I am hoping to get feedback from more
experienced useRs as to whether or not the below approach is unsafe in ways
that may affect reproducibility, modify global variables in bad ways, or
have other unintended consequences I have not anticipated.

Could I trouble one or more folks on this list to weigh in on the safety
(or perceived wisdom) of using R's internal RNG stream to seed an RNG
external to R? Many thanks in advance.

This relates to a Stackoverflow question here:
https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code

Pseudocode of a trivial facsimile of my current approach is below.

--Tommy

sample_wrapper <- function() {
  # initialize a variable to pass to C++
  init_var <- runif(1)

  # get current state of RNG stream
  # first entry of .Random.seed is an integer representing the algorithm used
  # second entry is current position in RNG stream
  # subsequent entries are pseudorandom numbers
  seed_pos <- .Random.seed[2]

  seed <- .Random.seed[seed_pos + 2]

  out <- sample_cpp(init_var = init_var, seed = seed)

  # move R's position in the RNG stream forward by 1 with a throw away sample
  runif(1)

  # return the output
  out}

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Duncan Murdoch-2
I wouldn't trust the C++ generator to be as good if you seed it this way
as if you just seeded it once with your phone number (or any other fixed
value) and let it run, because it's probably never been tested to be
good when run this way.  Is it good enough for the way you plan to use
it?  Maybe.

Duncan Murdoch

On 30/07/2020 3:05 p.m., Tommy Jones wrote:

> Hi,
>
> I am constructing a function that does sampling in C++ using a non-R RNG
> stream for thread safety reasons. This C++ function is wrapped by an R
> function, which is user facing. The R wrapper does some sampling itself to
> initialize some variables before passing them off to C++. So that my users
> do not have to manage two mechanisms to set random seeds, I've constructed
> a solution (shown below) that allows both RNGs to be seeded with set.seed
> and respond to the state of R's RNG stream.
>
> I believe the below works. However, I am hoping to get feedback from more
> experienced useRs as to whether or not the below approach is unsafe in ways
> that may affect reproducibility, modify global variables in bad ways, or
> have other unintended consequences I have not anticipated.
>
> Could I trouble one or more folks on this list to weigh in on the safety
> (or perceived wisdom) of using R's internal RNG stream to seed an RNG
> external to R? Many thanks in advance.
>
> This relates to a Stackoverflow question here:
> https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code
>
> Pseudocode of a trivial facsimile of my current approach is below.
>
> --Tommy
>
> sample_wrapper <- function() {
>    # initialize a variable to pass to C++
>    init_var <- runif(1)
>
>    # get current state of RNG stream
>    # first entry of .Random.seed is an integer representing the algorithm used
>    # second entry is current position in RNG stream
>    # subsequent entries are pseudorandom numbers
>    seed_pos <- .Random.seed[2]
>
>    seed <- .Random.seed[seed_pos + 2]
>
>    out <- sample_cpp(init_var = init_var, seed = seed)
>
>    # move R's position in the RNG stream forward by 1 with a throw away sample
>    runif(1)
>
>    # return the output
>    out}
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Tommy Jones
Thank you for this. I'd like to be sure I understand the
intuition correctly. Is the following true from what you said?

I can just fix the seed at the C++ level and the results will still be
(pseudo) random because the initialization at the R level is (pseudo)
random.

On Thu, Jul 30, 2020 at 3:36 PM Duncan Murdoch <[hidden email]>
wrote:

> I wouldn't trust the C++ generator to be as good if you seed it this way
> as if you just seeded it once with your phone number (or any other fixed
> value) and let it run, because it's probably never been tested to be
> good when run this way.  Is it good enough for the way you plan to use
> it?  Maybe.
>
> Duncan Murdoch
>
> On 30/07/2020 3:05 p.m., Tommy Jones wrote:
> > Hi,
> >
> > I am constructing a function that does sampling in C++ using a non-R RNG
> > stream for thread safety reasons. This C++ function is wrapped by an R
> > function, which is user facing. The R wrapper does some sampling itself
> to
> > initialize some variables before passing them off to C++. So that my
> users
> > do not have to manage two mechanisms to set random seeds, I've
> constructed
> > a solution (shown below) that allows both RNGs to be seeded with set.seed
> > and respond to the state of R's RNG stream.
> >
> > I believe the below works. However, I am hoping to get feedback from more
> > experienced useRs as to whether or not the below approach is unsafe in
> ways
> > that may affect reproducibility, modify global variables in bad ways, or
> > have other unintended consequences I have not anticipated.
> >
> > Could I trouble one or more folks on this list to weigh in on the safety
> > (or perceived wisdom) of using R's internal RNG stream to seed an RNG
> > external to R? Many thanks in advance.
> >
> > This relates to a Stackoverflow question here:
> >
> https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code
> >
> > Pseudocode of a trivial facsimile of my current approach is below.
> >
> > --Tommy
> >
> > sample_wrapper <- function() {
> >    # initialize a variable to pass to C++
> >    init_var <- runif(1)
> >
> >    # get current state of RNG stream
> >    # first entry of .Random.seed is an integer representing the
> algorithm used
> >    # second entry is current position in RNG stream
> >    # subsequent entries are pseudorandom numbers
> >    seed_pos <- .Random.seed[2]
> >
> >    seed <- .Random.seed[seed_pos + 2]
> >
> >    out <- sample_cpp(init_var = init_var, seed = seed)
> >
> >    # move R's position in the RNG stream forward by 1 with a throw away
> sample
> >    runif(1)
> >
> >    # return the output
> >    out}
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Gabriel Becker-2
Tommy,

I'm not Duncan (and am not nor claim to be an RNG expert) but I believe RNG
streams are designed and thus tested, to be used as streams. Repeatedly
setting the seed after small numbers of samples from them does not fit the
designed usecase (And also doesn't match the test criteria by which they
are evaluated/validated, which is what I believe Duncan was saying).

(Anything Duncan or another RNG expert says that contradicts the above
should be taken as correct instead of what I Said).

Best,
~G

On Thu, Jul 30, 2020 at 1:30 PM Tommy Jones <[hidden email]> wrote:

> Thank you for this. I'd like to be sure I understand the
> intuition correctly. Is the following true from what you said?
>
> I can just fix the seed at the C++ level and the results will still be
> (pseudo) random because the initialization at the R level is (pseudo)
> random.
>
> On Thu, Jul 30, 2020 at 3:36 PM Duncan Murdoch <[hidden email]>
> wrote:
>
> > I wouldn't trust the C++ generator to be as good if you seed it this way
> > as if you just seeded it once with your phone number (or any other fixed
> > value) and let it run, because it's probably never been tested to be
> > good when run this way.  Is it good enough for the way you plan to use
> > it?  Maybe.
> >
> > Duncan Murdoch
> >
> > On 30/07/2020 3:05 p.m., Tommy Jones wrote:
> > > Hi,
> > >
> > > I am constructing a function that does sampling in C++ using a non-R
> RNG
> > > stream for thread safety reasons. This C++ function is wrapped by an R
> > > function, which is user facing. The R wrapper does some sampling itself
> > to
> > > initialize some variables before passing them off to C++. So that my
> > users
> > > do not have to manage two mechanisms to set random seeds, I've
> > constructed
> > > a solution (shown below) that allows both RNGs to be seeded with
> set.seed
> > > and respond to the state of R's RNG stream.
> > >
> > > I believe the below works. However, I am hoping to get feedback from
> more
> > > experienced useRs as to whether or not the below approach is unsafe in
> > ways
> > > that may affect reproducibility, modify global variables in bad ways,
> or
> > > have other unintended consequences I have not anticipated.
> > >
> > > Could I trouble one or more folks on this list to weigh in on the
> safety
> > > (or perceived wisdom) of using R's internal RNG stream to seed an RNG
> > > external to R? Many thanks in advance.
> > >
> > > This relates to a Stackoverflow question here:
> > >
> >
> https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code
> > >
> > > Pseudocode of a trivial facsimile of my current approach is below.
> > >
> > > --Tommy
> > >
> > > sample_wrapper <- function() {
> > >    # initialize a variable to pass to C++
> > >    init_var <- runif(1)
> > >
> > >    # get current state of RNG stream
> > >    # first entry of .Random.seed is an integer representing the
> > algorithm used
> > >    # second entry is current position in RNG stream
> > >    # subsequent entries are pseudorandom numbers
> > >    seed_pos <- .Random.seed[2]
> > >
> > >    seed <- .Random.seed[seed_pos + 2]
> > >
> > >    out <- sample_cpp(init_var = init_var, seed = seed)
> > >
> > >    # move R's position in the RNG stream forward by 1 with a throw away
> > sample
> > >    runif(1)
> > >
> > >    # return the output
> > >    out}
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Duncan Murdoch-2
In reply to this post by Tommy Jones
On 30/07/2020 4:30 p.m., Tommy Jones wrote:
> Thank you for this. I'd like to be sure I understand the
> intuition correctly. Is the following true from what you said?
>
> I can just fix the seed at the C++ level and the results will still be
> (pseudo) random because the initialization at the R level is (pseudo)
> random.

No, that's not quite right.  Let me try again:

You can fix the seed at the C++ level and the results will be
pseudo-random because you have chosen to use a good pseudo-random
generator.

  - R has nothing to do with it.
  - If you haven't actually chosen a good generator, then seeding from R
won't necessarily help.
  - If you re-seed too frequently, you might break even a good generator.

For an example of the latter:  consider re-seeding with the current time
(to the nearest second) with every draw.  If you draw more than once per
second, you'll get exact repeats.

The scheme you chose won't be so obviously wrong, but there could still
be interactions between the R generator and the C++ generator.  For
example, maybe the C++ generator is based on a similar algorithm to the
R generator.  If you re-seed it every tenth draw, and only draw one
value from R, it might happen that you effectively take 9 steps back
with each re-seeding, so again you'll get exact repeats.

The real effect, if there is one, is likely to be much more subtle and
hard to detect.  In fact, it might be so hard to detect that there
really isn't a problem!  The practical issue is that by effectively
inventing your own algorithm, you can't rely on the accumulated
experience of everyone else to know whether the generator is good.

Duncan Murdoch


>
> On Thu, Jul 30, 2020 at 3:36 PM Duncan Murdoch <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I wouldn't trust the C++ generator to be as good if you seed it this
>     way
>     as if you just seeded it once with your phone number (or any other
>     fixed
>     value) and let it run, because it's probably never been tested to be
>     good when run this way.  Is it good enough for the way you plan to use
>     it?  Maybe.
>
>     Duncan Murdoch
>
>     On 30/07/2020 3:05 p.m., Tommy Jones wrote:
>      > Hi,
>      >
>      > I am constructing a function that does sampling in C++ using a
>     non-R RNG
>      > stream for thread safety reasons. This C++ function is wrapped by
>     an R
>      > function, which is user facing. The R wrapper does some sampling
>     itself to
>      > initialize some variables before passing them off to C++. So that
>     my users
>      > do not have to manage two mechanisms to set random seeds, I've
>     constructed
>      > a solution (shown below) that allows both RNGs to be seeded with
>     set.seed
>      > and respond to the state of R's RNG stream.
>      >
>      > I believe the below works. However, I am hoping to get feedback
>     from more
>      > experienced useRs as to whether or not the below approach is
>     unsafe in ways
>      > that may affect reproducibility, modify global variables in bad
>     ways, or
>      > have other unintended consequences I have not anticipated.
>      >
>      > Could I trouble one or more folks on this list to weigh in on the
>     safety
>      > (or perceived wisdom) of using R's internal RNG stream to seed an RNG
>      > external to R? Many thanks in advance.
>      >
>      > This relates to a Stackoverflow question here:
>      >
>     https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code
>      >
>      > Pseudocode of a trivial facsimile of my current approach is below.
>      >
>      > --Tommy
>      >
>      > sample_wrapper <- function() {
>      >    # initialize a variable to pass to C++
>      >    init_var <- runif(1)
>      >
>      >    # get current state of RNG stream
>      >    # first entry of .Random.seed is an integer representing the
>     algorithm used
>      >    # second entry is current position in RNG stream
>      >    # subsequent entries are pseudorandom numbers
>      >    seed_pos <- .Random.seed[2]
>      >
>      >    seed <- .Random.seed[seed_pos + 2]
>      >
>      >    out <- sample_cpp(init_var = init_var, seed = seed)
>      >
>      >    # move R's position in the RNG stream forward by 1 with a
>     throw away sample
>      >    runif(1)
>      >
>      >    # return the output
>      >    out}
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > [hidden email] <mailto:[hidden email]> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>      >
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Tommy Jones
Thank you Duncan and Gabriel.

I think that my trivial example was a little too trivial and is causing
some confusion. What's happening in the real function I'm writing is...

1. In R: Draw tens-of-thousands of times from a handful to Gamma RVs with
different parameters to initialize some variables. (Technically, I'm
calling gtools::rdirichlet which calls stats::rgamma)
2. Transfer the initialized variables to a function in C++
3. In C++: Draw millions of times from a Categorical(p) distribution, where
"p" is recalculated after each draw based on the current state of the RVs
in my system. (The heart of this is actually a Uniform(0,1) from the
 Xoshiro256+ generator as provided in the dqrng package.)
4. In R: post-process the results from the transformed space back to the
space of the parameters I'm estimating.
5. Still in R: call stats::runif to change the position in R's RNG stream
so that if the user calls the function 2 times in a row without setting the
seed, they'll still get pseudorandom results by providing the C++ RNG with
a different seed.

So, a single call to the user-facing function results in many many draws
from both RNG streams.

The true "problem" spawning my question is that I'd like my users to be
able to reproduce their results and calling set.seed() once seems more
"user friendly" than having them control two seeds, one with set.seed and
one with a seed argument. But I acknowledge that having the user have to
set both is the "safest" option.

My instinct is that the effects of this are so subtle as to not really be a
problem as you suggest, Duncan. But I am now thinking I'll need to
explicitly run some experiments to validate that.

I'm 100% in agreement about not reinventing the wheel, but instead relying
on the accumulated experience of the folks that are writing these RNGs.

Knowing more about the bigger use, does this still strike you as obviously
problematic?

Best,
Tommy

On Thu, Jul 30, 2020 at 4:49 PM Duncan Murdoch <[hidden email]>
wrote:

> On 30/07/2020 4:30 p.m., Tommy Jones wrote:
> > Thank you for this. I'd like to be sure I understand the
> > intuition correctly. Is the following true from what you said?
> >
> > I can just fix the seed at the C++ level and the results will still be
> > (pseudo) random because the initialization at the R level is (pseudo)
> > random.
>
> No, that's not quite right.  Let me try again:
>
> You can fix the seed at the C++ level and the results will be
> pseudo-random because you have chosen to use a good pseudo-random
> generator.
>
>   - R has nothing to do with it.
>   - If you haven't actually chosen a good generator, then seeding from R
> won't necessarily help.
>   - If you re-seed too frequently, you might break even a good generator.
>
> For an example of the latter:  consider re-seeding with the current time
> (to the nearest second) with every draw.  If you draw more than once per
> second, you'll get exact repeats.
>
> The scheme you chose won't be so obviously wrong, but there could still
> be interactions between the R generator and the C++ generator.  For
> example, maybe the C++ generator is based on a similar algorithm to the
> R generator.  If you re-seed it every tenth draw, and only draw one
> value from R, it might happen that you effectively take 9 steps back
> with each re-seeding, so again you'll get exact repeats.
>
> The real effect, if there is one, is likely to be much more subtle and
> hard to detect.  In fact, it might be so hard to detect that there
> really isn't a problem!  The practical issue is that by effectively
> inventing your own algorithm, you can't rely on the accumulated
> experience of everyone else to know whether the generator is good.
>
> Duncan Murdoch
>
>
> >
> > On Thu, Jul 30, 2020 at 3:36 PM Duncan Murdoch <[hidden email]
> > <mailto:[hidden email]>> wrote:
> >
> >     I wouldn't trust the C++ generator to be as good if you seed it this
> >     way
> >     as if you just seeded it once with your phone number (or any other
> >     fixed
> >     value) and let it run, because it's probably never been tested to be
> >     good when run this way.  Is it good enough for the way you plan to
> use
> >     it?  Maybe.
> >
> >     Duncan Murdoch
> >
> >     On 30/07/2020 3:05 p.m., Tommy Jones wrote:
> >      > Hi,
> >      >
> >      > I am constructing a function that does sampling in C++ using a
> >     non-R RNG
> >      > stream for thread safety reasons. This C++ function is wrapped by
> >     an R
> >      > function, which is user facing. The R wrapper does some sampling
> >     itself to
> >      > initialize some variables before passing them off to C++. So that
> >     my users
> >      > do not have to manage two mechanisms to set random seeds, I've
> >     constructed
> >      > a solution (shown below) that allows both RNGs to be seeded with
> >     set.seed
> >      > and respond to the state of R's RNG stream.
> >      >
> >      > I believe the below works. However, I am hoping to get feedback
> >     from more
> >      > experienced useRs as to whether or not the below approach is
> >     unsafe in ways
> >      > that may affect reproducibility, modify global variables in bad
> >     ways, or
> >      > have other unintended consequences I have not anticipated.
> >      >
> >      > Could I trouble one or more folks on this list to weigh in on the
> >     safety
> >      > (or perceived wisdom) of using R's internal RNG stream to seed an
> RNG
> >      > external to R? Many thanks in advance.
> >      >
> >      > This relates to a Stackoverflow question here:
> >      >
> >
> https://stackoverflow.com/questions/63165955/is-there-a-best-practice-for-using-non-r-rngs-in-rcpp-code
> >      >
> >      > Pseudocode of a trivial facsimile of my current approach is below.
> >      >
> >      > --Tommy
> >      >
> >      > sample_wrapper <- function() {
> >      >    # initialize a variable to pass to C++
> >      >    init_var <- runif(1)
> >      >
> >      >    # get current state of RNG stream
> >      >    # first entry of .Random.seed is an integer representing the
> >     algorithm used
> >      >    # second entry is current position in RNG stream
> >      >    # subsequent entries are pseudorandom numbers
> >      >    seed_pos <- .Random.seed[2]
> >      >
> >      >    seed <- .Random.seed[seed_pos + 2]
> >      >
> >      >    out <- sample_cpp(init_var = init_var, seed = seed)
> >      >
> >      >    # move R's position in the RNG stream forward by 1 with a
> >     throw away sample
> >      >    runif(1)
> >      >
> >      >    # return the output
> >      >    out}
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > ______________________________________________
> >      > [hidden email] <mailto:[hidden email]> mailing list
> >      > https://stat.ethz.ch/mailman/listinfo/r-devel
> >      >
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Abby Spurdle
> 3. In C++: Draw millions of times from a Categorical(p) distribution, where
> "p" is recalculated after each draw

I don't see the need here.
It should be possible to generate all the random numbers , *in R*, and
in *one line* of R code.
Easy...

Then standard inversion sampling, can be used to transform the random
numbers, as necessary.
This may (?) benefit from a C/C++ implementation, but that can be kept
separate from the random number generation.
i.e. The C++ function takes a vector of random numbers from a uniform
distribution, then computes "draws" (from the desired distribution),
iteratively.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seeding non-R RNG with numbers from R's RNG stream

Tommy Jones
Abby, that is a fantastic suggestion! It seems obvious now that you've said
it. Why didn't I think of that?

Thank you,
Tommy

On Fri, Jul 31, 2020 at 12:01 AM Abby Spurdle <[hidden email]> wrote:

> > 3. In C++: Draw millions of times from a Categorical(p) distribution,
> where
> > "p" is recalculated after each draw
>
> I don't see the need here.
> It should be possible to generate all the random numbers , *in R*, and
> in *one line* of R code.
> Easy...
>
> Then standard inversion sampling, can be used to transform the random
> numbers, as necessary.
> This may (?) benefit from a C/C++ implementation, but that can be kept
> separate from the random number generation.
> i.e. The C++ function takes a vector of random numbers from a uniform
> distribution, then computes "draws" (from the desired distribution),
> iteratively.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel