set.seed() in a package

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

set.seed() in a package

Marvin Wright
Hi all,

I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.

I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?

Best,

Marvin
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: set.seed() in a package

Duncan Murdoch-2
On 30/10/2019 3:28 a.m., Marvin Wright wrote:
> Hi all,
>
> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
>
> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?

set.seed() writes .Random.seed in the user's global environment, which
violates this policy:

- Packages should not modify the global environment (user’s workspace).

However, every call to a random number generator creates or modifies
.Random.seed as well, and most of those are expected and shouldn't be
flagged.  And interplot() is documented to do random simulations, so it
would be expected to change the seed:  the issue is that given the same
inputs it always changes it to the same thing.  I think that would be
quite hard for a test to detect.

Should it be a policy with no test?  Maybe, because I agree with you
that interplot()'s set.seed(324) is bad practice.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: set.seed() in a package

Peter Dalgaard-2
We commit a similar sin in the help pages, e.g.

example(set.seed) ; runif(2)
example(set.seed) ; runif(2)

gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)

You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons?

-pd


> On 30 Oct 2019, at 13:46 , Duncan Murdoch <[hidden email]> wrote:
>
> On 30/10/2019 3:28 a.m., Marvin Wright wrote:
>> Hi all,
>> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
>> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
>
> set.seed() writes .Random.seed in the user's global environment, which violates this policy:
>
> - Packages should not modify the global environment (user’s workspace).
>
> However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged.  And interplot() is documented to do random simulations, so it would be expected to change the seed:  the issue is that given the same inputs it always changes it to the same thing.  I think that would be quite hard for a test to detect.
>
> Should it be a policy with no test?  Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice.
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: set.seed() in a package

Duncan Murdoch-2
On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> We commit a similar sin in the help pages, e.g.
>
> example(set.seed) ; runif(2)
> example(set.seed) ; runif(2)
>
> gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)

I think it's pretty common in example code, and that's justifiable.  But
it could be avoided by using withr::with_seed() or something equivalent.

Duncan Murdoch

>
> You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons?
>
> -pd
>
>
>> On 30 Oct 2019, at 13:46 , Duncan Murdoch <[hidden email]> wrote:
>>
>> On 30/10/2019 3:28 a.m., Marvin Wright wrote:
>>> Hi all,
>>> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
>>> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
>>
>> set.seed() writes .Random.seed in the user's global environment, which violates this policy:
>>
>> - Packages should not modify the global environment (user’s workspace).
>>
>> However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged.  And interplot() is documented to do random simulations, so it would be expected to change the seed:  the issue is that given the same inputs it always changes it to the same thing.  I think that would be quite hard for a test to detect.
>>
>> Should it be a policy with no test?  Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: set.seed() in a package

Henrik Bengtsson-5
> On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> > You can fairly easily work around that by saving and restoring .Random.seed.

This is actually quite tedious to get correct; it requires you to
under how and when .Random.seed is set, and what are valid values on
.Random.seed.   For instance, a common mistake (me too) is to reset to
.GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
produce a warning on: ".Random.seed' is not an integer vector but of
type 'NULL', so ignored".  You end up having to do things such as:

  oseed <- .GlobalEnv$.Random.seed
  on.exit({
    if (is.null(oseed)) {
      rm(list=".Random.seed", envir= .GlobalEnv)
    } else {
      assign(".Random.seed", value=oseed, envir= .GlobalEnv)
    }
  })

to avoid that warning.  So, having support functions for this in base
R would be helpful, e.g.

  oseed <- base::getRandomSeed()
  on.exit(base::setRandomSeed(oseed))

Back to Marvin's point/question.  I think it would be useful if the
CRAN Policies would explicitly say that functions must not change the
random seed to a fixed one, without undoing it, unless the user
specifies it via an argument.  If not, there's a great risk it will
mess up statistical analysis.  Also, if there would a way to test
against such practices, which I think is really hard, I would be the
first one backing it up.

On a related note; there are packages that forward the .Random.seed
when loaded.  This is also an unfortunate behavior because you will
give different RNGs depending on that package was already loaded
before you called a function that depends on it or not.  For example,
if pkgA forwards the .Random.seed when loaded, the following is *not*
reproducible:

# User might or might not have loaded pkgA already
if (runif(1) < 0.5) loadNamespace(pkgA)

set.seed(0xBEEF)
loadNamespace(pkgA)
y <- runif(1)

I ran into this problem when doing some strict testing.  I argue this
also falls into the set of "bad" practices to be avoided.

/Henrik

On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <[hidden email]> wrote:

>
> On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> > We commit a similar sin in the help pages, e.g.
> >
> > example(set.seed) ; runif(2)
> > example(set.seed) ; runif(2)
> >
> > gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
>
> I think it's pretty common in example code, and that's justifiable.  But
> it could be avoided by using withr::with_seed() or something equivalent.
>
> Duncan Murdoch
>
> >
> > You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons?
> >
> > -pd
> >
> >
> >> On 30 Oct 2019, at 13:46 , Duncan Murdoch <[hidden email]> wrote:
> >>
> >> On 30/10/2019 3:28 a.m., Marvin Wright wrote:
> >>> Hi all,
> >>> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
> >>> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
> >>
> >> set.seed() writes .Random.seed in the user's global environment, which violates this policy:
> >>
> >> - Packages should not modify the global environment (user’s workspace).
> >>
> >> However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged.  And interplot() is documented to do random simulations, so it would be expected to change the seed:  the issue is that given the same inputs it always changes it to the same thing.  I think that would be quite hard for a test to detect.
> >>
> >> Should it be a policy with no test?  Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice.
> >>
> >> Duncan Murdoch
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: set.seed() in a package

Henrik Bengtsson-5
Forgot to say: For,

  oseed <- base::getRandomSeed()
  on.exit(base::setRandomSeed(oseed))

one could upgrade set.seed() to take this role, e.g.

  oseed <- set.seed(0xBEEF)
  on.exit(set.seed(oseed))

Current, set.seed() always return NULL.

BTW, and my memory might be bad, I think I mentioned this in the past
but was told that you cannot reset the RNG state for all types of RNG
kinds.  That might complicate things, but on the other hand, that
could be checked for at run-time by the above functions.

/Henrik

On Wed, Oct 30, 2019 at 9:50 AM Henrik Bengtsson
<[hidden email]> wrote:

>
> > On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> > > You can fairly easily work around that by saving and restoring .Random.seed.
>
> This is actually quite tedious to get correct; it requires you to
> under how and when .Random.seed is set, and what are valid values on
> .Random.seed.   For instance, a common mistake (me too) is to reset to
> .GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
> produce a warning on: ".Random.seed' is not an integer vector but of
> type 'NULL', so ignored".  You end up having to do things such as:
>
>   oseed <- .GlobalEnv$.Random.seed
>   on.exit({
>     if (is.null(oseed)) {
>       rm(list=".Random.seed", envir= .GlobalEnv)
>     } else {
>       assign(".Random.seed", value=oseed, envir= .GlobalEnv)
>     }
>   })
>
> to avoid that warning.  So, having support functions for this in base
> R would be helpful, e.g.
>
>   oseed <- base::getRandomSeed()
>   on.exit(base::setRandomSeed(oseed))
>
> Back to Marvin's point/question.  I think it would be useful if the
> CRAN Policies would explicitly say that functions must not change the
> random seed to a fixed one, without undoing it, unless the user
> specifies it via an argument.  If not, there's a great risk it will
> mess up statistical analysis.  Also, if there would a way to test
> against such practices, which I think is really hard, I would be the
> first one backing it up.
>
> On a related note; there are packages that forward the .Random.seed
> when loaded.  This is also an unfortunate behavior because you will
> give different RNGs depending on that package was already loaded
> before you called a function that depends on it or not.  For example,
> if pkgA forwards the .Random.seed when loaded, the following is *not*
> reproducible:
>
> # User might or might not have loaded pkgA already
> if (runif(1) < 0.5) loadNamespace(pkgA)
>
> set.seed(0xBEEF)
> loadNamespace(pkgA)
> y <- runif(1)
>
> I ran into this problem when doing some strict testing.  I argue this
> also falls into the set of "bad" practices to be avoided.
>
> /Henrik
>
> On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <[hidden email]> wrote:
> >
> > On 30/10/2019 9:08 a.m., peter dalgaard wrote:
> > > We commit a similar sin in the help pages, e.g.
> > >
> > > example(set.seed) ; runif(2)
> > > example(set.seed) ; runif(2)
> > >
> > > gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
> >
> > I think it's pretty common in example code, and that's justifiable.  But
> > it could be avoided by using withr::with_seed() or something equivalent.
> >
> > Duncan Murdoch
> >
> > >
> > > You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons?
> > >
> > > -pd
> > >
> > >
> > >> On 30 Oct 2019, at 13:46 , Duncan Murdoch <[hidden email]> wrote:
> > >>
> > >> On 30/10/2019 3:28 a.m., Marvin Wright wrote:
> > >>> Hi all,
> > >>> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
> > >>> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
> > >>
> > >> set.seed() writes .Random.seed in the user's global environment, which violates this policy:
> > >>
> > >> - Packages should not modify the global environment (user’s workspace).
> > >>
> > >> However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged.  And interplot() is documented to do random simulations, so it would be expected to change the seed:  the issue is that given the same inputs it always changes it to the same thing.  I think that would be quite hard for a test to detect.
> > >>
> > >> Should it be a policy with no test?  Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice.
> > >>
> > >> Duncan Murdoch
> > >>
> > >> ______________________________________________
> > >> [hidden email] mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: set.seed() in a package

Marvin Wright
Thanks for your answers. I agree with Duncan and Henrik that we should have such a policy, even if we cannot test it.

> On 30. Oct 2019, at 17:59, Henrik Bengtsson <[hidden email]> wrote:
>
> Forgot to say: For,
>
>  oseed <- base::getRandomSeed()
>  on.exit(base::setRandomSeed(oseed))
>
> one could upgrade set.seed() to take this role, e.g.
>
>  oseed <- set.seed(0xBEEF)
>  on.exit(set.seed(oseed))
>
> Current, set.seed() always return NULL.
>
> BTW, and my memory might be bad, I think I mentioned this in the past
> but was told that you cannot reset the RNG state for all types of RNG
> kinds.  That might complicate things, but on the other hand, that
> could be checked for at run-time by the above functions.
>
> /Henrik
>
> On Wed, Oct 30, 2019 at 9:50 AM Henrik Bengtsson
> <[hidden email]> wrote:
>>
>>> On 30/10/2019 9:08 a.m., peter dalgaard wrote:
>>>> You can fairly easily work around that by saving and restoring .Random.seed.
>>
>> This is actually quite tedious to get correct; it requires you to
>> under how and when .Random.seed is set, and what are valid values on
>> .Random.seed.   For instance, a common mistake (me too) is to reset to
>> .GlobalEnv$.Random.seed <- NULL in a fresh R session but this will
>> produce a warning on: ".Random.seed' is not an integer vector but of
>> type 'NULL', so ignored".  You end up having to do things such as:
>>
>>  oseed <- .GlobalEnv$.Random.seed
>>  on.exit({
>>    if (is.null(oseed)) {
>>      rm(list=".Random.seed", envir= .GlobalEnv)
>>    } else {
>>      assign(".Random.seed", value=oseed, envir= .GlobalEnv)
>>    }
>>  })
>>
>> to avoid that warning.  So, having support functions for this in base
>> R would be helpful, e.g.
>>
>>  oseed <- base::getRandomSeed()
>>  on.exit(base::setRandomSeed(oseed))
>>
>> Back to Marvin's point/question.  I think it would be useful if the
>> CRAN Policies would explicitly say that functions must not change the
>> random seed to a fixed one, without undoing it, unless the user
>> specifies it via an argument.  If not, there's a great risk it will
>> mess up statistical analysis.  Also, if there would a way to test
>> against such practices, which I think is really hard, I would be the
>> first one backing it up.
>>
>> On a related note; there are packages that forward the .Random.seed
>> when loaded.  This is also an unfortunate behavior because you will
>> give different RNGs depending on that package was already loaded
>> before you called a function that depends on it or not.  For example,
>> if pkgA forwards the .Random.seed when loaded, the following is *not*
>> reproducible:
>>
>> # User might or might not have loaded pkgA already
>> if (runif(1) < 0.5) loadNamespace(pkgA)
>>
>> set.seed(0xBEEF)
>> loadNamespace(pkgA)
>> y <- runif(1)
>>
>> I ran into this problem when doing some strict testing.  I argue this
>> also falls into the set of "bad" practices to be avoided.
>>
>> /Henrik
>>
>> On Wed, Oct 30, 2019 at 6:23 AM Duncan Murdoch <[hidden email]> wrote:
>>>
>>> On 30/10/2019 9:08 a.m., peter dalgaard wrote:
>>>> We commit a similar sin in the help pages, e.g.
>>>>
>>>> example(set.seed) ; runif(2)
>>>> example(set.seed) ; runif(2)
>>>>
>>>> gives you the same random uniforms both times. (Of course it isn't that much of an issue, since you would rarely be running examples before any serious simulations.)
>>>
>>> I think it's pretty common in example code, and that's justifiable.  But
>>> it could be avoided by using withr::with_seed() or something equivalent.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> You can fairly easily work around that by saving and restoring .Random.seed. I wonder if that isn't also true of the cases using set.seed() for other reasons?
>>>>
>>>> -pd
>>>>
>>>>
>>>>> On 30 Oct 2019, at 13:46 , Duncan Murdoch <[hidden email]> wrote:
>>>>>
>>>>> On 30/10/2019 3:28 a.m., Marvin Wright wrote:
>>>>>> Hi all,
>>>>>> I recently found several calls of set.seed() in a CRAN package. These calls are in a plot function, which could lead to unexpected behaviour. See https://github.com/sammo3182/interplot/issues/33 <https://github.com/sammo3182/interplot/issues/33> for a description of the problem.
>>>>>> I checked the CRAN repository policies and could not find anything about this. I would have expected a policy against setting fixed seeds somewhere in a package. Am I missing something?
>>>>>
>>>>> set.seed() writes .Random.seed in the user's global environment, which violates this policy:
>>>>>
>>>>> - Packages should not modify the global environment (user’s workspace).
>>>>>
>>>>> However, every call to a random number generator creates or modifies .Random.seed as well, and most of those are expected and shouldn't be flagged.  And interplot() is documented to do random simulations, so it would be expected to change the seed:  the issue is that given the same inputs it always changes it to the same thing.  I think that would be quite hard for a test to detect.
>>>>>
>>>>> Should it be a policy with no test?  Maybe, because I agree with you that interplot()'s set.seed(324) is bad practice.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel