A bug in the R Mersenne Twister (RNG) code?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

A bug in the R Mersenne Twister (RNG) code?

Mark Roberts
Whomever,

I recently sent the "bug report" below [hidden email] and have
just been asked to instead submit it to you.

Although I am basically not an R user, I have installed version 3.3.1
and am also the author of a statistics program written in Visual Basic
that contains a component which correctly implements the Mersenne
Twister (MT) algorithm.  I believe that it is not possible to generate
the correct stream of pseudorandom numbers using the MT default random
number generator in R, and am not the first person to notice this.  Here
is a posted 2013 entry
(www.r-bloggers.com/reproducibility-and-randomness/) on an R website
that asserts that the SAS computer program implementation of the MT
algorithm produces different numbers than R does when using the same
starting seed number.  The author of this post didn’t get anyone to
respond to his query about the reason for this SAS vs. R discrepancy.

There are two ways of initializing the original MT computer program
(written in C) so that an identical stream of numbers can be repeatedly
generated:  1) with a particular integer seed number, and 2) with a
particular array of integers.   In the 'compilation and usage' section
of this webpage (https://github.com/cslarsen/mersenne-twister) there is
a listing of the first 200 random numbers the MT algorithm should
produce for seed number = 1.  The inventors of the Mersenne Twister
random number generator provided two different sets of the first 1000
numbers produced by a correctly coded 32-bit implementation of the MT
algorithm when initializing it with a particular array of integers at:
www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
[There is a link to this output at:
www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]

My statistics program obtains exactly those 200 numbers from the first
site mentioned in the previous paragraph and also obtains those same
numbers from the second website (though I didn't check all 2000 values).
   Assuming that the MT code within R uses the 32-bit MT algorithm, I
suspect that the current version of R can't do that.  If you (i.e.,
anyone who might knowledgeably respond to this report) is able to
duplicate those reference test-values, then please send me the R code to
initialize the MT code within R to successfully do that, and I apologize
for having wasted your time. If you (collectively) can't do that, then R
is very likely using incorrectly implemented MT code.  And if this
latter possibility is true, it seems to me that this is something that
should be fixed.

Mark Roberts, Ph.D.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

R devel mailing list
Try comparing the streams for when the 625-integer versions of the seeds
are identical.  (R's seed is 626 integers: omit the first value, which
indicates which random number generator the seed is for.).  I find the the
MKL Mersenne Twister results match R's (with occassional differences in the
last bit) when the 625-integer seeds the same.

I believe R fiddles with the single-integer seed to spread it out a bit.
S's seed was taken modulo 1024 so old users tended not use use single-seeds
bigger than 1023.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Aug 30, 2016 at 2:45 PM, Mark Roberts <[hidden email]> wrote:

> Whomever,
>
> I recently sent the "bug report" below [hidden email] and have
> just been asked to instead submit it to you.
>
> Although I am basically not an R user, I have installed version 3.3.1
> and am also the author of a statistics program written in Visual Basic
> that contains a component which correctly implements the Mersenne
> Twister (MT) algorithm.  I believe that it is not possible to generate
> the correct stream of pseudorandom numbers using the MT default random
> number generator in R, and am not the first person to notice this.  Here
> is a posted 2013 entry
> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website
> that asserts that the SAS computer program implementation of the MT
> algorithm produces different numbers than R does when using the same
> starting seed number.  The author of this post didn’t get anyone to
> respond to his query about the reason for this SAS vs. R discrepancy.
>
> There are two ways of initializing the original MT computer program
> (written in C) so that an identical stream of numbers can be repeatedly
> generated:  1) with a particular integer seed number, and 2) with a
> particular array of integers.   In the 'compilation and usage' section
> of this webpage (https://github.com/cslarsen/mersenne-twister) there is
> a listing of the first 200 random numbers the MT algorithm should
> produce for seed number = 1.  The inventors of the Mersenne Twister
> random number generator provided two different sets of the first 1000
> numbers produced by a correctly coded 32-bit implementation of the MT
> algorithm when initializing it with a particular array of integers at:
> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
> [There is a link to this output at:
> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
>
> My statistics program obtains exactly those 200 numbers from the first
> site mentioned in the previous paragraph and also obtains those same
> numbers from the second website (though I didn't check all 2000 values).
>    Assuming that the MT code within R uses the 32-bit MT algorithm, I
> suspect that the current version of R can't do that.  If you (i.e.,
> anyone who might knowledgeably respond to this report) is able to
> duplicate those reference test-values, then please send me the R code to
> initialize the MT code within R to successfully do that, and I apologize
> for having wasted your time. If you (collectively) can't do that, then R
> is very likely using incorrectly implemented MT code.  And if this
> latter possibility is true, it seems to me that this is something that
> should be fixed.
>
> Mark Roberts, Ph.D.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

Duncan Murdoch-2
In reply to this post by Mark Roberts
I don't see evidence of a bug.  There have been several versions of the
MT; we may be using a different version than you are.  Ours is the
1999/10/28 version; the web page you cite uses one from 2002.

Perhaps the newer version fixes some problems, and then it would be
worth considering a change.  But changing the default RNG definitely
introduces problems in reproducibility, so it's not obvious that we
would do it.

Duncan Murdoch


On 30/08/2016 5:45 PM, Mark Roberts wrote:

> Whomever,
>
> I recently sent the "bug report" below [hidden email] and have
> just been asked to instead submit it to you.
>
> Although I am basically not an R user, I have installed version 3.3.1
> and am also the author of a statistics program written in Visual Basic
> that contains a component which correctly implements the Mersenne
> Twister (MT) algorithm.  I believe that it is not possible to generate
> the correct stream of pseudorandom numbers using the MT default random
> number generator in R, and am not the first person to notice this.  Here
> is a posted 2013 entry
> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website
> that asserts that the SAS computer program implementation of the MT
> algorithm produces different numbers than R does when using the same
> starting seed number.  The author of this post didn’t get anyone to
> respond to his query about the reason for this SAS vs. R discrepancy.
>
> There are two ways of initializing the original MT computer program
> (written in C) so that an identical stream of numbers can be repeatedly
> generated:  1) with a particular integer seed number, and 2) with a
> particular array of integers.   In the 'compilation and usage' section
> of this webpage (https://github.com/cslarsen/mersenne-twister) there is
> a listing of the first 200 random numbers the MT algorithm should
> produce for seed number = 1.  The inventors of the Mersenne Twister
> random number generator provided two different sets of the first 1000
> numbers produced by a correctly coded 32-bit implementation of the MT
> algorithm when initializing it with a particular array of integers at:
> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
> [There is a link to this output at:
> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
>
> My statistics program obtains exactly those 200 numbers from the first
> site mentioned in the previous paragraph and also obtains those same
> numbers from the second website (though I didn't check all 2000 values).
>    Assuming that the MT code within R uses the 32-bit MT algorithm, I
> suspect that the current version of R can't do that.  If you (i.e.,
> anyone who might knowledgeably respond to this report) is able to
> duplicate those reference test-values, then please send me the R code to
> initialize the MT code within R to successfully do that, and I apologize
> for having wasted your time. If you (collectively) can't do that, then R
> is very likely using incorrectly implemented MT code.  And if this
> latter possibility is true, it seems to me that this is something that
> should be fixed.
>
> Mark Roberts, Ph.D.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

Dirk Eddelbuettel

On 30 August 2016 at 18:29, Duncan Murdoch wrote:
| I don't see evidence of a bug.  There have been several versions of the
| MT; we may be using a different version than you are.  Ours is the
| 1999/10/28 version; the web page you cite uses one from 2002.
|
| Perhaps the newer version fixes some problems, and then it would be
| worth considering a change.  But changing the default RNG definitely
| introduces problems in reproducibility, so it's not obvious that we
| would do it.

Yep. FWIW the GNU GSL adopted the 2002 version a while ago too. Quoting from
https://www.gnu.org/software/gsl/manual/html_node/Random-number-generator-algorithms.html

Generator: gsl_rng_mt19937

   The MT19937 generator of Makoto Matsumoto and Takuji Nishimura is a
   variant of the twisted generalized feedback shift-register algorithm, and
   is known as the “Mersenne Twister” generator. It has a Mersenne prime
   period of 2^19937 - 1 (about 10^6000) and is equi-distributed in 623
   dimensions. It has passed the DIEHARD statistical tests. It uses 624 words
   of state per generator and is comparable in speed to the other
   generators. The original generator used a default seed of 4357 and
   choosing s equal to zero in gsl_rng_set reproduces this. Later versions
   switched to 5489 as the default seed, you can choose this explicitly via
   gsl_rng_set instead if you require it.

   For more information see,

      Makoto Matsumoto and Takuji Nishimura, “Mersenne Twister: A
      623-dimensionally equidistributed uniform pseudorandom number
      generator”. ACM Transactions on Modeling and Computer Simulation,
      Vol. 8, No. 1 (Jan. 1998), Pages 3–30 The generator gsl_rng_mt19937
      uses the second revision of the seeding procedure published by the two
      authors above in 2002. The original seeding procedures could cause
      spurious artifacts for some seed values. They are still available
      through the alternative generators gsl_rng_mt19937_1999 and
      gsl_rng_mt19937_1998.

Note the last sentence here.

This is all somewhat technical code, so when I noticed the above I could
never figure what exactly R was doing in its implementation.  But "innocent
until proven guilty" -- a sufficient number of people ought to have looked at
this -- so I saw no need to pursue this further.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

Paul Gilbert-2
In reply to this post by Duncan Murdoch-2


On 08/30/2016 06:29 PM, Duncan Murdoch wrote:
> I don't see evidence of a bug.  There have been several versions of the
> MT; we may be using a different version than you are.  Ours is the
> 1999/10/28 version; the web page you cite uses one from 2002.
>
> Perhaps the newer version fixes some problems, and then it would be
> worth considering a change.  But changing the default RNG definitely
> introduces problems in reproducibility,

Well "problems in reproducibility" is a bit vague. Results would always
be reproducible by specifying kind="Mersenne-Twister" or kind="Buggy
Kinderman-Ramage" for older results, so there is no problem reproducing
results. The only problem is that users expecting to reproduce results
twenty years later will need to know what random generator they used.
(BTW, they may also need to record information about the normal or other
generator, as well as the seed.) Of course, these changes are recorded
pretty well for R, so the history of "default" can always be found.

I think it is a mistake to encourage users into thinking they do not
need to keep track of some information if they want reproducibility.
Perhaps the default should be changed more often in order to encourage
better user habits.

More seriously, I think "default" should continue to be something that
is currently considered to be good. So, if there really is a known
problem, then I think "default" should be changed.

(And, no I did not get burned by the R 1.7.0 change in the default
generator. I got burned by a much earlier, unadvertised, and more subtle
change in the Splus generator.)

Paul Gilbert

so it's not obvious that we

> would do it.
>
> Duncan Murdoch
>
>
> On 30/08/2016 5:45 PM, Mark Roberts wrote:
>> Whomever,
>>
>> I recently sent the "bug report" below [hidden email] and have
>> just been asked to instead submit it to you.
>>
>> Although I am basically not an R user, I have installed version 3.3.1
>> and am also the author of a statistics program written in Visual Basic
>> that contains a component which correctly implements the Mersenne
>> Twister (MT) algorithm.  I believe that it is not possible to generate
>> the correct stream of pseudorandom numbers using the MT default random
>> number generator in R, and am not the first person to notice this.  Here
>> is a posted 2013 entry
>> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website
>> that asserts that the SAS computer program implementation of the MT
>> algorithm produces different numbers than R does when using the same
>> starting seed number.  The author of this post didn’t get anyone to
>> respond to his query about the reason for this SAS vs. R discrepancy.
>>
>> There are two ways of initializing the original MT computer program
>> (written in C) so that an identical stream of numbers can be repeatedly
>> generated:  1) with a particular integer seed number, and 2) with a
>> particular array of integers.   In the 'compilation and usage' section
>> of this webpage (https://github.com/cslarsen/mersenne-twister) there is
>> a listing of the first 200 random numbers the MT algorithm should
>> produce for seed number = 1.  The inventors of the Mersenne Twister
>> random number generator provided two different sets of the first 1000
>> numbers produced by a correctly coded 32-bit implementation of the MT
>> algorithm when initializing it with a particular array of integers at:
>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
>> [There is a link to this output at:
>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
>>
>> My statistics program obtains exactly those 200 numbers from the first
>> site mentioned in the previous paragraph and also obtains those same
>> numbers from the second website (though I didn't check all 2000 values).
>>    Assuming that the MT code within R uses the 32-bit MT algorithm, I
>> suspect that the current version of R can't do that.  If you (i.e.,
>> anyone who might knowledgeably respond to this report) is able to
>> duplicate those reference test-values, then please send me the R code to
>> initialize the MT code within R to successfully do that, and I apologize
>> for having wasted your time. If you (collectively) can't do that, then R
>> is very likely using incorrectly implemented MT code.  And if this
>> latter possibility is true, it seems to me that this is something that
>> should be fixed.
>>
>> Mark Roberts, Ph.D.
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

Gabriel Becker
I wonder how useful a (set of?) "time machine" functions which look up
/infer things like this based on a date would be. Could ease the pain of
changes generally, though not remove it completely.

~G

On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert <[hidden email]> wrote:

>
>
> On 08/30/2016 06:29 PM, Duncan Murdoch wrote:
>
>> I don't see evidence of a bug.  There have been several versions of the
>> MT; we may be using a different version than you are.  Ours is the
>> 1999/10/28 version; the web page you cite uses one from 2002.
>>
>> Perhaps the newer version fixes some problems, and then it would be
>> worth considering a change.  But changing the default RNG definitely
>> introduces problems in reproducibility,
>>
>
> Well "problems in reproducibility" is a bit vague. Results would always be
> reproducible by specifying kind="Mersenne-Twister" or kind="Buggy
> Kinderman-Ramage" for older results, so there is no problem reproducing
> results. The only problem is that users expecting to reproduce results
> twenty years later will need to know what random generator they used. (BTW,
> they may also need to record information about the normal or other
> generator, as well as the seed.) Of course, these changes are recorded
> pretty well for R, so the history of "default" can always be found.
>
> I think it is a mistake to encourage users into thinking they do not need
> to keep track of some information if they want reproducibility. Perhaps the
> default should be changed more often in order to encourage better user
> habits.
>
> More seriously, I think "default" should continue to be something that is
> currently considered to be good. So, if there really is a known problem,
> then I think "default" should be changed.
>
> (And, no I did not get burned by the R 1.7.0 change in the default
> generator. I got burned by a much earlier, unadvertised, and more subtle
> change in the Splus generator.)
>
> Paul Gilbert
>
>
> so it's not obvious that we
>
>> would do it.
>>
>> Duncan Murdoch
>>
>>
>> On 30/08/2016 5:45 PM, Mark Roberts wrote:
>>
>>> Whomever,
>>>
>>> I recently sent the "bug report" below [hidden email] and have
>>> just been asked to instead submit it to you.
>>>
>>> Although I am basically not an R user, I have installed version 3.3.1
>>> and am also the author of a statistics program written in Visual Basic
>>> that contains a component which correctly implements the Mersenne
>>> Twister (MT) algorithm.  I believe that it is not possible to generate
>>> the correct stream of pseudorandom numbers using the MT default random
>>> number generator in R, and am not the first person to notice this.  Here
>>> is a posted 2013 entry
>>> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website
>>> that asserts that the SAS computer program implementation of the MT
>>> algorithm produces different numbers than R does when using the same
>>> starting seed number.  The author of this post didn’t get anyone to
>>> respond to his query about the reason for this SAS vs. R discrepancy.
>>>
>>> There are two ways of initializing the original MT computer program
>>> (written in C) so that an identical stream of numbers can be repeatedly
>>> generated:  1) with a particular integer seed number, and 2) with a
>>> particular array of integers.   In the 'compilation and usage' section
>>> of this webpage (https://github.com/cslarsen/mersenne-twister) there is
>>> a listing of the first 200 random numbers the MT algorithm should
>>> produce for seed number = 1.  The inventors of the Mersenne Twister
>>> random number generator provided two different sets of the first 1000
>>> numbers produced by a correctly coded 32-bit implementation of the MT
>>> algorithm when initializing it with a particular array of integers at:
>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
>>> [There is a link to this output at:
>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
>>>
>>> My statistics program obtains exactly those 200 numbers from the first
>>> site mentioned in the previous paragraph and also obtains those same
>>> numbers from the second website (though I didn't check all 2000 values).
>>>    Assuming that the MT code within R uses the 32-bit MT algorithm, I
>>> suspect that the current version of R can't do that.  If you (i.e.,
>>> anyone who might knowledgeably respond to this report) is able to
>>> duplicate those reference test-values, then please send me the R code to
>>> initialize the MT code within R to successfully do that, and I apologize
>>> for having wasted your time. If you (collectively) can't do that, then R
>>> is very likely using incorrectly implemented MT code.  And if this
>>> latter possibility is true, it seems to me that this is something that
>>> should be fixed.
>>>
>>> Mark Roberts, Ph.D.
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

Martin Maechler
>>>>> Gabriel Becker <[hidden email]>
>>>>>     on Thu, 1 Sep 2016 08:34:31 -0700 writes:

    > I wonder how useful a (set of?) "time machine" functions
    > which look up /infer things like this based on a date
    > would be. Could ease the pain of changes generally, though
    > not remove it completely.

Such a set (possibly of size one) may be quite useful, notably
if it got an intuitive interface.
I'd recommend to partly follow options() here, i.e., the
  oc <- compatibilityR("2000-02-29")

would set random number generators (and other changeable
defaults) to those that were in effect when R 1.0.0 was released,
*and* a later call

  compatibilityR (oc)  # reset to previous state

would do what the comment says.


    > On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert
    > <[hidden email]> wrote:

    >>
    >>
    >> On 08/30/2016 06:29 PM, Duncan Murdoch wrote:
    >>
    >>> I don't see evidence of a bug.  There have been several
    >>> versions of the MT; we may be using a different version
    >>> than you are.  Ours is the 1999/10/28 version; the web
    >>> page you cite uses one from 2002.
    >>>
    >>> Perhaps the newer version fixes some problems, and then
    >>> it would be worth considering a change.  But changing
    >>> the default RNG definitely introduces problems in
    >>> reproducibility,
    >>>
    >>
    >> Well "problems in reproducibility" is a bit
    >> vague. Results would always be reproducible by specifying
    >> kind="Mersenne-Twister" or kind="Buggy Kinderman-Ramage"
    >> for older results, so there is no problem reproducing
    >> results. The only problem is that users expecting to
    >> reproduce results twenty years later will need to know
    >> what random generator they used. (BTW, they may also need
    >> to record information about the normal or other
    >> generator, as well as the seed.) Of course, these changes
    >> are recorded pretty well for R, so the history of
    >> "default" can always be found.
    >>
    >> I think it is a mistake to encourage users into thinking
    >> they do not need to keep track of some information if
    >> they want reproducibility. Perhaps the default should be
    >> changed more often in order to encourage better user
    >> habits.
    >>
    >> More seriously, I think "default" should continue to be
    >> something that is currently considered to be good. So, if
    >> there really is a known problem, then I think "default"
    >> should be changed.
    >>
    >> (And, no I did not get burned by the R 1.7.0 change in
    >> the default generator. I got burned by a much earlier,
    >> unadvertised, and more subtle change in the Splus
    >> generator.)
    >>
    >> Paul Gilbert
    >>
    >>
    >> so it's not obvious that we
    >>
    >>> would do it.
    >>>
    >>> Duncan Murdoch
    >>>
    >>>
    >>> On 30/08/2016 5:45 PM, Mark Roberts wrote:
    >>>
    >>>> Whomever,
    >>>>
    >>>> I recently sent the "bug report" below
    >>>> [hidden email] and have just been asked to
    >>>> instead submit it to you.
    >>>>
    >>>> Although I am basically not an R user, I have installed
    >>>> version 3.3.1 and am also the author of a statistics
    >>>> program written in Visual Basic that contains a
    >>>> component which correctly implements the Mersenne
    >>>> Twister (MT) algorithm.  I believe that it is not
    >>>> possible to generate the correct stream of pseudorandom
    >>>> numbers using the MT default random number generator in
    >>>> R, and am not the first person to notice this.  Here is
    >>>> a posted 2013 entry
    >>>> (www.r-bloggers.com/reproducibility-and-randomness/) on
    >>>> an R website that asserts that the SAS computer program
    >>>> implementation of the MT algorithm produces different
    >>>> numbers than R does when using the same starting seed
    >>>> number.  The author of this post didn’t get anyone to
    >>>> respond to his query about the reason for this SAS
    >>>> vs. R discrepancy.
    >>>>
    >>>> There are two ways of initializing the original MT
    >>>> computer program (written in C) so that an identical
    >>>> stream of numbers can be repeatedly generated: 1) with
    >>>> a particular integer seed number, and 2) with a
    >>>> particular array of integers.  In the 'compilation and
    >>>> usage' section of this webpage
    >>>> (https://github.com/cslarsen/mersenne-twister) there is
    >>>> a listing of the first 200 random numbers the MT
    >>>> algorithm should produce for seed number = 1.  The
    >>>> inventors of the Mersenne Twister random number
    >>>> generator provided two different sets of the first 1000
    >>>> numbers produced by a correctly coded 32-bit
    >>>> implementation of the MT algorithm when initializing it
    >>>> with a particular array of integers at:
    >>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out.
    >>>> [There is a link to this output at:
    >>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.]
    >>>>
    >>>> My statistics program obtains exactly those 200 numbers
    >>>> from the first site mentioned in the previous paragraph
    >>>> and also obtains those same numbers from the second
    >>>> website (though I didn't check all 2000 values).
    >>>> Assuming that the MT code within R uses the 32-bit MT
    >>>> algorithm, I suspect that the current version of R
    >>>> can't do that.  If you (i.e., anyone who might
    >>>> knowledgeably respond to this report) is able to
    >>>> duplicate those reference test-values, then please send
    >>>> me the R code to initialize the MT code within R to
    >>>> successfully do that, and I apologize for having wasted
    >>>> your time. If you (collectively) can't do that, then R
    >>>> is very likely using incorrectly implemented MT code.
    >>>> And if this latter possibility is true, it seems to me
    >>>> that this is something that should be fixed.
    >>>>
    >>>> Mark Roberts, Ph.D.
    >>>>
    >>>> [[alternative HTML version deleted]]
    >>>>
    >>>> ______________________________________________
    >>>> [hidden email] mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>>
    >>>>
    >>> ______________________________________________
    >>> [hidden email] mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>



    > --
    > Gabriel Becker, PhD Associate Scientist (Bioinformatics)
    > Genentech Research

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: A bug in the R Mersenne Twister (RNG) code?

Martin Maechler
In reply to this post by Dirk Eddelbuettel
>>>>> Dirk Eddelbuettel <[hidden email]>
>>>>>     on Wed, 31 Aug 2016 10:30:07 -0500 writes:

    > On 30 August 2016 at 18:29, Duncan Murdoch wrote:
    > I don't see evidence of a bug.  There have been several
    > versions of the  MT; we may be using a different version
    > than you are.  Ours is the  1999/10/28 version; the web
    > page you cite uses one from 2002.
    >  
    >  Perhaps the newer version fixes some problems, and then
    > it would be  worth considering a change.  But changing
    > the default RNG definitely  introduces problems in
    > reproducibility, so it's not obvious that we would do it.

    > Yep. FWIW the GNU GSL adopted the 2002 version a while ago
    > too. Quoting from
    > https://www.gnu.org/software/gsl/manual/html_node/Random-number-generator-algorithms.html

    > Generator: gsl_rng_mt19937

    >    The MT19937 generator of Makoto Matsumoto and Takuji
    > Nishimura is a variant of the twisted generalized feedback
    > shift-register algorithm, and is known as the “Mersenne
    > Twister” generator. It has a Mersenne prime period of
    > 2^19937 - 1 (about 10^6000) and is equi-distributed in 623
    > dimensions. It has passed the DIEHARD statistical
    > tests. It uses 624 words of state per generator and is
    > comparable in speed to the other generators. The original
    > generator used a default seed of 4357 and choosing s equal
    > to zero in gsl_rng_set reproduces this. Later versions
    > switched to 5489 as the default seed, you can choose this
    > explicitly via gsl_rng_set instead if you require it.

    >    For more information see,

    >       Makoto Matsumoto and Takuji Nishimura, “Mersenne
    > Twister: A 623-dimensionally equidistributed uniform
    > pseudorandom number generator”. ACM Transactions on
    > Modeling and Computer Simulation, Vol. 8, No. 1
    > (Jan. 1998), Pages 3–30 The generator gsl_rng_mt19937 uses
    > the second revision of the seeding procedure published by
    > the two authors above in 2002. The original seeding
    > procedures could cause spurious artifacts for some seed
    > values. They are still available through the alternative
    > generators gsl_rng_mt19937_1999 and gsl_rng_mt19937_1998.

    > Note the last sentence here.

    > This is all somewhat technical code, so when I noticed the
    > above I could never figure what exactly R was doing in its
    > implementation.  But "innocent until proven guilty" -- a
    > sufficient number of people ought to have looked at this
    > -- so I saw no need to pursue this further.

    > Dirk

This interesting thread went to sleep a bit early.

Let me summarize my view (another R-core after Duncan's) on this:

a) R's reference documentation, easily accessed with one of
     ?Random, ?RNG, ?.Random.seed  (and more)
  clearly indicates the reference for our  "Mersenne-Twister" (MT)
  as the  ' Matsumoto, M. and Nishimura, T. (1998) ... ' publication.
  and we are providing that.
  As Duncan said, there is no bug.

b) The extra information by Mark and notably Dirk indicates that
   "the litterature" showed that the 1998 version of MT has rare
   problems and the GSL, notably uses the 2002 version of MT.

c) I think it would be nice if we could provide that as well as
   another RNGkind().  I for one would be willing to integrate a
   well written patch proposal (what others call PR for "pull
   request", and I'd also call "code donation") into the R sources.

   Well-written for me would include to re-use / share code with
   the 1998 version as much as possible.

d) If we had both kinds, say "Mersenne-Twister" and "Mersenne-Twister_2002",
   we (the R community, not R core necessarily) could conduct
   experiments about the differences... and can contempltate the
   pros and cons of a potential switch of default.

e) A bit tangential to this:  Nowadays, there also exist faster
   gamma variate RNGs .. giving different random values - for
   rgamma() but also several other derived RNGs, notably
   rchisq(), and even more in other CRAN packages.

   With code donation there, we could introduce a new argument
        'gamma.kind'
   to the
                set.seed(see, kind = NULL, normal.kind = NULL)
                RNGkind (     kind = NULL, normal.kind = NULL)
   functions.
   (There, currently I'd guess we would not change the default).

---

Note that the above includes my guess that  R-core would not take
action unless we get patch proposals.
(But then, I'm happy if my guess is wrong here ...)

Martin Maechler,
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel