Reproducibility Between Local and Remote Computer with R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Reproducibility Between Local and Remote Computer with R

Kevin Egan
I posted this question:

I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote.

After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it.

Thanks.

and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC.

Thanks.
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Jeff Newmiller
Compare the sessionInfo outputs for the different environments.

On August 7, 2020 1:24:55 PM PDT, Kevin Egan <[hidden email]> wrote:

>I posted this question:
>
>I am currently using R , RStudio , and a remote computer (using an R
>script) to run the same code. I start by using set.seed(123) in all
>three versions of the code, then using glmnet to assess a matrix.
>Ultimately, I am having trouble reproducing the results between my
>local and the remote computer's results. I am using R version 4.0.2
>locally, and R version 3.6.0 remote.
>
>After running several tests, I'm wondering if there is a difference
>between the two versions in R which may lead to slightly different
>coefficients. If anyone has any insight I would appreciate it.
>
>Thanks.
>
>and found that there were slight differences between using rnorm with
>R-4.0.2 and R-3.6.0 but did not find any differences for runif between
>both systems. In my original code, I am using rnorm and was wondering
>if this may be the reason I am finding slight differences in
>coefficients for glmnet and lars testing between using my local
>computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my
>code locally on a MacOSX and remote on what I believe is an HPC.
>
>Thanks.
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

R help mailing list-2
In reply to this post by Kevin Egan
Hi,

I was initially going to think that the change in the RNG might be the source, however, that change was made in 3.6.0 and would have applied to runif() and sample():

"sample.kind can be "Rounding" or "Rejection", or partial matches to these. The former was the default in versions prior to 3.6.0: it made sample noticeably non-uniform on large populations, and should only be used for reproduction of old results. See PR#17494 for a discussion."

Three other possibilities:

1. Read news() for your local 4.0.2 installation, as there are some changes that were made, including some changes to round() that could be applicable here.

2. Check to see if the version of glmnet is the same on both machines. There have been changes to that package that might be relevant here and you might read the README and NEWS files for the package on CRAN to see if there is any relevant information there.

3. There is always a chance that different hardware and OS versions could lead to issues, especially out to a number of decimal places that could alter results. If you or via an Admin, have the ability to update the remote machine (both R and installed packages), that can help to reduce the number of variables at play here.

Regards,

Marc Schwartz


> On Aug 7, 2020, at 4:24 PM, Kevin Egan <[hidden email]> wrote:
>
> I posted this question:
>
> I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote.
>
> After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it.
>
> Thanks.
>
> and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC.
>
> Thanks.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Kevin Egan
In reply to this post by Jeff Newmiller
Local:
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

loaded via a namespace (and not attached):
 [1] crayon_1.3.4     dplyr_1.0.0      R6_2.4.1         lifecycle_0.2.0  magrittr_1.5     pillar_1.4.3    
 [7] rlang_0.4.7      rstudioapi_0.11  vctrs_0.3.1      generics_0.0.2   ellipsis_0.3.0   tools_4.0.2    
[13] glue_1.4.1       purrr_0.3.4      yaml_2.2.1       compiler_4.0.2   pkgconfig_2.0.3  tidyselect_1.1.0
[19] tibble_3.0.1


Remote:
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /ddn/apps/Cluster-Apps/intel/2019.5/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

loaded via a namespace (and not attached):
[1] compiler_3.6.3

> On 8 Aug 2020, at 08:17, Jeff Newmiller <[hidden email]> wrote:
>
> Compare the sessionInfo outputs for the different environments.
>
> On August 7, 2020 1:24:55 PM PDT, Kevin Egan <[hidden email]> wrote:
>> I posted this question:
>>
>> I am currently using R , RStudio , and a remote computer (using an R
>> script) to run the same code. I start by using set.seed(123) in all
>> three versions of the code, then using glmnet to assess a matrix.
>> Ultimately, I am having trouble reproducing the results between my
>> local and the remote computer's results. I am using R version 4.0.2
>> locally, and R version 3.6.0 remote.
>>
>> After running several tests, I'm wondering if there is a difference
>> between the two versions in R which may lead to slightly different
>> coefficients. If anyone has any insight I would appreciate it.
>>
>> Thanks.
>>
>> and found that there were slight differences between using rnorm with
>> R-4.0.2 and R-3.6.0 but did not find any differences for runif between
>> both systems. In my original code, I am using rnorm and was wondering
>> if this may be the reason I am finding slight differences in
>> coefficients for glmnet and lars testing between using my local
>> computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my
>> code locally on a MacOSX and remote on what I believe is an HPC.
>>
>> Thanks.
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Jeff Newmiller
You did not load the corresponding packages in both environments.

Also.. please post plain text format per the Posting Guide mentioned in the footer of every post.

On August 8, 2020 7:15:16 AM PDT, Kevin Egan <[hidden email]> wrote:

>Local:
>R version 4.0.2 (2020-06-22)
>Platform: x86_64-apple-darwin17.0 (64-bit)
>Running under: macOS Catalina 10.15.6
>
>Matrix products: default
>BLAS:  
>/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
>LAPACK:
>/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
>
>locale:
>[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
>attached base packages:
>[1] stats     graphics  grDevices utils     datasets  methods   base  
>
>
>loaded via a namespace (and not attached):
>[1] crayon_1.3.4     dplyr_1.0.0      R6_2.4.1         lifecycle_0.2.0
>magrittr_1.5     pillar_1.4.3    
>[7] rlang_0.4.7      rstudioapi_0.11  vctrs_0.3.1      generics_0.0.2  
>ellipsis_0.3.0   tools_4.0.2    
>[13] glue_1.4.1       purrr_0.3.4      yaml_2.2.1       compiler_4.0.2
> pkgconfig_2.0.3  tidyselect_1.1.0
>[19] tibble_3.0.1
>
>
>Remote:
>> sessionInfo()
>R version 3.6.3 (2020-02-29)
>Platform: x86_64-pc-linux-gnu (64-bit)
>Running under: CentOS Linux 7 (Core)
>
>Matrix products: default
>BLAS/LAPACK:
>/ddn/apps/Cluster-Apps/intel/2019.5/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
>
>locale:
>[1] C
>
>attached base packages:
>[1] stats     graphics  grDevices utils     datasets  methods   base  
>
>
>loaded via a namespace (and not attached):
>[1] compiler_3.6.3
>
>> On 8 Aug 2020, at 08:17, Jeff Newmiller <[hidden email]>
>wrote:
>>
>> Compare the sessionInfo outputs for the different environments.
>>
>> On August 7, 2020 1:24:55 PM PDT, Kevin Egan <[hidden email]>
>wrote:
>>> I posted this question:
>>>
>>> I am currently using R , RStudio , and a remote computer (using an R
>>> script) to run the same code. I start by using set.seed(123) in all
>>> three versions of the code, then using glmnet to assess a matrix.
>>> Ultimately, I am having trouble reproducing the results between my
>>> local and the remote computer's results. I am using R version 4.0.2
>>> locally, and R version 3.6.0 remote.
>>>
>>> After running several tests, I'm wondering if there is a difference
>>> between the two versions in R which may lead to slightly different
>>> coefficients. If anyone has any insight I would appreciate it.
>>>
>>> Thanks.
>>>
>>> and found that there were slight differences between using rnorm
>with
>>> R-4.0.2 and R-3.6.0 but did not find any differences for runif
>between
>>> both systems. In my original code, I am using rnorm and was
>wondering
>>> if this may be the reason I am finding slight differences in
>>> coefficients for glmnet and lars testing between using my local
>>> computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my
>>> code locally on a MacOSX and remote on what I believe is an HPC.
>>>
>>> Thanks.
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

aBBy Spurdle, ⍺XY
In reply to this post by Kevin Egan
Hi Kevin,

Intuitively, the first step would be to ensure that all versions of R,
and all the R packages, are the same.

However, you mention HPC.
And the glmnet package imports the foreach package, which appears
(after a quick glance) to support multi-core and parallel computing.

If your code uses parallel computing (?), you may need to look at how
random numbers, and related results, are handled...


On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <[hidden email]> wrote:

>
> I posted this question:
>
> I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote.
>
> After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it.
>
> Thanks.
>
> and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC.
>
> Thanks.
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Duncan Murdoch-2
In reply to this post by R help mailing list-2
On 08/08/2020 9:34 a.m., Marc Schwartz via R-help wrote:
> Hi,
>
> I was initially going to think that the change in the RNG might be the source, however, that change was made in 3.6.0 and would have applied to runif() and sample():
>
> "sample.kind can be "Rounding" or "Rejection", or partial matches to these. The former was the default in versions prior to 3.6.0: it made sample noticeably non-uniform on large populations, and should only be used for reproduction of old results. See PR#17494 for a discussion."
>

That still may be an issue.  If a user saves a workspace in an old
version and reloads it in a newer version, I believe they get the old
version of the RNG.

You need to check that the output of RNGkind() matches in all machines
to know that they're using the same RNGs.

Duncan Murdoch

> Three other possibilities:
>
> 1. Read news() for your local 4.0.2 installation, as there are some changes that were made, including some changes to round() that could be applicable here.
>
> 2. Check to see if the version of glmnet is the same on both machines. There have been changes to that package that might be relevant here and you might read the README and NEWS files for the package on CRAN to see if there is any relevant information there.
>
> 3. There is always a chance that different hardware and OS versions could lead to issues, especially out to a number of decimal places that could alter results. If you or via an Admin, have the ability to update the remote machine (both R and installed packages), that can help to reduce the number of variables at play here.
>
> Regards,
>
> Marc Schwartz
>
>
>> On Aug 7, 2020, at 4:24 PM, Kevin Egan <[hidden email]> wrote:
>>
>> I posted this question:
>>
>> I am currently using R , RStudio , and a remote computer (using an R script) to run the same code. I start by using set.seed(123) in all three versions of the code, then using glmnet to assess a matrix. Ultimately, I am having trouble reproducing the results between my local and the remote computer's results. I am using R version 4.0.2 locally, and R version 3.6.0 remote.
>>
>> After running several tests, I'm wondering if there is a difference between the two versions in R which may lead to slightly different coefficients. If anyone has any insight I would appreciate it.
>>
>> Thanks.
>>
>> and found that there were slight differences between using rnorm with R-4.0.2 and R-3.6.0 but did not find any differences for runif between both systems. In my original code, I am using rnorm and was wondering if this may be the reason I am finding slight differences in coefficients for glmnet and lars testing between using my local computer (R-4.0.2) and my remote computer (R-3.6.0). I am running my code locally on a MacOSX and remote on what I believe is an HPC.
>>
>> Thanks.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

ssefick
In reply to this post by aBBy Spurdle, ⍺XY
Caveat, I have only skimmed this email thread, so please forgive me if I
have missed something.

Are you able to use Renv, packrat, docker, or anaconda? Your compute
environments are very different.
Kindest regards,

Stephen Sefick

On Sat, Aug 8, 2020, 19:05 Abby Spurdle <[hidden email]> wrote:

> Hi Kevin,
>
> Intuitively, the first step would be to ensure that all versions of R,
> and all the R packages, are the same.
>
> However, you mention HPC.
> And the glmnet package imports the foreach package, which appears
> (after a quick glance) to support multi-core and parallel computing.
>
> If your code uses parallel computing (?), you may need to look at how
> random numbers, and related results, are handled...
>
>
> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <[hidden email]> wrote:
> >
> > I posted this question:
> >
> > I am currently using R , RStudio , and a remote computer (using an R
> script) to run the same code. I start by using set.seed(123) in all three
> versions of the code, then using glmnet to assess a matrix. Ultimately, I
> am having trouble reproducing the results between my local and the remote
> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0
> remote.
> >
> > After running several tests, I'm wondering if there is a difference
> between the two versions in R which may lead to slightly different
> coefficients. If anyone has any insight I would appreciate it.
> >
> > Thanks.
> >
> > and found that there were slight differences between using rnorm with
> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both
> systems. In my original code, I am using rnorm and was wondering if this
> may be the reason I am finding slight differences in coefficients for
> glmnet and lars testing between using my local computer (R-4.0.2) and my
> remote computer (R-3.6.0). I am running my code locally on a MacOSX and
> remote on what I believe is an HPC.
> >
> > Thanks.
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Kevin Egan
In reply to this post by aBBy Spurdle, ⍺XY
Hi Abby,

After running a few tests on my local and remote versions of R, this seems
to be the most plausible answer to the problem. I put set.seed(123)
several times within my code and produced the same results but would rather
not have to do that if possible.


On Sat, Aug 8, 2020 at 6:05 PM Abby Spurdle <[hidden email]> wrote:

> Hi Kevin,
>
> Intuitively, the first step would be to ensure that all versions of R,
> and all the R packages, are the same.
>
> However, you mention HPC.
> And the glmnet package imports the foreach package, which appears
> (after a quick glance) to support multi-core and parallel computing.
>
> If your code uses parallel computing (?), you may need to look at how
> random numbers, and related results, are handled...
>
>
> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <[hidden email]> wrote:
> >
> > I posted this question:
> >
> > I am currently using R , RStudio , and a remote computer (using an R
> script) to run the same code. I start by using set.seed(123) in all three
> versions of the code, then using glmnet to assess a matrix. Ultimately, I
> am having trouble reproducing the results between my local and the remote
> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0
> remote.
> >
> > After running several tests, I'm wondering if there is a difference
> between the two versions in R which may lead to slightly different
> coefficients. If anyone has any insight I would appreciate it.
> >
> > Thanks.
> >
> > and found that there were slight differences between using rnorm with
> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both
> systems. In my original code, I am using rnorm and was wondering if this
> may be the reason I am finding slight differences in coefficients for
> glmnet and lars testing between using my local computer (R-4.0.2) and my
> remote computer (R-3.6.0). I am running my code locally on a MacOSX and
> remote on what I believe is an HPC.
> >
> > Thanks.
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

ssefick
In reply to this post by ssefick
Hi Kevin,

I think Abby has suggested something similar to what I think the problem is
related to - environment setup.

Some possible solutions:
The renv and packrat packages are a way to version your packages to help
with reproducability. Anaconda might be a solution for the R version and
package version problem, if installed on your hpc. Docker could work as
well (maybe the best option if installed). There are other workarounds, but
I would have to know how your particular hpc/compute environment is set up
to comment further.

Brass tacks:
I think you need to ensure all your package versions (R and add-on
packages) are the same.

Fwiw,

Stephen

On Sun, Aug 9, 2020, 08:26 Kevin Egan <[hidden email]> wrote:

> Hi Stephen,
>
> I believe I am using Renv, but on my remote computer I am running batch
> files.
>
> Thanks,
>
> Kevin
>
> On 8 Aug 2020, at 18:18, stephen sefick <[hidden email]> wrote:
>
> Caveat, I have only skimmed this email thread, so please forgive me if I
> have missed something.
>
> Are you able to use Renv, packrat, docker, or anaconda? Your compute
> environments are very different.
> Kindest regards,
>
> Stephen Sefick
>
> On Sat, Aug 8, 2020, 19:05 Abby Spurdle <[hidden email]> wrote:
>
>> Hi Kevin,
>>
>> Intuitively, the first step would be to ensure that all versions of R,
>> and all the R packages, are the same.
>>
>> However, you mention HPC.
>> And the glmnet package imports the foreach package, which appears
>> (after a quick glance) to support multi-core and parallel computing.
>>
>> If your code uses parallel computing (?), you may need to look at how
>> random numbers, and related results, are handled...
>>
>>
>> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <[hidden email]> wrote:
>> >
>> > I posted this question:
>> >
>> > I am currently using R , RStudio , and a remote computer (using an R
>> script) to run the same code. I start by using set.seed(123) in all three
>> versions of the code, then using glmnet to assess a matrix. Ultimately, I
>> am having trouble reproducing the results between my local and the remote
>> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0
>> remote.
>> >
>> > After running several tests, I'm wondering if there is a difference
>> between the two versions in R which may lead to slightly different
>> coefficients. If anyone has any insight I would appreciate it.
>> >
>> > Thanks.
>> >
>> > and found that there were slight differences between using rnorm with
>> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both
>> systems. In my original code, I am using rnorm and was wondering if this
>> may be the reason I am finding slight differences in coefficients for
>> glmnet and lars testing between using my local computer (R-4.0.2) and my
>> remote computer (R-3.6.0). I am running my code locally on a MacOSX and
>> remote on what I believe is an HPC.
>> >
>> > Thanks.
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.r-project.org/posting-guide.html>
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Duncan Murdoch-2
In reply to this post by Kevin Egan
On 09/08/2020 8:33 a.m., Kevin Egan wrote:
> Hi Abby,
>
> After running a few tests on my local and remote versions of R, this seems
> to be the most plausible answer to the problem. I put set.seed(123)
> several times within my code and produced the same results but would rather
> not have to do that if possible.

You should look at the doRNG package, which addresses exactly this
problem.  See its vignette, vignette("doRNG", package="doRNG").

Duncan Murdoch

>
>
> On Sat, Aug 8, 2020 at 6:05 PM Abby Spurdle <[hidden email]> wrote:
>
>> Hi Kevin,
>>
>> Intuitively, the first step would be to ensure that all versions of R,
>> and all the R packages, are the same.
>>
>> However, you mention HPC.
>> And the glmnet package imports the foreach package, which appears
>> (after a quick glance) to support multi-core and parallel computing.
>>
>> If your code uses parallel computing (?), you may need to look at how
>> random numbers, and related results, are handled...
>>
>>
>> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <[hidden email]> wrote:
>>>
>>> I posted this question:
>>>
>>> I am currently using R , RStudio , and a remote computer (using an R
>> script) to run the same code. I start by using set.seed(123) in all three
>> versions of the code, then using glmnet to assess a matrix. Ultimately, I
>> am having trouble reproducing the results between my local and the remote
>> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0
>> remote.
>>>
>>> After running several tests, I'm wondering if there is a difference
>> between the two versions in R which may lead to slightly different
>> coefficients. If anyone has any insight I would appreciate it.
>>>
>>> Thanks.
>>>
>>> and found that there were slight differences between using rnorm with
>> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both
>> systems. In my original code, I am using rnorm and was wondering if this
>> may be the reason I am finding slight differences in coefficients for
>> glmnet and lars testing between using my local computer (R-4.0.2) and my
>> remote computer (R-3.6.0). I am running my code locally on a MacOSX and
>> remote on what I believe is an HPC.
>>>
>>> Thanks.
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Reproducibility Between Local and Remote Computer with R

Kevin Egan
In reply to this post by ssefick
Hi Stephen,

Thanks, I’m now trying to use R 3.6.3 on the HPC, I was able to run a few
tests remote and get reproducible results. The batches have not yet run,
but I’m hoping will give reproducible results when they do.

Thanks,

Kevin

On Sun, Aug 9, 2020 at 08:42 stephen sefick <[hidden email]> wrote:

> Hi Kevin,
>
> I think Abby has suggested something similar to what I think the problem
> is related to - environment setup.
>
> Some possible solutions:
> The renv and packrat packages are a way to version your packages to help
> with reproducability. Anaconda might be a solution for the R version and
> package version problem, if installed on your hpc. Docker could work as
> well (maybe the best option if installed). There are other workarounds, but
> I would have to know how your particular hpc/compute environment is set up
> to comment further.
>
> Brass tacks:
> I think you need to ensure all your package versions (R and add-on
> packages) are the same.
>
> Fwiw,
>
> Stephen
>
> On Sun, Aug 9, 2020, 08:26 Kevin Egan <[hidden email]> wrote:
>
>> Hi Stephen,
>>
>> I believe I am using Renv, but on my remote computer I am running batch
>> files.
>>
>> Thanks,
>>
>> Kevin
>>
>> On 8 Aug 2020, at 18:18, stephen sefick <[hidden email]> wrote:
>>
>> Caveat, I have only skimmed this email thread, so please forgive me if I
>> have missed something.
>>
>> Are you able to use Renv, packrat, docker, or anaconda? Your compute
>> environments are very different.
>> Kindest regards,
>>
>> Stephen Sefick
>>
>> On Sat, Aug 8, 2020, 19:05 Abby Spurdle <[hidden email]> wrote:
>>
>>> Hi Kevin,
>>>
>>> Intuitively, the first step would be to ensure that all versions of R,
>>> and all the R packages, are the same.
>>>
>>> However, you mention HPC.
>>> And the glmnet package imports the foreach package, which appears
>>> (after a quick glance) to support multi-core and parallel computing.
>>>
>>> If your code uses parallel computing (?), you may need to look at how
>>> random numbers, and related results, are handled...
>>>
>>>
>>> On Sun, Aug 9, 2020 at 1:14 AM Kevin Egan <[hidden email]> wrote:
>>> >
>>> > I posted this question:
>>> >
>>> > I am currently using R , RStudio , and a remote computer (using an R
>>> script) to run the same code. I start by using set.seed(123) in all three
>>> versions of the code, then using glmnet to assess a matrix. Ultimately, I
>>> am having trouble reproducing the results between my local and the remote
>>> computer's results. I am using R version 4.0.2 locally, and R version 3.6.0
>>> remote.
>>> >
>>> > After running several tests, I'm wondering if there is a difference
>>> between the two versions in R which may lead to slightly different
>>> coefficients. If anyone has any insight I would appreciate it.
>>> >
>>> > Thanks.
>>> >
>>> > and found that there were slight differences between using rnorm with
>>> R-4.0.2 and R-3.6.0 but did not find any differences for runif between both
>>> systems. In my original code, I am using rnorm and was wondering if this
>>> may be the reason I am finding slight differences in coefficients for
>>> glmnet and lars testing between using my local computer (R-4.0.2) and my
>>> remote computer (R-3.6.0). I am running my code locally on a MacOSX and
>>> remote on what I believe is an HPC.
>>> >
>>> > Thanks.
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> <http://www.r-project.org/posting-guide.html>
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> <http://www.r-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.