memory consumption of nested (un)serialize of sys.frames()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

memory consumption of nested (un)serialize of sys.frames()

Andreas Kersting
Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


crash_dumps <- new.env()

f <- function() {
  x <- runif(1e5)
  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
  assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
  print(i)
  f()
}

You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes.

I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround:

crash_dumps <- list()

f <- function() {
  x <- runif(1e5)
  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
  crash_dumps[["last.dump"]] <- dump
  assignInMyNamespace("crash_dumps", crash_dumps)
}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

luke-tierney

On Wed, 7 Apr 2021, Andreas Kersting wrote:

> Hi,
>
> please consider the following minimal reproducible example:
>
> Create a new R package which just contains the following two (exported) objects:

I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke

>
>
> crash_dumps <- new.env()
>
> f <- function() {
>  x <- runif(1e5)
>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>  assign("last.dump", dump, crash_dumps)
> }
>
>
> WARNING: the following will probably eat all your RAM!
>
> Attach this package and run:
>
> for (i in 1:100) {
>  print(i)
>  f()
> }
>
> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes.
>
> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks!
>
> Regards,
> Andreas
>
>
> Background:
> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory.
>
> Some more observations:
> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!?
> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env.
> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround:
>
> crash_dumps <- list()
>
> f <- function() {
>  x <- runif(1e5)
>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>  crash_dumps[["last.dump"]] <- dump
>  assignInMyNamespace("crash_dumps", crash_dumps)
> }
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

Andreas Kersting
Hi Luke,

Please see https://github.com/akersting/dumpTest for the package.

Here a session showing my issue:

> library(dumpTest)
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] dumpTest_0.1.0

loaded via a namespace (and not attached):
[1] compiler_4.0.5
> for (i in 1:100) {
+   print(i)
+   print(system.time(f()))
+ }
[1] 1
   user  system elapsed
  0.028   0.004   0.034
[1] 2
   user  system elapsed
  0.067   0.008   0.075
[1] 3
   user  system elapsed
  0.176   0.000   0.176
[1] 4
   user  system elapsed
  0.335   0.012   0.349
[1] 5
   user  system elapsed
  0.745   0.023   0.770
[1] 6
   user  system elapsed
  1.495   0.060   1.572
[1] 7
   user  system elapsed
  2.902   0.136   3.040
[1] 8
   user  system elapsed
  5.753   0.272   6.034
[1] 9
   user  system elapsed
 11.807   0.708  12.597
[1] 10
^C
Timing stopped at: 6.638 0.549 7.214

I had to interrupt in iteration 10 because I was running low on RAM.

Regards,
Andreas

2021-04-07 15:28 GMT+02:00 [hidden email]:

> On Wed, 7 Apr 2021, Andreas Kersting wrote:
>
>> Hi,
>>
>> please consider the following minimal reproducible example:
>>
>> Create a new R package which just contains the following two (exported) objects:
>
> I would not expect this behavior and I don't see it when I make such a
> package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
> more complete reproducible example if you want help with what you are
> trying to do; also sessionInfo() would help.
>
> Best,
>
> luke
>
>>
>>
>> crash_dumps <- new.env()
>>
>> f <- function() {
>>  x <- runif(1e5)
>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>  assign("last.dump", dump, crash_dumps)
>> }
>>
>>
>> WARNING: the following will probably eat all your RAM!
>>
>> Attach this package and run:
>>
>> for (i in 1:100) {
>>  print(i)
>>  f()
>> }
>>
>> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes.
>>
>> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks!
>>
>> Regards,
>> Andreas
>>
>>
>> Background:
>> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory.
>>
>> Some more observations:
>> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
>> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!?
>> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env.
>> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround:
>>
>> crash_dumps <- list()
>>
>> f <- function() {
>>  x <- runif(1e5)
>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>  crash_dumps[["last.dump"]] <- dump
>>  assignInMyNamespace("crash_dumps", crash_dumps)
>> }
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   [hidden email]
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

Dirk Eddelbuettel

On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
|  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
|  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
|  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
|  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      
|
| attached base packages:
| [1] stats     graphics  grDevices utils     datasets  methods   base    
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|    user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|    user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|    user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|    user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|    user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|    user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|    user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|    user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|    user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’
   
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97                                      
   user  system elapsed    
  0.004   0.000   0.005
[1] 98                                                                                                  
   user  system elapsed
  0.004   0.000   0.005                  
[1] 99                    
   user  system elapsed    
  0.004   0.000   0.004                                                                                  
[1] 100                                                                                                  
   user  system elapsed                                                                                  
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

luke-tierney
In reply to this post by Andreas Kersting
No issues here with that either. Looks like something is different on
your end.

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:

> Hi Luke,
>
> Please see https://github.com/akersting/dumpTest for the package.
>
> Here a session showing my issue:
>
>> library(dumpTest)
>> sessionInfo()
> R version 4.0.5 (2021-03-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 10 (buster)
>
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] dumpTest_0.1.0
>
> loaded via a namespace (and not attached):
> [1] compiler_4.0.5
>> for (i in 1:100) {
> +   print(i)
> +   print(system.time(f()))
> + }
> [1] 1
>   user  system elapsed
>  0.028   0.004   0.034
> [1] 2
>   user  system elapsed
>  0.067   0.008   0.075
> [1] 3
>   user  system elapsed
>  0.176   0.000   0.176
> [1] 4
>   user  system elapsed
>  0.335   0.012   0.349
> [1] 5
>   user  system elapsed
>  0.745   0.023   0.770
> [1] 6
>   user  system elapsed
>  1.495   0.060   1.572
> [1] 7
>   user  system elapsed
>  2.902   0.136   3.040
> [1] 8
>   user  system elapsed
>  5.753   0.272   6.034
> [1] 9
>   user  system elapsed
> 11.807   0.708  12.597
> [1] 10
> ^C
> Timing stopped at: 6.638 0.549 7.214
>
> I had to interrupt in iteration 10 because I was running low on RAM.
>
> Regards,
> Andreas
>
> 2021-04-07 15:28 GMT+02:00 [hidden email]:
>> On Wed, 7 Apr 2021, Andreas Kersting wrote:
>>
>>> Hi,
>>>
>>> please consider the following minimal reproducible example:
>>>
>>> Create a new R package which just contains the following two (exported) objects:
>>
>> I would not expect this behavior and I don't see it when I make such a
>> package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
>> more complete reproducible example if you want help with what you are
>> trying to do; also sessionInfo() would help.
>>
>> Best,
>>
>> luke
>>
>>>
>>>
>>> crash_dumps <- new.env()
>>>
>>> f <- function() {
>>>  x <- runif(1e5)
>>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>>  assign("last.dump", dump, crash_dumps)
>>> }
>>>
>>>
>>> WARNING: the following will probably eat all your RAM!
>>>
>>> Attach this package and run:
>>>
>>> for (i in 1:100) {
>>>  print(i)
>>>  f()
>>> }
>>>
>>> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes.
>>>
>>> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks!
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> Background:
>>> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory.
>>>
>>> Some more observations:
>>> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
>>> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!?
>>> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env.
>>> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround:
>>>
>>> crash_dumps <- list()
>>>
>>> f <- function() {
>>>  x <- runif(1e5)
>>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>>  crash_dumps[["last.dump"]] <- dump
>>>  assignInMyNamespace("crash_dumps", crash_dumps)
>>> }
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:   [hidden email]
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

Andreas Kersting
In reply to this post by Dirk Eddelbuettel
Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install --with-keep.source, i.e.

R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me.

Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" <[hidden email]>:

>
> On 7 April 2021 at 16:06, Andreas Kersting wrote:
> | Hi Luke,
> |
> | Please see https://github.com/akersting/dumpTest for the package.
> |
> | Here a session showing my issue:
> |
> | > library(dumpTest)
> | > sessionInfo()
> | R version 4.0.5 (2021-03-31)
> | Platform: x86_64-pc-linux-gnu (64-bit)
> | Running under: Debian GNU/Linux 10 (buster)
> |
> | Matrix products: default
> | BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
> |
> | locale:
> |  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
> |  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
> |  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
> |  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
> |  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      
> |
> | attached base packages:
> | [1] stats     graphics  grDevices utils     datasets  methods   base    
> |
> | other attached packages:
> | [1] dumpTest_0.1.0
> |
> | loaded via a namespace (and not attached):
> | [1] compiler_4.0.5
> | > for (i in 1:100) {
> | +   print(i)
> | +   print(system.time(f()))
> | + }
> | [1] 1
> |    user  system elapsed
> |   0.028   0.004   0.034
> | [1] 2
> |    user  system elapsed
> |   0.067   0.008   0.075
> | [1] 3
> |    user  system elapsed
> |   0.176   0.000   0.176
> | [1] 4
> |    user  system elapsed
> |   0.335   0.012   0.349
> | [1] 5
> |    user  system elapsed
> |   0.745   0.023   0.770
> | [1] 6
> |    user  system elapsed
> |   1.495   0.060   1.572
> | [1] 7
> |    user  system elapsed
> |   2.902   0.136   3.040
> | [1] 8
> |    user  system elapsed
> |   5.753   0.272   6.034
> | [1] 9
> |    user  system elapsed
> |  11.807   0.708  12.597
> | [1] 10
> | ^C
> | Timing stopped at: 6.638 0.549 7.214
> |
> | I had to interrupt in iteration 10 because I was running low on RAM.
>
> No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
> off my Debian package, hence instrumentation as in the Debian package.
>
> edd@rob:~$ installGithub.r akersting/dumpTest
> Using github PAT from envvar GITHUB_PAT
> Downloading GitHub repo akersting/dumpTest@HEAD
> ✔  checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
> ─  preparing ‘dumpTest’:
> ✔  checking DESCRIPTION meta-information ...
> ─  checking for LF line-endings in source and make files and shell scripts
> ─  checking for empty or unneeded directories
> ─  building ‘dumpTest_0.1.0.tar.gz’
>    
> Installing package into ‘/usr/local/lib/R/site-library’
> (as ‘lib’ is unspecified)
> * installing *source* package ‘dumpTest’ ...
> ** using staged installation
> ** R
> ** byte-compile and prepare package for lazy loading
> ** help
> No man pages found in package  ‘dumpTest’
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded from temporary location
> ** testing if installed package can be loaded from final location
> ** testing if installed package keeps a record of temporary installation path
> * DONE (dumpTest)
> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
>    user  system elapsed
>   0.481   0.019   0.500
> edd@rob:~$
>
> (I also ran the variant you showed with the dual print statements, it just
> consumes more screen real estate and ends on
>
> [...]
> [1] 97                                      
>    user  system elapsed    
>   0.004   0.000   0.005
> [1] 98                                                                                                  
>    user  system elapsed
>   0.004   0.000   0.005                  
> [1] 99                    
>    user  system elapsed    
>   0.004   0.000   0.004                                                                                  
> [1] 100                                                                                                  
>    user  system elapsed                                                                                  
>   0.005   0.000   0.005
> edd@rob:~$ )
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

Andreas Kersting
Hi,

For (hopefully) full reproducibility:

docker run rocker/tidyverse:4.0.5 Rscript -e 'devtools::install_github("akersting/dumpTest", INSTALL_opts = "--with-keep.source"); library(dumpTest); for (i in 1:100) {print(i); print(system.time(f()))}'

Regards,
Andreas

2021-04-07 17:09 GMT+02:00 "Andreas Kersting" <[hidden email]>:

> Hi Dirk, hi Luke,
>
> Thanks for checking!
>
> I could narrow it down further. I have the issue only if I install --with-keep.source, i.e.
>
> R CMD INSTALL --with-keep.source dumpTest
>
> Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me.
>
> Could you please confirm? Thanks!
>
> Regards,
> Andreas
>
> 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" <[hidden email]>:
>>
>> On 7 April 2021 at 16:06, Andreas Kersting wrote:
>> | Hi Luke,
>> |
>> | Please see https://github.com/akersting/dumpTest for the package.
>> |
>> | Here a session showing my issue:
>> |
>> | > library(dumpTest)
>> | > sessionInfo()
>> | R version 4.0.5 (2021-03-31)
>> | Platform: x86_64-pc-linux-gnu (64-bit)
>> | Running under: Debian GNU/Linux 10 (buster)
>> |
>> | Matrix products: default
>> | BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
>> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>> |
>> | locale:
>> |  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>> |  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>> |  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  
>> |  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
>> |  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
>> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      
>> |
>> | attached base packages:
>> | [1] stats     graphics  grDevices utils     datasets  methods   base    
>> |
>> | other attached packages:
>> | [1] dumpTest_0.1.0
>> |
>> | loaded via a namespace (and not attached):
>> | [1] compiler_4.0.5
>> | > for (i in 1:100) {
>> | +   print(i)
>> | +   print(system.time(f()))
>> | + }
>> | [1] 1
>> |    user  system elapsed
>> |   0.028   0.004   0.034
>> | [1] 2
>> |    user  system elapsed
>> |   0.067   0.008   0.075
>> | [1] 3
>> |    user  system elapsed
>> |   0.176   0.000   0.176
>> | [1] 4
>> |    user  system elapsed
>> |   0.335   0.012   0.349
>> | [1] 5
>> |    user  system elapsed
>> |   0.745   0.023   0.770
>> | [1] 6
>> |    user  system elapsed
>> |   1.495   0.060   1.572
>> | [1] 7
>> |    user  system elapsed
>> |   2.902   0.136   3.040
>> | [1] 8
>> |    user  system elapsed
>> |   5.753   0.272   6.034
>> | [1] 9
>> |    user  system elapsed
>> |  11.807   0.708  12.597
>> | [1] 10
>> | ^C
>> | Timing stopped at: 6.638 0.549 7.214
>> |
>> | I had to interrupt in iteration 10 because I was running low on RAM.
>>
>> No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
>> off my Debian package, hence instrumentation as in the Debian package.
>>
>> edd@rob:~$ installGithub.r akersting/dumpTest
>> Using github PAT from envvar GITHUB_PAT
>> Downloading GitHub repo akersting/dumpTest@HEAD
>> ✔  checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
>> ─  preparing ‘dumpTest’:
>> ✔  checking DESCRIPTION meta-information ...
>> ─  checking for LF line-endings in source and make files and shell scripts
>> ─  checking for empty or unneeded directories
>> ─  building ‘dumpTest_0.1.0.tar.gz’
>>    
>> Installing package into ‘/usr/local/lib/R/site-library’
>> (as ‘lib’ is unspecified)
>> * installing *source* package ‘dumpTest’ ...
>> ** using staged installation
>> ** R
>> ** byte-compile and prepare package for lazy loading
>> ** help
>> No man pages found in package  ‘dumpTest’
>> *** installing help indices
>> ** building package indices
>> ** testing if installed package can be loaded from temporary location
>> ** testing if installed package can be loaded from final location
>> ** testing if installed package keeps a record of temporary installation path
>> * DONE (dumpTest)
>> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
>>    user  system elapsed
>>   0.481   0.019   0.500
>> edd@rob:~$
>>
>> (I also ran the variant you showed with the dual print statements, it just
>> consumes more screen real estate and ends on
>>
>> [...]
>> [1] 97                                      
>>    user  system elapsed    
>>   0.004   0.000   0.005
>> [1] 98                                                                                                  
>>    user  system elapsed
>>   0.004   0.000   0.005                  
>> [1] 99                    
>>    user  system elapsed    
>>   0.004   0.000   0.004                                                                                  
>> [1] 100                                                                                                  
>>    user  system elapsed                                                                                  
>>   0.005   0.000   0.005
>> edd@rob:~$ )
>>
>> Dirk
>>
>> --
>> https://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]
>>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

luke-tierney
In reply to this post by Andreas Kersting
I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:

> Hi Dirk, hi Luke,
>
> Thanks for checking!
>
> I could narrow it down further. I have the issue only if I install --with-keep.source, i.e.
>
> R CMD INSTALL --with-keep.source dumpTest
>
> Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me.
>
> Could you please confirm? Thanks!
>
> Regards,
> Andreas
>
> 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" <[hidden email]>:
>>
>> On 7 April 2021 at 16:06, Andreas Kersting wrote:
>> | Hi Luke,
>> |
>> | Please see https://github.com/akersting/dumpTest for the package.
>> |
>> | Here a session showing my issue:
>> |
>> | > library(dumpTest)
>> | > sessionInfo()
>> | R version 4.0.5 (2021-03-31)
>> | Platform: x86_64-pc-linux-gnu (64-bit)
>> | Running under: Debian GNU/Linux 10 (buster)
>> |
>> | Matrix products: default
>> | BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
>> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>> |
>> | locale:
>> |  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> |  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> |  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> |  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> |  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> |
>> | attached base packages:
>> | [1] stats     graphics  grDevices utils     datasets  methods   base
>> |
>> | other attached packages:
>> | [1] dumpTest_0.1.0
>> |
>> | loaded via a namespace (and not attached):
>> | [1] compiler_4.0.5
>> | > for (i in 1:100) {
>> | +   print(i)
>> | +   print(system.time(f()))
>> | + }
>> | [1] 1
>> |    user  system elapsed
>> |   0.028   0.004   0.034
>> | [1] 2
>> |    user  system elapsed
>> |   0.067   0.008   0.075
>> | [1] 3
>> |    user  system elapsed
>> |   0.176   0.000   0.176
>> | [1] 4
>> |    user  system elapsed
>> |   0.335   0.012   0.349
>> | [1] 5
>> |    user  system elapsed
>> |   0.745   0.023   0.770
>> | [1] 6
>> |    user  system elapsed
>> |   1.495   0.060   1.572
>> | [1] 7
>> |    user  system elapsed
>> |   2.902   0.136   3.040
>> | [1] 8
>> |    user  system elapsed
>> |   5.753   0.272   6.034
>> | [1] 9
>> |    user  system elapsed
>> |  11.807   0.708  12.597
>> | [1] 10
>> | ^C
>> | Timing stopped at: 6.638 0.549 7.214
>> |
>> | I had to interrupt in iteration 10 because I was running low on RAM.
>>
>> No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
>> off my Debian package, hence instrumentation as in the Debian package.
>>
>> edd@rob:~$ installGithub.r akersting/dumpTest
>> Using github PAT from envvar GITHUB_PAT
>> Downloading GitHub repo akersting/dumpTest@HEAD
>> ✔  checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
>> ─  preparing ‘dumpTest’:
>> ✔  checking DESCRIPTION meta-information ...
>> ─  checking for LF line-endings in source and make files and shell scripts
>> ─  checking for empty or unneeded directories
>> ─  building ‘dumpTest_0.1.0.tar.gz’
>>
>> Installing package into ‘/usr/local/lib/R/site-library’
>> (as ‘lib’ is unspecified)
>> * installing *source* package ‘dumpTest’ ...
>> ** using staged installation
>> ** R
>> ** byte-compile and prepare package for lazy loading
>> ** help
>> No man pages found in package  ‘dumpTest’
>> *** installing help indices
>> ** building package indices
>> ** testing if installed package can be loaded from temporary location
>> ** testing if installed package can be loaded from final location
>> ** testing if installed package keeps a record of temporary installation path
>> * DONE (dumpTest)
>> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
>>    user  system elapsed
>>   0.481   0.019   0.500
>> edd@rob:~$
>>
>> (I also ran the variant you showed with the dual print statements, it just
>> consumes more screen real estate and ends on
>>
>> [...]
>> [1] 97
>>    user  system elapsed
>>   0.004   0.000   0.005
>> [1] 98
>>    user  system elapsed
>>   0.004   0.000   0.005
>> [1] 99
>>    user  system elapsed
>>   0.004   0.000   0.004
>> [1] 100
>>    user  system elapsed
>>   0.005   0.000   0.005
>> edd@rob:~$ )
>>
>> Dirk
>>
>> --
>> https://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]
>>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] memory consumption of nested (un)serialize of sys.frames()

luke-tierney
Looks like this is an unavoidable interaction between the way source
references and lazy loading are implemented. The link back to the
crash_dumps environment comes though source references on an
unevaluated argument promise. Creating a fresh environment is
.onLoad() avoids this and is probably your best bet.

Having an option to serialize without source references might be nice
but would probably not be high enough on anyone's priority list to get
done anytime soon.

Best,

luke

On Thu, 8 Apr 2021, [hidden email] wrote:

> I see that now also. Not sure yet what is going on.
>
> One work-around that may work for you is to create a fresh crash dump
> in a .onLoad function; somehting like
>
> crash_dumps <- NULL
> .onLoad <- function(...) crash_dumps <<- new.env()
>
> Best,
>
> luke
>
> On Wed, 7 Apr 2021, Andreas Kersting wrote:
>
>> Hi Dirk, hi Luke,
>>
>> Thanks for checking!
>>
>> I could narrow it down further. I have the issue only if I install
>> --with-keep.source, i.e.
>>
>> R CMD INSTALL --with-keep.source dumpTest
>>
>> Since this is the default in RStudio when clicking "Install and Restart", I
>> was always having the issue - also from base R. If I install using e.g.
>> devtools::install_github() directly it is also fine for me.
>>
>> Could you please confirm? Thanks!
>>
>> Regards,
>> Andreas
>>
>> 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" <[hidden email]>:
>>>
>>> On 7 April 2021 at 16:06, Andreas Kersting wrote:
>>> | Hi Luke,
>>> |
>>> | Please see https://github.com/akersting/dumpTest for the package.
>>> |
>>> | Here a session showing my issue:
>>> |
>>> | > library(dumpTest)
>>> | > sessionInfo()
>>> | R version 4.0.5 (2021-03-31)
>>> | Platform: x86_64-pc-linux-gnu (64-bit)
>>> | Running under: Debian GNU/Linux 10 (buster)
>>> |
>>> | Matrix products: default
>>> | BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
>>> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>>> |
>>> | locale:
>>> |  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>> |  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>> |  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>> |  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> |  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> |
>>> | attached base packages:
>>> | [1] stats     graphics  grDevices utils     datasets  methods   base
>>> |
>>> | other attached packages:
>>> | [1] dumpTest_0.1.0
>>> |
>>> | loaded via a namespace (and not attached):
>>> | [1] compiler_4.0.5
>>> | > for (i in 1:100) {
>>> | +   print(i)
>>> | +   print(system.time(f()))
>>> | + }
>>> | [1] 1
>>> |    user  system elapsed
>>> |   0.028   0.004   0.034
>>> | [1] 2
>>> |    user  system elapsed
>>> |   0.067   0.008   0.075
>>> | [1] 3
>>> |    user  system elapsed
>>> |   0.176   0.000   0.176
>>> | [1] 4
>>> |    user  system elapsed
>>> |   0.335   0.012   0.349
>>> | [1] 5
>>> |    user  system elapsed
>>> |   0.745   0.023   0.770
>>> | [1] 6
>>> |    user  system elapsed
>>> |   1.495   0.060   1.572
>>> | [1] 7
>>> |    user  system elapsed
>>> |   2.902   0.136   3.040
>>> | [1] 8
>>> |    user  system elapsed
>>> |   5.753   0.272   6.034
>>> | [1] 9
>>> |    user  system elapsed
>>> |  11.807   0.708  12.597
>>> | [1] 10
>>> | ^C
>>> | Timing stopped at: 6.638 0.549 7.214
>>> |
>>> | I had to interrupt in iteration 10 because I was running low on RAM.
>>>
>>> No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
>>> off my Debian package, hence instrumentation as in the Debian package.
>>>
>>> edd@rob:~$ installGithub.r akersting/dumpTest
>>> Using github PAT from envvar GITHUB_PAT
>>> Downloading GitHub repo akersting/dumpTest@HEAD
>>> ✔  checking for file
>>> ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
>>> ─  preparing ‘dumpTest’:
>>> ✔  checking DESCRIPTION meta-information ...
>>> ─  checking for LF line-endings in source and make files and shell scripts
>>> ─  checking for empty or unneeded directories
>>> ─  building ‘dumpTest_0.1.0.tar.gz’
>>>
>>> Installing package into ‘/usr/local/lib/R/site-library’
>>> (as ‘lib’ is unspecified)
>>> * installing *source* package ‘dumpTest’ ...
>>> ** using staged installation
>>> ** R
>>> ** byte-compile and prepare package for lazy loading
>>> ** help
>>> No man pages found in package  ‘dumpTest’
>>> *** installing help indices
>>> ** building package indices
>>> ** testing if installed package can be loaded from temporary location
>>> ** testing if installed package can be loaded from final location
>>> ** testing if installed package keeps a record of temporary installation
>>> path
>>> * DONE (dumpTest)
>>> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
>>>    user  system elapsed
>>>   0.481   0.019   0.500
>>> edd@rob:~$
>>>
>>> (I also ran the variant you showed with the dual print statements, it just
>>> consumes more screen real estate and ends on
>>>
>>> [...]
>>> [1] 97
>>>    user  system elapsed
>>>   0.004   0.000   0.005
>>> [1] 98
>>>    user  system elapsed
>>>   0.004   0.000   0.005
>>> [1] 99
>>>    user  system elapsed
>>>   0.004   0.000   0.004
>>> [1] 100
>>>    user  system elapsed
>>>   0.005   0.000   0.005
>>> edd@rob:~$ )
>>>
>>> Dirk
>>>
>>> --
>>> https://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]
>>>
>
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel