memory footprint of readRDS()

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

memory footprint of readRDS()

Joris FA Meys
Dear all,

I tried to read in a 3.8Gb RDS file on a computer with 16Gb available
memory. To my astonishment, the memory footprint of R rises quickly to over
13Gb and the attempt ends with an error that says "cannot allocate vector
of size 5.8Gb".

I would expect that 3 times the memory would be enough to read in that
file, but apparently I was wrong. I checked the memory.limit() and that one
gave me a value of more than 13Gb. So I wondered if this was to be
expected, or if there could be an underlying reason why this file doesn't
want to open.

Thank you in advance
Joris

--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: memory footprint of readRDS()

braverock
Your RDS file is likely compressed, and could have compression of 10x
or more depending on the composition of the data that is in it and the
compression method used. 'gzip' compression is used by default.

--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock

On Tue, 2018-09-18 at 17:28 +0200, Joris Meys wrote:

> Dear all,
>
> I tried to read in a 3.8Gb RDS file on a computer with 16Gb available
> memory. To my astonishment, the memory footprint of R rises quickly
> to over 13Gb and the attempt ends with an error that says "cannot
> allocate vector of size 5.8Gb".
>
> I would expect that 3 times the memory would be enough to read in
> that file, but apparently I was wrong. I checked the memory.limit()
> and that one gave me a value of more than 13Gb. So I wondered if this
> was to be expected, or if there could be an underlying reason why
> this file doesn't want to open.
>
> Thank you in advance
> Joris
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: memory footprint of readRDS()

R devel mailing list
In reply to this post by Joris FA Meys
The ratio of object size to rds file size depends on the object.  Some
variation is due to how header information is stored in memory and in the
file but I suspect most is due to how compression works (e.g., a vector of
repeated values can be compressed into a smaller file than a bunch of
random bytes).

f <- function (data, ...)  {
    force(data)
    tf <- tempfile()
    on.exit(unlink(tf))
    save(data, file = tf)
    c(`obj/file size` = as.numeric(object.size(data)/file.size(tf)))
}

> f(rep(0,1e6))
obj/file size
     1021.456
> f(rep(0,1e6), compress=FALSE)
obj/file size
    0.9999986
> f(rep(89.7,1e6))
obj/file size
     682.6555
> f(log(1:1e6))
obj/file size
     1.309126
> f(vector("list",1e6))
obj/file size
     2021.744
> f(as.list(log(1:1e6)))
obj/file size
     8.907579
> f(sample(as.raw(0:255),size=8e6,replace=TRUE))
obj/file size
    0.9998433
> f(rep(as.raw(0:255),length=8e6))
obj/file size
     254.5595
> f(as.character(1:1e6))
obj/file size
      23.5567



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Sep 18, 2018 at 8:28 AM, Joris Meys <[hidden email]> wrote:

> Dear all,
>
> I tried to read in a 3.8Gb RDS file on a computer with 16Gb available
> memory. To my astonishment, the memory footprint of R rises quickly to over
> 13Gb and the attempt ends with an error that says "cannot allocate vector
> of size 5.8Gb".
>
> I would expect that 3 times the memory would be enough to read in that
> file, but apparently I was wrong. I checked the memory.limit() and that one
> gave me a value of more than 13Gb. So I wondered if this was to be
> expected, or if there could be an underlying reason why this file doesn't
> want to open.
>
> Thank you in advance
> Joris
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-
> 9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>
> -----------
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: memory footprint of readRDS()

Joris FA Meys
Thx William and Brian for your swift responses, very insightful. I'll have
to hunt for more memory.

Cheers
Joris

On Tue, Sep 18, 2018 at 6:16 PM William Dunlap <[hidden email]> wrote:

> The ratio of object size to rds file size depends on the object.  Some
> variation is due to how header information is stored in memory and in the
> file but I suspect most is due to how compression works (e.g., a vector of
> repeated values can be compressed into a smaller file than a bunch of
> random bytes).
>
> f <- function (data, ...)  {
>     force(data)
>     tf <- tempfile()
>     on.exit(unlink(tf))
>     save(data, file = tf)
>     c(`obj/file size` = as.numeric(object.size(data)/file.size(tf)))
> }
>
> > f(rep(0,1e6))
> obj/file size
>      1021.456
> > f(rep(0,1e6), compress=FALSE)
> obj/file size
>     0.9999986
> > f(rep(89.7,1e6))
> obj/file size
>      682.6555
> > f(log(1:1e6))
> obj/file size
>      1.309126
> > f(vector("list",1e6))
> obj/file size
>      2021.744
> > f(as.list(log(1:1e6)))
> obj/file size
>      8.907579
> > f(sample(as.raw(0:255),size=8e6,replace=TRUE))
> obj/file size
>     0.9998433
> > f(rep(as.raw(0:255),length=8e6))
> obj/file size
>      254.5595
> > f(as.character(1:1e6))
> obj/file size
>       23.5567
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Sep 18, 2018 at 8:28 AM, Joris Meys <[hidden email]> wrote:
>
>> Dear all,
>>
>> I tried to read in a 3.8Gb RDS file on a computer with 16Gb available
>> memory. To my astonishment, the memory footprint of R rises quickly to
>> over
>> 13Gb and the attempt ends with an error that says "cannot allocate vector
>> of size 5.8Gb".
>>
>> I would expect that 3 times the memory would be enough to read in that
>> file, but apparently I was wrong. I checked the memory.limit() and that
>> one
>> gave me a value of more than 13Gb. So I wondered if this was to be
>> expected, or if there could be an underlying reason why this file doesn't
>> want to open.
>>
>> Thank you in advance
>> Joris
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Department of Data Analysis and Mathematical Modelling
>> Ghent University
>> Coupure Links 653, B-9000 Gent (Belgium)
>> <
>> https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g
>> >
>>
>> -----------
>> Biowiskundedagen 2017-2018
>> http://www.biowiskundedagen.ugent.be/
>>
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel