Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

Travers Ching

Below is a toy alt-rep string example, that generates N random strings:

https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c

example:
`x <- altrandomStrings(1e8)`
`head(x)`
[1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
`object.size(1e8)`

Object.size will call the `set_altstring_Elt_method` for every single
element, materializing (slowly) every element of the vector.  This is
a problem mostly in R-studio since object.size is called
automatically, defeating the purpose of alt-rep entirely.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

Tierney, Luke
You should really take this up with RStudio. Calling object.size on
every top level assignment as they appear to do is a bad idea, even
without ALTREP. object.size is only a cheap operation for simple
atomic vectors. For anything with recursive sturcture it needs to walk
the object, so the effort is proprtional to object size:

> x <- rep("A", 1e8)
> system.time(object.size(x))
    user  system elapsed
   1.222   0.624   1.850
> x <- rep(list(1), 1e8)
> system.time(object.size(x))
    user  system elapsed
   1.247   0.022   1.273

The current help for object.size says

      Provides an estimate of the memory that is being used to store an
      R object.

If this is interpreted as the current memory use, which could change
in the ALTREP context (or for environments, though there the changes
are ignored), then we could define object.size for ALTREP objects to
avoid any ALTREP-specific computation. I'm not convinced yet that this
is a good idea, but it even if we do change this at the R level,
RStudio would still be well-advised to have another look at what they
are doing.

Best,

luke

On Tue, 15 Jan 2019, Travers Ching wrote:

>
> Below is a toy alt-rep string example, that generates N random strings:
>
> https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c
>
> example:
> `x <- altrandomStrings(1e8)`
> `head(x)`
> [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
> `object.size(1e8)`
>
> Object.size will call the `set_altstring_Elt_method` for every single
> element, materializing (slowly) every element of the vector.  This is
> a problem mostly in R-studio since object.size is called
> automatically, defeating the purpose of alt-rep entirely.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Object.size() should not visit every element for alt-rep strings, or there should be an altstring_objectsize_method

Travers Ching
Hi Lujke,

Thanks for the response.  But for some reason, this is a duplicate
post I had sent WEEKS ago, but for some reason is only showing up now?
 I initially thought it was filtered out and detected as spam because
of the github link, so I re-wrote the email (several times in fact),
and you can see the other thread.   Very weird.

Also, the good people at rstudio seem to have fixed the issue!

Thanks
Travers

On Thu, Jan 31, 2019 at 5:35 AM Tierney, Luke <[hidden email]> wrote:

>
> You should really take this up with RStudio. Calling object.size on
> every top level assignment as they appear to do is a bad idea, even
> without ALTREP. object.size is only a cheap operation for simple
> atomic vectors. For anything with recursive sturcture it needs to walk
> the object, so the effort is proprtional to object size:
>
> > x <- rep("A", 1e8)
> > system.time(object.size(x))
>     user  system elapsed
>    1.222   0.624   1.850
> > x <- rep(list(1), 1e8)
> > system.time(object.size(x))
>     user  system elapsed
>    1.247   0.022   1.273
>
> The current help for object.size says
>
>       Provides an estimate of the memory that is being used to store an
>       R object.
>
> If this is interpreted as the current memory use, which could change
> in the ALTREP context (or for environments, though there the changes
> are ignored), then we could define object.size for ALTREP objects to
> avoid any ALTREP-specific computation. I'm not convinced yet that this
> is a good idea, but it even if we do change this at the R level,
> RStudio would still be well-advised to have another look at what they
> are doing.
>
> Best,
>
> luke
>
> On Tue, 15 Jan 2019, Travers Ching wrote:
>
> >
> > Below is a toy alt-rep string example, that generates N random strings:
> >
> > https://gist.github.com/traversc/a48a504eb062554f2d6ff8043ca16f9c
> >
> > example:
> > `x <- altrandomStrings(1e8)`
> > `head(x)`
> > [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ...
> > `object.size(1e8)`
> >
> > Object.size will call the `set_altstring_Elt_method` for every single
> > element, materializing (slowly) every element of the vector.  This is
> > a problem mostly in R-studio since object.size is called
> > automatically, defeating the purpose of alt-rep entirely.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   [hidden email]
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel