length(unclass(x)) without unclass(x)?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

length(unclass(x)) without unclass(x)?

Henrik Bengtsson-5
I'm looking for a way to get the length of an object 'x' as given by
base data type without dispatching on class.  Something analogous to
how .subset()/.subset2(), e.g. a .length() function.  I know that I
can do length(unclass(x)), but that will trigger the creation of a new
object unclass(x) which I want to avoid because 'x' might be very
large.

Here's a dummy example illustrating what I'm trying to get to:

> x <- structure(double(1e6), class = c("foo", "numeric"))
> length.foo <- function(x) 1L
> length(x)
[1] 1
> length(unclass(x))
[1] 1000000

but the latter call will cause an internal memory allocation:

> profmem::profmem(length(unclass(x)))
Rprofmem memory profiling of:
length(unclass(x))

Memory allocations:
        bytes      calls
1     8000040 <internal>
total 8000040

In my use case, I have control over neither the class of 'x' (it can
be any class from any package) nor the implementation of length() for
the class.  I'm not sure, but in the "old old days", I think I could
have called base::length.default(x) to achieve this.  Does anyone know
of a way to infer length(unclass(x)) without going via unclass(x)?  I
prefer to do this with the existing R API and not having to implement
it in native code.

Thanks,

Henrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: length(unclass(x)) without unclass(x)?

Radford Neal
> Henrik Bengtsson:
>
> I'm looking for a way to get the length of an object 'x' as given by
> base data type without dispatching on class.


The performance improvement you're looking for is implemented in the
latest version of pqR (pqR-2016-10-24, see pqR-project.org), along
with corresponding improvements in several other circumstances where
unclass(x) does not create a copy of x.

Here are some examples (starting with yours), using pqR's Rprofmemt
function to get convenient traces of memory allocations:

  > Rprofmemt(nelem=1000)  # trace allocations of vectors with >= 1000 elements
  >
  > x <- structure(double(1e6), class = c("foo", "numeric"))
  RPROFMEM: 8000040 (double 1000000):"double" "structure"
  RPROFMEM: 8000040 (double 1000000):"structure"
  > length.foo <- function(x) 1L
  > length(x)
  [1] 1
  > length(unclass(x))
  [1] 1000000
  >
  > `+.foo` <- function (e1, e2) (unclass(e1) + unclass(e2)) %% 100
  > z <- x + x
  RPROFMEM: 8000040 (double 1000000):"+.foo"
  >
  > `<.foo` <- function (e1, e2) any(unclass(e1)<unclass(e2))
  > x<x
  [1] FALSE
  >
  > y <- unclass(x)
  RPROFMEM: 8000040 (double 1000000):

There is no large allocation with length(unclass(x)), and only the
obviously necessarily single allocation in +.foo (not two additional
allocations for unclass(e1) and unclass(e1).  For <.foo, there is no
large allocation at all, because not only are allocations avoided for
unclass(e1) and unclass(e2), but 'any' also avoids an allocation for
the result of the comparison.  Unfortunately, assigning unclass(x) to
a variable does result in a copy being made (this might often be
avoided in future).

These performance improvements are implemented using pqR's "variant
result" mechanism, which also allows many other optimizations.  See

https://radfordneal.wordpress.com/2013/06/30/how-pqr-makes-programs-faster-by-not-doing-things/

for some explanation.  There is no particular reason this mechanism
couldn't be incorporated into R Core's implementation of R.

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...