I'm looking for a way to get the length of an object 'x' as given by
base data type without dispatching on class. Something analogous to
how .subset()/.subset2(), e.g. a .length() function. I know that I
can do length(unclass(x)), but that will trigger the creation of a new
object unclass(x) which I want to avoid because 'x' might be very
Here's a dummy example illustrating what I'm trying to get to:
1 8000040 <internal>
In my use case, I have control over neither the class of 'x' (it can
be any class from any package) nor the implementation of length() for
the class. I'm not sure, but in the "old old days", I think I could
have called base::length.default(x) to achieve this. Does anyone know
of a way to infer length(unclass(x)) without going via unclass(x)? I
prefer to do this with the existing R API and not having to implement
it in native code.
> Henrik Bengtsson:
> I'm looking for a way to get the length of an object 'x' as given by
> base data type without dispatching on class.
The performance improvement you're looking for is implemented in the
latest version of pqR (pqR-2016-10-24, see pqR-project.org), along
with corresponding improvements in several other circumstances where
unclass(x) does not create a copy of x.
Here are some examples (starting with yours), using pqR's Rprofmemt
function to get convenient traces of memory allocations:
> Rprofmemt(nelem=1000) # trace allocations of vectors with >= 1000 elements
> x <- structure(double(1e6), class = c("foo", "numeric"))
RPROFMEM: 8000040 (double 1000000):"double" "structure"
RPROFMEM: 8000040 (double 1000000):"structure"
> length.foo <- function(x) 1L
> `+.foo` <- function (e1, e2) (unclass(e1) + unclass(e2)) %% 100
> z <- x + x
RPROFMEM: 8000040 (double 1000000):"+.foo"
> `<.foo` <- function (e1, e2) any(unclass(e1)<unclass(e2))
> y <- unclass(x)
RPROFMEM: 8000040 (double 1000000):
There is no large allocation with length(unclass(x)), and only the
obviously necessarily single allocation in +.foo (not two additional
allocations for unclass(e1) and unclass(e1). For <.foo, there is no
large allocation at all, because not only are allocations avoided for
unclass(e1) and unclass(e2), but 'any' also avoids an allocation for
the result of the comparison. Unfortunately, assigning unclass(x) to
a variable does result in a copy being made (this might often be
avoided in future).
These performance improvements are implemented using pqR's "variant
result" mechanism, which also allows many other optimizations. See