True length - length(unclass(x)) - without having to call unclass()?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

True length - length(unclass(x)) - without having to call unclass()?

Henrik Bengtsson-5
Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:

.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Thxs,

Henrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Dénes Tóth-2
The solution below introduces a dependency on data.table, but otherwise
it does what you need:

---

# special method for Foo objects
length.Foo <- function(x) {
   length(unlist(x, recursive = TRUE, use.names = FALSE))
}

# an instance of a Foo object
x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")

# its length
stopifnot(length(x) == 3L)

# get its length as if it were a standard list
.length <- function(x) {
   cls <- class(x)
   # setattr() does not make a copy, but modifies by reference
   data.table::setattr(x, "class", NULL)
   # get the length
   len <- base::length(x)
   # re-set original classes
   data.table::setattr(x, "class", cls)
   # return the unclassed length
   len
}

# to check that we do not make unwanted changes
orig_class <- class(x)

# check that the address in RAM does not change
a1 <- data.table::address(x)

# 'unclassed' length
stopifnot(.length(x) == 2L)

# check that address is the same
stopifnot(a1 == data.table::address(x))

# check against original class
stopifnot(identical(orig_class, class(x)))

---


On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:

> Is there a low-level function that returns the length of an object 'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
>
> Thxs,
>
> Henrik
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

hadley wickham
In reply to this post by Henrik Bengtsson-5
For the new vctrs::records class, I implemented length, names, [[, and
[[<- myself in https://github.com/r-lib/vctrs/blob/master/src/fields.c.
That lets me override the default S3 methods while still being able to
access the underlying data that I'm interested in.

Another option that avoids (that you should never discuss in public
😉) is temporarily setting the object bit to FALSE.

In the long run, I think an ALTREP vector that exposes the underlying
data of an S3 object (i.e. sans attributes apart from names) is
probably the way forward.

Hadley
On Fri, Aug 24, 2018 at 1:03 PM Henrik Bengtsson
<[hidden email]> wrote:

>
> Is there a low-level function that returns the length of an object 'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
>
> Thxs,
>
> Henrik
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera
In reply to this post by Dénes Tóth-2
Please don't do this to get the underlying vector length (or to achieve
anything else). Setting/deleting attributes of an R object without
checking the reference count violates R semantics, which in turn can
have unpredictable results on R programs (essentially undebuggable
segfaults now or more likely later when new optimizations or features
are added to the language). Setting attributes on objects with reference
count (currently NAMED value) greater than 0 (in some special cases 1 is
ok) is cheating - please see Writing R Extensions - and getting speedups
via cheating leads to fragile, unmaintainable and buggy code. Doing so
in packages is particularly unhelpful to the whole community - packages
should only use the public API as documented.

Similarly, getting a physical address of an object to hack around
whether R has copied it or not should certainly not be done in packages
and R code should never be working with or even obtaining physical
address of an object. This is also why one cannot obtain such address
using base R (apart in textual form from certain diagnostic messages
where it can indeed be useful for low-level debugging).

Tomas

On 09/02/2018 01:19 AM, Dénes Tóth wrote:

> The solution below introduces a dependency on data.table, but
> otherwise it does what you need:
>
> ---
>
> # special method for Foo objects
> length.Foo <- function(x) {
>   length(unlist(x, recursive = TRUE, use.names = FALSE))
> }
>
> # an instance of a Foo object
> x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")
>
> # its length
> stopifnot(length(x) == 3L)
>
> # get its length as if it were a standard list
> .length <- function(x) {
>   cls <- class(x)
>   # setattr() does not make a copy, but modifies by reference
>   data.table::setattr(x, "class", NULL)
>   # get the length
>   len <- base::length(x)
>   # re-set original classes
>   data.table::setattr(x, "class", cls)
>   # return the unclassed length
>   len
> }
>
> # to check that we do not make unwanted changes
> orig_class <- class(x)
>
> # check that the address in RAM does not change
> a1 <- data.table::address(x)
>
> # 'unclassed' length
> stopifnot(.length(x) == 2L)
>
> # check that address is the same
> stopifnot(a1 == data.table::address(x))
>
> # check against original class
> stopifnot(identical(orig_class, class(x)))
>
> ---
>
>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>> Is there a low-level function that returns the length of an object 'x'
>> - the length that for instance .subset(x) and .subset2(x) see? An
>> obvious candidate would be to use:
>>
>> .length <- function(x) length(unclass(x))
>>
>> However, I'm concerned that calling unclass(x) may trigger an
>> expensive copy internally in some cases.  Is that concern unfounded?
>>
>> Thxs,
>>
>> Henrik
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Radford Neal
In reply to this post by Henrik Bengtsson-5
Regarding the discussion of getting length(unclass(x)) without an
unclassed version of x being created...

There are already no copies done for length(unclass(x)) in pqR
(current version of 2017-06-09 at pqR-project.org, as well as the
soon-to-be-release new version).  This is part of a more general
facility for avoiding copies from unclass in other circumstances as
well - eg, unclass(a)+unclass(b).

It's implemented using pqR's internal "variant result" mechanism.
Primitives such as "length" and "+" can ask for their arguments to be
evaluated in such a way that an "unclassed" result is possibly
returned with its class attribute still there, but with a flag set
(not in the object) to indicate that it should be ignored.

The variant result mechanism is also central to many other pqR
improvements, including deferred evaluation to enable automatic use of
multiple cores, and optimizations that allow fast evaluation of things
like any(x<0), any(is.na(x)), or all(is.na(x)) without creation of
intermediate results and with early termination when the result is
determined.

It is much better to use such a general mechanism that speeds up
existing code than to implement more and more special-case functions
like anyNA or some special function to allow length(unclass(x)) to be
done quickly.

The variant result mechanism has extremely low overhead, and is not
hard to implement.

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Dénes Tóth-2
In reply to this post by Tomas Kalibera
Hi Tomas,

On 09/03/2018 11:49 AM, Tomas Kalibera wrote:
> Please don't do this to get the underlying vector length (or to achieve
> anything else). Setting/deleting attributes of an R object without
> checking the reference count violates R semantics, which in turn can
> have unpredictable results on R programs (essentially undebuggable
> segfaults now or more likely later when new optimizations or features
> are added to the language). Setting attributes on objects with reference
> count (currently NAMED value) greater than 0 (in some special cases 1 is
> ok) is cheating - please see Writing R Extensions - and getting speedups
> via cheating leads to fragile, unmaintainable and buggy code.

Please note that data.table::setattr is an exported function of a widely
used package (available from CRAN), which also has a description in
?data.table::setattr why it might be useful.

Of course one has to use set* functions from data.table with extreme
care, but if one does it in the right way, they can help a lot. For
example there is no real danger of using them in internal functions
where one can control what is get passed to the function or created
within the function (so when one knows that the refcount==0 condition is
true).

(Notwithstanding the above, but also supporting you argumentation, it
took me hours to debug a particular problem in one of my internal
packages, see https://github.com/Rdatatable/data.table/issues/1281)

In the present case, an important and unanswered question is (cited from
Henrik):
 >>> However, I'm concerned that calling unclass(x) may trigger an
 >>> expensive copy internally in some cases.  Is that concern unfounded?

If no copy is made, length(unclass(x)) beats length(setattr(..)) in all
scenarios.


> Doing so
> in packages is particularly unhelpful to the whole community - packages
> should only use the public API as documented.
>
> Similarly, getting a physical address of an object to hack around
> whether R has copied it or not should certainly not be done in packages
> and R code should never be working with or even obtaining physical
> address of an object. This is also why one cannot obtain such address
> using base R (apart in textual form from certain diagnostic messages
> where it can indeed be useful for low-level debugging).

Getting the physical address of the object was done exclusively for
demonstration purposes. I totally agree that is should not be used for
the purpose you described and I have never ever done so.

Regards,
Denes

>
> Tomas
>
> On 09/02/2018 01:19 AM, Dénes Tóth wrote:
>> The solution below introduces a dependency on data.table, but
>> otherwise it does what you need:
>>
>> ---
>>
>> # special method for Foo objects
>> length.Foo <- function(x) {
>>   length(unlist(x, recursive = TRUE, use.names = FALSE))
>> }
>>
>> # an instance of a Foo object
>> x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")
>>
>> # its length
>> stopifnot(length(x) == 3L)
>>
>> # get its length as if it were a standard list
>> .length <- function(x) {
>>   cls <- class(x)
>>   # setattr() does not make a copy, but modifies by reference
>>   data.table::setattr(x, "class", NULL)
>>   # get the length
>>   len <- base::length(x)
>>   # re-set original classes
>>   data.table::setattr(x, "class", cls)
>>   # return the unclassed length
>>   len
>> }
>>
>> # to check that we do not make unwanted changes
>> orig_class <- class(x)
>>
>> # check that the address in RAM does not change
>> a1 <- data.table::address(x)
>>
>> # 'unclassed' length
>> stopifnot(.length(x) == 2L)
>>
>> # check that address is the same
>> stopifnot(a1 == data.table::address(x))
>>
>> # check against original class
>> stopifnot(identical(orig_class, class(x)))
>>
>> ---
>>
>>
>> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>>> Is there a low-level function that returns the length of an object 'x'
>>> - the length that for instance .subset(x) and .subset2(x) see? An
>>> obvious candidate would be to use:
>>>
>>> .length <- function(x) length(unclass(x))
>>>
>>> However, I'm concerned that calling unclass(x) may trigger an
>>> expensive copy internally in some cases.  Is that concern unfounded?
>>>
>>> Thxs,
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera
On 09/03/2018 03:59 PM, Dénes Tóth wrote:

> Hi Tomas,
>
> On 09/03/2018 11:49 AM, Tomas Kalibera wrote:
>> Please don't do this to get the underlying vector length (or to
>> achieve anything else). Setting/deleting attributes of an R object
>> without checking the reference count violates R semantics, which in
>> turn can have unpredictable results on R programs (essentially
>> undebuggable segfaults now or more likely later when new
>> optimizations or features are added to the language). Setting
>> attributes on objects with reference count (currently NAMED value)
>> greater than 0 (in some special cases 1 is ok) is cheating - please
>> see Writing R Extensions - and getting speedups via cheating leads to
>> fragile, unmaintainable and buggy code.
>
Hi Denes,

> Please note that data.table::setattr is an exported function of a
> widely used package (available from CRAN), which also has a
> description in ?data.table::setattr why it might be useful.
indeed, and not your fault, but the function is cheating and that it is
in a widely used package, even exported from it, does not make it any
safer. The related optimization in base R (shallow copying) mentioned in
the documentation of data.table::setattr is on the other hand sound, it
does not break the semantics.
> Of course one has to use set* functions from data.table with extreme
> care, but if one does it in the right way, they can help a lot. For
> example there is no real danger of using them in internal functions
> where one can control what is get passed to the function or created
> within the function (so when one knows that the refcount==0 condition
> is true).
Extreme care is not enough as the internals can and do change (and with
the limits given by documentation, they are likely to change soon wrt to
NAMED/reference counting), not mentioning that they are very
complicated. The approach of "modify in place because we know the
reference count is 0" is particularly error prone and unnecessary. It is
unnecessary because there is documented C API for legitimate use in
packages to find out whether an object may be referenced/shared
(indirectly checks the reference count). If not, it can be modified in
place without cheating, and some packages do it. It is error prone
because the reference count can change due to many things package
developers cannot be expected to know (and again, these things change):
in set* functions for example, it will never be 0 (!), these functions
with their current API can never be implemented in current R without
breaking the semantics.

In principle one can do similar things legitimately by wrapping objects
in an environment, passing such environment (environments can
legitimately be modified in place), checking the contained objects have
reference count of 1 (not shared), and if so, modifying them in place.
But indeed, as soon as such objects become shared, there is no way out,
one has to copy (in the current R).

Best
Tomas

> (Notwithstanding the above, but also supporting you argumentation, it
> took me hours to debug a particular problem in one of my internal
> packages, see https://github.com/Rdatatable/data.table/issues/1281)
>
> In the present case, an important and unanswered question is (cited
> from Henrik):
> >>> However, I'm concerned that calling unclass(x) may trigger an
> >>> expensive copy internally in some cases.  Is that concern unfounded?
>
> If no copy is made, length(unclass(x)) beats length(setattr(..)) in
> all scenarios.
>
>
>> Doing so in packages is particularly unhelpful to the whole community
>> - packages should only use the public API as documented.
>>
>> Similarly, getting a physical address of an object to hack around
>> whether R has copied it or not should certainly not be done in
>> packages and R code should never be working with or even obtaining
>> physical address of an object. This is also why one cannot obtain
>> such address using base R (apart in textual form from certain
>> diagnostic messages where it can indeed be useful for low-level
>> debugging).
>
> Getting the physical address of the object was done exclusively for
> demonstration purposes. I totally agree that is should not be used for
> the purpose you described and I have never ever done so.
>
> Regards,
> Denes
>
>>
>> Tomas
>>
>> On 09/02/2018 01:19 AM, Dénes Tóth wrote:
>>> The solution below introduces a dependency on data.table, but
>>> otherwise it does what you need:
>>>
>>> ---
>>>
>>> # special method for Foo objects
>>> length.Foo <- function(x) {
>>>   length(unlist(x, recursive = TRUE, use.names = FALSE))
>>> }
>>>
>>> # an instance of a Foo object
>>> x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo")
>>>
>>> # its length
>>> stopifnot(length(x) == 3L)
>>>
>>> # get its length as if it were a standard list
>>> .length <- function(x) {
>>>   cls <- class(x)
>>>   # setattr() does not make a copy, but modifies by reference
>>>   data.table::setattr(x, "class", NULL)
>>>   # get the length
>>>   len <- base::length(x)
>>>   # re-set original classes
>>>   data.table::setattr(x, "class", cls)
>>>   # return the unclassed length
>>>   len
>>> }
>>>
>>> # to check that we do not make unwanted changes
>>> orig_class <- class(x)
>>>
>>> # check that the address in RAM does not change
>>> a1 <- data.table::address(x)
>>>
>>> # 'unclassed' length
>>> stopifnot(.length(x) == 2L)
>>>
>>> # check that address is the same
>>> stopifnot(a1 == data.table::address(x))
>>>
>>> # check against original class
>>> stopifnot(identical(orig_class, class(x)))
>>>
>>> ---
>>>
>>>
>>> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>>>> Is there a low-level function that returns the length of an object 'x'
>>>> - the length that for instance .subset(x) and .subset2(x) see? An
>>>> obvious candidate would be to use:
>>>>
>>>> .length <- function(x) length(unclass(x))
>>>>
>>>> However, I'm concerned that calling unclass(x) may trigger an
>>>> expensive copy internally in some cases.  Is that concern unfounded?
>>>>
>>>> Thxs,
>>>>
>>>> Henrik
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera
In reply to this post by Henrik Bengtsson-5
On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> Is there a low-level function that returns the length of an object 'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
Unclass() will always copy when "x" is really a variable, because the
value in "x" will be referenced; whether it is prohibitively expensive
or not depends only on the workload - if "x" is a very long list and
this functions is called often then it could, but at least to me this
sounds unlikely. Unless you have a strong reason to believe it is the
case I would just use length(unclass(x)).

If the copying is really a problem, I would think about why the
underlying vector length is needed at R level - whether you really need
to know the length without actually having the unclassed vector anyway
for something else, so whether you are not paying for the copy anyway.
Or, from the other end, if you need to do more without copying, and it
is possible without breaking the value semantics, then you might need to
switch to C anyway and for a bigger piece of code.

If it were still just .length() you needed and it were performance
critical, you could just switch to C and call Rf_length. That does not
violate the semantics, just indeed it is not elegant as you are
switching to C.

If you stick to R and can live with the overhead of length(unclass(x))
then there is a chance the overhead will decrease as R is optimized
internally. This is possible in principle when the runtime knows that
the unclassed vector is only needed to compute something that does not
modify the vector. The current R cannot optimize this out, but it should
be possible with ALTREP at some point (and as Radford mentioned pqR does
it differently). Even with such internal optimizations indeed it is
often necessary to make guesses about realistic workloads, so if you
have a realistic workload where say length(unclass(x)) is critical, you
are more than welcome to donate it as benchmark.

Obviously, if you use a C version calling Rf_length, after such R
optimization your code would be unnecessarily non-elegant, but would
still work and probably without overhead, because R can't do much less
than Rf_length. In more complicated cases though hand-optimized C code
to implement say 2 operations in sequence could be slower than what
better optimizing runtime could do by joining the effect of possibly
more operations, which is in principle another danger of switching from
R to C. But as far as the semantics is followed, there is no other danger.

The temptation should be small anyway in this case when Rf_length()
would be the simplest, but as I made it more than clear in the previous
email, one should never violate the value semantics by temporarily
modifying the object (temporarily removing the class attribute or
temporarily remove the object bit). Violating semantics causes bugs, if
not with the present then with future versions of R (where version may
be an svn revision). A concrete recent example: modifying objects in
place in violation of the semantics caused a lot of bugs with
introduction of unification of constants in the byte-code compiler.

Best
Tomas

>
> Thxs,
>
> Henrik
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Iñaki Ucar
The bottomline here is that one can always call a base method,
inexpensively and without modifying the object, in, let's say,
*formal* OOP languages. In R, this is not possible in general. It
would be possible if there was always a foo.default, but primitives
use internal dispatch.

I was wondering whether it would be possible to provide a super(x, n)
function which simply causes the dispatching system to avoid "n"
classes in the hierarchy, so that:

> x <- structure(list(), class=c("foo", "bar"))
> length(super(x, 0)) # looks for a length.foo
> length(super(x, 1)) # looks for a length.bar
> length(super(x, 2)) # calls the default
> length(super(x, Inf)) # calls the default

Iñaki

El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
(<[hidden email]>) escribió:

>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> > Is there a low-level function that returns the length of an object 'x'
> > - the length that for instance .subset(x) and .subset2(x) see? An
> > obvious candidate would be to use:
> >
> > .length <- function(x) length(unclass(x))
> >
> > However, I'm concerned that calling unclass(x) may trigger an
> > expensive copy internally in some cases.  Is that concern unfounded?
> Unclass() will always copy when "x" is really a variable, because the
> value in "x" will be referenced; whether it is prohibitively expensive
> or not depends only on the workload - if "x" is a very long list and
> this functions is called often then it could, but at least to me this
> sounds unlikely. Unless you have a strong reason to believe it is the
> case I would just use length(unclass(x)).
>
> If the copying is really a problem, I would think about why the
> underlying vector length is needed at R level - whether you really need
> to know the length without actually having the unclassed vector anyway
> for something else, so whether you are not paying for the copy anyway.
> Or, from the other end, if you need to do more without copying, and it
> is possible without breaking the value semantics, then you might need to
> switch to C anyway and for a bigger piece of code.
>
> If it were still just .length() you needed and it were performance
> critical, you could just switch to C and call Rf_length. That does not
> violate the semantics, just indeed it is not elegant as you are
> switching to C.
>
> If you stick to R and can live with the overhead of length(unclass(x))
> then there is a chance the overhead will decrease as R is optimized
> internally. This is possible in principle when the runtime knows that
> the unclassed vector is only needed to compute something that does not
> modify the vector. The current R cannot optimize this out, but it should
> be possible with ALTREP at some point (and as Radford mentioned pqR does
> it differently). Even with such internal optimizations indeed it is
> often necessary to make guesses about realistic workloads, so if you
> have a realistic workload where say length(unclass(x)) is critical, you
> are more than welcome to donate it as benchmark.
>
> Obviously, if you use a C version calling Rf_length, after such R
> optimization your code would be unnecessarily non-elegant, but would
> still work and probably without overhead, because R can't do much less
> than Rf_length. In more complicated cases though hand-optimized C code
> to implement say 2 operations in sequence could be slower than what
> better optimizing runtime could do by joining the effect of possibly
> more operations, which is in principle another danger of switching from
> R to C. But as far as the semantics is followed, there is no other danger.
>
> The temptation should be small anyway in this case when Rf_length()
> would be the simplest, but as I made it more than clear in the previous
> email, one should never violate the value semantics by temporarily
> modifying the object (temporarily removing the class attribute or
> temporarily remove the object bit). Violating semantics causes bugs, if
> not with the present then with future versions of R (where version may
> be an svn revision). A concrete recent example: modifying objects in
> place in violation of the semantics caused a lot of bugs with
> introduction of unification of constants in the byte-code compiler.
>
> Best
> Tomas
>
> >
> > Thxs,
> >
> > Henrik
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
Iñaki Ucar

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Kevin Ushey
More generally, I think one of the issues is that R is not yet able to
decrement a reference count (or mark a 'shared' data object as
'unshared' after it knows only one binding to it exists). This means
passing variables to R closures will mark that object as shared:

    x <- list()
    .Internal(inspect(x))  # NAM(1)
    identity(x)
    .Internal(inspect(x))  # NAM(3)

I think for this reason users often resort to 'hacks' that involve
directly setting attributes on the object, since they 'know' only one
reference to a particular object exists. I'm not sure if this really
is 'safe', though -- likely not given potential future optimizations
to R, as Tomas has alluded to.

I think true reference counting has been implemented in the R sources,
but the switch has not yet been flipped to enable that by default.
Hopefully having that will make cases like the above work as expected?

Thanks,
Kevin

On Wed, Sep 5, 2018 at 2:19 AM Iñaki Ucar <[hidden email]> wrote:

>
> The bottomline here is that one can always call a base method,
> inexpensively and without modifying the object, in, let's say,
> *formal* OOP languages. In R, this is not possible in general. It
> would be possible if there was always a foo.default, but primitives
> use internal dispatch.
>
> I was wondering whether it would be possible to provide a super(x, n)
> function which simply causes the dispatching system to avoid "n"
> classes in the hierarchy, so that:
>
> > x <- structure(list(), class=c("foo", "bar"))
> > length(super(x, 0)) # looks for a length.foo
> > length(super(x, 1)) # looks for a length.bar
> > length(super(x, 2)) # calls the default
> > length(super(x, Inf)) # calls the default
>
> Iñaki
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

luke-tierney
On Wed, 5 Sep 2018, Kevin Ushey wrote:

> More generally, I think one of the issues is that R is not yet able to
> decrement a reference count (or mark a 'shared' data object as
> 'unshared' after it knows only one binding to it exists). This means
> passing variables to R closures will mark that object as shared:
>
>    x <- list()
>    .Internal(inspect(x))  # NAM(1)
>    identity(x)
>    .Internal(inspect(x))  # NAM(3)
>
> I think for this reason users often resort to 'hacks' that involve
> directly setting attributes on the object, since they 'know' only one
> reference to a particular object exists. I'm not sure if this really
> is 'safe', though -- likely not given potential future optimizations
> to R, as Tomas has alluded to.
>
> I think true reference counting has been implemented in the R sources,
> but the switch has not yet been flipped to enable that by default.
> Hopefully having that will make cases like the above work as expected?

Current R-devel built with reference counting by setting

CFLAGS="-O3 -g -Wall -pedantic -DSWITCH_TO_REFCNT"

gives


x <- list()
.Internal(inspect(x))
## @55ad788e3b28 19 VECSXP g0c0 [REF(1)] (len=0, tl=0)
identity(x)
## list()
.Internal(inspect(x))
## @55ad788e3b28 19 VECSXP g0c0 [REF(1)] (len=0, tl=0)

I'm moderately hopeful we'll be able to switch to this for 3.6.0 but
depends on finding enough time to sort out some loose ends.

Best,

luke

>
> Thanks,
> Kevin
>
> On Wed, Sep 5, 2018 at 2:19 AM Iñaki Ucar <[hidden email]> wrote:
>>
>> The bottomline here is that one can always call a base method,
>> inexpensively and without modifying the object, in, let's say,
>> *formal* OOP languages. In R, this is not possible in general. It
>> would be possible if there was always a foo.default, but primitives
>> use internal dispatch.
>>
>> I was wondering whether it would be possible to provide a super(x, n)
>> function which simply causes the dispatching system to avoid "n"
>> classes in the hierarchy, so that:
>>
>>> x <- structure(list(), class=c("foo", "bar"))
>>> length(super(x, 0)) # looks for a length.foo
>>> length(super(x, 1)) # looks for a length.bar
>>> length(super(x, 2)) # calls the default
>>> length(super(x, Inf)) # calls the default
>>
>> Iñaki
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera
In reply to this post by Iñaki Ucar
On 09/05/2018 11:18 AM, Iñaki Ucar wrote:

> The bottomline here is that one can always call a base method,
> inexpensively and without modifying the object, in, let's say,
> *formal* OOP languages. In R, this is not possible in general. It
> would be possible if there was always a foo.default, but primitives
> use internal dispatch.
>
> I was wondering whether it would be possible to provide a super(x, n)
> function which simply causes the dispatching system to avoid "n"
> classes in the hierarchy, so that:
>
>> x <- structure(list(), class=c("foo", "bar"))
>> length(super(x, 0)) # looks for a length.foo
>> length(super(x, 1)) # looks for a length.bar
>> length(super(x, 2)) # calls the default
>> length(super(x, Inf)) # calls the default
I think that a cast should always to be for a specific class, defined by
the name of the class. Identifying classes by their inheritance index
might be unnecessarily brittle - it would break if someone introduced a
new ancestor class. Apart from the syntax - supporting fast casts for S3
dispatch in the current implementation would be quite a bit of work,
probably not worth it, also it would probably slow down the internal
dispatch in primitives. But a partial solution could be implemented at
some point with ALTREP wrappers when one could without copying create a
wrapper object with a modified class attribute.

Tomas

> Iñaki
>
> El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
> (<[hidden email]>) escribió:
>> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>>> Is there a low-level function that returns the length of an object 'x'
>>> - the length that for instance .subset(x) and .subset2(x) see? An
>>> obvious candidate would be to use:
>>>
>>> .length <- function(x) length(unclass(x))
>>>
>>> However, I'm concerned that calling unclass(x) may trigger an
>>> expensive copy internally in some cases.  Is that concern unfounded?
>> Unclass() will always copy when "x" is really a variable, because the
>> value in "x" will be referenced; whether it is prohibitively expensive
>> or not depends only on the workload - if "x" is a very long list and
>> this functions is called often then it could, but at least to me this
>> sounds unlikely. Unless you have a strong reason to believe it is the
>> case I would just use length(unclass(x)).
>>
>> If the copying is really a problem, I would think about why the
>> underlying vector length is needed at R level - whether you really need
>> to know the length without actually having the unclassed vector anyway
>> for something else, so whether you are not paying for the copy anyway.
>> Or, from the other end, if you need to do more without copying, and it
>> is possible without breaking the value semantics, then you might need to
>> switch to C anyway and for a bigger piece of code.
>>
>> If it were still just .length() you needed and it were performance
>> critical, you could just switch to C and call Rf_length. That does not
>> violate the semantics, just indeed it is not elegant as you are
>> switching to C.
>>
>> If you stick to R and can live with the overhead of length(unclass(x))
>> then there is a chance the overhead will decrease as R is optimized
>> internally. This is possible in principle when the runtime knows that
>> the unclassed vector is only needed to compute something that does not
>> modify the vector. The current R cannot optimize this out, but it should
>> be possible with ALTREP at some point (and as Radford mentioned pqR does
>> it differently). Even with such internal optimizations indeed it is
>> often necessary to make guesses about realistic workloads, so if you
>> have a realistic workload where say length(unclass(x)) is critical, you
>> are more than welcome to donate it as benchmark.
>>
>> Obviously, if you use a C version calling Rf_length, after such R
>> optimization your code would be unnecessarily non-elegant, but would
>> still work and probably without overhead, because R can't do much less
>> than Rf_length. In more complicated cases though hand-optimized C code
>> to implement say 2 operations in sequence could be slower than what
>> better optimizing runtime could do by joining the effect of possibly
>> more operations, which is in principle another danger of switching from
>> R to C. But as far as the semantics is followed, there is no other danger.
>>
>> The temptation should be small anyway in this case when Rf_length()
>> would be the simplest, but as I made it more than clear in the previous
>> email, one should never violate the value semantics by temporarily
>> modifying the object (temporarily removing the class attribute or
>> temporarily remove the object bit). Violating semantics causes bugs, if
>> not with the present then with future versions of R (where version may
>> be an svn revision). A concrete recent example: modifying objects in
>> place in violation of the semantics caused a lot of bugs with
>> introduction of unification of constants in the byte-code compiler.
>>
>> Best
>> Tomas
>>
>>> Thxs,
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: True length - length(unclass(x)) - without having to call unclass()?

Iñaki Ucar
El lun., 10 sept. 2018 a las 14:18, Tomas Kalibera
(<[hidden email]>) escribió:

>
> On 09/05/2018 11:18 AM, Iñaki Ucar wrote:
> > The bottomline here is that one can always call a base method,
> > inexpensively and without modifying the object, in, let's say,
> > *formal* OOP languages. In R, this is not possible in general. It
> > would be possible if there was always a foo.default, but primitives
> > use internal dispatch.
> >
> > I was wondering whether it would be possible to provide a super(x, n)
> > function which simply causes the dispatching system to avoid "n"
> > classes in the hierarchy, so that:
> >
> >> x <- structure(list(), class=c("foo", "bar"))
> >> length(super(x, 0)) # looks for a length.foo
> >> length(super(x, 1)) # looks for a length.bar
> >> length(super(x, 2)) # calls the default
> >> length(super(x, Inf)) # calls the default
> I think that a cast should always to be for a specific class, defined by
> the name of the class. Identifying classes by their inheritance index
> might be unnecessarily brittle - it would break if someone introduced a
> new ancestor class.

Agree. But just wanted to point out that, then, something like
super(x, "default") should always work to point to default methods,
even if a method is internal and there's no foo.default defined.
Otherwise, we would have the same problem.

Iñaki

> Apart from the syntax - supporting fast casts for S3
> dispatch in the current implementation would be quite a bit of work,
> probably not worth it, also it would probably slow down the internal
> dispatch in primitives. But a partial solution could be implemented at
> some point with ALTREP wrappers when one could without copying create a
> wrapper object with a modified class attribute.
>
> Tomas
> > Iñaki
> >
> > El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
> > (<[hidden email]>) escribió:
> >> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> >>> Is there a low-level function that returns the length of an object 'x'
> >>> - the length that for instance .subset(x) and .subset2(x) see? An
> >>> obvious candidate would be to use:
> >>>
> >>> .length <- function(x) length(unclass(x))
> >>>
> >>> However, I'm concerned that calling unclass(x) may trigger an
> >>> expensive copy internally in some cases.  Is that concern unfounded?
> >> Unclass() will always copy when "x" is really a variable, because the
> >> value in "x" will be referenced; whether it is prohibitively expensive
> >> or not depends only on the workload - if "x" is a very long list and
> >> this functions is called often then it could, but at least to me this
> >> sounds unlikely. Unless you have a strong reason to believe it is the
> >> case I would just use length(unclass(x)).
> >>
> >> If the copying is really a problem, I would think about why the
> >> underlying vector length is needed at R level - whether you really need
> >> to know the length without actually having the unclassed vector anyway
> >> for something else, so whether you are not paying for the copy anyway.
> >> Or, from the other end, if you need to do more without copying, and it
> >> is possible without breaking the value semantics, then you might need to
> >> switch to C anyway and for a bigger piece of code.
> >>
> >> If it were still just .length() you needed and it were performance
> >> critical, you could just switch to C and call Rf_length. That does not
> >> violate the semantics, just indeed it is not elegant as you are
> >> switching to C.
> >>
> >> If you stick to R and can live with the overhead of length(unclass(x))
> >> then there is a chance the overhead will decrease as R is optimized
> >> internally. This is possible in principle when the runtime knows that
> >> the unclassed vector is only needed to compute something that does not
> >> modify the vector. The current R cannot optimize this out, but it should
> >> be possible with ALTREP at some point (and as Radford mentioned pqR does
> >> it differently). Even with such internal optimizations indeed it is
> >> often necessary to make guesses about realistic workloads, so if you
> >> have a realistic workload where say length(unclass(x)) is critical, you
> >> are more than welcome to donate it as benchmark.
> >>
> >> Obviously, if you use a C version calling Rf_length, after such R
> >> optimization your code would be unnecessarily non-elegant, but would
> >> still work and probably without overhead, because R can't do much less
> >> than Rf_length. In more complicated cases though hand-optimized C code
> >> to implement say 2 operations in sequence could be slower than what
> >> better optimizing runtime could do by joining the effect of possibly
> >> more operations, which is in principle another danger of switching from
> >> R to C. But as far as the semantics is followed, there is no other danger.
> >>
> >> The temptation should be small anyway in this case when Rf_length()
> >> would be the simplest, but as I made it more than clear in the previous
> >> email, one should never violate the value semantics by temporarily
> >> modifying the object (temporarily removing the class attribute or
> >> temporarily remove the object bit). Violating semantics causes bugs, if
> >> not with the present then with future versions of R (where version may
> >> be an svn revision). A concrete recent example: modifying objects in
> >> place in violation of the semantics caused a lot of bugs with
> >> introduction of unification of constants in the byte-code compiler.
> >>
> >> Best
> >> Tomas
> >>
> >>> Thxs,
> >>>
> >>> Henrik
> >>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel