Puzzled about a new method for "[".

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Puzzled about a new method for "[".

Rolf Turner

I recently tried to write a new method for "[", to be applied to data
frames, so that the object returned would retain (all) attributes of the
columns, including attributes that my code had created.

I thrashed around for quite a while, and then got some help from Rui
Barradas who showed me how to do it, in the following manner:

`[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
length(cols) == 1)[{
    SaveAt <- lapply(x, attributes)
    x <- NextMethod()
    lX <- lapply(names(x),function(nm, x, Sat){
      attributes(x[[nm]]) <- Sat[[nm]]
      x[[nm]]}, x = x, Sat = SaveAt)
    names(lX) <- names(x)
    x <- as.data.frame(lX)
    x
}

If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
something like X[1:42,]) the attributes are retained as desired.

OK.  All good.  Now we finally come to my question!  I want to put this
new method into a package that I am building.  When I build the package
and run R CMD check I get a complaint:

... no visible binding for global variable ‘cols’

And indeed, there is no such variable.  At first I thought that maybe
the code should be

`[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
                                       length(j) == 1)[{

But I looked at "[.data.frame" and it has "cols" too; not "j".

So why doesn't "[.data.frame" throw a warning when R gets built?

Can someone please explain to me what's going on here?

cheers,

Rolf

P. S. I amended the code for my method, replacing "cols" by "j", and it
*seems* to run, and deliver the desired results.  (And the package
checks, without complaint.) I am nervous, however, that there may be
some Trap for Young Players that I don't perceive, lurking about and
waiting to cause problems for me.

R.

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Iñaki Ucar
On Sun, 3 Nov 2019 at 22:12, Rolf Turner <[hidden email]> wrote:

>
>
> I recently tried to write a new method for "[", to be applied to data
> frames, so that the object returned would retain (all) attributes of the
> columns, including attributes that my code had created.
>
> I thrashed around for quite a while, and then got some help from Rui
> Barradas who showed me how to do it, in the following manner:
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
> length(cols) == 1)[{
>     SaveAt <- lapply(x, attributes)
>     x <- NextMethod()
>     lX <- lapply(names(x),function(nm, x, Sat){
>       attributes(x[[nm]]) <- Sat[[nm]]
>       x[[nm]]}, x = x, Sat = SaveAt)
>     names(lX) <- names(x)
>     x <- as.data.frame(lX)
>     x
> }
>
> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
> something like X[1:42,]) the attributes are retained as desired.
>
> OK.  All good.  Now we finally come to my question!  I want to put this
> new method into a package that I am building.  When I build the package
> and run R CMD check I get a complaint:
>
> ... no visible binding for global variable ‘cols’
>
> And indeed, there is no such variable.  At first I thought that maybe
> the code should be
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>                                        length(j) == 1)[{
>
> But I looked at "[.data.frame" and it has "cols" too; not "j".
>
> So why doesn't "[.data.frame" throw a warning when R gets built?
>
> Can someone please explain to me what's going on here?

The thing is...

test <- function(x = y * 2) {
  y <- 1
  x
}

test()
# 2

Lazy evaluation magic.

Iñaki

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Duncan Murdoch-2
In reply to this post by Rolf Turner
On 03/11/2019 4:11 p.m., Rolf Turner wrote:

>
> I recently tried to write a new method for "[", to be applied to data
> frames, so that the object returned would retain (all) attributes of the
> columns, including attributes that my code had created.
>
> I thrashed around for quite a while, and then got some help from Rui
> Barradas who showed me how to do it, in the following manner:
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
> length(cols) == 1)[{
>      SaveAt <- lapply(x, attributes)
>      x <- NextMethod()
>      lX <- lapply(names(x),function(nm, x, Sat){
>        attributes(x[[nm]]) <- Sat[[nm]]
>        x[[nm]]}, x = x, Sat = SaveAt)
>      names(lX) <- names(x)
>      x <- as.data.frame(lX)
>      x
> }
>
> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
> something like X[1:42,]) the attributes are retained as desired.
>
> OK.  All good.  Now we finally come to my question!  I want to put this
> new method into a package that I am building.  When I build the package
> and run R CMD check I get a complaint:
>
> ... no visible binding for global variable ‘cols’
>
> And indeed, there is no such variable.  At first I thought that maybe
> the code should be
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>                                         length(j) == 1)[{
>
> But I looked at "[.data.frame" and it has "cols" too; not "j".
>
> So why doesn't "[.data.frame" throw a warning when R gets built?
>
> Can someone please explain to me what's going on here?

Defaults for parameters are evaluated in the evaluation frame of the
function, at the time the parameter is first used.

If you look at the source for "[.data.frame", you should see that "cols"
is defined there as a local variable.  The "drop" argument shouldn't be
used until it is.  (There's a call to "missing(drop)" early in the
source that doesn't count:  it doesn't evaluate "drop", it just checks
whether it is specified by the caller.)

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Rolf Turner

On 4/11/19 10:31 AM, Duncan Murdoch wrote:

> On 03/11/2019 4:11 p.m., Rolf Turner wrote:
>>
>> I recently tried to write a new method for "[", to be applied to data
>> frames, so that the object returned would retain (all) attributes of the
>> columns, including attributes that my code had created.
>>
>> I thrashed around for quite a while, and then got some help from Rui
>> Barradas who showed me how to do it, in the following manner:
>>
>> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>> length(cols) == 1)[{
>>      SaveAt <- lapply(x, attributes)
>>      x <- NextMethod()
>>      lX <- lapply(names(x),function(nm, x, Sat){
>>        attributes(x[[nm]]) <- Sat[[nm]]
>>        x[[nm]]}, x = x, Sat = SaveAt)
>>      names(lX) <- names(x)
>>      x <- as.data.frame(lX)
>>      x
>> }
>>
>> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
>> something like X[1:42,]) the attributes are retained as desired.
>>
>> OK.  All good.  Now we finally come to my question!  I want to put this
>> new method into a package that I am building.  When I build the package
>> and run R CMD check I get a complaint:
>>
>> ... no visible binding for global variable ‘cols’
>>
>> And indeed, there is no such variable.  At first I thought that maybe
>> the code should be
>>
>> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>>                                         length(j) == 1)[{
>>
>> But I looked at "[.data.frame" and it has "cols" too; not "j".
>>
>> So why doesn't "[.data.frame" throw a warning when R gets built?
>>
>> Can someone please explain to me what's going on here?
>
> Defaults for parameters are evaluated in the evaluation frame of the
> function, at the time the parameter is first used.
>
> If you look at the source for "[.data.frame", you should see that "cols"
> is defined there as a local variable.  The "drop" argument shouldn't be
> used until it is.  (There's a call to "missing(drop)" early in the
> source that doesn't count:  it doesn't evaluate "drop", it just checks
> whether it is specified by the caller.)


OK.  As I understand what you're saying, the reason there isn't a
"no visible binding" problem in [.data.frame is that "cols" *is* defined
in the body of the function.  Whereas, in my method, "cols" does not get
defined anywhere in the function, and thus triggers the warning.

I guess that a workaround would be to do a dummy assignment, like unto
cols <- 42 at the start of the code for my method.

(a) Are there perils involved with this strategy?

(b) Is there anything wrong with my current strategy of replacing

    drop = if (missing(i)) TRUE else length(cols) == 1)

by

    drop = if (missing(i)) TRUE else length(j) == 1)

???

As I said, this *seems* to work OK, by I cannot work through what the
implications might be.

Can anyone reassure me?

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Duncan Murdoch-2
On 03/11/2019 6:43 p.m., Rolf Turner wrote:

>
> On 4/11/19 10:31 AM, Duncan Murdoch wrote:
>
>> On 03/11/2019 4:11 p.m., Rolf Turner wrote:
>>>
>>> I recently tried to write a new method for "[", to be applied to data
>>> frames, so that the object returned would retain (all) attributes of the
>>> columns, including attributes that my code had created.
>>>
>>> I thrashed around for quite a while, and then got some help from Rui
>>> Barradas who showed me how to do it, in the following manner:
>>>
>>> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>>> length(cols) == 1)[{
>>>       SaveAt <- lapply(x, attributes)
>>>       x <- NextMethod()
>>>       lX <- lapply(names(x),function(nm, x, Sat){
>>>         attributes(x[[nm]]) <- Sat[[nm]]
>>>         x[[nm]]}, x = x, Sat = SaveAt)
>>>       names(lX) <- names(x)
>>>       x <- as.data.frame(lX)
>>>       x
>>> }
>>>
>>> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
>>> something like X[1:42,]) the attributes are retained as desired.
>>>
>>> OK.  All good.  Now we finally come to my question!  I want to put this
>>> new method into a package that I am building.  When I build the package
>>> and run R CMD check I get a complaint:
>>>
>>> ... no visible binding for global variable ‘cols’
>>>
>>> And indeed, there is no such variable.  At first I thought that maybe
>>> the code should be
>>>
>>> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>>>                                          length(j) == 1)[{
>>>
>>> But I looked at "[.data.frame" and it has "cols" too; not "j".
>>>
>>> So why doesn't "[.data.frame" throw a warning when R gets built?
>>>
>>> Can someone please explain to me what's going on here?
>>
>> Defaults for parameters are evaluated in the evaluation frame of the
>> function, at the time the parameter is first used.
>>
>> If you look at the source for "[.data.frame", you should see that "cols"
>> is defined there as a local variable.  The "drop" argument shouldn't be
>> used until it is.  (There's a call to "missing(drop)" early in the
>> source that doesn't count:  it doesn't evaluate "drop", it just checks
>> whether it is specified by the caller.)
>
>
> OK.  As I understand what you're saying, the reason there isn't a
> "no visible binding" problem in [.data.frame is that "cols" *is* defined
> in the body of the function.  Whereas, in my method, "cols" does not get
> defined anywhere in the function, and thus triggers the warning.
>
> I guess that a workaround would be to do a dummy assignment, like unto
> cols <- 42 at the start of the code for my method.
>
> (a) Are there perils involved with this strategy?

Only that 42 might not be the right value.

>
> (b) Is there anything wrong with my current strategy of replacing
>
>      drop = if (missing(i)) TRUE else length(cols) == 1)
>
> by
>
>      drop = if (missing(i)) TRUE else length(j) == 1)

[.data.frame is pretty complicated, and I haven't read it closely enough
to know if this is equivalent.  I would suggest you consider not
including "drop" at all, just implicitly including it in "..." .

Duncan Murdoch

>
> ???
>
> As I said, this *seems* to work OK, by I cannot work through what the
> implications might be.
>
> Can anyone reassure me?
>
> cheers,
>
> Rolf
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Rolf Turner

On 4/11/19 1:06 PM, Duncan Murdoch wrote:

> On 03/11/2019 6:43 p.m., Rolf Turner wrote:
>>
>> On 4/11/19 10:31 AM, Duncan Murdoch wrote:
>>
>>> On 03/11/2019 4:11 p.m., Rolf Turner wrote:
>>>>
>>>> I recently tried to write a new method for "[", to be applied to data
>>>> frames, so that the object returned would retain (all) attributes of
>>>> the
>>>> columns, including attributes that my code had created.
>>>>
>>>> I thrashed around for quite a while, and then got some help from Rui
>>>> Barradas who showed me how to do it, in the following manner:
>>>>
>>>> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>>>> length(cols) == 1)[{
>>>>       SaveAt <- lapply(x, attributes)
>>>>       x <- NextMethod()
>>>>       lX <- lapply(names(x),function(nm, x, Sat){
>>>>         attributes(x[[nm]]) <- Sat[[nm]]
>>>>         x[[nm]]}, x = x, Sat = SaveAt)
>>>>       names(lX) <- names(x)
>>>>       x <- as.data.frame(lX)
>>>>       x
>>>> }
>>>>
>>>> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
>>>> something like X[1:42,]) the attributes are retained as desired.
>>>>
>>>> OK.  All good.  Now we finally come to my question!  I want to put this
>>>> new method into a package that I am building.  When I build the package
>>>> and run R CMD check I get a complaint:
>>>>
>>>> ... no visible binding for global variable ‘cols’
>>>>
>>>> And indeed, there is no such variable.  At first I thought that maybe
>>>> the code should be
>>>>
>>>> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>>>>                                          length(j) == 1)[{
>>>>
>>>> But I looked at "[.data.frame" and it has "cols" too; not "j".
>>>>
>>>> So why doesn't "[.data.frame" throw a warning when R gets built?
>>>>
>>>> Can someone please explain to me what's going on here?
>>>
>>> Defaults for parameters are evaluated in the evaluation frame of the
>>> function, at the time the parameter is first used.
>>>
>>> If you look at the source for "[.data.frame", you should see that "cols"
>>> is defined there as a local variable.  The "drop" argument shouldn't be
>>> used until it is.  (There's a call to "missing(drop)" early in the
>>> source that doesn't count:  it doesn't evaluate "drop", it just checks
>>> whether it is specified by the caller.)
>>
>>
>> OK.  As I understand what you're saying, the reason there isn't a
>> "no visible binding" problem in [.data.frame is that "cols" *is* defined
>> in the body of the function.  Whereas, in my method, "cols" does not get
>> defined anywhere in the function, and thus triggers the warning.
>>
>> I guess that a workaround would be to do a dummy assignment, like unto
>> cols <- 42 at the start of the code for my method.
>>
>> (a) Are there perils involved with this strategy?
>
> Only that 42 might not be the right value.
>
>>
>> (b) Is there anything wrong with my current strategy of replacing
>>
>>      drop = if (missing(i)) TRUE else length(cols) == 1)
>>
>> by
>>
>>      drop = if (missing(i)) TRUE else length(j) == 1)
>
> [.data.frame is pretty complicated, and I haven't read it closely enough
> to know if this is equivalent.  I would suggest you consider not
> including "drop" at all, just implicitly including it in "..." .

OK.  I'll try that!

Thanks.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

hadley wickham
In reply to this post by Rolf Turner
For what it's worth, I don't think this strategy can work in general,
because a class might have attributes that depend on its data/contents
(e.g. https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum). I
don't think these are particularly common in practice, but it's
dangerous to assume that you can restore a class simply by restoring
its attributes after subsetting.

Hadley

On Sun, Nov 3, 2019 at 3:11 PM Rolf Turner <[hidden email]> wrote:

>
>
> I recently tried to write a new method for "[", to be applied to data
> frames, so that the object returned would retain (all) attributes of the
> columns, including attributes that my code had created.
>
> I thrashed around for quite a while, and then got some help from Rui
> Barradas who showed me how to do it, in the following manner:
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
> length(cols) == 1)[{
>     SaveAt <- lapply(x, attributes)
>     x <- NextMethod()
>     lX <- lapply(names(x),function(nm, x, Sat){
>       attributes(x[[nm]]) <- Sat[[nm]]
>       x[[nm]]}, x = x, Sat = SaveAt)
>     names(lX) <- names(x)
>     x <- as.data.frame(lX)
>     x
> }
>
> If I set class(X) <- c("myclass",class(X)) and apply "[" to X (e.g.
> something like X[1:42,]) the attributes are retained as desired.
>
> OK.  All good.  Now we finally come to my question!  I want to put this
> new method into a package that I am building.  When I build the package
> and run R CMD check I get a complaint:
>
> ... no visible binding for global variable ‘cols’
>
> And indeed, there is no such variable.  At first I thought that maybe
> the code should be
>
> `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
>                                        length(j) == 1)[{
>
> But I looked at "[.data.frame" and it has "cols" too; not "j".
>
> So why doesn't "[.data.frame" throw a warning when R gets built?
>
> Can someone please explain to me what's going on here?
>
> cheers,
>
> Rolf
>
> P. S. I amended the code for my method, replacing "cols" by "j", and it
> *seems* to run, and deliver the desired results.  (And the package
> checks, without complaint.) I am nervous, however, that there may be
> some Trap for Young Players that I don't perceive, lurking about and
> waiting to cause problems for me.
>
> R.
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Rolf Turner

On 5/11/19 3:41 AM, Hadley Wickham wrote:

> For what it's worth, I don't think this strategy can work in general,
> because a class might have attributes that depend on its data/contents
> (e.g. https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum). I
> don't think these are particularly common in practice, but it's
> dangerous to assume that you can restore a class simply by restoring
> its attributes after subsetting.


You're probably right that there are lurking perils in general, but I am
not trying to "restore a class".  I simply want to *retain* attributes
of columns in a data frame.

* I have a data frame X
* I attach attributes to certain of its columns;
      attr(X$melvin,"clyde") <- 42
   (I *don't* change the class of X$melvin.)
* I form a subset of X:
     Y <- X[1:100,3:10]
* given that "melvin" is amongst columns 3 through 10 of X,
     I want Y$melvin to retain the attribute "clyde", i.e. I
     want attr(Y$melvin,"clyde") to return 42

There is almost surely a better approach than the one that I've chosen
(isn't there always?) but it seems to work, and the perils certainly are
not immediately apparent to me.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Pages, Herve
Hi Rolf,

On 11/4/19 12:28, Rolf Turner wrote:

>
> On 5/11/19 3:41 AM, Hadley Wickham wrote:
>
>> For what it's worth, I don't think this strategy can work in general,
>> because a class might have attributes that depend on its data/contents
>> (e.g.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__vctrs.r-2Dlib.org_articles_s3-2Dvector.html-23cached-2Dsum&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pqLHzHYLUeyQnxA1K_XhSbKJql6r9wK1RXcDG2tuZ6s&s=kPUlNqBPr6j4lPvqkIj8w2Gl5JYGLqJ7ws6wH5tpGcw&e= 
>> ). I
>> don't think these are particularly common in practice, but it's
>> dangerous to assume that you can restore a class simply by restoring
>> its attributes after subsetting.
>
>
> You're probably right that there are lurking perils in general, but I am
> not trying to "restore a class".  I simply want to *retain* attributes
> of columns in a data frame.
>
> * I have a data frame X
> * I attach attributes to certain of its columns;
>       attr(X$melvin,"clyde") <- 42
>    (I *don't* change the class of X$melvin.)
> * I form a subset of X:
>      Y <- X[1:100,3:10]
> * given that "melvin" is amongst columns 3 through 10 of X,
>      I want Y$melvin to retain the attribute "clyde", i.e. I
>      want attr(Y$melvin,"clyde") to return 42
>
> There is almost surely a better approach than the one that I've chosen
> (isn't there always?) but it seems to work, and the perils certainly are
> not immediately apparent to me.

Maybe you've solved the problem for the columns that contain your
objects but now you've introduced a potential problem for columns that
contain objects with attributes whose value depend on content.

Hadley it right that restoring the original attributes of a vector (list
or atomic) after subsetting is unsafe.

Best,
H.

>
> cheers,
>
> Rolf
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

R devel mailing list
In reply to this post by Rolf Turner
> the perils certainly are not immediately apparent to me.

Here is a concrete example of a peril
 `[.myclass` <- function(x, i, j, drop = if (missing(i)) TRUE else
length(cols) == 1)
   {
       SaveAt <- lapply(x, attributes)
       x <- NextMethod()
       lX <- lapply(names(x),function(nm, x, Sat){
         attributes(x[[nm]]) <- Sat[[nm]]
         x[[nm]]}, x = x, Sat = SaveAt)
       names(lX) <- names(x)
       x <- as.data.frame(lX)
       x
   }

 x <- data.frame(Mat=I(matrix(101:106,ncol=2)), Vec=201:203)
 xmc <- structure(x, class=c("myclass", class(x)))
 xmc[1:2,]
Error in attributes(x[[nm]]) <- Sat[[nm]] :
  dims [product 6] do not match the length of object [4]
 x[1:2,]
  Mat.1 Mat.2 Vec
1   101   104 201
2   102   105 202

I would be surprised if extracting a column from some rows of a data.frame
gave a different result than extracting some rows from a column of a
data.frame.  The row-selecting method used by [.data.frame depends on the
class of the column.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Nov 4, 2019 at 12:28 PM Rolf Turner <[hidden email]> wrote:

>
> On 5/11/19 3:41 AM, Hadley Wickham wrote:
>
> > For what it's worth, I don't think this strategy can work in general,
> > because a class might have attributes that depend on its data/contents
> > (e.g. https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum). I
> > don't think these are particularly common in practice, but it's
> > dangerous to assume that you can restore a class simply by restoring
> > its attributes after subsetting.
>
>
> You're probably right that there are lurking perils in general, but I am
> not trying to "restore a class".  I simply want to *retain* attributes
> of columns in a data frame.
>
> * I have a data frame X
> * I attach attributes to certain of its columns;
>       attr(X$melvin,"clyde") <- 42
>    (I *don't* change the class of X$melvin.)
> * I form a subset of X:
>      Y <- X[1:100,3:10]
> * given that "melvin" is amongst columns 3 through 10 of X,
>      I want Y$melvin to retain the attribute "clyde", i.e. I
>      want attr(Y$melvin,"clyde") to return 42
>
> There is almost surely a better approach than the one that I've chosen
> (isn't there always?) but it seems to work, and the perils certainly are
> not immediately apparent to me.
>
> cheers,
>
> Rolf
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Duncan Murdoch-2
In reply to this post by Pages, Herve
On 04/11/2019 4:40 p.m., Pages, Herve wrote:

> Hi Rolf,
>
> On 11/4/19 12:28, Rolf Turner wrote:
>>
>> On 5/11/19 3:41 AM, Hadley Wickham wrote:
>>
>>> For what it's worth, I don't think this strategy can work in general,
>>> because a class might have attributes that depend on its data/contents
>>> (e.g.
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__vctrs.r-2Dlib.org_articles_s3-2Dvector.html-23cached-2Dsum&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pqLHzHYLUeyQnxA1K_XhSbKJql6r9wK1RXcDG2tuZ6s&s=kPUlNqBPr6j4lPvqkIj8w2Gl5JYGLqJ7ws6wH5tpGcw&e=
>>> ). I
>>> don't think these are particularly common in practice, but it's
>>> dangerous to assume that you can restore a class simply by restoring
>>> its attributes after subsetting.
>>
>>
>> You're probably right that there are lurking perils in general, but I am
>> not trying to "restore a class".  I simply want to *retain* attributes
>> of columns in a data frame.
>>
>> * I have a data frame X
>> * I attach attributes to certain of its columns;
>>        attr(X$melvin,"clyde") <- 42
>>     (I *don't* change the class of X$melvin.)
>> * I form a subset of X:
>>       Y <- X[1:100,3:10]
>> * given that "melvin" is amongst columns 3 through 10 of X,
>>       I want Y$melvin to retain the attribute "clyde", i.e. I
>>       want attr(Y$melvin,"clyde") to return 42
>>
>> There is almost surely a better approach than the one that I've chosen
>> (isn't there always?) but it seems to work, and the perils certainly are
>> not immediately apparent to me.
>
> Maybe you've solved the problem for the columns that contain your
> objects but now you've introduced a potential problem for columns that
> contain objects with attributes whose value depend on content.
>
> Hadley it right that restoring the original attributes of a vector (list
> or atomic) after subsetting is unsafe.

Right, so Rolf should only restore attributes that are ones he added in
the first place.  Unknown attributes should be left alone.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Rolf Turner
On 5/11/19 10:54 AM, Duncan Murdoch wrote:

> On 04/11/2019 4:40 p.m., Pages, Herve wrote:
>> Hi Rolf,
>>
>> On 11/4/19 12:28, Rolf Turner wrote:
>>>
>>> On 5/11/19 3:41 AM, Hadley Wickham wrote:
>>>
>>>> For what it's worth, I don't think this strategy can work in general,
>>>> because a class might have attributes that depend on its data/contents
>>>> (e.g.
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__vctrs.r-2Dlib.org_articles_s3-2Dvector.html-23cached-2Dsum&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pqLHzHYLUeyQnxA1K_XhSbKJql6r9wK1RXcDG2tuZ6s&s=kPUlNqBPr6j4lPvqkIj8w2Gl5JYGLqJ7ws6wH5tpGcw&e= 
>>>>
>>>> ). I
>>>> don't think these are particularly common in practice, but it's
>>>> dangerous to assume that you can restore a class simply by restoring
>>>> its attributes after subsetting.
>>>
>>>
>>> You're probably right that there are lurking perils in general, but I am
>>> not trying to "restore a class".  I simply want to *retain* attributes
>>> of columns in a data frame.
>>>
>>> * I have a data frame X
>>> * I attach attributes to certain of its columns;
>>>        attr(X$melvin,"clyde") <- 42
>>>     (I *don't* change the class of X$melvin.)
>>> * I form a subset of X:
>>>       Y <- X[1:100,3:10]
>>> * given that "melvin" is amongst columns 3 through 10 of X,
>>>       I want Y$melvin to retain the attribute "clyde", i.e. I
>>>       want attr(Y$melvin,"clyde") to return 42
>>>
>>> There is almost surely a better approach than the one that I've chosen
>>> (isn't there always?) but it seems to work, and the perils certainly are
>>> not immediately apparent to me.
>>
>> Maybe you've solved the problem for the columns that contain your
>> objects but now you've introduced a potential problem for columns that
>> contain objects with attributes whose value depend on content.
>>
>> Hadley it right that restoring the original attributes of a vector (list
>> or atomic) after subsetting is unsafe.
>
> Right, so Rolf should only restore attributes that are ones he added in
> the first place.  Unknown attributes should be left alone.

Fair point.  And that gets fiddly.  I guess I'm going to have to rethink
my strategy.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Pages, Herve
In reply to this post by Duncan Murdoch-2


On 11/4/19 13:54, Duncan Murdoch wrote:

> On 04/11/2019 4:40 p.m., Pages, Herve wrote:
>> Hi Rolf,
>>
>> On 11/4/19 12:28, Rolf Turner wrote:
>>>
>>> On 5/11/19 3:41 AM, Hadley Wickham wrote:
>>>
>>>> For what it's worth, I don't think this strategy can work in general,
>>>> because a class might have attributes that depend on its data/contents
>>>> (e.g.
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__vctrs.r-2Dlib.org_articles_s3-2Dvector.html-23cached-2Dsum&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pqLHzHYLUeyQnxA1K_XhSbKJql6r9wK1RXcDG2tuZ6s&s=kPUlNqBPr6j4lPvqkIj8w2Gl5JYGLqJ7ws6wH5tpGcw&e= 
>>>>
>>>> ). I
>>>> don't think these are particularly common in practice, but it's
>>>> dangerous to assume that you can restore a class simply by restoring
>>>> its attributes after subsetting.
>>>
>>>
>>> You're probably right that there are lurking perils in general, but I am
>>> not trying to "restore a class".  I simply want to *retain* attributes
>>> of columns in a data frame.
>>>
>>> * I have a data frame X
>>> * I attach attributes to certain of its columns;
>>>        attr(X$melvin,"clyde") <- 42
>>>     (I *don't* change the class of X$melvin.)
>>> * I form a subset of X:
>>>       Y <- X[1:100,3:10]
>>> * given that "melvin" is amongst columns 3 through 10 of X,
>>>       I want Y$melvin to retain the attribute "clyde", i.e. I
>>>       want attr(Y$melvin,"clyde") to return 42
>>>
>>> There is almost surely a better approach than the one that I've chosen
>>> (isn't there always?) but it seems to work, and the perils certainly are
>>> not immediately apparent to me.
>>
>> Maybe you've solved the problem for the columns that contain your
>> objects but now you've introduced a potential problem for columns that
>> contain objects with attributes whose value depend on content.
>>
>> Hadley it right that restoring the original attributes of a vector (list
>> or atomic) after subsetting is unsafe.
>
> Right, so Rolf should only restore attributes that are ones he added in
> the first place.  Unknown attributes should be left alone.

Exactly. More precisely the problem needs to be tackled at the level of
his objects (i.e. define a [ method for his objects that preserves the
attributes) and not at the level of the [ method for data frames. The [
method for data frames will call his [ method when needed.

H.

>
> Duncan Murdoch

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Iñaki Ucar
In reply to this post by Rolf Turner
You can try for testing with a column of class errors, from the package
'errors'. The attributes depend on the content in the way Hadley pointed
out.

Iñaki

El lun., 4 nov. 2019 23:19, Rolf Turner <[hidden email]> escribió:

> On 5/11/19 10:54 AM, Duncan Murdoch wrote:
> > On 04/11/2019 4:40 p.m., Pages, Herve wrote:
> >> Hi Rolf,
> >>
> >> On 11/4/19 12:28, Rolf Turner wrote:
> >>>
> >>> On 5/11/19 3:41 AM, Hadley Wickham wrote:
> >>>
> >>>> For what it's worth, I don't think this strategy can work in general,
> >>>> because a class might have attributes that depend on its data/contents
> >>>> (e.g.
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__vctrs.r-2Dlib.org_articles_s3-2Dvector.html-23cached-2Dsum&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=pqLHzHYLUeyQnxA1K_XhSbKJql6r9wK1RXcDG2tuZ6s&s=kPUlNqBPr6j4lPvqkIj8w2Gl5JYGLqJ7ws6wH5tpGcw&e=
> >>>>
> >>>> ). I
> >>>> don't think these are particularly common in practice, but it's
> >>>> dangerous to assume that you can restore a class simply by restoring
> >>>> its attributes after subsetting.
> >>>
> >>>
> >>> You're probably right that there are lurking perils in general, but I
> am
> >>> not trying to "restore a class".  I simply want to *retain* attributes
> >>> of columns in a data frame.
> >>>
> >>> * I have a data frame X
> >>> * I attach attributes to certain of its columns;
> >>>        attr(X$melvin,"clyde") <- 42
> >>>     (I *don't* change the class of X$melvin.)
> >>> * I form a subset of X:
> >>>       Y <- X[1:100,3:10]
> >>> * given that "melvin" is amongst columns 3 through 10 of X,
> >>>       I want Y$melvin to retain the attribute "clyde", i.e. I
> >>>       want attr(Y$melvin,"clyde") to return 42
> >>>
> >>> There is almost surely a better approach than the one that I've chosen
> >>> (isn't there always?) but it seems to work, and the perils certainly
> are
> >>> not immediately apparent to me.
> >>
> >> Maybe you've solved the problem for the columns that contain your
> >> objects but now you've introduced a potential problem for columns that
> >> contain objects with attributes whose value depend on content.
> >>
> >> Hadley it right that restoring the original attributes of a vector (list
> >> or atomic) after subsetting is unsafe.
> >
> > Right, so Rolf should only restore attributes that are ones he added in
> > the first place.  Unknown attributes should be left alone.
>
> Fair point.  And that gets fiddly.  I guess I'm going to have to rethink
> my strategy.
>
> cheers,
>
> Rolf
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Puzzled about a new method for "[".

Rolf Turner
On 5/11/19 9:37 PM, Iñaki Ucar wrote:
> You can try for testing with a column of class errors, from the package
> 'errors'. The attributes depend on the content in the way Hadley pointed
> out.

Thanks, but it turns out to be much simpler than that.  There is a very
easy way to accomplish what I want --- simply give an attribute to the
data frame, rather than to a certain column of that data frame.  I don't
know why the hell I didn't do that in the first place!  Duh!!!

Sorry for all the noise that this issue has generated.

cheers,

Rolf

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel