backquotes and term.labels

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

backquotes and term.labels

R devel mailing list
A user reported a problem with the survdiff function and the use of variables that contain
a space.  Here is a simple example.  The same issue occurs in survfit for the same reason.

lung2 <- lung
names(lung2)[1] <- "in st"   # old name is inst
survdiff(Surv(time, status) ~ `in st`, data=lung2)
Error in `[.data.frame`(m, ll) : undefined columns selected

In the body of the code the program want to send all of the right-hand side variables
forward to the strata() function.  The code looks more or less like this, where m is the
model frame

   Terms <- terms(m)
   index <- attr(Terms, "term.labels")
   if (length(index) ==0)  X <- rep(1L, n)  # no coariates
   else X <- strata(m[index])

For the variable with a space in the name the term.label is "`in st`", and the subscript
fails.

Is this intended behaviour or a bug?  The issue is that the name of this column in the
model frame does not have the backtics, while the terms structure does have them.

Terry T.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: backquotes and term.labels

R devel mailing list
I believe this has to do terms() making "term.labels" (hence the dimnames
of "factors")
with deparse(), so that the backquotes are included for non-syntactic
names.  The backquotes
are not in the column names of the input data.frame (nor model frame) so
you get a mismatch
when subscripting the data.frame or model.frame with elements of
terms()$term.labels.

I think you can avoid the problem by adding right after
    ll <- attr(Terms, "term.labels")
the line
    ll <- gsub("^`|`$", "", ll)

E.g.,

> d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y
z`=cos(1:5)+2)
> Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
> m <- model.frame(Terms, data=d)
> colnames(m)
[1] "y"            "log(`b$a$d`)" "x y z"
> attr(Terms, "term.labels")
[1] "log(`b$a$d`)" "`x y z`"
>   ll <- attr(Terms, "term.labels")
> gsub("^`|`$", "", ll)
[1] "log(`b$a$d`)" "x y z"

It is a bit of a mess.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel <
[hidden email]> wrote:

> A user reported a problem with the survdiff function and the use of
> variables that contain a space.  Here is a simple example.  The same issue
> occurs in survfit for the same reason.
>
> lung2 <- lung
> names(lung2)[1] <- "in st"   # old name is inst
> survdiff(Surv(time, status) ~ `in st`, data=lung2)
> Error in `[.data.frame`(m, ll) : undefined columns selected
>
> In the body of the code the program want to send all of the right-hand
> side variables forward to the strata() function.  The code looks more or
> less like this, where m is the model frame
>
>   Terms <- terms(m)
>   index <- attr(Terms, "term.labels")
>   if (length(index) ==0)  X <- rep(1L, n)  # no coariates
>   else X <- strata(m[index])
>
> For the variable with a space in the name the term.label is "`in st`", and
> the subscript fails.
>
> Is this intended behaviour or a bug?  The issue is that the name of this
> column in the model frame does not have the backtics, while the terms
> structure does have them.
>
> Terry T.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: backquotes and term.labels

R devel mailing list
Thanks to Bill Dunlap for the clarification.  On follow-up it turns out that this will be
an issue for many if not most of the routines in the survival package: a lot of them look
at the terms structure and make use of the dimnames of attr(terms, 'factors'), which also
keeps the unneeded backquotes.  Others use the term.labels attribute.  To dodge this I
will need to create a fixterms() routine which I call at the top of every single routine
in the library.

Is there a chance for a fix at a higher level?

Terry T.



On 03/05/2018 03:55 PM, William Dunlap wrote:

> I believe this has to do terms() making "term.labels" (hence the dimnames of "factors")
> with deparse(), so that the backquotes are included for non-syntactic names.  The backquotes
> are not in the column names of the input data.frame (nor model frame) so you get a mismatch
> when subscripting the data.frame or model.frame with elements of terms()$term.labels.
>
> I think you can avoid the problem by adding right after
>      ll <- attr(Terms, "term.labels")
> the line
>      ll <- gsub("^`|`$", "", ll)
>
> E.g.,
>
>  > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y z`=cos(1:5)+2)
>  > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
>  > m <- model.frame(Terms, data=d)
>  > colnames(m)
> [1] "y"            "log(`b$a$d`)" "x y z"
>  > attr(Terms, "term.labels")
> [1] "log(`b$a$d`)" "`x y z`"
>  >   ll <- attr(Terms, "term.labels")
>  > gsub("^`|`$", "", ll)
> [1] "log(`b$a$d`)" "x y z"
>
> It is a bit of a mess.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     A user reported a problem with the survdiff function and the use of variables that
>     contain a space.  Here is a simple example.  The same issue occurs in survfit for the
>     same reason.
>
>     lung2 <- lung
>     names(lung2)[1] <- "in st"   # old name is inst
>     survdiff(Surv(time, status) ~ `in st`, data=lung2)
>     Error in `[.data.frame`(m, ll) : undefined columns selected
>
>     In the body of the code the program want to send all of the right-hand side variables
>     forward to the strata() function.  The code looks more or less like this, where m is
>     the model frame
>
>        Terms <- terms(m)
>        index <- attr(Terms, "term.labels")
>        if (length(index) ==0)  X <- rep(1L, n)  # no coariates
>        else X <- strata(m[index])
>
>     For the variable with a space in the name the term.label is "`in st`", and the
>     subscript fails.
>
>     Is this intended behaviour or a bug?  The issue is that the name of this column in the
>     model frame does not have the backtics, while the terms structure does have them.
>
>     Terry T.
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: backquotes and term.labels

bbolker
I knew I had seen this before but couldn't previously remember where.
https://github.com/lme4/lme4/issues/441 ... I initially fixed with
gsub(), but (pushed by Martin Maechler to do better) I eventually
fixed it by storing the original names of the model frame (without
backticks) as an attribute for later retrieval:
https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.


On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel
<[hidden email]> wrote:

> Thanks to Bill Dunlap for the clarification.  On follow-up it turns out that
> this will be an issue for many if not most of the routines in the survival
> package: a lot of them look at the terms structure and make use of the
> dimnames of attr(terms, 'factors'), which also keeps the unneeded
> backquotes.  Others use the term.labels attribute.  To dodge this I will
> need to create a fixterms() routine which I call at the top of every single
> routine in the library.
>
> Is there a chance for a fix at a higher level?
>
> Terry T.
>
>
>
> On 03/05/2018 03:55 PM, William Dunlap wrote:
>>
>> I believe this has to do terms() making "term.labels" (hence the dimnames
>> of "factors")
>> with deparse(), so that the backquotes are included for non-syntactic
>> names.  The backquotes
>> are not in the column names of the input data.frame (nor model frame) so
>> you get a mismatch
>> when subscripting the data.frame or model.frame with elements of
>> terms()$term.labels.
>>
>> I think you can avoid the problem by adding right after
>>      ll <- attr(Terms, "term.labels")
>> the line
>>      ll <- gsub("^`|`$", "", ll)
>>
>> E.g.,
>>
>>  > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y
>> z`=cos(1:5)+2)
>>  > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
>>  > m <- model.frame(Terms, data=d)
>>  > colnames(m)
>> [1] "y"            "log(`b$a$d`)" "x y z"
>>  > attr(Terms, "term.labels")
>> [1] "log(`b$a$d`)" "`x y z`"
>>  >   ll <- attr(Terms, "term.labels")
>>  > gsub("^`|`$", "", ll)
>> [1] "log(`b$a$d`)" "x y z"
>>
>> It is a bit of a mess.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com <http://tibco.com>
>>
>> On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>     A user reported a problem with the survdiff function and the use of
>> variables that
>>     contain a space.  Here is a simple example.  The same issue occurs in
>> survfit for the
>>     same reason.
>>
>>     lung2 <- lung
>>     names(lung2)[1] <- "in st"   # old name is inst
>>     survdiff(Surv(time, status) ~ `in st`, data=lung2)
>>     Error in `[.data.frame`(m, ll) : undefined columns selected
>>
>>     In the body of the code the program want to send all of the right-hand
>> side variables
>>     forward to the strata() function.  The code looks more or less like
>> this, where m is
>>     the model frame
>>
>>        Terms <- terms(m)
>>        index <- attr(Terms, "term.labels")
>>        if (length(index) ==0)  X <- rep(1L, n)  # no coariates
>>        else X <- strata(m[index])
>>
>>     For the variable with a space in the name the term.label is "`in st`",
>> and the
>>     subscript fails.
>>
>>     Is this intended behaviour or a bug?  The issue is that the name of
>> this column in the
>>     model frame does not have the backtics, while the terms structure does
>> have them.
>>
>>     Terry T.
>>
>>     ______________________________________________
>>     [hidden email] <mailto:[hidden email]> mailing list
>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel