# Error in [.terms

2 messages
 Martin,    There are a couple of issues with [.terms that have bitten my survival code.  At the useR conference I promised you a detailed (readable) explanation, and have been lax in getting it to you. The error was first pointed out in a bugzilla note from 2016, by the way.  The current survival code works around these. Consider the following formula: <>= library(survival)  # only to get access to the lung data set test <- Surv(time, status) ~  age + offset(ph.ecog) + strata(inst) tform <- terms(test, specials="strata") mf <- model.frame(tform, data=lung) mterm <- terms(mf) @ The strata term is handled in a special way by coxph, and then needs to be removed from the model formula before calling model.matrix. To do this the code uses essentially the following, which fails for the formula above. <>= strata <- attr(mterm, "specials")\$strata - attr(mterm, "response") X <- model.matrix(mterm[-strata], mf) @ The root problem is the need for multiple subscripts. \begin{itemize}    \item The formula itself has length 5, with ~' as the first element    \item The variables and predvars attributes are call objects, each a list() with 4 elments: the response and all 3 predictors    \item The term.labels attribute omits the resonse and the offset, so has  length 2    \item The factors attribute has 4 rows and 2 columns    \item The dataClasses attribute is a character vector of length 4 \end{itemize} So the ideal result of  mterm[remove the specials] would use subscript of \begin{itemize}    \item [-5] on the formula itself, variables and predvars attributes    \item [-2] for term.labels    \item [-4 , -2, drop=FALSE] for factor attribute    \item [-2] for order attribute    \item [-4] for the dataClasses attribute \end{itemize} That will recreate the formula that would have been'' had there been no strata term.  Now look at the first portion of the code in models.R <<>>= [.terms <- function (termobj, i) {      resp <- if (attr(termobj, "response")) termobj[[2L]]      newformula <- attr(termobj, "term.labels")[i]      if (length(newformula) == 0L) newformula <- "1"      newformula <- reformulate(newformula, resp, attr(termobj, "intercept"), environment(termobj))      result <- terms(newformula, specials = names(attr(termobj, "specials")))      # Edit the optional attributes } @ The use of reformulate() is a nice trick.  However, the index reported in the specials attribute is generated with reference to the variables attribute, or equivalently the row names of the factors attribute, not with respect to the term.labels attribute. For consistency the second line should instead be <<>>= newformula <- row.names(attr(termobj, "factors"))[i] @ Of course, this will break code for anyone else who has used [.terms and, like me, has been adjusting for the `response is counted in specials but not in term.labels'' feature.  R core will have to discuss/decide what is the right thing to do, and I'll adapt. The reformulate trick breaks in another way, one that only appeared on my radar this week via a formula like the following. <>= Surv(time, status) ~ age + (sex=='male') + strata(inst) @ In both the term.labels attribute and the row/col names of the factors attribute the parentheses disappear, and the result of the reformulate call is not a proper formula.  The + binds tighter than == leading to an error message that will confuse most users. We can argue, and I probably would, that the user should have used I(sex=='male').  But they didn't, and without the I() it is a legal formula, or at least one that currently works.  Fixing this issue is a lot harder. An offset term causes issues in the 'Edit the optional attributes' part of the routine as well.  If you and/or R core will tell me what you think the code should do, I'll create a patch.  My vote would be to use rownames per the above and ignore the () edge case. The same basic code appears in drop.terms, by the way. Terry T.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel