# named arguments in formula and terms

3 messages
Open this post in threaded view
|
Report Content as Inappropriate

## named arguments in formula and terms

 Hi, we came across the following unexpected (for us) behavior in terms.formula: When determining whether a term is duplicated, only the order of the arguments in function calls seems to be checked but not their names. Thus the terms f(x, a = z) and f(x, b = z) are deemed to be duplicated and one of the terms is thus dropped. R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels") [1] "f(x, a = z)" However, changing the arguments or the order of arguments keeps both terms: R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels") [1] "f(x, a = z)"  "f(x, b = zz)" R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels") [1] "f(x, a = z)" "f(b = z, x)" Is this intended behavior or needed for certain terms? We came across this problem when setting up certain smooth regressors with different kinds of patterns. As a trivial simplified example we can generate the same kind of problem with rep(). Consider the two dummy variables rep(x = 0:1, each = 4) and rep(x = 0:1, times = 4). With the response y = 1:8 I get: R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4)) Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4)) Coefficients:             (Intercept)  rep(x = 0:1, each = 4)                     2.5                     4.0 So while the model is identified because the two regressors are not the same, terms.fomula does not recognize this and drops the second regressor. What I would have wanted can be obtained by switching the arguments: R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4)) Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4)) Coefficients:              (Intercept)   rep(each = 4, x = 0:1)  rep(x = 0:1, times = 4)                        2                        4                        1 Of course, here I could avoid the problem by setting up proper factors etc. But to me this looks a potential bug in terms.formula... Thanks in advance for any insights, Z ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|
Report Content as Inappropriate

## Re: named arguments in formula and terms

 Dear Achim, >>>>> Achim Zeileis <[hidden email]> >>>>>     on Fri, 10 Mar 2017 15:02:38 +0100 writes:     > Hi, we came across the following unexpected (for us)     > behavior in terms.formula: When determining whether a term     > is duplicated, only the order of the arguments in function     > calls seems to be checked but not their names. Thus the     > terms f(x, a = z) and f(x, b = z) are deemed to be     > duplicated and one of the terms is thus dropped.     R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")     > [1] "f(x, a = z)"     > However, changing the arguments or the order of arguments     > keeps both terms:     R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")     > [1] "f(x, a = z)" "f(x, b = zz)"     R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")     > [1] "f(x, a = z)" "f(b = z, x)"     > Is this intended behavior or needed for certain terms?     > We came across this problem when setting up certain smooth     > regressors with different kinds of patterns. As a trivial     > simplified example we can generate the same kind of     > problem with rep(). Consider the two dummy variables rep(x     > = 0:1, each = 4) and rep(x = 0:1, times = 4). With the     > response y = 1:8 I get:     R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))     > Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x     > = 0:1, times = 4))     > Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0     > So while the model is identified because the two     > regressors are not the same, terms.fomula does not     > recognize this and drops the second regressor.  What I     > would have wanted can be obtained by switching the     > arguments:     R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))     > Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x     > = 0:1, times = 4))     > Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =     > 0:1, times = 4) 2 4 1     > Of course, here I could avoid the problem by setting up     > proper factors etc. But to me this looks a potential bug     > in terms.formula... I agree that there is a bug. According to https://www.r-project.org/bugs.htmlI have generated an R bugzilla account for you so you can report it there (for "book keeping", posteriority, etc).     > Thanks in advance for any insights, Z and thank *you* (and Nikolaus ?) for the report! Best regards, Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel