named arguments in formula and terms

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

named arguments in formula and terms

Achim Zeileis-2
Hi, we came across the following unexpected (for us) behavior in
terms.formula: When determining whether a term is duplicated, only the
order of the arguments in function calls seems to be checked but not their
names. Thus the terms f(x, a = z) and f(x, b = z) are deemed to be
duplicated and one of the terms is thus dropped.

R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
[1] "f(x, a = z)"

However, changing the arguments or the order of arguments keeps both
terms:

R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
[1] "f(x, a = z)"  "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
[1] "f(x, a = z)" "f(b = z, x)"

Is this intended behavior or needed for certain terms?

We came across this problem when setting up certain smooth regressors with
different kinds of patterns. As a trivial simplified example we can
generate the same kind of problem with rep(). Consider the two dummy
variables rep(x = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
response y = 1:8 I get:

R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

Call:
lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

Coefficients:
            (Intercept)  rep(x = 0:1, each = 4)
                    2.5                     4.0

So while the model is identified because the two regressors are not the
same, terms.fomula does not recognize this and drops the second regressor.
What I would have wanted can be obtained by switching the arguments:

R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))

Call:
lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))

Coefficients:
             (Intercept)   rep(each = 4, x = 0:1)  rep(x = 0:1, times = 4)
                       2                        4                        1

Of course, here I could avoid the problem by setting up proper factors
etc. But to me this looks a potential bug in terms.formula...

Thanks in advance for any insights,
Z

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: named arguments in formula and terms

Martin Maechler
Dear Achim,

>>>>> Achim Zeileis <[hidden email]>
>>>>>     on Fri, 10 Mar 2017 15:02:38 +0100 writes:

    > Hi, we came across the following unexpected (for us)
    > behavior in terms.formula: When determining whether a term
    > is duplicated, only the order of the arguments in function
    > calls seems to be checked but not their names. Thus the
    > terms f(x, a = z) and f(x, b = z) are deemed to be
    > duplicated and one of the terms is thus dropped.

    R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
    > [1] "f(x, a = z)"

    > However, changing the arguments or the order of arguments
    > keeps both terms:

    R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
    > [1] "f(x, a = z)" "f(x, b = zz)"
    R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
    > [1] "f(x, a = z)" "f(b = z, x)"

    > Is this intended behavior or needed for certain terms?

    > We came across this problem when setting up certain smooth
    > regressors with different kinds of patterns. As a trivial
    > simplified example we can generate the same kind of
    > problem with rep(). Consider the two dummy variables rep(x
    > = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
    > response y = 1:8 I get:

    R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

    > Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x
    > = 0:1, times = 4))

    > Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0

    > So while the model is identified because the two
    > regressors are not the same, terms.fomula does not
    > recognize this and drops the second regressor.  What I
    > would have wanted can be obtained by switching the
    > arguments:

    R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))

    > Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x
    > = 0:1, times = 4))

    > Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =
    > 0:1, times = 4) 2 4 1

    > Of course, here I could avoid the problem by setting up
    > proper factors etc. But to me this looks a potential bug
    > in terms.formula...

I agree that there is a bug.
According to https://www.r-project.org/bugs.html
I have generated an R bugzilla account for you so you can report
it there (for "book keeping", posteriority, etc).

    > Thanks in advance for any insights, Z

and thank *you* (and Nikolaus ?) for the report!

Best regards,
Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: named arguments in formula and terms

Achim Zeileis-4
Martin, thanks for the follow-up!

On Mon, 13 Mar 2017, Martin Maechler wrote:

> Dear Achim,
>
>>>>>> Achim Zeileis <[hidden email]>
>>>>>>     on Fri, 10 Mar 2017 15:02:38 +0100 writes:
>
>    > Hi, we came across the following unexpected (for us)
>    > behavior in terms.formula: When determining whether a term
>    > is duplicated, only the order of the arguments in function
>    > calls seems to be checked but not their names. Thus the
>    > terms f(x, a = z) and f(x, b = z) are deemed to be
>    > duplicated and one of the terms is thus dropped.
>
>    R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
>    > [1] "f(x, a = z)"
>
>    > However, changing the arguments or the order of arguments
>    > keeps both terms:
>
>    R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
>    > [1] "f(x, a = z)" "f(x, b = zz)"
>    R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
>    > [1] "f(x, a = z)" "f(b = z, x)"
>
>    > Is this intended behavior or needed for certain terms?
>
>    > We came across this problem when setting up certain smooth
>    > regressors with different kinds of patterns. As a trivial
>    > simplified example we can generate the same kind of
>    > problem with rep(). Consider the two dummy variables rep(x
>    > = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
>    > response y = 1:8 I get:
>
>    R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))
>
>    > Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x
>    > = 0:1, times = 4))
>
>    > Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0
>
>    > So while the model is identified because the two
>    > regressors are not the same, terms.fomula does not
>    > recognize this and drops the second regressor.  What I
>    > would have wanted can be obtained by switching the
>    > arguments:
>
>    R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))
>
>    > Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x
>    > = 0:1, times = 4))
>
>    > Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =
>    > 0:1, times = 4) 2 4 1
>
>    > Of course, here I could avoid the problem by setting up
>    > proper factors etc. But to me this looks a potential bug
>    > in terms.formula...
>
> I agree that there is a bug.

OK, good. I just wasn't sure whether I had missed some documentation
somewhere that this is intended behavior.

> According to https://www.r-project.org/bugs.html
> I have generated an R bugzilla account for you so you can report
> it there (for "book keeping", posteriority, etc).

Thanks, I had already looked at that but waited for feedback on this list
first.

>    > Thanks in advance for any insights, Z
>
> and thank *you* (and Nikolaus ?) for the report!

No problem. Niki found the problem and I came up with the simplified
example. In any case, I just posted a slightly modified version of my
e-mail as #17235 on Bugzilla:

https://bugs.R-project.org/bugzilla/show_bug.cgi?id=17235

Thanks & best wishes,
Z


> Best regards,
> Martin
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel