Give update.formula() an option not to simplify or reorder the result -- request for comments

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Give update.formula() an option not to simplify or reorder the result -- request for comments

Pavel Krivitsky
Dear All,

Martin Maechler has asked me to send this to R-devel for discussion
after I submitted it as an enhancement request (
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17563).

At this time, the update.formula() method always performs a number of
transformations on the results, eliminating redundant variables and
reordering interactions to be after the main effects. This is not
always the desired behaviour, because formulas are increasingly used
for purposes other than specifying linear models.

This the proposal is to add an option simplify= (defaulting to TRUE,
for backwards compatibility) that if FALSE will skip the simplification
step.

That is,

> update(a~b:c+b, .~.+b) # default: simplify=TRUE

a ~ b + b:c

> update(a~b:c+b, .~.+b, simplify=FALSE) # results are a mock-up

a ~ b:c + b + b

From what I can tell, this can be accomplished by skipping the second
line of the implementation of update.formula() ("out <-
formula(terms.formula(tmp, simplify = TRUE))").

Any thoughts? One particular question that Martin raised is whether the
UI should be just a single logical argument, or something else.

                        Best Regards,
                        Pavel

--
Pavel Krivitsky
Lecturer in Statistics
National Institute of Applied Statistics Research Australia (NIASRA)
School of Mathematics and Applied Statistics | Building 39C Room 154
University of Wollongong NSW 2522 Australia
T +61 2 4221 3713
Web (NIASRA): http://niasra.uow.edu.au/index.html
Web (Personal): http://www.krivitsky.net/research
ORCID: 0000-0002-9101-3362

NOTICE: This email is intended for the addressee named and may contain
confidential information. If you are not the intended recipient, please
delete it and notify the sender. Please consider the environment before
printing this email.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Give update.formula() an option not to simplify or reorder the result -- request for comments

Abby Spurdle
Hi Pavel
(Back On List)

And my two cents...

> At this time, the update.formula() method always performs a number of
> transformations on the results, eliminating redundant variables and
> reordering interactions to be after the main effects.
> This the proposal is to add an option simplify= (defaulting to TRUE,
> for backwards compatibility) that if FALSE will skip the simplification
> step.
> Any thoughts? One particular question that Martin raised is whether the
> UI should be just a single logical argument, or something else.

Firstly, note that the constructor for formula objects behaves differently
to the update method, so I think any changes should be consistent between
the two functions.
> #constructor - doesn't simplify
> y ~ x + x
y ~ x + x
> #update method - does simplify
> update (y ~ x, ~. + x)
y ~ x

Interestingly, this doesn't simplify.
> update (y ~ I (x), ~. + x)
y ~ I(x) + x

I think that simplification could mean different things.
So, there could be something like:
> update (y ~ x, ~. + x, strip=FALSE)
y ~ I (2 * x)

I don't know how easy that would be to implement.
(Symbolic computation on par with computer algebra systems is a discussion
in itself...).
And you could have one argument (say, method="simplify") rather than two or
more logical arguments.

It would also be possible to allow partial forms of simplification, by
specifying which terms should be collapsed, however, I doubt any possible
usefulness of this, would justify the complexity.
However, feel free to disagree.

You made an interesting comment.

> This is not
> always the desired behavior, because formulas are increasingly used
> for purposes other than specifying linear models.

Can I ask what these purposes are?


kind regards
Abs

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Give update.formula() an option not to simplify or reorder the result -- request for comments

Danny Smith
Hi Abs,

Re: your last point:

> You made an interesting comment.
>

> > This is not
> > always the desired behavior, because formulas are increasingly used
> > for purposes other than specifying linear models.
>
> Can I ask what these purposes are?



Not sure how relevant these are/what Pavel was referring to specifically,
but there are a few alternative uses that I'm familiar with in the
tidyverse packages.

Since formulas store both an expression and an environment they're really
useful for complex evaluation. rlang's "quosures" are a subclass of formula
<https://adv-r.hadley.nz/evaluation.html#quosure-impl>.

Othewise the main tidyverse use is a shorthand for specifying anonymous
functions (this is used extensively, particularly in purrr). From
?dplyr::mutate_at:
# You can also pass formulas to create functions on the spot, purrr-style:
starwars %>% mutate_at(c("height", "mass"), ~scale2(., na.rm = TRUE))

Also see ?dplyr::case_when:
x <- 1:50
case_when(
  x %% 35 == 0 ~ "fizz buzz",
  x %% 5 == 0 ~ "fizz",
  x %% 7 == 0 ~ "buzz",
  TRUE ~ as.character(x)
)

And in base R, formulas are used in the plotting functions, e.g.:
## boxplot on a formula:
boxplot(count ~ spray, data = InsectSprays, col = "lightgray")

Cheers,
Danny

On Mon, May 20, 2019 at 12:12 PM Abby Spurdle <[hidden email]> wrote:

> Hi Pavel
> (Back On List)
>
> And my two cents...
>
> > At this time, the update.formula() method always performs a number of
> > transformations on the results, eliminating redundant variables and
> > reordering interactions to be after the main effects.
> > This the proposal is to add an option simplify= (defaulting to TRUE,
> > for backwards compatibility) that if FALSE will skip the simplification
> > step.
> > Any thoughts? One particular question that Martin raised is whether the
> > UI should be just a single logical argument, or something else.
>
> Firstly, note that the constructor for formula objects behaves differently
> to the update method, so I think any changes should be consistent between
> the two functions.
> > #constructor - doesn't simplify
> > y ~ x + x
> y ~ x + x
> > #update method - does simplify
> > update (y ~ x, ~. + x)
> y ~ x
>
> Interestingly, this doesn't simplify.
> > update (y ~ I (x), ~. + x)
> y ~ I(x) + x
>
> I think that simplification could mean different things.
> So, there could be something like:
> > update (y ~ x, ~. + x, strip=FALSE)
> y ~ I (2 * x)
>
> I don't know how easy that would be to implement.
> (Symbolic computation on par with computer algebra systems is a discussion
> in itself...).
> And you could have one argument (say, method="simplify") rather than two or
> more logical arguments.
>
> It would also be possible to allow partial forms of simplification, by
> specifying which terms should be collapsed, however, I doubt any possible
> usefulness of this, would justify the complexity.
> However, feel free to disagree.
>
> You made an interesting comment.
>
> > This is not
> > always the desired behavior, because formulas are increasingly used
> > for purposes other than specifying linear models.
>
> Can I ask what these purposes are?
>
>
> kind regards
> Abs
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Give update.formula() an option not to simplify or reorder the result -- request for comments

Abby Spurdle
In reply to this post by Pavel Krivitsky
> Martin Maechler has asked me to send this to R-devel for discussion
> after I submitted it as an enhancement request (
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17563).

I think R needs to provide more support for CAS-style symbolic computation.
That is, support by either the R language itself or the standard packages,
or both.
(And certainly not by interfacing with another interpreted language).

Obviously, I don't speak for R Core.
However, this is how I would like to see R move in the future.
...improved symbolic and symbolic-numeric computation...

I think any changes to formula objects or their methods, should be
congruent with these symbolic improvements.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Give update.formula() an option not to simplify or reorder the result -- request for comments

Thomas Mailund
With a bit of meta programming that manipulates expressions, I don’t think this would be difficult to implement in a package. Well, as difficult as it is to implement a CAS, but not harder. I wrote some code for symbolic differentiation — I don’t remember where I put it — and that was easy. But that is because differentiation is just a handful of rules and then the chain rule. I don’t have the skills for handling more complex symbolic manipulation, but anyone who could add it to the language could also easily add it as a package, I think.

Whether in a standard package or not, I have no preference whatsoever.

Cheers
Thomas



On 25 May 2019 at 00.59.44, Abby Spurdle ([hidden email]<mailto:[hidden email]>) wrote:

> Martin Maechler has asked me to send this to R-devel for discussion
> after I submitted it as an enhancement request (
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17563).

I think R needs to provide more support for CAS-style symbolic computation.
That is, support by either the R language itself or the standard packages,
or both.
(And certainly not by interfacing with another interpreted language).

Obviously, I don't speak for R Core.
However, this is how I would like to see R move in the future.
...improved symbolic and symbolic-numeric computation...

I think any changes to formula objects or their methods, should be
congruent with these symbolic improvements.

[[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Give update.formula() an option not to simplify or reorder the result -- request for comments

Martin Maechler
In reply to this post by Abby Spurdle
Trying to revive, possibly conclude a forgotten thread ...

>>>>> Abby Spurdle
>>>>>     on Mon, 20 May 2019 14:11:47 +1200 writes:

    > Hi Pavel
    > (Back On List)

    > And my two cents...

    >> At this time, the update.formula() method always performs a number of
    >> transformations on the results, eliminating redundant variables and
    >> reordering interactions to be after the main effects.
    >> This the proposal is to add an option simplify= (defaulting to TRUE,
    >> for backwards compatibility) that if FALSE will skip the simplification
    >> step.
    >> Any thoughts? One particular question that Martin raised is whether the
    >> UI should be just a single logical argument, or something else.

    > Firstly, note that the constructor for formula objects behaves differently
    > to the update method, so I think any changes should be consistent between
    > the two functions.

Not so easily:  The ` ~ ` constructor does not sensibly (in my
opinion) get optional arguments,
whereas Pavel was suggesting a new *optional* argument to update.formula()

    >> #constructor - doesn't simplify
    >> y ~ x + x
    > y ~ x + x
    >> #update method - does simplify
    >> update (y ~ x, ~. + x)
    > y ~ x

    > Interestingly, this doesn't simplify.
    >> update (y ~ I (x), ~. + x)
    > y ~ I(x) + x

well, I hope so:
The whole point of  I(.) is to  *not* be identical (but close) to its argument.

    > I think that simplification could mean different things.

Good point, I tend to agree, with the above,

(whereas I'm less happy with this example : )

    > So, there could be something like:
    >> update (y ~ x, ~. + x, strip=FALSE)
    > y ~ I (2 * x)

    > I don't know how easy that would be to implement.
    > (Symbolic computation on par with computer algebra systems is a discussion
    > in itself...).
    > And you could have one argument (say, method="simplify") rather than two or
    > more logical arguments.

Yes exactly; given what we've heard till now, I'd also go for a
new argument (possibly 'method') which should be a string
(and keep the current behavior as default), ideally here with a
match.arg() setup.

    > It would also be possible to allow partial forms of simplification, by
    > specifying which terms should be collapsed, however, I doubt any possible
    > usefulness of this, would justify the complexity.
    > However, feel free to disagree.

    > You made an interesting comment.

    >> This is not
    >> always the desired behavior, because formulas are increasingly used
    >> for purposes other than specifying linear models.

    > Can I ask what these purposes are?

    > kind regards
    > Abs

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel