should base R have a piping operator ?

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: should base R have a piping operator ?

Lionel Henry
Hi Gabe,

> There is another way the pipe could go into base R that could not be
> done in package space and has the potential to mitigate some pretty
> serious downsides to the pipes relating to debugging

I assume you're thinking about the large stack trace of the magrittr
pipe? You don't need a parser transformation to solve this problem
though, the pipe could be implemented as a regular function with a
very limited impact on the stack. And if implemented as a SPECIALSXP,
it would be completely invisible. We've been planning to rewrite %>%
to fix the performance and the stack print, it's just low priority.

About the semantics of local evaluation that were proposed in this
thread, I think that wouldn't be right. A native pipe should be
consistent with other control flow constructs like `if` and `for` and
evaluate in the current environment. In that case, the `.` binding, if
any, would be restored to its original value in `on.exit()` (or through
unwind-protection if implemented in C).

Best,
Lionel


> On 6 Oct 2019, at 01:50, Gabriel Becker <[hidden email]> wrote:
>
> Hi all,
>
> I think there's some nuance here that makes makes me agree partially with
> each "side".
>
> The pipe is inarguably extremely popular. Many probably think of it as a
> core feature of R, along with the tidyverse that (as was pointed out)
> largely surrounds it and drives its popularity. Whether its a good or bad
> thing that they think that doesn't change the fact that by my estimation
> that Ant is correct that they do. BUT, I don't agree with him that that, by
> itself, is a reason to put it in base R in the form that it exists now. For
> the current form, there aren't really any major downsides that I see to
> having people just use the package version.
>
> Sure it may be a little weird, but it doesn't ever really stop the
> people from using it or present a significant barrier. Another major point
> is that many (most?) base R functions are not necessarily tooled to be
> endomorphic, which in my personal opinion is *largely* the only place that
> the pipes are really compelling.
>
> That was for pipes as the exist in package space, though. There is another
> way the pipe could go into base R that could not be done in package space
> and has the potential to mitigate some pretty serious downsides to the
> pipes relating to debugging, which would be to implement them in the parser.
>
> If
>
> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
>
>
> were *parsed* as, for example, into
>
> local({
>            . = group_by(iris, Species)
>
>            ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>
>            filter(., mean_sl > 5)
>       })
>
>
>
>
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
>
> There is some minor precedent for that type of transformative parsing:
>
>> expr = parse(text = "5 -> x")
>
>> expr
>
> expression(5 -> x)
>
>> expr[[1]]
>
> x <- 5
>
>
> Though thats a much more minor transformation.
>
> All of that said, I believe Jim Hester (cc'ed) suggested something along
> these lines at the RSummit a couple of years ago, and thus far R-core has
> not shown much appetite for changing things in the parser.
>
> Without that changing, I'd have to say that my vote, for whatever its
> worth, comes down on the side of pipes being fine in packages. A summary of
> my reasoning being that it only makes sense for them to go into R itself if
> doing so fixes an issue that cna't be fixed with them in package space.
>
> Best,
> ~G
>
>
>
> On Sun, Oct 6, 2019 at 5:26 AM Ant F <[hidden email]> wrote:
>
>> Yes but this exageration precisely misses the point.
>>
>> Concerning your examples:
>>
>> * I love fread but I think it makes a lot of subjective choices that are
>> best associated with a package. I think it
>> changed a lot with time and can still change, and we have great developers
>> willing to maintain it and be reactive
>> regarding feature requests or bug reports
>>
>> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
>> that's very easy to dismiss it as an inclusion in base R.
>>
>> * summarize is an alternative to aggregate, that would be very confusing to
>> have both
>>
>> Now to be fair to your argument we could think of other functions such as
>> data.table::rleid() which I believe base R misses deeply,
>> and there is nothing wrong with packaged functions making their way to base
>> R.
>>
>> Maybe there's an existing list of criteria for inclusion, in base R but if
>> not I can make one up for the sake of this discussion :) :
>> * 1) the functionality should not already exist
>> * 2) the function should be general enough
>> * 3) the function should have a large amount of potential of users
>> * 4) the function should be robust, and not require extensive maintenance
>> * 5) the function should be stable, we shouldn't expect new features ever 2
>> months
>> * 6) the function should have an intuitive interface in the context of the
>> rest ot base R
>>
>> I guess 1 and 6 could be held against my proposal, because :
>> (1) everything can be done without pipes
>> (6) They are somewhat surprising (though with explicit dots not that much,
>> and not more surprising than say `bquote()`)
>>
>> In my opinion the + offset the -.
>>
>> I wouldn't advise taking magrittr's pipe (providing the license allows so)
>> for instance, because it makes a lot of design choices and has a complex
>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>> require maintenance.
>>
>> Antoine
>>
>> PS: I just receive the digest once a day so If you don't "reply all" I can
>> only react later.
>>
>> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <[hidden email]> a écrit :
>>
>>> I exaggerated the comparison for effect. However, it is not very
>> difficult
>>> to find functions in dplyr or data.table or indeed other packages that
>> one
>>> may wish to be in base R. Examples, for me, could include
>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
>> Also,
>>> the "popularity" of magrittr::`%>%` is mostly attributable to the
>> tidyverse
>>> (an advanced superset of R). Many R users don't even know that they are
>>> installing the magrittr package.
>>>
>>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <[hidden email]>
>> wrote:
>>>
>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <[hidden email]> wrote:
>>>>>
>>>>> How is your argument different to, say,  "Should dplyr or data.table
>> be
>>>>> part of base R as they are the most popular data science packages and
>>>> they
>>>>> are used by a large number of users?"
>>>>
>>>> Two packages with many features, dozens of functions and under heavy
>>>> development to fix bugs, add new features and improve performance, vs.
>>>> a single operator with a limited and well-defined functionality, and a
>>>> reference implementation that hasn't changed in years (but certainly
>>>> hackish in a way that probably could only be improved from R itself).
>>>>
>>>> Can't you really spot the difference?
>>>>
>>>> Iñaki
>>>>
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: should base R have a piping operator ?

Duncan Murdoch-2
On 07/10/2019 4:22 a.m., Lionel Henry wrote:

> Hi Gabe,
>
>> There is another way the pipe could go into base R that could not be
>> done in package space and has the potential to mitigate some pretty
>> serious downsides to the pipes relating to debugging
>
> I assume you're thinking about the large stack trace of the magrittr
> pipe? You don't need a parser transformation to solve this problem
> though, the pipe could be implemented as a regular function with a
> very limited impact on the stack. And if implemented as a SPECIALSXP,
> it would be completely invisible. We've been planning to rewrite %>%
> to fix the performance and the stack print, it's just low priority.

I don't know what Gabe had in mind, but the downside to pipes that I see
is that they are single statements.  I'd like the debugger to be able to
single step through one stage at a time.  I'd like to be able to set a
breakpoint on line 3 in

   a %>%
   b %>%
   c %>%
   d

and be able to examine the intermediate result of evaluating b before
piping it into c.  (Or maybe that's off by one:  maybe I'd prefer to
examine the inputs to d if I put a breakpoint there.  I'd have to try it
to find out which feels more natural.)


> About the semantics of local evaluation that were proposed in this
> thread, I think that wouldn't be right. A native pipe should be
> consistent with other control flow constructs like `if` and `for` and
> evaluate in the current environment. In that case, the `.` binding, if
> any, would be restored to its original value in `on.exit()` (or through
> unwind-protection if implemented in C).

That makes sense.

Duncan Murdoch

>
> Best,
> Lionel
>
>
>> On 6 Oct 2019, at 01:50, Gabriel Becker <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I think there's some nuance here that makes makes me agree partially with
>> each "side".
>>
>> The pipe is inarguably extremely popular. Many probably think of it as a
>> core feature of R, along with the tidyverse that (as was pointed out)
>> largely surrounds it and drives its popularity. Whether its a good or bad
>> thing that they think that doesn't change the fact that by my estimation
>> that Ant is correct that they do. BUT, I don't agree with him that that, by
>> itself, is a reason to put it in base R in the form that it exists now. For
>> the current form, there aren't really any major downsides that I see to
>> having people just use the package version.
>>
>> Sure it may be a little weird, but it doesn't ever really stop the
>> people from using it or present a significant barrier. Another major point
>> is that many (most?) base R functions are not necessarily tooled to be
>> endomorphic, which in my personal opinion is *largely* the only place that
>> the pipes are really compelling.
>>
>> That was for pipes as the exist in package space, though. There is another
>> way the pipe could go into base R that could not be done in package space
>> and has the potential to mitigate some pretty serious downsides to the
>> pipes relating to debugging, which would be to implement them in the parser.
>>
>> If
>>
>> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
>> filter(mean_sl > 5)
>>
>>
>> were *parsed* as, for example, into
>>
>> local({
>>             . = group_by(iris, Species)
>>
>>             ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>>
>>             filter(., mean_sl > 5)
>>        })
>>
>>
>>
>>
>> Then debuggiing (once you knew that) would be much easier but behavaior
>> would be the same as it is now. There could even be some sort of
>> step-through-pipe debugger at that point added as well for additional
>> convenience.
>>
>> There is some minor precedent for that type of transformative parsing:
>>
>>> expr = parse(text = "5 -> x")
>>
>>> expr
>>
>> expression(5 -> x)
>>
>>> expr[[1]]
>>
>> x <- 5
>>
>>
>> Though thats a much more minor transformation.
>>
>> All of that said, I believe Jim Hester (cc'ed) suggested something along
>> these lines at the RSummit a couple of years ago, and thus far R-core has
>> not shown much appetite for changing things in the parser.
>>
>> Without that changing, I'd have to say that my vote, for whatever its
>> worth, comes down on the side of pipes being fine in packages. A summary of
>> my reasoning being that it only makes sense for them to go into R itself if
>> doing so fixes an issue that cna't be fixed with them in package space.
>>
>> Best,
>> ~G
>>
>>
>>
>> On Sun, Oct 6, 2019 at 5:26 AM Ant F <[hidden email]> wrote:
>>
>>> Yes but this exageration precisely misses the point.
>>>
>>> Concerning your examples:
>>>
>>> * I love fread but I think it makes a lot of subjective choices that are
>>> best associated with a package. I think it
>>> changed a lot with time and can still change, and we have great developers
>>> willing to maintain it and be reactive
>>> regarding feature requests or bug reports
>>>
>>> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
>>> that's very easy to dismiss it as an inclusion in base R.
>>>
>>> * summarize is an alternative to aggregate, that would be very confusing to
>>> have both
>>>
>>> Now to be fair to your argument we could think of other functions such as
>>> data.table::rleid() which I believe base R misses deeply,
>>> and there is nothing wrong with packaged functions making their way to base
>>> R.
>>>
>>> Maybe there's an existing list of criteria for inclusion, in base R but if
>>> not I can make one up for the sake of this discussion :) :
>>> * 1) the functionality should not already exist
>>> * 2) the function should be general enough
>>> * 3) the function should have a large amount of potential of users
>>> * 4) the function should be robust, and not require extensive maintenance
>>> * 5) the function should be stable, we shouldn't expect new features ever 2
>>> months
>>> * 6) the function should have an intuitive interface in the context of the
>>> rest ot base R
>>>
>>> I guess 1 and 6 could be held against my proposal, because :
>>> (1) everything can be done without pipes
>>> (6) They are somewhat surprising (though with explicit dots not that much,
>>> and not more surprising than say `bquote()`)
>>>
>>> In my opinion the + offset the -.
>>>
>>> I wouldn't advise taking magrittr's pipe (providing the license allows so)
>>> for instance, because it makes a lot of design choices and has a complex
>>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>>> require maintenance.
>>>
>>> Antoine
>>>
>>> PS: I just receive the digest once a day so If you don't "reply all" I can
>>> only react later.
>>>
>>> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <[hidden email]> a écrit :
>>>
>>>> I exaggerated the comparison for effect. However, it is not very
>>> difficult
>>>> to find functions in dplyr or data.table or indeed other packages that
>>> one
>>>> may wish to be in base R. Examples, for me, could include
>>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
>>> Also,
>>>> the "popularity" of magrittr::`%>%` is mostly attributable to the
>>> tidyverse
>>>> (an advanced superset of R). Many R users don't even know that they are
>>>> installing the magrittr package.
>>>>
>>>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <[hidden email]>
>>> wrote:
>>>>
>>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <[hidden email]> wrote:
>>>>>>
>>>>>> How is your argument different to, say,  "Should dplyr or data.table
>>> be
>>>>>> part of base R as they are the most popular data science packages and
>>>>> they
>>>>>> are used by a large number of users?"
>>>>>
>>>>> Two packages with many features, dozens of functions and under heavy
>>>>> development to fix bugs, add new features and improve performance, vs.
>>>>> a single operator with a limited and well-defined functionality, and a
>>>>> reference implementation that hasn't changed in years (but certainly
>>>>> hackish in a way that probably could only be improved from R itself).
>>>>>
>>>>> Can't you really spot the difference?
>>>>>
>>>>> Iñaki
>>>>>
>>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: should base R have a piping operator ?

Lionel Henry
>
> On 7 Oct 2019, at 13:47, Duncan Murdoch <[hidden email]> wrote:
>
> On 07/10/2019 4:22 a.m., Lionel Henry wrote:
>> Hi Gabe,
>>> There is another way the pipe could go into base R that could not be
>>> done in package space and has the potential to mitigate some pretty
>>> serious downsides to the pipes relating to debugging
>> I assume you're thinking about the large stack trace of the magrittr
>> pipe? You don't need a parser transformation to solve this problem
>> though, the pipe could be implemented as a regular function with a
>> very limited impact on the stack. And if implemented as a SPECIALSXP,
>> it would be completely invisible. We've been planning to rewrite %>%
>> to fix the performance and the stack print, it's just low priority.
>
> I don't know what Gabe had in mind, but the downside to pipes that I see is that they are single statements.  I'd like the debugger to be able to single step through one stage at a time.  I'd like to be able to set a breakpoint on line 3 in
>
>  a %>%
>  b %>%
>  c %>%
>  d
>
> and be able to examine the intermediate result of evaluating b before piping it into c.  (Or maybe that's off by one:  maybe I'd prefer to examine the inputs to d if I put a breakpoint there.  I'd have to try it to find out which feels more natural.)

In order to place a breakpoint on line 3, I think you'll need to wrap
`c()` in curly braces and insert a `browser()` call. And at that point
you're changing the semantics of `c()` and you'll need to manually
write the placeholder for the input:

a() |>
  b() |>
  { browser(); c(.) } |>
  d()

I don't see any way around this. I guess it could be done behind the
scenes by the IDE when a breakpoint is set though. Note that this
doesn't require any changes to the parser and already works with the
magrittr pipe.

Then there's the issue of continuing to step-debug through the
pipeline. This could be achieved by parsing `a |> b()` as `{a} |>
{b()}`. so that each sub-expression carries source references. In
general, there are metaprogramming patterns that would be made easier
if calls to `function` or `if` always had a body wrapped in `{`. It is
too late to change historical operators but maybe it makes sense for
newer ones?

Lionel


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: should base R have a piping operator ?

Duncan Murdoch-2
On 07/10/2019 8:38 a.m., Lionel Henry wrote:

>>
>> On 7 Oct 2019, at 13:47, Duncan Murdoch <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> On 07/10/2019 4:22 a.m., Lionel Henry wrote:
>>> Hi Gabe,
>>>> There is another way the pipe could go into base R that could not be
>>>> done in package space and has the potential to mitigate some pretty
>>>> serious downsides to the pipes relating to debugging
>>> I assume you're thinking about the large stack trace of the magrittr
>>> pipe? You don't need a parser transformation to solve this problem
>>> though, the pipe could be implemented as a regular function with a
>>> very limited impact on the stack. And if implemented as a SPECIALSXP,
>>> it would be completely invisible. We've been planning to rewrite %>%
>>> to fix the performance and the stack print, it's just low priority.
>>
>> I don't know what Gabe had in mind, but the downside to pipes that I
>> see is that they are single statements.  I'd like the debugger to be
>> able to single step through one stage at a time.  I'd like to be able
>> to set a breakpoint on line 3 in
>>
>>  a %>%
>>  b %>%
>>  c %>%
>>  d
>>
>> and be able to examine the intermediate result of evaluating b before
>> piping it into c.  (Or maybe that's off by one:  maybe I'd prefer to
>> examine the inputs to d if I put a breakpoint there.  I'd have to try
>> it to find out which feels more natural.)
>
> In order to place a breakpoint on line 3, I think you'll need to wrap
> `c()` in curly braces and insert a `browser()` call. And at that point
> you're changing the semantics of `c()` and you'll need to manually
> write the placeholder for the input:
>
> a() |>
>    b() |>
>    { browser(); c(.) } |>
>    d()
>
> I don't see any way around this. I guess it could be done behind the
> scenes by the IDE when a breakpoint is set though. Note that this
> doesn't require any changes to the parser and already works with the
> magrittr pipe.

Yes, I was hoping this would happen behind the scenes.  I agree that the
parser doesn't need to be changed, but the IDE would need to break up
the statement into 3 or more equivalent statements for this to work with
no changes to core R.  I think that could be done after parsing at
run-time, as described in my earlier message.

Duncan Murdoch

P.S.  Were you just using |> to save typing, or is there a proposal to
add a new operator to the language?  That would need parser changes.


>
> Then there's the issue of continuing to step-debug through the
> pipeline. This could be achieved by parsing `a |> b()` as `{a} |>
> {b()}`. so that each sub-expression carries source references. In
> general, there are metaprogramming patterns that would be made easier
> if calls to `function` or `if` always had a body wrapped in `{`. It is
> too late to change historical operators but maybe it makes sense for
> newer ones?
>
> Lionel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: should base R have a piping operator ?

Lionel Henry

> On 7 Oct 2019, at 15:36, Duncan Murdoch <[hidden email]> wrote:
>
>  I think that could be done after parsing at run-time, as described in my earlier message.

Good point.


> P.S.  Were you just using |> to save typing, or is there a proposal to add a new operator to the language?  That would need parser changes.

Just a hypothetical native pipe for which the parser would automatically
wrap the arguments in srcref-carrying braces. Then we get step-debugging
of pipelines in all editors.


Best,
Lionel
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Tierney, Luke
In reply to this post by Lionel Henry
On Mon, 7 Oct 2019, Lionel Henry wrote:

> Hi Gabe,
>
>> There is another way the pipe could go into base R that could not be
>> done in package space and has the potential to mitigate some pretty
>> serious downsides to the pipes relating to debugging
>
> I assume you're thinking about the large stack trace of the magrittr
> pipe? You don't need a parser transformation to solve this problem
> though, the pipe could be implemented as a regular function with a
> very limited impact on the stack. And if implemented as a SPECIALSXP,
> it would be completely invisible. We've been planning to rewrite %>%
> to fix the performance and the stack print, it's just low priority.
>
> About the semantics of local evaluation that were proposed in this
> thread, I think that wouldn't be right. A native pipe should be
> consistent with other control flow constructs like `if` and `for` and
> evaluate in the current environment. In that case, the `.` binding, if
> any, would be restored to its original value in `on.exit()` (or through
> unwind-protection if implemented in C).
>

Sorry to be blunt but adding/removing a variable from a caller's
environment is a horrible design idea. Think about what happens if an
argument in a pipe stage contains a pipe. (Not completely
unreasonable, e.g. for a left_join). We already have such a design
lurking in (at least) one place in base code and it keeps biting. It's
pretty high on my list to be expunged.

If a variable is to be used it needs to be in its own
scope/environment.  There is another option, which is to rewrite the
pipe as a nested call and evaluate that in the parent frame. Not likely
to be much worse for debugging and might even be better.  Some
tinkering with these ideas is at

https://gitlab.com/luke-tierney/pipes

All that said, there is nothing that can be done with pipes that can't
be done without them. They may be the most visible aspect of the
tidyverse but they are also the least essential. I don't find them
useful, mostly because they make debugging harder and add to the
cognitive load of figuring out what is actually going on in the
evaluation process. So I don't use them in my work or my teaching (I
do mention them in teaching so students can understand them when they
see them). Many people clearly like them, and that's fine. But they
are not in any way, shape, or form essential.

I can't speak for all of R core on this, but this is how I look at the
question of inclusion in base: R core developer time is a (very)
scarce resource. Any part of that resource that is used to incorporate
and maintain in base something that can be implemented reasonably well
in a package is then not available for improving and maintaining parts
of R that have to be in base. There would need to be extremely strong
reasons for reallocating resources in this way and I just don't see
how that case can be made here.

It is certainly possible that thinking about pipes might suggest tome
useful low level primitives to add that would have to live in base and
might be useful in other contexts. Those might be worth considering.
[Some kind of 'exec()' or aving an 'exec()' or 'tailcall()' primitive
to reuse a call frame for example.]

Best,

luke

> Best,
> Lionel
>
>
>> On 6 Oct 2019, at 01:50, Gabriel Becker <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I think there's some nuance here that makes makes me agree partially with
>> each "side".
>>
>> The pipe is inarguably extremely popular. Many probably think of it as a
>> core feature of R, along with the tidyverse that (as was pointed out)
>> largely surrounds it and drives its popularity. Whether its a good or bad
>> thing that they think that doesn't change the fact that by my estimation
>> that Ant is correct that they do. BUT, I don't agree with him that that, by
>> itself, is a reason to put it in base R in the form that it exists now. For
>> the current form, there aren't really any major downsides that I see to
>> having people just use the package version.
>>
>> Sure it may be a little weird, but it doesn't ever really stop the
>> people from using it or present a significant barrier. Another major point
>> is that many (most?) base R functions are not necessarily tooled to be
>> endomorphic, which in my personal opinion is *largely* the only place that
>> the pipes are really compelling.
>>
>> That was for pipes as the exist in package space, though. There is another
>> way the pipe could go into base R that could not be done in package space
>> and has the potential to mitigate some pretty serious downsides to the
>> pipes relating to debugging, which would be to implement them in the parser.
>>
>> If
>>
>> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
>> filter(mean_sl > 5)
>>
>>
>> were *parsed* as, for example, into
>>
>> local({
>>            . = group_by(iris, Species)
>>
>>            ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>>
>>            filter(., mean_sl > 5)
>>       })
>>
>>
>>
>>
>> Then debuggiing (once you knew that) would be much easier but behavaior
>> would be the same as it is now. There could even be some sort of
>> step-through-pipe debugger at that point added as well for additional
>> convenience.
>>
>> There is some minor precedent for that type of transformative parsing:
>>
>>> expr = parse(text = "5 -> x")
>>
>>> expr
>>
>> expression(5 -> x)
>>
>>> expr[[1]]
>>
>> x <- 5
>>
>>
>> Though thats a much more minor transformation.
>>
>> All of that said, I believe Jim Hester (cc'ed) suggested something along
>> these lines at the RSummit a couple of years ago, and thus far R-core has
>> not shown much appetite for changing things in the parser.
>>
>> Without that changing, I'd have to say that my vote, for whatever its
>> worth, comes down on the side of pipes being fine in packages. A summary of
>> my reasoning being that it only makes sense for them to go into R itself if
>> doing so fixes an issue that cna't be fixed with them in package space.
>>
>> Best,
>> ~G
>>
>>
>>
>> On Sun, Oct 6, 2019 at 5:26 AM Ant F <[hidden email]> wrote:
>>
>>> Yes but this exageration precisely misses the point.
>>>
>>> Concerning your examples:
>>>
>>> * I love fread but I think it makes a lot of subjective choices that are
>>> best associated with a package. I think it
>>> changed a lot with time and can still change, and we have great developers
>>> willing to maintain it and be reactive
>>> regarding feature requests or bug reports
>>>
>>> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
>>> that's very easy to dismiss it as an inclusion in base R.
>>>
>>> * summarize is an alternative to aggregate, that would be very confusing to
>>> have both
>>>
>>> Now to be fair to your argument we could think of other functions such as
>>> data.table::rleid() which I believe base R misses deeply,
>>> and there is nothing wrong with packaged functions making their way to base
>>> R.
>>>
>>> Maybe there's an existing list of criteria for inclusion, in base R but if
>>> not I can make one up for the sake of this discussion :) :
>>> * 1) the functionality should not already exist
>>> * 2) the function should be general enough
>>> * 3) the function should have a large amount of potential of users
>>> * 4) the function should be robust, and not require extensive maintenance
>>> * 5) the function should be stable, we shouldn't expect new features ever 2
>>> months
>>> * 6) the function should have an intuitive interface in the context of the
>>> rest ot base R
>>>
>>> I guess 1 and 6 could be held against my proposal, because :
>>> (1) everything can be done without pipes
>>> (6) They are somewhat surprising (though with explicit dots not that much,
>>> and not more surprising than say `bquote()`)
>>>
>>> In my opinion the + offset the -.
>>>
>>> I wouldn't advise taking magrittr's pipe (providing the license allows so)
>>> for instance, because it makes a lot of design choices and has a complex
>>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>>> require maintenance.
>>>
>>> Antoine
>>>
>>> PS: I just receive the digest once a day so If you don't "reply all" I can
>>> only react later.
>>>
>>> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <[hidden email]> a écrit :
>>>
>>>> I exaggerated the comparison for effect. However, it is not very
>>> difficult
>>>> to find functions in dplyr or data.table or indeed other packages that
>>> one
>>>> may wish to be in base R. Examples, for me, could include
>>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
>>> Also,
>>>> the "popularity" of magrittr::`%>%` is mostly attributable to the
>>> tidyverse
>>>> (an advanced superset of R). Many R users don't even know that they are
>>>> installing the magrittr package.
>>>>
>>>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <[hidden email]>
>>> wrote:
>>>>
>>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <[hidden email]> wrote:
>>>>>>
>>>>>> How is your argument different to, say,  "Should dplyr or data.table
>>> be
>>>>>> part of base R as they are the most popular data science packages and
>>>>> they
>>>>>> are used by a large number of users?"
>>>>>
>>>>> Two packages with many features, dozens of functions and under heavy
>>>>> development to fix bugs, add new features and improve performance, vs.
>>>>> a single operator with a limited and well-defined functionality, and a
>>>>> reference implementation that hasn't changed in years (but certainly
>>>>> hackish in a way that probably could only be improved from R itself).
>>>>>
>>>>> Can't you really spot the difference?
>>>>>
>>>>> Iñaki
>>>>>
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Lionel Henry


> On 7 Oct 2019, at 17:04, Tierney, Luke <[hidden email]> wrote:
>
>  Think about what happens if an
> argument in a pipe stage contains a pipe. (Not completely
> unreasonable, e.g. for a left_join).

It should work exactly as it does in a local environment.

```
`%foo%` <- function(x, y) {
  env <- parent.frame()

  # Use `:=` to avoid partial matching on .env/.frame
  rlang::scoped_bindings(. := x, .env = env)

  eval(substitute(y), env)
}

"A" %foo% {
  print(.)
  "B" %foo% print(.)
  print(.)
}
#> [1] "A"
#> [1] "B"
#> [1] "A"

print(.)
#> Error in print(.) : object '.' not found

```

The advantage is that side effects (such as assigning variables or calling
`return()`) will occur in the expected environment. I don't see it causing
problems except in artificial cases. Am I missing something?

I agree that restraining the pipe to a single placeholder (to avoid
double evaluation) would be a good design too.

I can't access https://gitlab.com/luke-tierney/pipes, it appears to be 404.

Best,
Lionel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Tierney, Luke
On Mon, 7 Oct 2019, Lionel Henry wrote:

>
>
>> On 7 Oct 2019, at 17:04, Tierney, Luke <[hidden email]> wrote:
>>
>>  Think about what happens if an
>> argument in a pipe stage contains a pipe. (Not completely
>> unreasonable, e.g. for a left_join).
>
> It should work exactly as it does in a local environment.
>
> ```
> `%foo%` <- function(x, y) {
>  env <- parent.frame()
>
>  # Use `:=` to avoid partial matching on .env/.frame
>  rlang::scoped_bindings(. := x, .env = env)
>
>  eval(substitute(y), env)
> }
>
> "A" %foo% {
>  print(.)
>  "B" %foo% print(.)
>  print(.)
> }
> #> [1] "A"
> #> [1] "B"
> #> [1] "A"
>
> print(.)
> #> Error in print(.) : object '.' not found
>
> ```
>
> The advantage is that side effects (such as assigning variables or calling
> `return()`) will occur in the expected environment.

You get the assignment behavior with the nested call approach. (Not
that doing this is necessarily a good idea).

> I don't see it causing
> problems except in artificial cases. Am I missing something?

Here is a stylized example:

f <- function(x, y) {
     assign("xx", x, parent.frame())
     on.exit(rm(xx, envir = parent.frame()))
     y
     get("xx") + 1
}

## This is fine:
> f(1, 2)
[1] 2

## This is not:
> f(1, f(1, 2))
Error in get("xx") : object 'xx' not found

If you play these games whether you get the result you want, or an
obvious error, or just the wrong answer depends on argument evaluation
order and the like. You really don't want to go there. Not to mention
that you would be telling users they are not allowed to use '.' as a
variable name for their own purposes or you would be polluting their
environment with some other artificial symbol that they would see in
debugging. Just don't.

Anything going in base needs to worry even about artificial cases.
Yes, there are things in base that don't meet that standard. No, that
is not a reason to add more.

> I agree that restraining the pipe to a single placeholder (to avoid
> double evaluation) would be a good design too.
>
> I can't access https://gitlab.com/luke-tierney/pipes, it appears to be 404.

Should be able to get there now. Needed to change the visibility ---
still learning my way around gitlab.

Best,

luke

> Best,
> Lionel
>
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Kevin Ushey
IMHO, if base R were to include a pipe operator, I think it should be much
simpler than the magrittr pipe. It should satisfy the property that:

    x |> f(...)   is equivalent to   f(x, ...)

Except, perhaps, in terms of when the promise for 'x' gets forced. We
shouldn't need to mess with bindings in environments to make that work.

My understanding is that the '.' placeholder is used so that the magrittr
pipe can be adapted to functions that aren't endomorphic or otherwise
easily pipeable. I would argue that:

1. Users could just create their own pipable wrapper functions if so
required, or
2. Users could use magrittr to get some of the 'extensions' to the pipe
operator (with the noted caveats).

Best,
Kevin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Lionel Henry
Hi Kevin,

> On 7 Oct 2019, at 18:42, Kevin Ushey <[hidden email]> wrote:
>
> My understanding is that the '.' placeholder is used so that the magrittr pipe can be adapted to functions that aren't endomorphic or otherwise easily pipeable. I would argue that:
>
> 1. Users could just create their own pipable wrapper functions if so required, or
> 2. Users could use magrittr to get some of the 'extensions' to the pipe operator (with the noted caveats).


Another advantage of the placeholder is that it represents an obvious
binding to inspect while debugging. It would be useful to be able to
inspect all intermediate values in a pipeline by stepping with
sequences of `n` and `.` commands.

Lionel


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Lionel Henry
In reply to this post by Tierney, Luke
On 7 Oct 2019, at 18:17, Tierney, Luke <[hidden email]> wrote:

> Here is a stylized example:

The previous value of the binding should only be restored if it
existed:

g <- function(x, y) {
  rlang::scoped_bindings(xx = x, .env = parent.frame())
  y
  get("xx") + 10
}

# Good
g(1, 2)
#> [1] 11

# Still good?
g(1, g(1, 2))
#> [1] 11


> If you play these games whether you get the result you want, or an
> obvious error, or just the wrong answer depends on argument evaluation
> order and the like.

I think the surprises are limited because the pattern has stack-like
semantics. We get in a new context where `.` gains a new meaning, and
when we exit the previous meaning is restored.

One example where this could lead to unexpected behaviour is trying to
capture the value of the placeholder in a closure:

f <- function(x) {
  x %>% {
    identity(function() .)
  }
}

# This makes sense:
f("A")()
#> Error: object '.' not found

# This doesn't:
"B" %>% { f("A")() }
#> [1] "B"


> Not to mention that you would be telling users they are not allowed
> to use '.' as a variable name for their own purposes or you would be
> polluting their environment with some other artificial symbol that
> they would see in debugging.

That's a good point. Debugging allows to move up the call stack before
the context is exited, so you'd see the last value of `.` in examples
of nested pipes like `foo %>% bar( f %>% g() )`. That could be confusing.


> Anything going in base needs to worry even about artificial cases.
> Yes, there are things in base that don't meet that standard. No, that
> is not a reason to add more.

Agreed. What I meant by artificial cases is functions making
questionable assumptions after peeking into foreign contexts etc.

I'm worried about what happens with important language constructs like
`<-` and `return()` when code is evaluated in a local context. That
said, I think binding pipe values to `.` is more important than these
particular semantics because the placeholder is an obvious binding to
inspect while debug-stepping through a pipeline. So evaluating in a
child is probably preferable to giving up the placeholder altogether.

Best,
Lionel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Tierney, Luke
Yes you can make my little example work by implementing dynamic
scope with a stack for saving/restoring binding values. Given R's
reflection capabilities and rm() with an envir argument that has its
own issues. If you want to try to get this right and maintain it in
your own packages that is up to you. I can't see the cost/benefit
calculation justifying having it in base.

Best,

luke

On Mon, 7 Oct 2019, Lionel Henry wrote:

> On 7 Oct 2019, at 18:17, Tierney, Luke <[hidden email]> wrote:
>
>> Here is a stylized example:
>
> The previous value of the binding should only be restored if it
> existed:
>
> g <- function(x, y) {
>  rlang::scoped_bindings(xx = x, .env = parent.frame())
>  y
>  get("xx") + 10
> }
>
> # Good
> g(1, 2)
> #> [1] 11
>
> # Still good?
> g(1, g(1, 2))
> #> [1] 11
>
>
>> If you play these games whether you get the result you want, or an
>> obvious error, or just the wrong answer depends on argument evaluation
>> order and the like.
>
> I think the surprises are limited because the pattern has stack-like
> semantics. We get in a new context where `.` gains a new meaning, and
> when we exit the previous meaning is restored.
>
> One example where this could lead to unexpected behaviour is trying to
> capture the value of the placeholder in a closure:
>
> f <- function(x) {
>  x %>% {
>    identity(function() .)
>  }
> }
>
> # This makes sense:
> f("A")()
> #> Error: object '.' not found
>
> # This doesn't:
> "B" %>% { f("A")() }
> #> [1] "B"
>
>
>> Not to mention that you would be telling users they are not allowed
>> to use '.' as a variable name for their own purposes or you would be
>> polluting their environment with some other artificial symbol that
>> they would see in debugging.
>
> That's a good point. Debugging allows to move up the call stack before
> the context is exited, so you'd see the last value of `.` in examples
> of nested pipes like `foo %>% bar( f %>% g() )`. That could be confusing.
>
>
>> Anything going in base needs to worry even about artificial cases.
>> Yes, there are things in base that don't meet that standard. No, that
>> is not a reason to add more.
>
> Agreed. What I meant by artificial cases is functions making
> questionable assumptions after peeking into foreign contexts etc.
>
> I'm worried about what happens with important language constructs like
> `<-` and `return()` when code is evaluated in a local context. That
> said, I think binding pipe values to `.` is more important than these
> particular semantics because the placeholder is an obvious binding to
> inspect while debug-stepping through a pipeline. So evaluating in a
> child is probably preferable to giving up the placeholder altogether.
>
> Best,
> Lionel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: should base R have a piping operator ?

Tierney, Luke
In reply to this post by Tierney, Luke
Just for the record, and not that using return() calls like this is
necessarily a good idea, it is possible to make a nested-call-based
pipe that handles return() calls the way you want using delayedAssign.
I've added it to the end of the file on gitlab.

Time to move on to the stuff I've been avoiding ...

Best,

luke

On Mon, 7 Oct 2019, Tierney, Luke wrote:

> On Mon, 7 Oct 2019, Lionel Henry wrote:
>
>>
>>
>>> On 7 Oct 2019, at 17:04, Tierney, Luke <[hidden email]> wrote:
>>>
>>>  Think about what happens if an
>>> argument in a pipe stage contains a pipe. (Not completely
>>> unreasonable, e.g. for a left_join).
>>
>> It should work exactly as it does in a local environment.
>>
>> ```
>> `%foo%` <- function(x, y) {
>>  env <- parent.frame()
>>
>>  # Use `:=` to avoid partial matching on .env/.frame
>>  rlang::scoped_bindings(. := x, .env = env)
>>
>>  eval(substitute(y), env)
>> }
>>
>> "A" %foo% {
>>  print(.)
>>  "B" %foo% print(.)
>>  print(.)
>> }
>> #> [1] "A"
>> #> [1] "B"
>> #> [1] "A"
>>
>> print(.)
>> #> Error in print(.) : object '.' not found
>>
>> ```
>>
>> The advantage is that side effects (such as assigning variables or calling
>> `return()`) will occur in the expected environment.
>
> You get the assignment behavior with the nested call approach. (Not
> that doing this is necessarily a good idea).
>
>> I don't see it causing
>> problems except in artificial cases. Am I missing something?
>
> Here is a stylized example:
>
> f <- function(x, y) {
>     assign("xx", x, parent.frame())
>     on.exit(rm(xx, envir = parent.frame()))
>     y
>     get("xx") + 1
> }
>
> ## This is fine:
>> f(1, 2)
> [1] 2
>
> ## This is not:
>> f(1, f(1, 2))
> Error in get("xx") : object 'xx' not found
>
> If you play these games whether you get the result you want, or an
> obvious error, or just the wrong answer depends on argument evaluation
> order and the like. You really don't want to go there. Not to mention
> that you would be telling users they are not allowed to use '.' as a
> variable name for their own purposes or you would be polluting their
> environment with some other artificial symbol that they would see in
> debugging. Just don't.
>
> Anything going in base needs to worry even about artificial cases.
> Yes, there are things in base that don't meet that standard. No, that
> is not a reason to add more.
>
>> I agree that restraining the pipe to a single placeholder (to avoid
>> double evaluation) would be a good design too.
>>
>> I can't access https://gitlab.com/luke-tierney/pipes, it appears to be 404.
>
> Should be able to get there now. Needed to change the visibility ---
> still learning my way around gitlab.
>
> Best,
>
> luke
>
>> Best,
>> Lionel
>>
>>
>
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
12