the pipe |> and line breaks in pipelines

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

the pipe |> and line breaks in pipelines

Timothy Goodman
Hi,

I'm a data scientist who routinely uses R in my day-to-day work, for tasks
such as cleaning and transforming data, exploratory data analysis, etc.
This includes frequent use of the pipe operator from the magrittr and dplyr
libraries, %>%.  So, I was pleased to hear about the recent work on a
native pipe operator, |>.

This seems like a good time to bring up the main pain point I encounter
when using pipes in R, and some suggestions on what could be done about
it.  The issue is that the pipe operator can't be placed at the start of a
line of code (except in parentheses).  That's no different than any binary
operator in R, but I find it's a source of difficulty for the pipe because
of how pipes are often used.

[I'm assuming here that my usage is fairly typical of a lot of users; at
any rate, I don't think I'm *too* unusual.]

=== Why this is a problem ===

It's very common (for me, and I suspect for many users of dplyr) to write
multi-step pipelines and put each step on its own line for readability.
Something like this:

  ### Example 1 ###
  my_data_frame_1 %>%
    filter(some_conditions_1) %>%
    inner_join(my_data_frame_2, by = some_columns_1) %>%
    group_by(some_columns_2) %>%
    summarize(some_aggregate_functions_1) %>%
    filter(some_conditions_2) %>%
    left_join(my_data_frame_3, by = some_columns_3) %>%
    group_by(some_columns_4) %>%
    summarize(some_aggregate_functions_2) %>%
    arrange(some_columns_5)

[I guess some might consider this an overly long pipeline; for me it's
pretty typical.  I *could* split it up by assigning intermediate results to
variables, but much of the value I get from the pipe is that it lets my
code communicate which results are temporary, and which will be used again
later.  Assigning variables for single-use results would remove that
expressiveness.]

I would prefer (for reasons I'll explain) to be able to write the above
example like this, which isn't valid R:

  ### Example 2 (not valid R) ###
  my_data_frame_1
    %>% filter(some_conditions_1)
    %>% inner_join(my_data_frame_2, by = some_columns_1)
    %>% group_by(some_columns_2)
    %>% summarize(some_aggregate_functions_1)
    %>% filter(some_conditions_2)
    %>% left_join(my_data_frame_3, by = some_columns_3)
    %>% group_by(some_columns_4)
    %>% summarize(some_aggregate_functions_2)
    %>% arrange(some_columns_5)

One (minor) advantage is obvious: It lets you easily line up the pipes,
which means that you can see at a glance that the whole block is a single
pipeline, and you'd immediately notice if you inadvertently omitted a pipe,
which otherwise can lead to confusing output.  [It's also aesthetically
pleasing, especially when %>% is replaced with |>, but that's subjective.]

But the bigger issue happens when I want to re-run just *part* of the
pipeline.  I do this often when debugging: if the output of the pipeline
seems wrong, I re-run the first few steps and check the output, then
include a little more and re-run again, etc., until I locate my mistake.
Working in an interactive notebook environment, this involves using the
cursor to select just the part of the code I want to re-run.

It's fast and easy to select *entire* lines of code, but unfortunately with
the pipes placed at the end of the line I must instead select everything
*except* the last three characters of the line (the last two characters for
the new pipe).  Then when I want to re-run the same partial pipeline with
the next line of code included, I can't just press SHIFT+Down to select it
as I otherwise would, but instead must move the cursor horizontally to a
position three characters before the end of *that* line (which is generally
different due to varying line lengths).  And so forth each time I want to
include an additional line.

Moreover, with the staggered positions of the pipes at the end of each
line, it's very easy to accidentally select the final pipe on a line, and
then sit there for a moment wondering if the environment has stopped
responding before realizing it's just waiting for further input (i.e., for
the right-hand side).  These small delays and disruptions add up over the
course of a day.

This desire to select and re-run the first part of a pipeline is also the
reason why it doesn't suffice to achieve syntax like my "Example 2" by
wrapping the entire pipeline in parentheses.  That's of no use if I want to
re-run a selection that doesn't include the final close-paren.

=== Possible Solutions ===

I can think of two, but maybe there are others.  The first would make
"Example 2" into valid code, and the second would allow you to run a
selection that included a trailing pipe.

  Solution 1: Add a special case to how R is parsed, so if the first
(non-whitespace) token after an end-line is a pipe, that pipe gets moved to
before the end-line.
    - Argument for: This lets you write code like example 2, which
addresses the pain point around re-running part of a pipeline, and has
advantages for readability.  Also, since starting a line with a pipe
operator is currently invalid, the change wouldn't break any working code.
    - Argument against: It would make the behavior of %>% inconsistent with
that of other binary operators in R.  (However, this objection might not
apply to the new pipe, |>, which I understand is being implemented as a
syntax transformation rather than a binary operator.)

  Solution 2: Ignore the pipe operator if it occurs as the final token of
the code being executed.
    - Argument for: This would mean the user could select and re-run the
first few lines of a longer pipeline (selecting *entire* lines), avoiding
the difficulties described above.
    - Argument against: This means that %>% would be valid even if it
occurred without a right-hand side, which is inconsistent with other
operators in R.  (But, as above, this objection might not apply to |>.)
Also, this solution still doesn't enable the syntax of "Example 2", with
its readability benefit.

Thanks for reading this and considering it.

- Tim Goodman

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Stefan Evert-3
I'm not a pipe user, so I may be overlooking some issue, but wouldn't simply putting identity() on the last line solve your main problem?

### Example 1 ###
 my_data_frame_1 %>%
   filter(some_conditions_1) %>%
   inner_join(my_data_frame_2, by = some_columns_1) %>%
   group_by(some_columns_2) %>%
   summarize(some_aggregate_functions_1) %>%
   filter(some_conditions_2) %>%
   left_join(my_data_frame_3, by = some_columns_3) %>%
   group_by(some_columns_4) %>%
   summarize(some_aggregate_functions_2) %>%
   arrange(some_columns_5) %>%
   identity()

I agree that it would be visually more pleasing to have the pipe symbols lined up at the start of each line, but I don't think it's worth breaking R's principle of evaluating any line with a complete expression.

With your solution 1, R wouldn't be able to execute any complete command because it would have to wait and see if the next line happens to start with %>%.

With your solution 2,
 
  my_data_frame_1 %>%

would be a complete expression (because an extra trailing %>% is allowed on the last line of a pipe) and hence execute immediately rather than wait for the next line.

Best,
Stefan


> On 9 Dec 2020, at 06:45, Timothy Goodman <[hidden email]> wrote:
>
> Hi,
>
> I'm a data scientist who routinely uses R in my day-to-day work, for tasks
> such as cleaning and transforming data, exploratory data analysis, etc.
> This includes frequent use of the pipe operator from the magrittr and dplyr
> libraries, %>%.  So, I was pleased to hear about the recent work on a
> native pipe operator, |>.
>
> This seems like a good time to bring up the main pain point I encounter
> when using pipes in R, and some suggestions on what could be done about
> it.  The issue is that the pipe operator can't be placed at the start of a
> line of code (except in parentheses).  That's no different than any binary
> operator in R, but I find it's a source of difficulty for the pipe because
> of how pipes are often used.
>
> [I'm assuming here that my usage is fairly typical of a lot of users; at
> any rate, I don't think I'm *too* unusual.]
>
> === Why this is a problem ===
>
> It's very common (for me, and I suspect for many users of dplyr) to write
> multi-step pipelines and put each step on its own line for readability.
> Something like this:
>
>  ### Example 1 ###
>  my_data_frame_1 %>%
>    filter(some_conditions_1) %>%
>    inner_join(my_data_frame_2, by = some_columns_1) %>%
>    group_by(some_columns_2) %>%
>    summarize(some_aggregate_functions_1) %>%
>    filter(some_conditions_2) %>%
>    left_join(my_data_frame_3, by = some_columns_3) %>%
>    group_by(some_columns_4) %>%
>    summarize(some_aggregate_functions_2) %>%
>    arrange(some_columns_5)
>
> [I guess some might consider this an overly long pipeline; for me it's
> pretty typical.  I *could* split it up by assigning intermediate results to
> variables, but much of the value I get from the pipe is that it lets my
> code communicate which results are temporary, and which will be used again
> later.  Assigning variables for single-use results would remove that
> expressiveness.]
>
> I would prefer (for reasons I'll explain) to be able to write the above
> example like this, which isn't valid R:
>
>  ### Example 2 (not valid R) ###
>  my_data_frame_1
>    %>% filter(some_conditions_1)
>    %>% inner_join(my_data_frame_2, by = some_columns_1)
>    %>% group_by(some_columns_2)
>    %>% summarize(some_aggregate_functions_1)
>    %>% filter(some_conditions_2)
>    %>% left_join(my_data_frame_3, by = some_columns_3)
>    %>% group_by(some_columns_4)
>    %>% summarize(some_aggregate_functions_2)
>    %>% arrange(some_columns_5)
>
> One (minor) advantage is obvious: It lets you easily line up the pipes,
> which means that you can see at a glance that the whole block is a single
> pipeline, and you'd immediately notice if you inadvertently omitted a pipe,
> which otherwise can lead to confusing output.  [It's also aesthetically
> pleasing, especially when %>% is replaced with |>, but that's subjective.]
>
> But the bigger issue happens when I want to re-run just *part* of the
> pipeline.  I do this often when debugging: if the output of the pipeline
> seems wrong, I re-run the first few steps and check the output, then
> include a little more and re-run again, etc., until I locate my mistake.
> Working in an interactive notebook environment, this involves using the
> cursor to select just the part of the code I want to re-run.
>
> It's fast and easy to select *entire* lines of code, but unfortunately with
> the pipes placed at the end of the line I must instead select everything
> *except* the last three characters of the line (the last two characters for
> the new pipe).  Then when I want to re-run the same partial pipeline with
> the next line of code included, I can't just press SHIFT+Down to select it
> as I otherwise would, but instead must move the cursor horizontally to a
> position three characters before the end of *that* line (which is generally
> different due to varying line lengths).  And so forth each time I want to
> include an additional line.
>
> Moreover, with the staggered positions of the pipes at the end of each
> line, it's very easy to accidentally select the final pipe on a line, and
> then sit there for a moment wondering if the environment has stopped
> responding before realizing it's just waiting for further input (i.e., for
> the right-hand side).  These small delays and disruptions add up over the
> course of a day.
>
> This desire to select and re-run the first part of a pipeline is also the
> reason why it doesn't suffice to achieve syntax like my "Example 2" by
> wrapping the entire pipeline in parentheses.  That's of no use if I want to
> re-run a selection that doesn't include the final close-paren.
>
> === Possible Solutions ===
>
> I can think of two, but maybe there are others.  The first would make
> "Example 2" into valid code, and the second would allow you to run a
> selection that included a trailing pipe.
>
>  Solution 1: Add a special case to how R is parsed, so if the first
> (non-whitespace) token after an end-line is a pipe, that pipe gets moved to
> before the end-line.
>    - Argument for: This lets you write code like example 2, which
> addresses the pain point around re-running part of a pipeline, and has
> advantages for readability.  Also, since starting a line with a pipe
> operator is currently invalid, the change wouldn't break any working code.
>    - Argument against: It would make the behavior of %>% inconsistent with
> that of other binary operators in R.  (However, this objection might not
> apply to the new pipe, |>, which I understand is being implemented as a
> syntax transformation rather than a binary operator.)
>
>  Solution 2: Ignore the pipe operator if it occurs as the final token of
> the code being executed.
>    - Argument for: This would mean the user could select and re-run the
> first few lines of a longer pipeline (selecting *entire* lines), avoiding
> the difficulties described above.
>    - Argument against: This means that %>% would be valid even if it
> occurred without a right-hand side, which is inconsistent with other
> operators in R.  (But, as above, this objection might not apply to |>.)
> Also, this solution still doesn't enable the syntax of "Example 2", with
> its readability benefit.
>
> Thanks for reading this and considering it.
>
> - Tim Goodman
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Duncan Murdoch-2
In reply to this post by Timothy Goodman
The requirement for operators at the end of the line comes from the
interactive nature of R.  If you type

     my_data_frame_1

how could R know that you are not done, and are planning to type the
rest of the expression

       %>% filter(some_conditions_1)
       ...

before it should consider the expression complete?  The way languages
like C do this is by requiring a statement terminator at the end.  You
can also do it by wrapping the entire thing in parentheses ().

However, be careful: Don't use braces:  they don't work.  And parens
have the side effect of removing invisibility from the result (which is
a design flaw or bonus, depending on your point of view).  So I actually
wouldn't advise this workaround.

Duncan Murdoch


On 09/12/2020 12:45 a.m., Timothy Goodman wrote:

> Hi,
>
> I'm a data scientist who routinely uses R in my day-to-day work, for tasks
> such as cleaning and transforming data, exploratory data analysis, etc.
> This includes frequent use of the pipe operator from the magrittr and dplyr
> libraries, %>%.  So, I was pleased to hear about the recent work on a
> native pipe operator, |>.
>
> This seems like a good time to bring up the main pain point I encounter
> when using pipes in R, and some suggestions on what could be done about
> it.  The issue is that the pipe operator can't be placed at the start of a
> line of code (except in parentheses).  That's no different than any binary
> operator in R, but I find it's a source of difficulty for the pipe because
> of how pipes are often used.
>
> [I'm assuming here that my usage is fairly typical of a lot of users; at
> any rate, I don't think I'm *too* unusual.]
>
> === Why this is a problem ===
>
> It's very common (for me, and I suspect for many users of dplyr) to write
> multi-step pipelines and put each step on its own line for readability.
> Something like this:
>
>    ### Example 1 ###
>    my_data_frame_1 %>%
>      filter(some_conditions_1) %>%
>      inner_join(my_data_frame_2, by = some_columns_1) %>%
>      group_by(some_columns_2) %>%
>      summarize(some_aggregate_functions_1) %>%
>      filter(some_conditions_2) %>%
>      left_join(my_data_frame_3, by = some_columns_3) %>%
>      group_by(some_columns_4) %>%
>      summarize(some_aggregate_functions_2) %>%
>      arrange(some_columns_5)
>
> [I guess some might consider this an overly long pipeline; for me it's
> pretty typical.  I *could* split it up by assigning intermediate results to
> variables, but much of the value I get from the pipe is that it lets my
> code communicate which results are temporary, and which will be used again
> later.  Assigning variables for single-use results would remove that
> expressiveness.]
>
> I would prefer (for reasons I'll explain) to be able to write the above
> example like this, which isn't valid R:
>
>    ### Example 2 (not valid R) ###
>    my_data_frame_1
>      %>% filter(some_conditions_1)
>      %>% inner_join(my_data_frame_2, by = some_columns_1)
>      %>% group_by(some_columns_2)
>      %>% summarize(some_aggregate_functions_1)
>      %>% filter(some_conditions_2)
>      %>% left_join(my_data_frame_3, by = some_columns_3)
>      %>% group_by(some_columns_4)
>      %>% summarize(some_aggregate_functions_2)
>      %>% arrange(some_columns_5)
>
> One (minor) advantage is obvious: It lets you easily line up the pipes,
> which means that you can see at a glance that the whole block is a single
> pipeline, and you'd immediately notice if you inadvertently omitted a pipe,
> which otherwise can lead to confusing output.  [It's also aesthetically
> pleasing, especially when %>% is replaced with |>, but that's subjective.]
>
> But the bigger issue happens when I want to re-run just *part* of the
> pipeline.  I do this often when debugging: if the output of the pipeline
> seems wrong, I re-run the first few steps and check the output, then
> include a little more and re-run again, etc., until I locate my mistake.
> Working in an interactive notebook environment, this involves using the
> cursor to select just the part of the code I want to re-run.
>
> It's fast and easy to select *entire* lines of code, but unfortunately with
> the pipes placed at the end of the line I must instead select everything
> *except* the last three characters of the line (the last two characters for
> the new pipe).  Then when I want to re-run the same partial pipeline with
> the next line of code included, I can't just press SHIFT+Down to select it
> as I otherwise would, but instead must move the cursor horizontally to a
> position three characters before the end of *that* line (which is generally
> different due to varying line lengths).  And so forth each time I want to
> include an additional line.
>
> Moreover, with the staggered positions of the pipes at the end of each
> line, it's very easy to accidentally select the final pipe on a line, and
> then sit there for a moment wondering if the environment has stopped
> responding before realizing it's just waiting for further input (i.e., for
> the right-hand side).  These small delays and disruptions add up over the
> course of a day.
>
> This desire to select and re-run the first part of a pipeline is also the
> reason why it doesn't suffice to achieve syntax like my "Example 2" by
> wrapping the entire pipeline in parentheses.  That's of no use if I want to
> re-run a selection that doesn't include the final close-paren.
>
> === Possible Solutions ===
>
> I can think of two, but maybe there are others.  The first would make
> "Example 2" into valid code, and the second would allow you to run a
> selection that included a trailing pipe.
>
>    Solution 1: Add a special case to how R is parsed, so if the first
> (non-whitespace) token after an end-line is a pipe, that pipe gets moved to
> before the end-line.
>      - Argument for: This lets you write code like example 2, which
> addresses the pain point around re-running part of a pipeline, and has
> advantages for readability.  Also, since starting a line with a pipe
> operator is currently invalid, the change wouldn't break any working code.
>      - Argument against: It would make the behavior of %>% inconsistent with
> that of other binary operators in R.  (However, this objection might not
> apply to the new pipe, |>, which I understand is being implemented as a
> syntax transformation rather than a binary operator.)
>
>    Solution 2: Ignore the pipe operator if it occurs as the final token of
> the code being executed.
>      - Argument for: This would mean the user could select and re-run the
> first few lines of a longer pipeline (selecting *entire* lines), avoiding
> the difficulties described above.
>      - Argument against: This means that %>% would be valid even if it
> occurred without a right-hand side, which is inconsistent with other
> operators in R.  (But, as above, this objection might not apply to |>.)
> Also, this solution still doesn't enable the syntax of "Example 2", with
> its readability benefit.
>
> Thanks for reading this and considering it.
>
> - Tim Goodman
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Gabor Grothendieck
In reply to this post by Timothy Goodman
On Wed, Dec 9, 2020 at 4:03 AM Timothy Goodman <[hidden email]> wrote:
> But the bigger issue happens when I want to re-run just *part* of the
> pipeline.

Insert one of the following into the pipeline. It does not require that you
edit any lines.   It only involves inserting a line.

  print %>%
  { str(.); . } %>%
  { . ->> .save } %>%

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Timothy Goodman
In reply to this post by Duncan Murdoch-2
If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would* expect R
to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

    my_data_frame1
        |> filter(some_conditions_1)

and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by pressing
Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and is
almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the previous
line was complete.  JavaScript's rules around this are overly complicated,
but a rule like "a pipe following a line break is treated as continuing the
previous line" would be much simpler.  And while it might be objectionable
to treat the operator %>% different from other operators, the addition of
|>, which isn't truly an operator at all, seems like the right time to
consider it.)

-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <[hidden email]>
wrote:

> The requirement for operators at the end of the line comes from the
> interactive nature of R.  If you type
>
>      my_data_frame_1
>
> how could R know that you are not done, and are planning to type the
> rest of the expression
>
>        %>% filter(some_conditions_1)
>        ...
>
> before it should consider the expression complete?  The way languages
> like C do this is by requiring a statement terminator at the end.  You
> can also do it by wrapping the entire thing in parentheses ().
>
> However, be careful: Don't use braces:  they don't work.  And parens
> have the side effect of removing invisibility from the result (which is
> a design flaw or bonus, depending on your point of view).  So I actually
> wouldn't advise this workaround.
>
> Duncan Murdoch
>
>
> On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> > Hi,
> >
> > I'm a data scientist who routinely uses R in my day-to-day work, for
> tasks
> > such as cleaning and transforming data, exploratory data analysis, etc.
> > This includes frequent use of the pipe operator from the magrittr and
> dplyr
> > libraries, %>%.  So, I was pleased to hear about the recent work on a
> > native pipe operator, |>.
> >
> > This seems like a good time to bring up the main pain point I encounter
> > when using pipes in R, and some suggestions on what could be done about
> > it.  The issue is that the pipe operator can't be placed at the start of
> a
> > line of code (except in parentheses).  That's no different than any
> binary
> > operator in R, but I find it's a source of difficulty for the pipe
> because
> > of how pipes are often used.
> >
> > [I'm assuming here that my usage is fairly typical of a lot of users; at
> > any rate, I don't think I'm *too* unusual.]
> >
> > === Why this is a problem ===
> >
> > It's very common (for me, and I suspect for many users of dplyr) to write
> > multi-step pipelines and put each step on its own line for readability.
> > Something like this:
> >
> >    ### Example 1 ###
> >    my_data_frame_1 %>%
> >      filter(some_conditions_1) %>%
> >      inner_join(my_data_frame_2, by = some_columns_1) %>%
> >      group_by(some_columns_2) %>%
> >      summarize(some_aggregate_functions_1) %>%
> >      filter(some_conditions_2) %>%
> >      left_join(my_data_frame_3, by = some_columns_3) %>%
> >      group_by(some_columns_4) %>%
> >      summarize(some_aggregate_functions_2) %>%
> >      arrange(some_columns_5)
> >
> > [I guess some might consider this an overly long pipeline; for me it's
> > pretty typical.  I *could* split it up by assigning intermediate results
> to
> > variables, but much of the value I get from the pipe is that it lets my
> > code communicate which results are temporary, and which will be used
> again
> > later.  Assigning variables for single-use results would remove that
> > expressiveness.]
> >
> > I would prefer (for reasons I'll explain) to be able to write the above
> > example like this, which isn't valid R:
> >
> >    ### Example 2 (not valid R) ###
> >    my_data_frame_1
> >      %>% filter(some_conditions_1)
> >      %>% inner_join(my_data_frame_2, by = some_columns_1)
> >      %>% group_by(some_columns_2)
> >      %>% summarize(some_aggregate_functions_1)
> >      %>% filter(some_conditions_2)
> >      %>% left_join(my_data_frame_3, by = some_columns_3)
> >      %>% group_by(some_columns_4)
> >      %>% summarize(some_aggregate_functions_2)
> >      %>% arrange(some_columns_5)
> >
> > One (minor) advantage is obvious: It lets you easily line up the pipes,
> > which means that you can see at a glance that the whole block is a single
> > pipeline, and you'd immediately notice if you inadvertently omitted a
> pipe,
> > which otherwise can lead to confusing output.  [It's also aesthetically
> > pleasing, especially when %>% is replaced with |>, but that's
> subjective.]
> >
> > But the bigger issue happens when I want to re-run just *part* of the
> > pipeline.  I do this often when debugging: if the output of the pipeline
> > seems wrong, I re-run the first few steps and check the output, then
> > include a little more and re-run again, etc., until I locate my mistake.
> > Working in an interactive notebook environment, this involves using the
> > cursor to select just the part of the code I want to re-run.
> >
> > It's fast and easy to select *entire* lines of code, but unfortunately
> with
> > the pipes placed at the end of the line I must instead select everything
> > *except* the last three characters of the line (the last two characters
> for
> > the new pipe).  Then when I want to re-run the same partial pipeline with
> > the next line of code included, I can't just press SHIFT+Down to select
> it
> > as I otherwise would, but instead must move the cursor horizontally to a
> > position three characters before the end of *that* line (which is
> generally
> > different due to varying line lengths).  And so forth each time I want to
> > include an additional line.
> >
> > Moreover, with the staggered positions of the pipes at the end of each
> > line, it's very easy to accidentally select the final pipe on a line, and
> > then sit there for a moment wondering if the environment has stopped
> > responding before realizing it's just waiting for further input (i.e.,
> for
> > the right-hand side).  These small delays and disruptions add up over the
> > course of a day.
> >
> > This desire to select and re-run the first part of a pipeline is also the
> > reason why it doesn't suffice to achieve syntax like my "Example 2" by
> > wrapping the entire pipeline in parentheses.  That's of no use if I want
> to
> > re-run a selection that doesn't include the final close-paren.
> >
> > === Possible Solutions ===
> >
> > I can think of two, but maybe there are others.  The first would make
> > "Example 2" into valid code, and the second would allow you to run a
> > selection that included a trailing pipe.
> >
> >    Solution 1: Add a special case to how R is parsed, so if the first
> > (non-whitespace) token after an end-line is a pipe, that pipe gets moved
> to
> > before the end-line.
> >      - Argument for: This lets you write code like example 2, which
> > addresses the pain point around re-running part of a pipeline, and has
> > advantages for readability.  Also, since starting a line with a pipe
> > operator is currently invalid, the change wouldn't break any working
> code.
> >      - Argument against: It would make the behavior of %>% inconsistent
> with
> > that of other binary operators in R.  (However, this objection might not
> > apply to the new pipe, |>, which I understand is being implemented as a
> > syntax transformation rather than a binary operator.)
> >
> >    Solution 2: Ignore the pipe operator if it occurs as the final token
> of
> > the code being executed.
> >      - Argument for: This would mean the user could select and re-run the
> > first few lines of a longer pipeline (selecting *entire* lines), avoiding
> > the difficulties described above.
> >      - Argument against: This means that %>% would be valid even if it
> > occurred without a right-hand side, which is inconsistent with other
> > operators in R.  (But, as above, this objection might not apply to |>.)
> > Also, this solution still doesn't enable the syntax of "Example 2", with
> > its readability benefit.
> >
> > Thanks for reading this and considering it.
> >
> > - Tim Goodman
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Kevin Ushey
I agree with Duncan that the right solution is to wrap the pipe
expression with parentheses. Having the parser treat newlines
differently based on whether the session is interactive, or on what
type of operator happens to follow a newline, feels like a pretty big
can of worms.

I think this (or something similar) would accomplish what you want
while still retaining the nice aesthetics of the pipe expression, with
a minimal amount of syntax "noise":

result <- (
  data
    |> op1()
    |> op2()
)

For interactive sessions where you wanted to execute only parts of the
pipeline at a time, I could see that being accomplished by the editor
-- it could transform the expression so that it could be handled by R,
either by hoisting the pipe operator(s) up a line, or by wrapping the
to-be-executed expression in parentheses for you. If such a style of
coding became popular enough, I'm sure the developers of such editors
would be interested and willing to support this ...

Perhaps more importantly, it would be much easier to accomplish than a
change to the behavior of the R parser, and it would be work that
wouldn't have to be maintained by the R Core team.

Best,
Kevin

On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman <[hidden email]> wrote:

>
> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> command in the Notebook environment I'm using) I certainly *would* expect R
> to treat it as a complete statement.
>
> But what I'm talking about is a different case, where I highlight a
> multi-line statement in my notebook:
>
>     my_data_frame1
>         |> filter(some_conditions_1)
>
> and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
> R script containing those two lines of code, or to run a multi-line
> statement like that from the console (which in RStudio I can do by pressing
> Shift+Enter between the lines.)
>
> In those cases, R could either (1) Give an error message [the current
> behavior], or (2) understand that the first line is meant to be piped to
> the second.  The second option would be significantly more useful, and is
> almost certainly what the user intended.
>
> (For what it's worth, there are some languages, such as Javascript, that
> consider the first token of the next line when determining if the previous
> line was complete.  JavaScript's rules around this are overly complicated,
> but a rule like "a pipe following a line break is treated as continuing the
> previous line" would be much simpler.  And while it might be objectionable
> to treat the operator %>% different from other operators, the addition of
> |>, which isn't truly an operator at all, seems like the right time to
> consider it.)
>
> -Tim
>
> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <[hidden email]>
> wrote:
>
> > The requirement for operators at the end of the line comes from the
> > interactive nature of R.  If you type
> >
> >      my_data_frame_1
> >
> > how could R know that you are not done, and are planning to type the
> > rest of the expression
> >
> >        %>% filter(some_conditions_1)
> >        ...
> >
> > before it should consider the expression complete?  The way languages
> > like C do this is by requiring a statement terminator at the end.  You
> > can also do it by wrapping the entire thing in parentheses ().
> >
> > However, be careful: Don't use braces:  they don't work.  And parens
> > have the side effect of removing invisibility from the result (which is
> > a design flaw or bonus, depending on your point of view).  So I actually
> > wouldn't advise this workaround.
> >
> > Duncan Murdoch
> >
> >
> > On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> > > Hi,
> > >
> > > I'm a data scientist who routinely uses R in my day-to-day work, for
> > tasks
> > > such as cleaning and transforming data, exploratory data analysis, etc.
> > > This includes frequent use of the pipe operator from the magrittr and
> > dplyr
> > > libraries, %>%.  So, I was pleased to hear about the recent work on a
> > > native pipe operator, |>.
> > >
> > > This seems like a good time to bring up the main pain point I encounter
> > > when using pipes in R, and some suggestions on what could be done about
> > > it.  The issue is that the pipe operator can't be placed at the start of
> > a
> > > line of code (except in parentheses).  That's no different than any
> > binary
> > > operator in R, but I find it's a source of difficulty for the pipe
> > because
> > > of how pipes are often used.
> > >
> > > [I'm assuming here that my usage is fairly typical of a lot of users; at
> > > any rate, I don't think I'm *too* unusual.]
> > >
> > > === Why this is a problem ===
> > >
> > > It's very common (for me, and I suspect for many users of dplyr) to write
> > > multi-step pipelines and put each step on its own line for readability.
> > > Something like this:
> > >
> > >    ### Example 1 ###
> > >    my_data_frame_1 %>%
> > >      filter(some_conditions_1) %>%
> > >      inner_join(my_data_frame_2, by = some_columns_1) %>%
> > >      group_by(some_columns_2) %>%
> > >      summarize(some_aggregate_functions_1) %>%
> > >      filter(some_conditions_2) %>%
> > >      left_join(my_data_frame_3, by = some_columns_3) %>%
> > >      group_by(some_columns_4) %>%
> > >      summarize(some_aggregate_functions_2) %>%
> > >      arrange(some_columns_5)
> > >
> > > [I guess some might consider this an overly long pipeline; for me it's
> > > pretty typical.  I *could* split it up by assigning intermediate results
> > to
> > > variables, but much of the value I get from the pipe is that it lets my
> > > code communicate which results are temporary, and which will be used
> > again
> > > later.  Assigning variables for single-use results would remove that
> > > expressiveness.]
> > >
> > > I would prefer (for reasons I'll explain) to be able to write the above
> > > example like this, which isn't valid R:
> > >
> > >    ### Example 2 (not valid R) ###
> > >    my_data_frame_1
> > >      %>% filter(some_conditions_1)
> > >      %>% inner_join(my_data_frame_2, by = some_columns_1)
> > >      %>% group_by(some_columns_2)
> > >      %>% summarize(some_aggregate_functions_1)
> > >      %>% filter(some_conditions_2)
> > >      %>% left_join(my_data_frame_3, by = some_columns_3)
> > >      %>% group_by(some_columns_4)
> > >      %>% summarize(some_aggregate_functions_2)
> > >      %>% arrange(some_columns_5)
> > >
> > > One (minor) advantage is obvious: It lets you easily line up the pipes,
> > > which means that you can see at a glance that the whole block is a single
> > > pipeline, and you'd immediately notice if you inadvertently omitted a
> > pipe,
> > > which otherwise can lead to confusing output.  [It's also aesthetically
> > > pleasing, especially when %>% is replaced with |>, but that's
> > subjective.]
> > >
> > > But the bigger issue happens when I want to re-run just *part* of the
> > > pipeline.  I do this often when debugging: if the output of the pipeline
> > > seems wrong, I re-run the first few steps and check the output, then
> > > include a little more and re-run again, etc., until I locate my mistake.
> > > Working in an interactive notebook environment, this involves using the
> > > cursor to select just the part of the code I want to re-run.
> > >
> > > It's fast and easy to select *entire* lines of code, but unfortunately
> > with
> > > the pipes placed at the end of the line I must instead select everything
> > > *except* the last three characters of the line (the last two characters
> > for
> > > the new pipe).  Then when I want to re-run the same partial pipeline with
> > > the next line of code included, I can't just press SHIFT+Down to select
> > it
> > > as I otherwise would, but instead must move the cursor horizontally to a
> > > position three characters before the end of *that* line (which is
> > generally
> > > different due to varying line lengths).  And so forth each time I want to
> > > include an additional line.
> > >
> > > Moreover, with the staggered positions of the pipes at the end of each
> > > line, it's very easy to accidentally select the final pipe on a line, and
> > > then sit there for a moment wondering if the environment has stopped
> > > responding before realizing it's just waiting for further input (i.e.,
> > for
> > > the right-hand side).  These small delays and disruptions add up over the
> > > course of a day.
> > >
> > > This desire to select and re-run the first part of a pipeline is also the
> > > reason why it doesn't suffice to achieve syntax like my "Example 2" by
> > > wrapping the entire pipeline in parentheses.  That's of no use if I want
> > to
> > > re-run a selection that doesn't include the final close-paren.
> > >
> > > === Possible Solutions ===
> > >
> > > I can think of two, but maybe there are others.  The first would make
> > > "Example 2" into valid code, and the second would allow you to run a
> > > selection that included a trailing pipe.
> > >
> > >    Solution 1: Add a special case to how R is parsed, so if the first
> > > (non-whitespace) token after an end-line is a pipe, that pipe gets moved
> > to
> > > before the end-line.
> > >      - Argument for: This lets you write code like example 2, which
> > > addresses the pain point around re-running part of a pipeline, and has
> > > advantages for readability.  Also, since starting a line with a pipe
> > > operator is currently invalid, the change wouldn't break any working
> > code.
> > >      - Argument against: It would make the behavior of %>% inconsistent
> > with
> > > that of other binary operators in R.  (However, this objection might not
> > > apply to the new pipe, |>, which I understand is being implemented as a
> > > syntax transformation rather than a binary operator.)
> > >
> > >    Solution 2: Ignore the pipe operator if it occurs as the final token
> > of
> > > the code being executed.
> > >      - Argument for: This would mean the user could select and re-run the
> > > first few lines of a longer pipeline (selecting *entire* lines), avoiding
> > > the difficulties described above.
> > >      - Argument against: This means that %>% would be valid even if it
> > > occurred without a right-hand side, which is inconsistent with other
> > > operators in R.  (But, as above, this objection might not apply to |>.)
> > > Also, this solution still doesn't enable the syntax of "Example 2", with
> > > its readability benefit.
> > >
> > > Thanks for reading this and considering it.
> > >
> > > - Tim Goodman
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Duncan Murdoch-2
In reply to this post by Timothy Goodman
On 09/12/2020 2:33 p.m., Timothy Goodman wrote:

> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> command in the Notebook environment I'm using) I certainly *would*
> expect R to treat it as a complete statement.
>
> But what I'm talking about is a different case, where I highlight a
> multi-line statement in my notebook:
>
>      my_data_frame1
>          |> filter(some_conditions_1)
>
> and then press Ctrl+Enter.

I don't think I'd like it if parsing changed between passing one line at
a time and passing a block of lines.  I'd like to be able to highlight a
few lines and pass those, then type one, then highlight some more and
pass those:  and have it act as though I just passed the whole combined
block, or typed everything one line at a time.


   Or, I suppose the equivalent would be to run

> an R script containing those two lines of code, or to run a multi-line
> statement like that from the console (which in RStudio I can do by
> pressing Shift+Enter between the lines.)
>
> In those cases, R could either (1) Give an error message [the current
> behavior], or (2) understand that the first line is meant to be piped to
> the second.  The second option would be significantly more useful, and
> is almost certainly what the user intended.
>
> (For what it's worth, there are some languages, such as Javascript, that
> consider the first token of the next line when determining if the
> previous line was complete.  JavaScript's rules around this are overly
> complicated, but a rule like "a pipe following a line break is treated
> as continuing the previous line" would be much simpler.  And while it
> might be objectionable to treat the operator %>% different from other
> operators, the addition of |>, which isn't truly an operator at all,
> seems like the right time to consider it.)

I think this would be hard to implement with R's current parser, but
possible.  I think it could be done by distinguishing between EOL
markers within a block of text and "end of block" marks.  If it applied
only to the |> operator it would be *really* ugly.

My strongest objection to it is the one at the top, though.  If I have a
block of lines sitting in my editor that I just finished executing, with
the cursor pointing at the next line, I'd like to know that it didn't
matter whether the lines were passed one at a time, as a block, or some
combination of those.

Duncan Murdoch

>
> -Tim
>
> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     The requirement for operators at the end of the line comes from the
>     interactive nature of R.  If you type
>
>           my_data_frame_1
>
>     how could R know that you are not done, and are planning to type the
>     rest of the expression
>
>             %>% filter(some_conditions_1)
>             ...
>
>     before it should consider the expression complete?  The way languages
>     like C do this is by requiring a statement terminator at the end.  You
>     can also do it by wrapping the entire thing in parentheses ().
>
>     However, be careful: Don't use braces:  they don't work.  And parens
>     have the side effect of removing invisibility from the result (which is
>     a design flaw or bonus, depending on your point of view).  So I
>     actually
>     wouldn't advise this workaround.
>
>     Duncan Murdoch
>
>
>     On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
>      > Hi,
>      >
>      > I'm a data scientist who routinely uses R in my day-to-day work,
>     for tasks
>      > such as cleaning and transforming data, exploratory data
>     analysis, etc.
>      > This includes frequent use of the pipe operator from the magrittr
>     and dplyr
>      > libraries, %>%.  So, I was pleased to hear about the recent work on a
>      > native pipe operator, |>.
>      >
>      > This seems like a good time to bring up the main pain point I
>     encounter
>      > when using pipes in R, and some suggestions on what could be done
>     about
>      > it.  The issue is that the pipe operator can't be placed at the
>     start of a
>      > line of code (except in parentheses).  That's no different than
>     any binary
>      > operator in R, but I find it's a source of difficulty for the
>     pipe because
>      > of how pipes are often used.
>      >
>      > [I'm assuming here that my usage is fairly typical of a lot of
>     users; at
>      > any rate, I don't think I'm *too* unusual.]
>      >
>      > === Why this is a problem ===
>      >
>      > It's very common (for me, and I suspect for many users of dplyr)
>     to write
>      > multi-step pipelines and put each step on its own line for
>     readability.
>      > Something like this:
>      >
>      >    ### Example 1 ###
>      >    my_data_frame_1 %>%
>      >      filter(some_conditions_1) %>%
>      >      inner_join(my_data_frame_2, by = some_columns_1) %>%
>      >      group_by(some_columns_2) %>%
>      >      summarize(some_aggregate_functions_1) %>%
>      >      filter(some_conditions_2) %>%
>      >      left_join(my_data_frame_3, by = some_columns_3) %>%
>      >      group_by(some_columns_4) %>%
>      >      summarize(some_aggregate_functions_2) %>%
>      >      arrange(some_columns_5)
>      >
>      > [I guess some might consider this an overly long pipeline; for me
>     it's
>      > pretty typical.  I *could* split it up by assigning intermediate
>     results to
>      > variables, but much of the value I get from the pipe is that it
>     lets my
>      > code communicate which results are temporary, and which will be
>     used again
>      > later.  Assigning variables for single-use results would remove that
>      > expressiveness.]
>      >
>      > I would prefer (for reasons I'll explain) to be able to write the
>     above
>      > example like this, which isn't valid R:
>      >
>      >    ### Example 2 (not valid R) ###
>      >    my_data_frame_1
>      >      %>% filter(some_conditions_1)
>      >      %>% inner_join(my_data_frame_2, by = some_columns_1)
>      >      %>% group_by(some_columns_2)
>      >      %>% summarize(some_aggregate_functions_1)
>      >      %>% filter(some_conditions_2)
>      >      %>% left_join(my_data_frame_3, by = some_columns_3)
>      >      %>% group_by(some_columns_4)
>      >      %>% summarize(some_aggregate_functions_2)
>      >      %>% arrange(some_columns_5)
>      >
>      > One (minor) advantage is obvious: It lets you easily line up the
>     pipes,
>      > which means that you can see at a glance that the whole block is
>     a single
>      > pipeline, and you'd immediately notice if you inadvertently
>     omitted a pipe,
>      > which otherwise can lead to confusing output.  [It's also
>     aesthetically
>      > pleasing, especially when %>% is replaced with |>, but that's
>     subjective.]
>      >
>      > But the bigger issue happens when I want to re-run just *part* of the
>      > pipeline.  I do this often when debugging: if the output of the
>     pipeline
>      > seems wrong, I re-run the first few steps and check the output, then
>      > include a little more and re-run again, etc., until I locate my
>     mistake.
>      > Working in an interactive notebook environment, this involves
>     using the
>      > cursor to select just the part of the code I want to re-run.
>      >
>      > It's fast and easy to select *entire* lines of code, but
>     unfortunately with
>      > the pipes placed at the end of the line I must instead select
>     everything
>      > *except* the last three characters of the line (the last two
>     characters for
>      > the new pipe).  Then when I want to re-run the same partial
>     pipeline with
>      > the next line of code included, I can't just press SHIFT+Down to
>     select it
>      > as I otherwise would, but instead must move the cursor
>     horizontally to a
>      > position three characters before the end of *that* line (which is
>     generally
>      > different due to varying line lengths).  And so forth each time I
>     want to
>      > include an additional line.
>      >
>      > Moreover, with the staggered positions of the pipes at the end of
>     each
>      > line, it's very easy to accidentally select the final pipe on a
>     line, and
>      > then sit there for a moment wondering if the environment has stopped
>      > responding before realizing it's just waiting for further input
>     (i.e., for
>      > the right-hand side).  These small delays and disruptions add up
>     over the
>      > course of a day.
>      >
>      > This desire to select and re-run the first part of a pipeline is
>     also the
>      > reason why it doesn't suffice to achieve syntax like my "Example
>     2" by
>      > wrapping the entire pipeline in parentheses.  That's of no use if
>     I want to
>      > re-run a selection that doesn't include the final close-paren.
>      >
>      > === Possible Solutions ===
>      >
>      > I can think of two, but maybe there are others.  The first would make
>      > "Example 2" into valid code, and the second would allow you to run a
>      > selection that included a trailing pipe.
>      >
>      >    Solution 1: Add a special case to how R is parsed, so if the first
>      > (non-whitespace) token after an end-line is a pipe, that pipe
>     gets moved to
>      > before the end-line.
>      >      - Argument for: This lets you write code like example 2, which
>      > addresses the pain point around re-running part of a pipeline,
>     and has
>      > advantages for readability.  Also, since starting a line with a pipe
>      > operator is currently invalid, the change wouldn't break any
>     working code.
>      >      - Argument against: It would make the behavior of %>%
>     inconsistent with
>      > that of other binary operators in R.  (However, this objection
>     might not
>      > apply to the new pipe, |>, which I understand is being
>     implemented as a
>      > syntax transformation rather than a binary operator.)
>      >
>      >    Solution 2: Ignore the pipe operator if it occurs as the final
>     token of
>      > the code being executed.
>      >      - Argument for: This would mean the user could select and
>     re-run the
>      > first few lines of a longer pipeline (selecting *entire* lines),
>     avoiding
>      > the difficulties described above.
>      >      - Argument against: This means that %>% would be valid even
>     if it
>      > occurred without a right-hand side, which is inconsistent with other
>      > operators in R.  (But, as above, this objection might not apply
>     to |>.)
>      > Also, this solution still doesn't enable the syntax of "Example
>     2", with
>      > its readability benefit.
>      >
>      > Thanks for reading this and considering it.
>      >
>      > - Tim Goodman
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > [hidden email] <mailto:[hidden email]> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>      >
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

bbolker
In reply to this post by Kevin Ushey
   FWIW there is previous discussion of this in a twitter thread from May:

https://twitter.com/bolkerb/status/1258542150620332039

at the end I suggested defining something like .__END <- identity() as a
pipe-ender.

On 12/9/20 2:58 PM, Kevin Ushey wrote:

> I agree with Duncan that the right solution is to wrap the pipe
> expression with parentheses. Having the parser treat newlines
> differently based on whether the session is interactive, or on what
> type of operator happens to follow a newline, feels like a pretty big
> can of worms.
>
> I think this (or something similar) would accomplish what you want
> while still retaining the nice aesthetics of the pipe expression, with
> a minimal amount of syntax "noise":
>
> result <- (
>    data
>      |> op1()
>      |> op2()
> )
>
> For interactive sessions where you wanted to execute only parts of the
> pipeline at a time, I could see that being accomplished by the editor
> -- it could transform the expression so that it could be handled by R,
> either by hoisting the pipe operator(s) up a line, or by wrapping the
> to-be-executed expression in parentheses for you. If such a style of
> coding became popular enough, I'm sure the developers of such editors
> would be interested and willing to support this ...
>
> Perhaps more importantly, it would be much easier to accomplish than a
> change to the behavior of the R parser, and it would be work that
> wouldn't have to be maintained by the R Core team.
>
> Best,
> Kevin
>
> On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman <[hidden email]> wrote:
>>
>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
>> command in the Notebook environment I'm using) I certainly *would* expect R
>> to treat it as a complete statement.
>>
>> But what I'm talking about is a different case, where I highlight a
>> multi-line statement in my notebook:
>>
>>      my_data_frame1
>>          |> filter(some_conditions_1)
>>
>> and then press Ctrl+Enter.  Or, I suppose the equivalent would be to run an
>> R script containing those two lines of code, or to run a multi-line
>> statement like that from the console (which in RStudio I can do by pressing
>> Shift+Enter between the lines.)
>>
>> In those cases, R could either (1) Give an error message [the current
>> behavior], or (2) understand that the first line is meant to be piped to
>> the second.  The second option would be significantly more useful, and is
>> almost certainly what the user intended.
>>
>> (For what it's worth, there are some languages, such as Javascript, that
>> consider the first token of the next line when determining if the previous
>> line was complete.  JavaScript's rules around this are overly complicated,
>> but a rule like "a pipe following a line break is treated as continuing the
>> previous line" would be much simpler.  And while it might be objectionable
>> to treat the operator %>% different from other operators, the addition of
>> |>, which isn't truly an operator at all, seems like the right time to
>> consider it.)
>>
>> -Tim
>>
>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <[hidden email]>
>> wrote:
>>
>>> The requirement for operators at the end of the line comes from the
>>> interactive nature of R.  If you type
>>>
>>>       my_data_frame_1
>>>
>>> how could R know that you are not done, and are planning to type the
>>> rest of the expression
>>>
>>>         %>% filter(some_conditions_1)
>>>         ...
>>>
>>> before it should consider the expression complete?  The way languages
>>> like C do this is by requiring a statement terminator at the end.  You
>>> can also do it by wrapping the entire thing in parentheses ().
>>>
>>> However, be careful: Don't use braces:  they don't work.  And parens
>>> have the side effect of removing invisibility from the result (which is
>>> a design flaw or bonus, depending on your point of view).  So I actually
>>> wouldn't advise this workaround.
>>>
>>> Duncan Murdoch
>>>
>>>
>>> On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
>>>> Hi,
>>>>
>>>> I'm a data scientist who routinely uses R in my day-to-day work, for
>>> tasks
>>>> such as cleaning and transforming data, exploratory data analysis, etc.
>>>> This includes frequent use of the pipe operator from the magrittr and
>>> dplyr
>>>> libraries, %>%.  So, I was pleased to hear about the recent work on a
>>>> native pipe operator, |>.
>>>>
>>>> This seems like a good time to bring up the main pain point I encounter
>>>> when using pipes in R, and some suggestions on what could be done about
>>>> it.  The issue is that the pipe operator can't be placed at the start of
>>> a
>>>> line of code (except in parentheses).  That's no different than any
>>> binary
>>>> operator in R, but I find it's a source of difficulty for the pipe
>>> because
>>>> of how pipes are often used.
>>>>
>>>> [I'm assuming here that my usage is fairly typical of a lot of users; at
>>>> any rate, I don't think I'm *too* unusual.]
>>>>
>>>> === Why this is a problem ===
>>>>
>>>> It's very common (for me, and I suspect for many users of dplyr) to write
>>>> multi-step pipelines and put each step on its own line for readability.
>>>> Something like this:
>>>>
>>>>     ### Example 1 ###
>>>>     my_data_frame_1 %>%
>>>>       filter(some_conditions_1) %>%
>>>>       inner_join(my_data_frame_2, by = some_columns_1) %>%
>>>>       group_by(some_columns_2) %>%
>>>>       summarize(some_aggregate_functions_1) %>%
>>>>       filter(some_conditions_2) %>%
>>>>       left_join(my_data_frame_3, by = some_columns_3) %>%
>>>>       group_by(some_columns_4) %>%
>>>>       summarize(some_aggregate_functions_2) %>%
>>>>       arrange(some_columns_5)
>>>>
>>>> [I guess some might consider this an overly long pipeline; for me it's
>>>> pretty typical.  I *could* split it up by assigning intermediate results
>>> to
>>>> variables, but much of the value I get from the pipe is that it lets my
>>>> code communicate which results are temporary, and which will be used
>>> again
>>>> later.  Assigning variables for single-use results would remove that
>>>> expressiveness.]
>>>>
>>>> I would prefer (for reasons I'll explain) to be able to write the above
>>>> example like this, which isn't valid R:
>>>>
>>>>     ### Example 2 (not valid R) ###
>>>>     my_data_frame_1
>>>>       %>% filter(some_conditions_1)
>>>>       %>% inner_join(my_data_frame_2, by = some_columns_1)
>>>>       %>% group_by(some_columns_2)
>>>>       %>% summarize(some_aggregate_functions_1)
>>>>       %>% filter(some_conditions_2)
>>>>       %>% left_join(my_data_frame_3, by = some_columns_3)
>>>>       %>% group_by(some_columns_4)
>>>>       %>% summarize(some_aggregate_functions_2)
>>>>       %>% arrange(some_columns_5)
>>>>
>>>> One (minor) advantage is obvious: It lets you easily line up the pipes,
>>>> which means that you can see at a glance that the whole block is a single
>>>> pipeline, and you'd immediately notice if you inadvertently omitted a
>>> pipe,
>>>> which otherwise can lead to confusing output.  [It's also aesthetically
>>>> pleasing, especially when %>% is replaced with |>, but that's
>>> subjective.]
>>>>
>>>> But the bigger issue happens when I want to re-run just *part* of the
>>>> pipeline.  I do this often when debugging: if the output of the pipeline
>>>> seems wrong, I re-run the first few steps and check the output, then
>>>> include a little more and re-run again, etc., until I locate my mistake.
>>>> Working in an interactive notebook environment, this involves using the
>>>> cursor to select just the part of the code I want to re-run.
>>>>
>>>> It's fast and easy to select *entire* lines of code, but unfortunately
>>> with
>>>> the pipes placed at the end of the line I must instead select everything
>>>> *except* the last three characters of the line (the last two characters
>>> for
>>>> the new pipe).  Then when I want to re-run the same partial pipeline with
>>>> the next line of code included, I can't just press SHIFT+Down to select
>>> it
>>>> as I otherwise would, but instead must move the cursor horizontally to a
>>>> position three characters before the end of *that* line (which is
>>> generally
>>>> different due to varying line lengths).  And so forth each time I want to
>>>> include an additional line.
>>>>
>>>> Moreover, with the staggered positions of the pipes at the end of each
>>>> line, it's very easy to accidentally select the final pipe on a line, and
>>>> then sit there for a moment wondering if the environment has stopped
>>>> responding before realizing it's just waiting for further input (i.e.,
>>> for
>>>> the right-hand side).  These small delays and disruptions add up over the
>>>> course of a day.
>>>>
>>>> This desire to select and re-run the first part of a pipeline is also the
>>>> reason why it doesn't suffice to achieve syntax like my "Example 2" by
>>>> wrapping the entire pipeline in parentheses.  That's of no use if I want
>>> to
>>>> re-run a selection that doesn't include the final close-paren.
>>>>
>>>> === Possible Solutions ===
>>>>
>>>> I can think of two, but maybe there are others.  The first would make
>>>> "Example 2" into valid code, and the second would allow you to run a
>>>> selection that included a trailing pipe.
>>>>
>>>>     Solution 1: Add a special case to how R is parsed, so if the first
>>>> (non-whitespace) token after an end-line is a pipe, that pipe gets moved
>>> to
>>>> before the end-line.
>>>>       - Argument for: This lets you write code like example 2, which
>>>> addresses the pain point around re-running part of a pipeline, and has
>>>> advantages for readability.  Also, since starting a line with a pipe
>>>> operator is currently invalid, the change wouldn't break any working
>>> code.
>>>>       - Argument against: It would make the behavior of %>% inconsistent
>>> with
>>>> that of other binary operators in R.  (However, this objection might not
>>>> apply to the new pipe, |>, which I understand is being implemented as a
>>>> syntax transformation rather than a binary operator.)
>>>>
>>>>     Solution 2: Ignore the pipe operator if it occurs as the final token
>>> of
>>>> the code being executed.
>>>>       - Argument for: This would mean the user could select and re-run the
>>>> first few lines of a longer pipeline (selecting *entire* lines), avoiding
>>>> the difficulties described above.
>>>>       - Argument against: This means that %>% would be valid even if it
>>>> occurred without a right-hand side, which is inconsistent with other
>>>> operators in R.  (But, as above, this objection might not apply to |>.)
>>>> Also, this solution still doesn't enable the syntax of "Example 2", with
>>>> its readability benefit.
>>>>
>>>> Thanks for reading this and considering it.
>>>>
>>>> - Tim Goodman
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Timothy Goodman
In reply to this post by Duncan Murdoch-2
Regarding special treatment for |>, isn't it getting special treatment
anyway, because it's implemented as a syntax transformation from x |> f(y)
to f(x, y), rather than as an operator?

That said, the point about wanting a block of code submitted line-by-line
to work the same as a block of code submitted all at once is a fair one.
Maybe the better solution would be if there were a way to say "Submit the
selected code as a single expression, ignoring line-breaks".  Then I could
run any number of lines with pipes at the start and no special character at
the end, and have it treated as a single pipeline.  I suppose that'd need
to be a feature offered by the environment (RStudio's RNotebooks in my
case).  I could wrap my pipelines in parentheses (to make the "pipes at
start of line" syntax valid R code), and then could use the hypothetical
"submit selected code ignoring line-breaks" feature when running just the
first part of the pipeline -- i.e., selecting full lines, but starting
after the opening paren so as not to need to insert a closing paren.

- Tim

On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <[hidden email]>
wrote:

> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> > command in the Notebook environment I'm using) I certainly *would*
> > expect R to treat it as a complete statement.
> >
> > But what I'm talking about is a different case, where I highlight a
> > multi-line statement in my notebook:
> >
> >      my_data_frame1
> >          |> filter(some_conditions_1)
> >
> > and then press Ctrl+Enter.
>
> I don't think I'd like it if parsing changed between passing one line at
> a time and passing a block of lines.  I'd like to be able to highlight a
> few lines and pass those, then type one, then highlight some more and
> pass those:  and have it act as though I just passed the whole combined
> block, or typed everything one line at a time.
>
>
>    Or, I suppose the equivalent would be to run
> > an R script containing those two lines of code, or to run a multi-line
> > statement like that from the console (which in RStudio I can do by
> > pressing Shift+Enter between the lines.)
> >
> > In those cases, R could either (1) Give an error message [the current
> > behavior], or (2) understand that the first line is meant to be piped to
> > the second.  The second option would be significantly more useful, and
> > is almost certainly what the user intended.
> >
> > (For what it's worth, there are some languages, such as Javascript, that
> > consider the first token of the next line when determining if the
> > previous line was complete.  JavaScript's rules around this are overly
> > complicated, but a rule like "a pipe following a line break is treated
> > as continuing the previous line" would be much simpler.  And while it
> > might be objectionable to treat the operator %>% different from other
> > operators, the addition of |>, which isn't truly an operator at all,
> > seems like the right time to consider it.)
>
> I think this would be hard to implement with R's current parser, but
> possible.  I think it could be done by distinguishing between EOL
> markers within a block of text and "end of block" marks.  If it applied
> only to the |> operator it would be *really* ugly.
>
> My strongest objection to it is the one at the top, though.  If I have a
> block of lines sitting in my editor that I just finished executing, with
> the cursor pointing at the next line, I'd like to know that it didn't
> matter whether the lines were passed one at a time, as a block, or some
> combination of those.
>
> Duncan Murdoch
>
> >
> > -Tim
> >
> > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <[hidden email]
> > <mailto:[hidden email]>> wrote:
> >
> >     The requirement for operators at the end of the line comes from the
> >     interactive nature of R.  If you type
> >
> >           my_data_frame_1
> >
> >     how could R know that you are not done, and are planning to type the
> >     rest of the expression
> >
> >             %>% filter(some_conditions_1)
> >             ...
> >
> >     before it should consider the expression complete?  The way languages
> >     like C do this is by requiring a statement terminator at the end.
> You
> >     can also do it by wrapping the entire thing in parentheses ().
> >
> >     However, be careful: Don't use braces:  they don't work.  And parens
> >     have the side effect of removing invisibility from the result (which
> is
> >     a design flaw or bonus, depending on your point of view).  So I
> >     actually
> >     wouldn't advise this workaround.
> >
> >     Duncan Murdoch
> >
> >
> >     On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> >      > Hi,
> >      >
> >      > I'm a data scientist who routinely uses R in my day-to-day work,
> >     for tasks
> >      > such as cleaning and transforming data, exploratory data
> >     analysis, etc.
> >      > This includes frequent use of the pipe operator from the magrittr
> >     and dplyr
> >      > libraries, %>%.  So, I was pleased to hear about the recent work
> on a
> >      > native pipe operator, |>.
> >      >
> >      > This seems like a good time to bring up the main pain point I
> >     encounter
> >      > when using pipes in R, and some suggestions on what could be done
> >     about
> >      > it.  The issue is that the pipe operator can't be placed at the
> >     start of a
> >      > line of code (except in parentheses).  That's no different than
> >     any binary
> >      > operator in R, but I find it's a source of difficulty for the
> >     pipe because
> >      > of how pipes are often used.
> >      >
> >      > [I'm assuming here that my usage is fairly typical of a lot of
> >     users; at
> >      > any rate, I don't think I'm *too* unusual.]
> >      >
> >      > === Why this is a problem ===
> >      >
> >      > It's very common (for me, and I suspect for many users of dplyr)
> >     to write
> >      > multi-step pipelines and put each step on its own line for
> >     readability.
> >      > Something like this:
> >      >
> >      >    ### Example 1 ###
> >      >    my_data_frame_1 %>%
> >      >      filter(some_conditions_1) %>%
> >      >      inner_join(my_data_frame_2, by = some_columns_1) %>%
> >      >      group_by(some_columns_2) %>%
> >      >      summarize(some_aggregate_functions_1) %>%
> >      >      filter(some_conditions_2) %>%
> >      >      left_join(my_data_frame_3, by = some_columns_3) %>%
> >      >      group_by(some_columns_4) %>%
> >      >      summarize(some_aggregate_functions_2) %>%
> >      >      arrange(some_columns_5)
> >      >
> >      > [I guess some might consider this an overly long pipeline; for me
> >     it's
> >      > pretty typical.  I *could* split it up by assigning intermediate
> >     results to
> >      > variables, but much of the value I get from the pipe is that it
> >     lets my
> >      > code communicate which results are temporary, and which will be
> >     used again
> >      > later.  Assigning variables for single-use results would remove
> that
> >      > expressiveness.]
> >      >
> >      > I would prefer (for reasons I'll explain) to be able to write the
> >     above
> >      > example like this, which isn't valid R:
> >      >
> >      >    ### Example 2 (not valid R) ###
> >      >    my_data_frame_1
> >      >      %>% filter(some_conditions_1)
> >      >      %>% inner_join(my_data_frame_2, by = some_columns_1)
> >      >      %>% group_by(some_columns_2)
> >      >      %>% summarize(some_aggregate_functions_1)
> >      >      %>% filter(some_conditions_2)
> >      >      %>% left_join(my_data_frame_3, by = some_columns_3)
> >      >      %>% group_by(some_columns_4)
> >      >      %>% summarize(some_aggregate_functions_2)
> >      >      %>% arrange(some_columns_5)
> >      >
> >      > One (minor) advantage is obvious: It lets you easily line up the
> >     pipes,
> >      > which means that you can see at a glance that the whole block is
> >     a single
> >      > pipeline, and you'd immediately notice if you inadvertently
> >     omitted a pipe,
> >      > which otherwise can lead to confusing output.  [It's also
> >     aesthetically
> >      > pleasing, especially when %>% is replaced with |>, but that's
> >     subjective.]
> >      >
> >      > But the bigger issue happens when I want to re-run just *part* of
> the
> >      > pipeline.  I do this often when debugging: if the output of the
> >     pipeline
> >      > seems wrong, I re-run the first few steps and check the output,
> then
> >      > include a little more and re-run again, etc., until I locate my
> >     mistake.
> >      > Working in an interactive notebook environment, this involves
> >     using the
> >      > cursor to select just the part of the code I want to re-run.
> >      >
> >      > It's fast and easy to select *entire* lines of code, but
> >     unfortunately with
> >      > the pipes placed at the end of the line I must instead select
> >     everything
> >      > *except* the last three characters of the line (the last two
> >     characters for
> >      > the new pipe).  Then when I want to re-run the same partial
> >     pipeline with
> >      > the next line of code included, I can't just press SHIFT+Down to
> >     select it
> >      > as I otherwise would, but instead must move the cursor
> >     horizontally to a
> >      > position three characters before the end of *that* line (which is
> >     generally
> >      > different due to varying line lengths).  And so forth each time I
> >     want to
> >      > include an additional line.
> >      >
> >      > Moreover, with the staggered positions of the pipes at the end of
> >     each
> >      > line, it's very easy to accidentally select the final pipe on a
> >     line, and
> >      > then sit there for a moment wondering if the environment has
> stopped
> >      > responding before realizing it's just waiting for further input
> >     (i.e., for
> >      > the right-hand side).  These small delays and disruptions add up
> >     over the
> >      > course of a day.
> >      >
> >      > This desire to select and re-run the first part of a pipeline is
> >     also the
> >      > reason why it doesn't suffice to achieve syntax like my "Example
> >     2" by
> >      > wrapping the entire pipeline in parentheses.  That's of no use if
> >     I want to
> >      > re-run a selection that doesn't include the final close-paren.
> >      >
> >      > === Possible Solutions ===
> >      >
> >      > I can think of two, but maybe there are others.  The first would
> make
> >      > "Example 2" into valid code, and the second would allow you to
> run a
> >      > selection that included a trailing pipe.
> >      >
> >      >    Solution 1: Add a special case to how R is parsed, so if the
> first
> >      > (non-whitespace) token after an end-line is a pipe, that pipe
> >     gets moved to
> >      > before the end-line.
> >      >      - Argument for: This lets you write code like example 2,
> which
> >      > addresses the pain point around re-running part of a pipeline,
> >     and has
> >      > advantages for readability.  Also, since starting a line with a
> pipe
> >      > operator is currently invalid, the change wouldn't break any
> >     working code.
> >      >      - Argument against: It would make the behavior of %>%
> >     inconsistent with
> >      > that of other binary operators in R.  (However, this objection
> >     might not
> >      > apply to the new pipe, |>, which I understand is being
> >     implemented as a
> >      > syntax transformation rather than a binary operator.)
> >      >
> >      >    Solution 2: Ignore the pipe operator if it occurs as the final
> >     token of
> >      > the code being executed.
> >      >      - Argument for: This would mean the user could select and
> >     re-run the
> >      > first few lines of a longer pipeline (selecting *entire* lines),
> >     avoiding
> >      > the difficulties described above.
> >      >      - Argument against: This means that %>% would be valid even
> >     if it
> >      > occurred without a right-hand side, which is inconsistent with
> other
> >      > operators in R.  (But, as above, this objection might not apply
> >     to |>.)
> >      > Also, this solution still doesn't enable the syntax of "Example
> >     2", with
> >      > its readability benefit.
> >      >
> >      > Thanks for reading this and considering it.
> >      >
> >      > - Tim Goodman
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > ______________________________________________
> >      > [hidden email] <mailto:[hidden email]> mailing list
> >      > https://stat.ethz.ch/mailman/listinfo/r-devel
> >     <https://stat.ethz.ch/mailman/listinfo/r-devel>
> >      >
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

bbolker
   Definitely support the idea that if this kind of trickery is going to
happen that it be confined to some particular IDE/environment or some
particular submission protocol. I don't want it to happen in my ESS
session please ... I'd rather deal with the parentheses.

On 12/9/20 3:45 PM, Timothy Goodman wrote:

> Regarding special treatment for |>, isn't it getting special treatment
> anyway, because it's implemented as a syntax transformation from x |> f(y)
> to f(x, y), rather than as an operator?
>
> That said, the point about wanting a block of code submitted line-by-line
> to work the same as a block of code submitted all at once is a fair one.
> Maybe the better solution would be if there were a way to say "Submit the
> selected code as a single expression, ignoring line-breaks".  Then I could
> run any number of lines with pipes at the start and no special character at
> the end, and have it treated as a single pipeline.  I suppose that'd need
> to be a feature offered by the environment (RStudio's RNotebooks in my
> case).  I could wrap my pipelines in parentheses (to make the "pipes at
> start of line" syntax valid R code), and then could use the hypothetical
> "submit selected code ignoring line-breaks" feature when running just the
> first part of the pipeline -- i.e., selecting full lines, but starting
> after the opening paren so as not to need to insert a closing paren.
>
> - Tim
>
> On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <[hidden email]>
> wrote:
>
>> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
>>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
>>> command in the Notebook environment I'm using) I certainly *would*
>>> expect R to treat it as a complete statement.
>>>
>>> But what I'm talking about is a different case, where I highlight a
>>> multi-line statement in my notebook:
>>>
>>>       my_data_frame1
>>>           |> filter(some_conditions_1)
>>>
>>> and then press Ctrl+Enter.
>>
>> I don't think I'd like it if parsing changed between passing one line at
>> a time and passing a block of lines.  I'd like to be able to highlight a
>> few lines and pass those, then type one, then highlight some more and
>> pass those:  and have it act as though I just passed the whole combined
>> block, or typed everything one line at a time.
>>
>>
>>     Or, I suppose the equivalent would be to run
>>> an R script containing those two lines of code, or to run a multi-line
>>> statement like that from the console (which in RStudio I can do by
>>> pressing Shift+Enter between the lines.)
>>>
>>> In those cases, R could either (1) Give an error message [the current
>>> behavior], or (2) understand that the first line is meant to be piped to
>>> the second.  The second option would be significantly more useful, and
>>> is almost certainly what the user intended.
>>>
>>> (For what it's worth, there are some languages, such as Javascript, that
>>> consider the first token of the next line when determining if the
>>> previous line was complete.  JavaScript's rules around this are overly
>>> complicated, but a rule like "a pipe following a line break is treated
>>> as continuing the previous line" would be much simpler.  And while it
>>> might be objectionable to treat the operator %>% different from other
>>> operators, the addition of |>, which isn't truly an operator at all,
>>> seems like the right time to consider it.)
>>
>> I think this would be hard to implement with R's current parser, but
>> possible.  I think it could be done by distinguishing between EOL
>> markers within a block of text and "end of block" marks.  If it applied
>> only to the |> operator it would be *really* ugly.
>>
>> My strongest objection to it is the one at the top, though.  If I have a
>> block of lines sitting in my editor that I just finished executing, with
>> the cursor pointing at the next line, I'd like to know that it didn't
>> matter whether the lines were passed one at a time, as a block, or some
>> combination of those.
>>
>> Duncan Murdoch
>>
>>>
>>> -Tim
>>>
>>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>>      The requirement for operators at the end of the line comes from the
>>>      interactive nature of R.  If you type
>>>
>>>            my_data_frame_1
>>>
>>>      how could R know that you are not done, and are planning to type the
>>>      rest of the expression
>>>
>>>              %>% filter(some_conditions_1)
>>>              ...
>>>
>>>      before it should consider the expression complete?  The way languages
>>>      like C do this is by requiring a statement terminator at the end.
>> You
>>>      can also do it by wrapping the entire thing in parentheses ().
>>>
>>>      However, be careful: Don't use braces:  they don't work.  And parens
>>>      have the side effect of removing invisibility from the result (which
>> is
>>>      a design flaw or bonus, depending on your point of view).  So I
>>>      actually
>>>      wouldn't advise this workaround.
>>>
>>>      Duncan Murdoch
>>>
>>>
>>>      On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
>>>       > Hi,
>>>       >
>>>       > I'm a data scientist who routinely uses R in my day-to-day work,
>>>      for tasks
>>>       > such as cleaning and transforming data, exploratory data
>>>      analysis, etc.
>>>       > This includes frequent use of the pipe operator from the magrittr
>>>      and dplyr
>>>       > libraries, %>%.  So, I was pleased to hear about the recent work
>> on a
>>>       > native pipe operator, |>.
>>>       >
>>>       > This seems like a good time to bring up the main pain point I
>>>      encounter
>>>       > when using pipes in R, and some suggestions on what could be done
>>>      about
>>>       > it.  The issue is that the pipe operator can't be placed at the
>>>      start of a
>>>       > line of code (except in parentheses).  That's no different than
>>>      any binary
>>>       > operator in R, but I find it's a source of difficulty for the
>>>      pipe because
>>>       > of how pipes are often used.
>>>       >
>>>       > [I'm assuming here that my usage is fairly typical of a lot of
>>>      users; at
>>>       > any rate, I don't think I'm *too* unusual.]
>>>       >
>>>       > === Why this is a problem ===
>>>       >
>>>       > It's very common (for me, and I suspect for many users of dplyr)
>>>      to write
>>>       > multi-step pipelines and put each step on its own line for
>>>      readability.
>>>       > Something like this:
>>>       >
>>>       >    ### Example 1 ###
>>>       >    my_data_frame_1 %>%
>>>       >      filter(some_conditions_1) %>%
>>>       >      inner_join(my_data_frame_2, by = some_columns_1) %>%
>>>       >      group_by(some_columns_2) %>%
>>>       >      summarize(some_aggregate_functions_1) %>%
>>>       >      filter(some_conditions_2) %>%
>>>       >      left_join(my_data_frame_3, by = some_columns_3) %>%
>>>       >      group_by(some_columns_4) %>%
>>>       >      summarize(some_aggregate_functions_2) %>%
>>>       >      arrange(some_columns_5)
>>>       >
>>>       > [I guess some might consider this an overly long pipeline; for me
>>>      it's
>>>       > pretty typical.  I *could* split it up by assigning intermediate
>>>      results to
>>>       > variables, but much of the value I get from the pipe is that it
>>>      lets my
>>>       > code communicate which results are temporary, and which will be
>>>      used again
>>>       > later.  Assigning variables for single-use results would remove
>> that
>>>       > expressiveness.]
>>>       >
>>>       > I would prefer (for reasons I'll explain) to be able to write the
>>>      above
>>>       > example like this, which isn't valid R:
>>>       >
>>>       >    ### Example 2 (not valid R) ###
>>>       >    my_data_frame_1
>>>       >      %>% filter(some_conditions_1)
>>>       >      %>% inner_join(my_data_frame_2, by = some_columns_1)
>>>       >      %>% group_by(some_columns_2)
>>>       >      %>% summarize(some_aggregate_functions_1)
>>>       >      %>% filter(some_conditions_2)
>>>       >      %>% left_join(my_data_frame_3, by = some_columns_3)
>>>       >      %>% group_by(some_columns_4)
>>>       >      %>% summarize(some_aggregate_functions_2)
>>>       >      %>% arrange(some_columns_5)
>>>       >
>>>       > One (minor) advantage is obvious: It lets you easily line up the
>>>      pipes,
>>>       > which means that you can see at a glance that the whole block is
>>>      a single
>>>       > pipeline, and you'd immediately notice if you inadvertently
>>>      omitted a pipe,
>>>       > which otherwise can lead to confusing output.  [It's also
>>>      aesthetically
>>>       > pleasing, especially when %>% is replaced with |>, but that's
>>>      subjective.]
>>>       >
>>>       > But the bigger issue happens when I want to re-run just *part* of
>> the
>>>       > pipeline.  I do this often when debugging: if the output of the
>>>      pipeline
>>>       > seems wrong, I re-run the first few steps and check the output,
>> then
>>>       > include a little more and re-run again, etc., until I locate my
>>>      mistake.
>>>       > Working in an interactive notebook environment, this involves
>>>      using the
>>>       > cursor to select just the part of the code I want to re-run.
>>>       >
>>>       > It's fast and easy to select *entire* lines of code, but
>>>      unfortunately with
>>>       > the pipes placed at the end of the line I must instead select
>>>      everything
>>>       > *except* the last three characters of the line (the last two
>>>      characters for
>>>       > the new pipe).  Then when I want to re-run the same partial
>>>      pipeline with
>>>       > the next line of code included, I can't just press SHIFT+Down to
>>>      select it
>>>       > as I otherwise would, but instead must move the cursor
>>>      horizontally to a
>>>       > position three characters before the end of *that* line (which is
>>>      generally
>>>       > different due to varying line lengths).  And so forth each time I
>>>      want to
>>>       > include an additional line.
>>>       >
>>>       > Moreover, with the staggered positions of the pipes at the end of
>>>      each
>>>       > line, it's very easy to accidentally select the final pipe on a
>>>      line, and
>>>       > then sit there for a moment wondering if the environment has
>> stopped
>>>       > responding before realizing it's just waiting for further input
>>>      (i.e., for
>>>       > the right-hand side).  These small delays and disruptions add up
>>>      over the
>>>       > course of a day.
>>>       >
>>>       > This desire to select and re-run the first part of a pipeline is
>>>      also the
>>>       > reason why it doesn't suffice to achieve syntax like my "Example
>>>      2" by
>>>       > wrapping the entire pipeline in parentheses.  That's of no use if
>>>      I want to
>>>       > re-run a selection that doesn't include the final close-paren.
>>>       >
>>>       > === Possible Solutions ===
>>>       >
>>>       > I can think of two, but maybe there are others.  The first would
>> make
>>>       > "Example 2" into valid code, and the second would allow you to
>> run a
>>>       > selection that included a trailing pipe.
>>>       >
>>>       >    Solution 1: Add a special case to how R is parsed, so if the
>> first
>>>       > (non-whitespace) token after an end-line is a pipe, that pipe
>>>      gets moved to
>>>       > before the end-line.
>>>       >      - Argument for: This lets you write code like example 2,
>> which
>>>       > addresses the pain point around re-running part of a pipeline,
>>>      and has
>>>       > advantages for readability.  Also, since starting a line with a
>> pipe
>>>       > operator is currently invalid, the change wouldn't break any
>>>      working code.
>>>       >      - Argument against: It would make the behavior of %>%
>>>      inconsistent with
>>>       > that of other binary operators in R.  (However, this objection
>>>      might not
>>>       > apply to the new pipe, |>, which I understand is being
>>>      implemented as a
>>>       > syntax transformation rather than a binary operator.)
>>>       >
>>>       >    Solution 2: Ignore the pipe operator if it occurs as the final
>>>      token of
>>>       > the code being executed.
>>>       >      - Argument for: This would mean the user could select and
>>>      re-run the
>>>       > first few lines of a longer pipeline (selecting *entire* lines),
>>>      avoiding
>>>       > the difficulties described above.
>>>       >      - Argument against: This means that %>% would be valid even
>>>      if it
>>>       > occurred without a right-hand side, which is inconsistent with
>> other
>>>       > operators in R.  (But, as above, this objection might not apply
>>>      to |>.)
>>>       > Also, this solution still doesn't enable the syntax of "Example
>>>      2", with
>>>       > its readability benefit.
>>>       >
>>>       > Thanks for reading this and considering it.
>>>       >
>>>       > - Tim Goodman
>>>       >
>>>       >       [[alternative HTML version deleted]]
>>>       >
>>>       > ______________________________________________
>>>       > [hidden email] <mailto:[hidden email]> mailing list
>>>       > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>      <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>>       >
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Duncan Murdoch-2
In reply to this post by Timothy Goodman
On 09/12/2020 3:45 p.m., Timothy Goodman wrote:
> Regarding special treatment for |>, isn't it getting special treatment
> anyway, because it's implemented as a syntax transformation from x |>
> f(y) to f(x, y), rather than as an operator?

That's different.  Currently |> is parsed just like any other binary
operator, it's the code emitted after parsing that is different from
most other cases.  I think your suggestion would need changes in the
parsing itself.

It's a few years since I worked with Bison (the parser generator that R
uses), but I recall that handling inconsistencies was always tricky.


> That said, the point about wanting a block of code submitted
> line-by-line to work the same as a block of code submitted all at once
> is a fair one.  Maybe the better solution would be if there were a way
> to say "Submit the selected code as a single expression, ignoring
> line-breaks".

The way to do that is to replace some of the line breaks with
semicolons, which act as statement separators.  The tricky bit is to
figure out which ones to replace.  So if your block is

   x +
   y
   z

you'd glue it together as "x + y; z".  RStudio appears to know enough
about R parsing to do that, and presumably if it was allowed to look at
the start of the next line could handle things like

   x
   |> f()
   z

and rewrite them as "x |> f(); z".  It would mess up debugging a little
(z is now on line 1, not line 3), but maybe it could undo the
transformation if R told it there was a problem at line 1, column 11.


  Then I could run any number of lines with pipes at the
> start and no special character at the end, and have it treated as a
> single pipeline.  I suppose that'd need to be a feature offered by the
> environment (RStudio's RNotebooks in my case).  I could wrap my
> pipelines in parentheses (to make the "pipes at start of line" syntax
> valid R code), and then could use the hypothetical "submit selected code
> ignoring line-breaks" feature when running just the first part of the
> pipeline -- i.e., selecting full lines, but starting after the opening
> paren so as not to need to insert a closing paren.

I think I don't understand your workflow enough to comment on this.

Duncan


>
> - Tim
>
> On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
>      > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to
>     execute the
>      > command in the Notebook environment I'm using) I certainly *would*
>      > expect R to treat it as a complete statement.
>      >
>      > But what I'm talking about is a different case, where I highlight a
>      > multi-line statement in my notebook:
>      >
>      >      my_data_frame1
>      >          |> filter(some_conditions_1)
>      >
>      > and then press Ctrl+Enter.
>
>     I don't think I'd like it if parsing changed between passing one
>     line at
>     a time and passing a block of lines.  I'd like to be able to
>     highlight a
>     few lines and pass those, then type one, then highlight some more and
>     pass those:  and have it act as though I just passed the whole combined
>     block, or typed everything one line at a time.
>
>
>         Or, I suppose the equivalent would be to run
>      > an R script containing those two lines of code, or to run a
>     multi-line
>      > statement like that from the console (which in RStudio I can do by
>      > pressing Shift+Enter between the lines.)
>      >
>      > In those cases, R could either (1) Give an error message [the
>     current
>      > behavior], or (2) understand that the first line is meant to be
>     piped to
>      > the second.  The second option would be significantly more
>     useful, and
>      > is almost certainly what the user intended.
>      >
>      > (For what it's worth, there are some languages, such as
>     Javascript, that
>      > consider the first token of the next line when determining if the
>      > previous line was complete.  JavaScript's rules around this are
>     overly
>      > complicated, but a rule like "a pipe following a line break is
>     treated
>      > as continuing the previous line" would be much simpler.  And
>     while it
>      > might be objectionable to treat the operator %>% different from
>     other
>      > operators, the addition of |>, which isn't truly an operator at all,
>      > seems like the right time to consider it.)
>
>     I think this would be hard to implement with R's current parser, but
>     possible.  I think it could be done by distinguishing between EOL
>     markers within a block of text and "end of block" marks.  If it applied
>     only to the |> operator it would be *really* ugly.
>
>     My strongest objection to it is the one at the top, though.  If I
>     have a
>     block of lines sitting in my editor that I just finished executing,
>     with
>     the cursor pointing at the next line, I'd like to know that it didn't
>     matter whether the lines were passed one at a time, as a block, or some
>     combination of those.
>
>     Duncan Murdoch
>
>      >
>      > -Tim
>      >
>      > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch
>     <[hidden email] <mailto:[hidden email]>
>      > <mailto:[hidden email]
>     <mailto:[hidden email]>>> wrote:
>      >
>      >     The requirement for operators at the end of the line comes
>     from the
>      >     interactive nature of R.  If you type
>      >
>      >           my_data_frame_1
>      >
>      >     how could R know that you are not done, and are planning to
>     type the
>      >     rest of the expression
>      >
>      >             %>% filter(some_conditions_1)
>      >             ...
>      >
>      >     before it should consider the expression complete?  The way
>     languages
>      >     like C do this is by requiring a statement terminator at the
>     end.  You
>      >     can also do it by wrapping the entire thing in parentheses ().
>      >
>      >     However, be careful: Don't use braces:  they don't work.  And
>     parens
>      >     have the side effect of removing invisibility from the result
>     (which is
>      >     a design flaw or bonus, depending on your point of view).  So I
>      >     actually
>      >     wouldn't advise this workaround.
>      >
>      >     Duncan Murdoch
>      >
>      >
>      >     On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
>      >      > Hi,
>      >      >
>      >      > I'm a data scientist who routinely uses R in my day-to-day
>     work,
>      >     for tasks
>      >      > such as cleaning and transforming data, exploratory data
>      >     analysis, etc.
>      >      > This includes frequent use of the pipe operator from the
>     magrittr
>      >     and dplyr
>      >      > libraries, %>%.  So, I was pleased to hear about the
>     recent work on a
>      >      > native pipe operator, |>.
>      >      >
>      >      > This seems like a good time to bring up the main pain point I
>      >     encounter
>      >      > when using pipes in R, and some suggestions on what could
>     be done
>      >     about
>      >      > it.  The issue is that the pipe operator can't be placed
>     at the
>      >     start of a
>      >      > line of code (except in parentheses).  That's no different
>     than
>      >     any binary
>      >      > operator in R, but I find it's a source of difficulty for the
>      >     pipe because
>      >      > of how pipes are often used.
>      >      >
>      >      > [I'm assuming here that my usage is fairly typical of a lot of
>      >     users; at
>      >      > any rate, I don't think I'm *too* unusual.]
>      >      >
>      >      > === Why this is a problem ===
>      >      >
>      >      > It's very common (for me, and I suspect for many users of
>     dplyr)
>      >     to write
>      >      > multi-step pipelines and put each step on its own line for
>      >     readability.
>      >      > Something like this:
>      >      >
>      >      >    ### Example 1 ###
>      >      >    my_data_frame_1 %>%
>      >      >      filter(some_conditions_1) %>%
>      >      >      inner_join(my_data_frame_2, by = some_columns_1) %>%
>      >      >      group_by(some_columns_2) %>%
>      >      >      summarize(some_aggregate_functions_1) %>%
>      >      >      filter(some_conditions_2) %>%
>      >      >      left_join(my_data_frame_3, by = some_columns_3) %>%
>      >      >      group_by(some_columns_4) %>%
>      >      >      summarize(some_aggregate_functions_2) %>%
>      >      >      arrange(some_columns_5)
>      >      >
>      >      > [I guess some might consider this an overly long pipeline;
>     for me
>      >     it's
>      >      > pretty typical.  I *could* split it up by assigning
>     intermediate
>      >     results to
>      >      > variables, but much of the value I get from the pipe is
>     that it
>      >     lets my
>      >      > code communicate which results are temporary, and which
>     will be
>      >     used again
>      >      > later.  Assigning variables for single-use results would
>     remove that
>      >      > expressiveness.]
>      >      >
>      >      > I would prefer (for reasons I'll explain) to be able to
>     write the
>      >     above
>      >      > example like this, which isn't valid R:
>      >      >
>      >      >    ### Example 2 (not valid R) ###
>      >      >    my_data_frame_1
>      >      >      %>% filter(some_conditions_1)
>      >      >      %>% inner_join(my_data_frame_2, by = some_columns_1)
>      >      >      %>% group_by(some_columns_2)
>      >      >      %>% summarize(some_aggregate_functions_1)
>      >      >      %>% filter(some_conditions_2)
>      >      >      %>% left_join(my_data_frame_3, by = some_columns_3)
>      >      >      %>% group_by(some_columns_4)
>      >      >      %>% summarize(some_aggregate_functions_2)
>      >      >      %>% arrange(some_columns_5)
>      >      >
>      >      > One (minor) advantage is obvious: It lets you easily line
>     up the
>      >     pipes,
>      >      > which means that you can see at a glance that the whole
>     block is
>      >     a single
>      >      > pipeline, and you'd immediately notice if you inadvertently
>      >     omitted a pipe,
>      >      > which otherwise can lead to confusing output.  [It's also
>      >     aesthetically
>      >      > pleasing, especially when %>% is replaced with |>, but that's
>      >     subjective.]
>      >      >
>      >      > But the bigger issue happens when I want to re-run just
>     *part* of the
>      >      > pipeline.  I do this often when debugging: if the output
>     of the
>      >     pipeline
>      >      > seems wrong, I re-run the first few steps and check the
>     output, then
>      >      > include a little more and re-run again, etc., until I
>     locate my
>      >     mistake.
>      >      > Working in an interactive notebook environment, this involves
>      >     using the
>      >      > cursor to select just the part of the code I want to re-run.
>      >      >
>      >      > It's fast and easy to select *entire* lines of code, but
>      >     unfortunately with
>      >      > the pipes placed at the end of the line I must instead select
>      >     everything
>      >      > *except* the last three characters of the line (the last two
>      >     characters for
>      >      > the new pipe).  Then when I want to re-run the same partial
>      >     pipeline with
>      >      > the next line of code included, I can't just press
>     SHIFT+Down to
>      >     select it
>      >      > as I otherwise would, but instead must move the cursor
>      >     horizontally to a
>      >      > position three characters before the end of *that* line
>     (which is
>      >     generally
>      >      > different due to varying line lengths).  And so forth each
>     time I
>      >     want to
>      >      > include an additional line.
>      >      >
>      >      > Moreover, with the staggered positions of the pipes at the
>     end of
>      >     each
>      >      > line, it's very easy to accidentally select the final pipe
>     on a
>      >     line, and
>      >      > then sit there for a moment wondering if the environment
>     has stopped
>      >      > responding before realizing it's just waiting for further
>     input
>      >     (i.e., for
>      >      > the right-hand side).  These small delays and disruptions
>     add up
>      >     over the
>      >      > course of a day.
>      >      >
>      >      > This desire to select and re-run the first part of a
>     pipeline is
>      >     also the
>      >      > reason why it doesn't suffice to achieve syntax like my
>     "Example
>      >     2" by
>      >      > wrapping the entire pipeline in parentheses.  That's of no
>     use if
>      >     I want to
>      >      > re-run a selection that doesn't include the final close-paren.
>      >      >
>      >      > === Possible Solutions ===
>      >      >
>      >      > I can think of two, but maybe there are others.  The first
>     would make
>      >      > "Example 2" into valid code, and the second would allow
>     you to run a
>      >      > selection that included a trailing pipe.
>      >      >
>      >      >    Solution 1: Add a special case to how R is parsed, so
>     if the first
>      >      > (non-whitespace) token after an end-line is a pipe, that pipe
>      >     gets moved to
>      >      > before the end-line.
>      >      >      - Argument for: This lets you write code like example
>     2, which
>      >      > addresses the pain point around re-running part of a pipeline,
>      >     and has
>      >      > advantages for readability.  Also, since starting a line
>     with a pipe
>      >      > operator is currently invalid, the change wouldn't break any
>      >     working code.
>      >      >      - Argument against: It would make the behavior of %>%
>      >     inconsistent with
>      >      > that of other binary operators in R.  (However, this objection
>      >     might not
>      >      > apply to the new pipe, |>, which I understand is being
>      >     implemented as a
>      >      > syntax transformation rather than a binary operator.)
>      >      >
>      >      >    Solution 2: Ignore the pipe operator if it occurs as
>     the final
>      >     token of
>      >      > the code being executed.
>      >      >      - Argument for: This would mean the user could select and
>      >     re-run the
>      >      > first few lines of a longer pipeline (selecting *entire*
>     lines),
>      >     avoiding
>      >      > the difficulties described above.
>      >      >      - Argument against: This means that %>% would be
>     valid even
>      >     if it
>      >      > occurred without a right-hand side, which is inconsistent
>     with other
>      >      > operators in R.  (But, as above, this objection might not
>     apply
>      >     to |>.)
>      >      > Also, this solution still doesn't enable the syntax of
>     "Example
>      >     2", with
>      >      > its readability benefit.
>      >      >
>      >      > Thanks for reading this and considering it.
>      >      >
>      >      > - Tim Goodman
>      >      >
>      >      >       [[alternative HTML version deleted]]
>      >      >
>      >      > ______________________________________________
>      >      > [hidden email] <mailto:[hidden email]>
>     <mailto:[hidden email] <mailto:[hidden email]>>
>     mailing list
>      >      > https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>      >     <https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>>
>      >      >
>      >
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Gregory Warnes-2
In reply to this post by bbolker
Many languages allow a final backslash (“\”) character to allow an
expression to span multiple lines, and I’ve often wished for this in R,
particularly to allow me to put  `else` on a separate line at the
top-level. It would also allow alignment of infix operators like the new
pipe operator `|>` at the start of a line, which I would heartily endorse.

On Wed, Dec 9, 2020 at 3:58 PM Ben Bolker <[hidden email]> wrote:

>    Definitely support the idea that if this kind of trickery is going to
> happen that it be confined to some particular IDE/environment or some
> particular submission protocol. I don't want it to happen in my ESS
> session please ... I'd rather deal with the parentheses.
>
> On 12/9/20 3:45 PM, Timothy Goodman wrote:
> > Regarding special treatment for |>, isn't it getting special treatment
> > anyway, because it's implemented as a syntax transformation from x |>
> f(y)
> > to f(x, y), rather than as an operator?
> >
> > That said, the point about wanting a block of code submitted line-by-line
> > to work the same as a block of code submittedr d all at once is a fair
> one.
> > Maybe the better solution would be if there were a way to say "Submit the
> > selected code as a single expression, ignoring line-breaks".  Then I
> could
> > run any number of lines with pipes at the start and no special character
> at
> > the end, and have it treated as a single pipeline.  I suppose that'd need
> > to be a feature offered by the erred environment (RStudio's RNotebooks
> in my
> > case).  I could wrap my pipelines in parentheses (to make the "pipes at
> > start of line" syntax valid R code), and then could use the hypothetical
> > "submit selected code ignoring line-breaks" feature when running just the
> > first part of the pipeline -- i.e., selecting full lines, but starting
> > after the opening paren so as not to need to insert a closing paren.
> >
> > - Tim
> >
> > On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <[hidden email]
> >
> > wrote:
> >
> >> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> >>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> >>> command in the Notebook environment I'm using) I certainly *would*
> >>> expect R to treat it as a complete statement.
> >>>
> >>> But what I'm talking about is a different case, where I highlight a
> >>> multi-line statement in my notebook:
> >>>
> >>>       my_data_frame1
> >>>           |> filter(some_conditions_1)
> >>>
> >>> and then press Ctrl+Enter.
> >>
> >> I don't think I'd like it if parsing changed between passing one line at
> >> a time and passing a block of lines.  I'd like to be able to highlight a
> >> few lines and pass those, then type one, then highlight some more and
> >> pass those:  and have it act as though I just passed the whole combined
> >> block, or typed everything one line at a time.
> >>
> >>
> >>     Or, I suppose the equivalent would be to run
> >>> an R script containing those two lines of code, or to run a multi-line
> >>> statement like that from the console (which in RStudio I can do by
> >>> pressing Shift+Enter between the lines.)
> >>>
> >>> In those cases, R could either (1) Give an error message [the current
> >>> behavior], or (2) understand that the first line is meant to be piped
> to
> >>> the second.  The second option would be significantly more useful, and
> >>> is almost certainly what the user intended.
> >>>
> >>> (For what it's worth, there are some languages, such as Javascript,
> that
> >>> consider the first token of the next line when determining if the
> >>> previous line was complete.  JavaScript's rules around this are overly
> >>> complicated, but a rule like "a pipe following a line break is treated
> >>> as continuing the previous line" would be much simpler.  And while it
> >>> might be objectionable to treat the operator %>% different from other
> >>> operators, the addition of |>, which isn't truly an operator at all,
> >>> seems like the right time to consider it.)
> >>
> >> I think this would be hard to implement with R's current parser, but
> >> possible.  I think it could be done by distinguishing between EOL
> >> markers within a block of text and "end of block" marks.  If it applied
> >> only to the |> operator it would be *really* ugly.
> >>
> >> My strongest objection to it is the one at the top, though.  If I have a
> >> block of lines sitting in my editor that I just finished executing, with
> >> the cursor pointing at the next line, I'd like to know that it didn't
> >> matter whether the lines were passed one at a time, as a block, or some
> >> combination of those.
> >>
> >> Duncan Murdoch
> >>
> >>>
> >>> -Tim
> >>>
> >>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <
> [hidden email]
> >>> <mailto:[hidden email]>> wrote:
> >>>
> >>>      The requirement for operators at the end of the line comes from
> the
> >>>      interactive nature of R.  If you type
> >>>
> >>>            my_data_frame_1
> >>>
> >>>      how could R know that you are not done, and are planning to type
> the
> >>>      rest of the expression
> >>>
> >>>              %>% filter(some_conditions_1)
> >>>              ...
> >>>
> >>>      before it should consider the expression complete?  The way
> languages
> >>>      like C do this is by requiring a statement terminator at the end.
> >> You
> >>>      can also do it by wrapping the entire thing in parentheses ().
> >>>
> >>>      However, be careful: Don't use braces:  they don't work.  And
> parens
> >>>      have the side effect of removing invisibility from the result
> (which
> >> is
> >>>      a design flaw or bonus, depending on your point of view).  So I
> >>>      actually
> >>>      wouldn't advise this workaround.
> >>>
> >>>      Duncan Murdoch
> >>>
> >>>
> >>>      On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> >>>       > Hi,
> >>>       >
> >>>       > I'm a data scientist who routinely uses R in my day-to-day
> work,
> >>>      for tasks
> >>>       > such as cleaning and transforming data, exploratory data
> >>>      analysis, etc.
> >>>       > This includes frequent use of the pipe operator from the
> magrittr
> >>>      and dplyr
> >>>       > libraries, %>%.  So, I was pleased to hear about the recent
> work
> >> on a
> >>>       > native pipe operator, |>.
> >>>       >
> >>>       > This seems like a good time to bring up the main pain point I
> >>>      encounter
> >>>       > when using pipes in R, and some suggestions on what could be
> done
> >>>      about
> >>>       > it.  The issue is that the pipe operator can't be placed at the
> >>>      start of a
> >>>       > line of code (except in parentheses).  That's no different than
> >>>      any binary
> >>>       > operator in R, but I find it's a source of difficulty for the
> >>>      pipe because
> >>>       > of how pipes are often used.
> >>>       >
> >>>       > [I'm assuming here that my usage is fairly typical of a lot of
> >>>      users; at
> >>>       > any rate, I don't think I'm *too* unusual.]
> >>>       >
> >>>       > === Why this is a problem ===
> >>>       >
> >>>       > It's very common (for me, and I suspect for many users of
> dplyr)
> >>>      to write
> >>>       > multi-step pipelines and put each step on its own line for
> >>>      readability.
> >>>       > Something like this:
> >>>       >
> >>>       >    ### Example 1 ###
> >>>       >    my_data_frame_1 %>%
> >>>       >      filter(some_conditions_1) %>%
> >>>       >      inner_join(my_data_frame_2, by = some_columns_1) %>%
> >>>       >      group_by(some_columns_2) %>%
> >>>       >      summarize(some_aggregate_functions_1) %>%
> >>>       >      filter(some_conditions_2) %>%
> >>>       >      left_join(my_data_frame_3, by = some_columns_3) %>%
> >>>       >      group_by(some_columns_4) %>%
> >>>       >      summarize(some_aggregate_functions_2) %>%
> >>>       >      arrange(some_columns_5)
> >>>       >
> >>>       > [I guess some might consider this an overly long pipeline; for
> me
> >>>      it's
> >>>       > pretty typical.  I *could* split it up by assigning
> intermediate
> >>>      results to
> >>>       > variables, but much of the value I get from the pipe is that it
> >>>      lets my
> >>>       > code communicate which results are temporary, and which will be
> >>>      used again
> >>>       > later.  Assigning variables for single-use results would remove
> >> that
> >>>       > expressiveness.]
> >>>       >
> >>>       > I would prefer (for reasons I'll explain) to be able to write
> the
> >>>      above
> >>>       > example like this, which isn't valid R:
> >>>       >
> >>>       >    ### Example 2 (not valid R) ###
> >>>       >    my_data_frame_1
> >>>       >      %>% filter(some_conditions_1)
> >>>       >      %>% inner_join(my_data_frame_2, by = some_columns_1)
> >>>       >      %>% group_by(some_columns_2)
> >>>       >      %>% summarize(some_aggregate_functions_1)
> >>>       >      %>% filter(some_conditions_2)
> >>>       >      %>% left_join(my_data_frame_3, by = some_columns_3)
> >>>       >      %>% group_by(some_columns_4)
> >>>       >      %>% summarize(some_aggregate_functions_2)
> >>>       >      %>% arrange(some_columns_5)
> >>>       >
> >>>       > One (minor) advantage is obvious: It lets you easily line up
> the
> >>>      pipes,
> >>>       > which means that you can see at a glance that the whole block
> is
> >>>      a single
> >>>       > pipeline, and you'd immediately notice if you inadvertently
> >>>      omitted a pipe,
> >>>       > which otherwise can lead to confusing output.  [It's also
> >>>      aesthetically
> >>>       > pleasing, especially when %>% is replaced with |>, but that's
> >>>      subjective.]
> >>>       >
> >>>       > But the bigger issue happens when I want to re-run just *part*
> of
> >> the
> >>>       > pipeline.  I do this often when debugging: if the output of the
> >>>      pipeline
> >>>       > seems wrong, I re-run the first few steps and check the output,
> >> then
> >>>       > include a little more and re-run again, etc., until I locate my
> >>>      mistake.
> >>>       > Working in an interactive notebook environment, this involves
> >>>      using the
> >>>       > cursor to select just the part of the code I want to re-run.
> >>>       >
> >>>       > It's fast and easy to select *entire* lines of code, but
> >>>      unfortunately with
> >>>       > the pipes placed at the end of the line I must instead select
> >>>      everything
> >>>       > *except* the last three characters of the line (the last two
> >>>      characters for
> >>>       > the new pipe).  Then when I want to re-run the same partial
> >>>      pipeline with
> >>>       > the next line of code included, I can't just press SHIFT+Down
> to
> >>>      select it
> >>>       > as I otherwise would, but instead must move the cursor
> >>>      horizontally to a
> >>>       > position three characters before the end of *that* line (which
> is
> >>>      generally
> >>>       > different due to varying line lengths).  And so forth each
> time I
> >>>      want to
> >>>       > include an additional line.
> >>>       >
> >>>       > Moreover, with the staggered positions of the pipes at the end
> of
> >>>      each
> >>>       > line, it's very easy to accidentally select the final pipe on a
> >>>      line, and
> >>>       > then sit there for a moment wondering if the environment has
> >> stopped
> >>>       > responding before realizing it's just waiting for further input
> >>>      (i.e., for
> >>>       > the right-hand side).  These small delays and disruptions add
> up
> >>>      over the
> >>>       > course of a day.
> >>>       >
> >>>       > This desire to select and re-run the first part of a pipeline
> is
> >>>      also the
> >>>       > reason why it doesn't suffice to achieve syntax like my
> "Example
> >>>      2" by
> >>>       > wrapping the entire pipeline in parentheses.  That's of no use
> if
> >>>      I want to
> >>>       > re-run a selection that doesn't include the final close-paren.
> >>>       >
> >>>       > === Possible Solutions ===
> >>>       >
> >>>       > I can think of two, but maybe there are others.  The first
> would
> >> make
> >>>       > "Example 2" into valid code, and the second would allow you to
> >> run a
> >>>       > selection that included a trailing pipe.
> >>>       >
> >>>       >    Solution 1: Add a special case to how R is parsed, so if the
> >> first
> >>>       > (non-whitespace) token after an end-line is a pipe, that pipe
> >>>      gets moved to
> >>>       > before the end-line.
> >>>       >      - Argument for: This lets you write code like example 2,
> >> which
> >>>       > addresses the pain point around re-running part of a pipeline,
> >>>      and has
> >>>       > advantages for readability.  Also, since starting a line with a
> >> pipe
> >>>       > operator is currently invalid, the change wouldn't break any
> >>>      working code.
> >>>       >      - Argument against: It would make the behavior of %>%
> >>>      inconsistent with
> >>>       > that of other binary operators in R.  (However, this objection
> >>>      might not
> >>>       > apply to the new pipe, |>, which I understand is being
> >>>      implemented as a
> >>>       > syntax transformation rather than a binary operator.)
> >>>       >
> >>>       >    Solution 2: Ignore the pipe operator if it occurs as the
> final
> >>>      token of
> >>>       > the code being executed.
> >>>       >      - Argument for: This would mean the user could select and
> >>>      re-run the
> >>>       > first few lines of a longer pipeline (selecting *entire*
> lines),
> >>>      avoiding
> >>>       > the difficulties described above.
> >>>       >      - Argument against: This means that %>% would be valid
> even
> >>>      if it
> >>>       > occurred without a right-hand side, which is inconsistent with
> >> other
> >>>       > operators in R.  (But, as above, this objection might not apply
> >>>      to |>.)
> >>>       > Also, this solution still doesn't enable the syntax of "Example
> >>>      2", with
> >>>       > its readability benefit.
> >>>       >
> >>>       > Thanks for reading this and considering it.
> >>>       >
> >>>       > - Tim Goodman
> >>>       >
> >>>       >       [[alternative HTML version deleted]]
> >>>       >
> >>>       > ______________________________________________
> >>>       > [hidden email] <mailto:[hidden email]> mailing
> list
> >>>       > https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>      <https://stat.ethz.ch/mailman/listinfo/r-devel>
> >>>       >
> >>>
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
"Whereas true religion and good morals are the only solid foundations of
public liberty and happiness . . . it is hereby earnestly recommended to
the several States to take the most effectual measures for the
encouragement thereof." Continental Congress, 1778

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Timothy Goodman
In reply to this post by Duncan Murdoch-2
On Wed, Dec 9, 2020 at 1:03 PM Duncan Murdoch <[hidden email]>
wrote:  Then I could run any number of lines with pipes at the

> > start and no special character at the end, and have it treated as a
> > single pipeline.  I suppose that'd need to be a feature offered by the
> > environment (RStudio's RNotebooks in my case).  I could wrap my
> > pipelines in parentheses (to make the "pipes at start of line" syntax
> > valid R code), and then could use the hypothetical "submit selected code
> > ignoring line-breaks" feature when running just the first part of the
> > pipeline -- i.e., selecting full lines, but starting after the opening
> > paren so as not to need to insert a closing paren.
>
> I think I don't understand your workflow enough to comment on this.
>
> Duncan
>
>
>
What I mean is, I could add parentheses as suggested to let me put the
pipes at the start of the line, like this:

    (                                  # Line 1
        my_data_frame                  # Line 2
        |> filter(some_condition)      # Line 3
        |> group_by(some_column)       # Line 4
        |> summarize(some_functions)   # Line 5
    )                                  # Line 6

If this gives me an unexpected result, I might want to re-run just up
through line 3 and check the output, to see if something is wrong with the
"filter" (e.g., my condition matched less data than expected).  Ideally, I
could do this without changing the code, by just selecting lines 2 and 3
and pressing Ctrl+Enter (my environment's shortcut for "run selected
code").  But it wouldn't work, because without including the parentheses
these lines would be treated as two separate expressions, the second of
which is invalid since it starts with a pipe.  Alternatively, I could
include line 1 in my selection (along with lines 2 and 3), but it wouldn't
work without having to type a new closing parenthesis after line 3, and
then delete it afterwards.  Or, I could select and comment out lines 4 and
5, and then select and run all 6 lines.  But none of those are as
convenient as just being able to select and run lines 2 and 3 (which is
what I'm used to being able to do in several other languages which support
pipelines).  And though it may seem a minor annoyance, when I'm working a
lot with dplyr code I find myself wanting to do something like this many
times per day.

What *would* work well would be if I could write the code as above, but
then when I want to select and re-run just lines 2 and 3, I would use some
keyboard shortcut that meant "pass this code to the parser as a single
line, with line breaks (and comments) removed".  Then it would be run like
    my_data_frame |> filter(some_condition)
instead of producing an error.  That'd require the environment I'm using --
RStudio -- to support this feature, but wouldn't require any change to how
R is parsed.  From the replies here, I'm coming around to thinking that'd
be the better option.

- Tim

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Timothy Goodman
In reply to this post by Kevin Ushey
I'm thrilled to hear it!  Thank you!

- Tim

P.S. I re-added the r-devel list, since Kevin's reply was sent just to me,
but I thought there might be others interested in knowing about those work
items.  (I hope that's OK, email-etiquette-wise.)

On Wed, Dec 9, 2020 at 1:10 PM Kevin Ushey <[hidden email]> wrote:

> You might be surprised to learn that the RStudio IDE engineers might
> be receptive to such a feature request. :-)
>
> https://github.com/rstudio/rstudio/issues/8589
> https://github.com/rstudio/rstudio/issues/8590
>
> (Spoiler alert: I am one of the RStudio IDE engineers, and I think
> this would be worth doing.)
>
> Best,
> Kevin
>
> On Wed, Dec 9, 2020 at 12:16 PM Timothy Goodman <[hidden email]>
> wrote:
> >
> > Since my larger concern is being able to conveniently select and re-run
> part of a multiline pipeline, I don't think wrapping in parentheses will
> help.  I'd have to add a closing paren at the end of the selection, which
> is no more convenient than having to highlight all but the last pipe.
> (Admittedly, wrapping in parens would allow my preferred syntax of having
> pipes at the start of the line, but I don't think that's worth the cost of
> having to constantly move the trailing paren around.)
> >
> > My back-up plan if I fail to persuade you all is indeed to beg the
> developers of RStudio to add an option to do the transformation I would
> want when executing notebook code, but I'm anticipating the objection of "R
> Notebooks shouldn't transform invalid R code into valid R code."  I was
> hoping "Let's make this new pipe |> work differently in a case that's
> currently an error" would be an easier sell.
> >
> > Also, just to reiterate: Only one of my two suggestions really requires
> caring about newlines.  (That's my preferred solution, but I understand
> it'd be the bigger change.)  The other suggestion just amounts to ignoring
> a final |> when code is submitted for execution.
> >
> >  -Tim
> >
> > On Wed, Dec 9, 2020 at 11:58 AM Kevin Ushey <[hidden email]>
> wrote:
> >>
> >> I agree with Duncan that the right solution is to wrap the pipe
> >> expression with parentheses. Having the parser treat newlines
> >> differently based on whether the session is interactive, or on what
> >> type of operator happens to follow a newline, feels like a pretty big
> >> can of worms.
> >>
> >> I think this (or something similar) would accomplish what you want
> >> while still retaining the nice aesthetics of the pipe expression, with
> >> a minimal amount of syntax "noise":
> >>
> >> result <- (
> >>   data
> >>     |> op1()
> >>     |> op2()
> >> )
> >>
> >> For interactive sessions where you wanted to execute only parts of the
> >> pipeline at a time, I could see that being accomplished by the editor
> >> -- it could transform the expression so that it could be handled by R,
> >> either by hoisting the pipe operator(s) up a line, or by wrapping the
> >> to-be-executed expression in parentheses for you. If such a style of
> >> coding became popular enough, I'm sure the developers of such editors
> >> would be interested and willing to support this ...
> >>
> >> Perhaps more importantly, it would be much easier to accomplish than a
> >> change to the behavior of the R parser, and it would be work that
> >> wouldn't have to be maintained by the R Core team.
> >>
> >> Best,
> >> Kevin
> >>
> >> On Wed, Dec 9, 2020 at 11:34 AM Timothy Goodman <[hidden email]>
> wrote:
> >> >
> >> > If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute
> the
> >> > command in the Notebook environment I'm using) I certainly *would*
> expect R
> >> > to treat it as a complete statement.
> >> >
> >> > But what I'm talking about is a different case, where I highlight a
> >> > multi-line statement in my notebook:
> >> >
> >> >     my_data_frame1
> >> >         |> filter(some_conditions_1)
> >> >
> >> > and then press Ctrl+Enter.  Or, I suppose the equivalent would be to
> run an
> >> > R script containing those two lines of code, or to run a multi-line
> >> > statement like that from the console (which in RStudio I can do by
> pressing
> >> > Shift+Enter between the lines.)
> >> >
> >> > In those cases, R could either (1) Give an error message [the current
> >> > behavior], or (2) understand that the first line is meant to be piped
> to
> >> > the second.  The second option would be significantly more useful,
> and is
> >> > almost certainly what the user intended.
> >> >
> >> > (For what it's worth, there are some languages, such as Javascript,
> that
> >> > consider the first token of the next line when determining if the
> previous
> >> > line was complete.  JavaScript's rules around this are overly
> complicated,
> >> > but a rule like "a pipe following a line break is treated as
> continuing the
> >> > previous line" would be much simpler.  And while it might be
> objectionable
> >> > to treat the operator %>% different from other operators, the
> addition of
> >> > |>, which isn't truly an operator at all, seems like the right time to
> >> > consider it.)
> >> >
> >> > -Tim
> >> >
> >> > On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <
> [hidden email]>
> >> > wrote:
> >> >
> >> > > The requirement for operators at the end of the line comes from the
> >> > > interactive nature of R.  If you type
> >> > >
> >> > >      my_data_frame_1
> >> > >
> >> > > how could R know that you are not done, and are planning to type the
> >> > > rest of the expression
> >> > >
> >> > >        %>% filter(some_conditions_1)
> >> > >        ...
> >> > >
> >> > > before it should consider the expression complete?  The way
> languages
> >> > > like C do this is by requiring a statement terminator at the end.
> You
> >> > > can also do it by wrapping the entire thing in parentheses ().
> >> > >
> >> > > However, be careful: Don't use braces:  they don't work.  And parens
> >> > > have the side effect of removing invisibility from the result
> (which is
> >> > > a design flaw or bonus, depending on your point of view).  So I
> actually
> >> > > wouldn't advise this workaround.
> >> > >
> >> > > Duncan Murdoch
> >> > >
> >> > >
> >> > > On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I'm a data scientist who routinely uses R in my day-to-day work,
> for
> >> > > tasks
> >> > > > such as cleaning and transforming data, exploratory data
> analysis, etc.
> >> > > > This includes frequent use of the pipe operator from the magrittr
> and
> >> > > dplyr
> >> > > > libraries, %>%.  So, I was pleased to hear about the recent work
> on a
> >> > > > native pipe operator, |>.
> >> > > >
> >> > > > This seems like a good time to bring up the main pain point I
> encounter
> >> > > > when using pipes in R, and some suggestions on what could be done
> about
> >> > > > it.  The issue is that the pipe operator can't be placed at the
> start of
> >> > > a
> >> > > > line of code (except in parentheses).  That's no different than
> any
> >> > > binary
> >> > > > operator in R, but I find it's a source of difficulty for the pipe
> >> > > because
> >> > > > of how pipes are often used.
> >> > > >
> >> > > > [I'm assuming here that my usage is fairly typical of a lot of
> users; at
> >> > > > any rate, I don't think I'm *too* unusual.]
> >> > > >
> >> > > > === Why this is a problem ===
> >> > > >
> >> > > > It's very common (for me, and I suspect for many users of dplyr)
> to write
> >> > > > multi-step pipelines and put each step on its own line for
> readability.
> >> > > > Something like this:
> >> > > >
> >> > > >    ### Example 1 ###
> >> > > >    my_data_frame_1 %>%
> >> > > >      filter(some_conditions_1) %>%
> >> > > >      inner_join(my_data_frame_2, by = some_columns_1) %>%
> >> > > >      group_by(some_columns_2) %>%
> >> > > >      summarize(some_aggregate_functions_1) %>%
> >> > > >      filter(some_conditions_2) %>%
> >> > > >      left_join(my_data_frame_3, by = some_columns_3) %>%
> >> > > >      group_by(some_columns_4) %>%
> >> > > >      summarize(some_aggregate_functions_2) %>%
> >> > > >      arrange(some_columns_5)
> >> > > >
> >> > > > [I guess some might consider this an overly long pipeline; for me
> it's
> >> > > > pretty typical.  I *could* split it up by assigning intermediate
> results
> >> > > to
> >> > > > variables, but much of the value I get from the pipe is that it
> lets my
> >> > > > code communicate which results are temporary, and which will be
> used
> >> > > again
> >> > > > later.  Assigning variables for single-use results would remove
> that
> >> > > > expressiveness.]
> >> > > >
> >> > > > I would prefer (for reasons I'll explain) to be able to write the
> above
> >> > > > example like this, which isn't valid R:
> >> > > >
> >> > > >    ### Example 2 (not valid R) ###
> >> > > >    my_data_frame_1
> >> > > >      %>% filter(some_conditions_1)
> >> > > >      %>% inner_join(my_data_frame_2, by = some_columns_1)
> >> > > >      %>% group_by(some_columns_2)
> >> > > >      %>% summarize(some_aggregate_functions_1)
> >> > > >      %>% filter(some_conditions_2)
> >> > > >      %>% left_join(my_data_frame_3, by = some_columns_3)
> >> > > >      %>% group_by(some_columns_4)
> >> > > >      %>% summarize(some_aggregate_functions_2)
> >> > > >      %>% arrange(some_columns_5)
> >> > > >
> >> > > > One (minor) advantage is obvious: It lets you easily line up the
> pipes,
> >> > > > which means that you can see at a glance that the whole block is
> a single
> >> > > > pipeline, and you'd immediately notice if you inadvertently
> omitted a
> >> > > pipe,
> >> > > > which otherwise can lead to confusing output.  [It's also
> aesthetically
> >> > > > pleasing, especially when %>% is replaced with |>, but that's
> >> > > subjective.]
> >> > > >
> >> > > > But the bigger issue happens when I want to re-run just *part* of
> the
> >> > > > pipeline.  I do this often when debugging: if the output of the
> pipeline
> >> > > > seems wrong, I re-run the first few steps and check the output,
> then
> >> > > > include a little more and re-run again, etc., until I locate my
> mistake.
> >> > > > Working in an interactive notebook environment, this involves
> using the
> >> > > > cursor to select just the part of the code I want to re-run.
> >> > > >
> >> > > > It's fast and easy to select *entire* lines of code, but
> unfortunately
> >> > > with
> >> > > > the pipes placed at the end of the line I must instead select
> everything
> >> > > > *except* the last three characters of the line (the last two
> characters
> >> > > for
> >> > > > the new pipe).  Then when I want to re-run the same partial
> pipeline with
> >> > > > the next line of code included, I can't just press SHIFT+Down to
> select
> >> > > it
> >> > > > as I otherwise would, but instead must move the cursor
> horizontally to a
> >> > > > position three characters before the end of *that* line (which is
> >> > > generally
> >> > > > different due to varying line lengths).  And so forth each time I
> want to
> >> > > > include an additional line.
> >> > > >
> >> > > > Moreover, with the staggered positions of the pipes at the end of
> each
> >> > > > line, it's very easy to accidentally select the final pipe on a
> line, and
> >> > > > then sit there for a moment wondering if the environment has
> stopped
> >> > > > responding before realizing it's just waiting for further input
> (i.e.,
> >> > > for
> >> > > > the right-hand side).  These small delays and disruptions add up
> over the
> >> > > > course of a day.
> >> > > >
> >> > > > This desire to select and re-run the first part of a pipeline is
> also the
> >> > > > reason why it doesn't suffice to achieve syntax like my "Example
> 2" by
> >> > > > wrapping the entire pipeline in parentheses.  That's of no use if
> I want
> >> > > to
> >> > > > re-run a selection that doesn't include the final close-paren.
> >> > > >
> >> > > > === Possible Solutions ===
> >> > > >
> >> > > > I can think of two, but maybe there are others.  The first would
> make
> >> > > > "Example 2" into valid code, and the second would allow you to
> run a
> >> > > > selection that included a trailing pipe.
> >> > > >
> >> > > >    Solution 1: Add a special case to how R is parsed, so if the
> first
> >> > > > (non-whitespace) token after an end-line is a pipe, that pipe
> gets moved
> >> > > to
> >> > > > before the end-line.
> >> > > >      - Argument for: This lets you write code like example 2,
> which
> >> > > > addresses the pain point around re-running part of a pipeline,
> and has
> >> > > > advantages for readability.  Also, since starting a line with a
> pipe
> >> > > > operator is currently invalid, the change wouldn't break any
> working
> >> > > code.
> >> > > >      - Argument against: It would make the behavior of %>%
> inconsistent
> >> > > with
> >> > > > that of other binary operators in R.  (However, this objection
> might not
> >> > > > apply to the new pipe, |>, which I understand is being
> implemented as a
> >> > > > syntax transformation rather than a binary operator.)
> >> > > >
> >> > > >    Solution 2: Ignore the pipe operator if it occurs as the final
> token
> >> > > of
> >> > > > the code being executed.
> >> > > >      - Argument for: This would mean the user could select and
> re-run the
> >> > > > first few lines of a longer pipeline (selecting *entire* lines),
> avoiding
> >> > > > the difficulties described above.
> >> > > >      - Argument against: This means that %>% would be valid even
> if it
> >> > > > occurred without a right-hand side, which is inconsistent with
> other
> >> > > > operators in R.  (But, as above, this objection might not apply
> to |>.)
> >> > > > Also, this solution still doesn't enable the syntax of "Example
> 2", with
> >> > > > its readability benefit.
> >> > > >
> >> > > > Thanks for reading this and considering it.
> >> > > >
> >> > > > - Tim Goodman
> >> > > >
> >> > > >       [[alternative HTML version deleted]]
> >> > > >
> >> > > > ______________________________________________
> >> > > > [hidden email] mailing list
> >> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> > > >
> >> > >
> >> > >
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: the pipe |> and line breaks in pipelines

Bill Dunlap-2
In reply to this post by Timothy Goodman
When I am debugging a function with code like
    x <- f1(x)
    x <- f2(x)
    result <- f3(x)
I will often slip a line like '.GlobalEnv$tmp1 <- x' between the first two
lines and '.GlobalEnv$tmp2 <- x' between the last two lines and look at the
intermediate results, 'tmp1' and 'tmp2' in the global environment, later to
see what is going on.

The equivalent expression using pipes is
    x |>
        f1() |>
        f2() \>
        f3() -> result
You can slip lines like 'print() \>' between the pipe parts because
print(x) returns x, but it is more tedious to add assignment lines.  One
could define a function like
   pipe_save <- function(x, name, envir=.GlobalEnv) {
       envir[[name]] <- x
        x
   }
and then puts lines like 'pipe_save("tmp1") |>' into the pipe sequence to
save intermediate results.

A function like
    pipe_eval <- function(x, expr) {
       eval(substitute(expr), list(x=x))
        x
   }
would make it easy to call plot() or summary(), etc., on the piped data
with lines like
   'pipe_eval(print(summary(x)) |>'
inserted into the pipe sequence.

E.g.,

> 1/(1:10) |>
+    pipe_eval(print(summary(x))) |>
+    range() |>
+    pipe_eval(print(x)) |>
+    sum()
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.1000  0.1295  0.1833  0.2929  0.3125  1.0000
[1] 0.1 1.0
[1] 1.1

You could even add if(isTRUE(getOption("debug"))) before the eval() or
assignment to make these do nothing to make it easy to turn debugging on
and off with options(debug=TRUE/FALSE).

-Bill


On Wed, Dec 9, 2020 at 1:58 PM Timothy Goodman <[hidden email]>
wrote:
>
> On Wed, Dec 9, 2020 at 1:03 PM Duncan Murdoch <[hidden email]>
> wrote:  Then I could run any number of lines with pipes at the
>
> > > start and no special character at the end, and have it treated as a
> > > single pipeline.  I suppose that'd need to be a feature offered by the
> > > environment (RStudio's RNotebooks in my case).  I could wrap my
> > > pipelines in parentheses (to make the "pipes at start of line" syntax
> > > valid R code), and then could use the hypothetical "submit selected
code

> > > ignoring line-breaks" feature when running just the first part of the
> > > pipeline -- i.e., selecting full lines, but starting after the opening
> > > paren so as not to need to insert a closing paren.
> >
> > I think I don't understand your workflow enough to comment on this.
> >
> > Duncan
> >
> >
> >
> What I mean is, I could add parentheses as suggested to let me put the
> pipes at the start of the line, like this:
>
>     (                                  # Line 1
>         my_data_frame                  # Line 2
>         |> filter(some_condition)      # Line 3
>         |> group_by(some_column)       # Line 4
>         |> summarize(some_functions)   # Line 5
>     )                                  # Line 6
>
> If this gives me an unexpected result, I might want to re-run just up
> through line 3 and check the output, to see if something is wrong with the
> "filter" (e.g., my condition matched less data than expected).  Ideally, I
> could do this without changing the code, by just selecting lines 2 and 3
> and pressing Ctrl+Enter (my environment's shortcut for "run selected
> code").  But it wouldn't work, because without including the parentheses
> these lines would be treated as two separate expressions, the second of
> which is invalid since it starts with a pipe.  Alternatively, I could
> include line 1 in my selection (along with lines 2 and 3), but it wouldn't
> work without having to type a new closing parenthesis after line 3, and
> then delete it afterwards.  Or, I could select and comment out lines 4 and
> 5, and then select and run all 6 lines.  But none of those are as
> convenient as just being able to select and run lines 2 and 3 (which is
> what I'm used to being able to do in several other languages which support
> pipelines).  And though it may seem a minor annoyance, when I'm working a
> lot with dplyr code I find myself wanting to do something like this many
> times per day.
>
> What *would* work well would be if I could write the code as above, but
> then when I want to select and re-run just lines 2 and 3, I would use some
> keyboard shortcut that meant "pass this code to the parser as a single
> line, with line breaks (and comments) removed".  Then it would be run like
>     my_data_frame |> filter(some_condition)
> instead of producing an error.  That'd require the environment I'm using
--

> RStudio -- to support this feature, but wouldn't require any change to how
> R is parsed.  From the replies here, I'm coming around to thinking that'd
> be the better option.
>
> - Tim
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel