

It turns out that allowing a bare function expression on the
righthand side (RHS) of a pipe creates opportunities for confusion
and mistakes that are too risky. So we will be dropping support for
this from the pipe operator.
The case of a RHS call that wants to receive the LHS result in an
argument other than the first can be handled with just implicit first
argument passing along the lines of
mtcars > subset(cyl == 4) > (\(d) lm(mpg ~ disp, data = d))()
It was hoped that allowing a bare function expression would make this
more convenient, but it has issues as outlined below. We are exploring
some alternatives, and will hopefully settle on one soon after the
holidays.
The basic problem, pointed out in a comment on Twitter, is that in
expressions of the form
1 > \(x) x + 1 > y
1 > \(x) x + 1 > \(y) x + y
everything after the \(x) is parsed as part of the body of the
function. So these are parsed along the lines of
1 > \(x) { x + 1 > y }
1 > \(x) { x + 1 > \(y) x + y }
In the first case the result is assigned to a (useless) local
variable. Someone writing this is more likely to have intended to
assign the result to a global variable, as this would:
(1 > \(x) x + 1) > y
In the second case the 'x' in 'x + y' refers to the local variable 'x'
in the first RHS function. Someone writing this is more likely to have
meant
(1 > \(x) x + 1) > \(y) x + y
with 'x' in 'x + y' now referring to a global variable:
> x < 2
> 1 > \(x) x + 1 > \(y) x + y
[1] 3
> (1 > \(x) x + 1) > \(y) x + y
[1] 4
These issues arise with any approach in R that allows a bare function
expression on the RHS of a pipe operation. It also arises in other
languages with pipe operators. For example, here is the last example
in Julia:
julia> x = 2
2
julia> 1 > x > x + 1 > y > x + y
3
julia> ( 1 > x > x + 1 ) > y > x + y
4
Even though proper use of parentheses can work around these issues,
the likelihood of making mistakes that are hard to track down is too
high. So we will disallow the use of bare function expressions on the
right hand side of a pipe.
Best,
luke

After some discussions we've settled on a syntax of the form
mtcars > subset(cyl == 4) > d => lm(mpg ~ disp, data = d)
to handle cases where the pipe lhs needs to be passed to an argument
other than the first of the function called on the rhs. This seems a
to be a reasonable balance between making these nonstandard cases
easy to see but still easy to write. This is now committed to Rdevel.
Best,
luke
Bill
On Tue, Jan 12, 2021 at 12:01 PM Dirk Eddelbuettel
On 12/01/2021 3:52 p.m., Bill Dunlap wrote:
> '=>' can be defined as a function. E.g., it could be the logical "implies"
> function:
> > `=>` < function(x, y) !x  y
> > TRUE => FALSE
> [1] FALSE
> > FALSE => TRUE
> [1] TRUE
> It might be nice then to have deparse() display it as an infix operator
> instead of the current prefix:
> > deparse(quote(p => q))
> [1] "`=>`(p, q)"
> There was a user who recently wrote asking for an infix operator like > or
> => that would deparse nicely for use in some sort of model specification.
The precedence of it as an operator is determined by what makes sense in
the pipe construction. Currently precedence appears to be
:: ::: access variables in a namespace
$ @ component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
 + unary minus and plus
: sequence operator
%any% special operators (including %% and %/%)
* / multiply, divide
+  (binary) add, subtract
< > <= >= == != ordering and comparison
! negation
& && and
  or
=> PIPE BIND
> PIPE
~ as in formulae
> >> rightwards assignment
< << assignment (right to left)
= assignment (right to left)
? help (unary and binary)
(Most of this is taken from ?Syntax, but I added the new operators in
based on the gram.y file). So
A & B => C & D
would appear to be parsed as
(A & B) => (C & D)
I think this also makes sense; do you?
Duncan Murdoch
>
> When used with >, the parser will turn the > and => into an ordinary
> looking function call so deparsing is irrelevant.
> > deparse(quote(x > tmp => f(7,arg2=tmp)))
> [1] "f(7, arg2 = x)"
>
> Bill
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


I agree that the precedence looks reasonable. E.g.,
> str.language(quote(A > 0 & A<=B & B <= C => A <= C & 0 < C))
language: `=>`(A > 0 & A <= B & B <= C, A <= C ...
symbol: =>
language: A > 0 & A <= B & B <= C
symbol: &
language: A > 0 & A <= B
symbol: &
language: A > 0
symbol: >
symbol: A
double: 0
language: A <= B
symbol: <=
symbol: A
symbol: B
language: B <= C
symbol: <=
symbol: B
symbol: C
language: A <= C & 0 < C
symbol: &
language: A <= C
symbol: <=
symbol: A
symbol: C
language: 0 < C
symbol: <
double: 0
symbol: C
> str.language(quote(data > tmp1 => f1(x, arg1=tmp1) > f2(y) > tmp3 =>
f3(z, arg3=tmp3)))
language: f3(z, arg3 = f2(f1(x, arg1 = data), y))
symbol: f3
symbol: z
language: arg3 = f2(f1(x, arg1 = data), y)
symbol: f2
language: f1(x, arg1 = data)
symbol: f1
symbol: x
symbol: arg1 = data
symbol: y
Where str.language is
str.language < function(expr, name = "", indent = 0)
{
trim... < function(string, width.cutoff) {
if (nchar(string) > width.cutoff) {
string < sprintf("%.*s ...", width.cutoff4, string)
}
string
}
cat(sep="", rep(" ", indent), typeof(expr), ": ",
if(length(name)==1 && nzchar(name)) { paste0(name, " = ") },
trim...(deparse1(expr, width.cutoff=40), width.cutoff=40),
"\n")
if (is.recursive(expr)) {
if (!is.list(expr)) {
expr < as.list(expr)
}
nms < names(expr)
for (i in seq_along(expr)) {
str.language(expr[[i]], name=nms[[i]], indent = indent + 1)
}
}
invisible(expr)
}
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


These are documented but still seem like serious deficiencies:
> f < function(x, y) x + 10*y
> 3 > x => f(x, x)
Error in f(x, x) : pipe placeholder may only appear once
> 3 > x => f(1+x, 1)
Error in f(1 + x, 1) :
pipe placeholder must only appear as a toplevel argument in the RHS call
Also note:
?"=>"
No documentation for ‘=>’ in specified packages and libraries:
you could try ‘??=>’
Gabor,
Although it might be nice if all imagined cases worked, there are many ways to work around and get the results you want.
You may want to consider that it is easier to recognize the symbol you use (x in the examples) if it is alone and used only exactly once and it the list of function arguments. If you want the x used multiple times, you can make a function that accepts the x once and then invokes another function and reuses the x as often as needed. Similarly for 1+x.
I do not know if the above choice was made to make it easier and faster to apply the above, or to avoid possible bad edge cases. Have you tested other ideas like:
3 > x => f(x=5)
Or
3 > x => f(x, y=x)
I mean ones where a default is supplied, not that it makes much sense here?
I am thinking of the concept of substitution as is often done for text or symbols. Often the substitution is done for the first instance found unless you specify you want a global change. In your examples, if only the first use of x would be replaced, the second naked x being left alone would be an error. If all instances were changed, what anomalies might happen? Giving a vector of length 1 containing the number 3 seems harmless enough to duplicate. But the pipeline can send all kinds of interesting data structures through including data.frames and arbitrary objects.
If
3 > x => f(x, y=x)
were allowed then I think that
runif(1) > x => f(x, y=x)
be parsed as
f(runif(1), y=runif(1))
so runif(1) would be evaluated twice, leading to incorrect results from f().
Bill
