"+" operator on characters revisited

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

"+" operator on characters revisited

Vitalie S.-2

Hello everyone!

Motivated by the recent post on SO
http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r

I wonder what is the current state of argument on making "+" to
concatenate character vectors. The "+" method is still sealed for
signature("character", "character") in the current version of R.

The 4 years old R-devel thread
https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html
on the same topic, stopped without reaching any definite conclusion.

The only definite argument occurred in the thread against "+" operator
was the lack of commutativity (as if one have to prove algebraic
theorems in R).

Yet another useful suggestion of introducing cat0() and paste0(), for
the common use of cat and paste with sep="" was not absorbed by the
core R either.

Thanks,
Vitalie

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Gabor Grothendieck
On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <[hidden email]> wrote:

>
> Hello everyone!
>
> Motivated by the recent post on SO
> http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r
>
> I wonder what is the current state of argument on making "+" to
> concatenate character vectors. The "+" method is still sealed for
> signature("character", "character") in the current version of R.
>
> The 4 years old R-devel thread
> https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html
> on the same topic, stopped without reaching any definite conclusion.
>
> The only definite argument occurred in the thread against "+" operator
> was the lack of commutativity (as if one have to prove algebraic
> theorems in R).
>
> Yet another useful suggestion of introducing cat0() and paste0(), for
> the common use of cat and paste with sep="" was not absorbed by the
> core R either.

The gsubfn package has always had a paste0 function and I would be
happy to remove it if the core adds it.

Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:

library(gsubfn)
fn$cat("pi = $pi\n")

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Vitalie S.-2
Gabor Grothendieck <[hidden email]> writes:

> On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <[hidden email]> wrote:
>>
>> Hello everyone!
>>
>> Motivated by the recent post on SO
>>
> http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r>
>> I wonder what is the current state of argument on making "+" to
>> concatenate character vectors. The "+" method is still sealed for
>> signature("character", "character") in the current version of R.
>>
>> The 4 years old R-devel thread
>> https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html>
>> on the same topic, stopped without reaching any definite conclusion.
>>
>> The only definite argument occurred in the thread against "+" operator
>> was the lack of commutativity (as if one have to prove algebraic
>> theorems in R).
>>
>> Yet another useful suggestion of introducing cat0() and paste0(), for
>> the common use of cat and paste with sep="" was not absorbed by the
>> core R either.
>
> The gsubfn package has always had a paste0 function and I would be
> happy to remove it if the core adds it.
>
> Also the gsubfn supports quasi perl style string interpolation that
> can sometimes be used to avoid the use of paste in the first place.
> Just preface the function in question by fn$ like this:
>
> library(gsubfn)
> fn$cat("pi = $pi\n")

Thanks for the tip. Not bad indeed.
Almost as readable as

cat("pi = " + pi + "\n")

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Gabor Grothendieck
On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <[hidden email]> wrote:

> Gabor Grothendieck <[hidden email]> writes:
>
>> On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <[hidden email]> wrote:
>>>
>>> Hello everyone!
>>>
>>> Motivated by the recent post on SO
>>>
>> http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r>
>>> I wonder what is the current state of argument on making "+" to
>>> concatenate character vectors. The "+" method is still sealed for
>>> signature("character", "character") in the current version of R.
>>>
>>> The 4 years old R-devel thread
>>> https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html>
>>> on the same topic, stopped without reaching any definite conclusion.
>>>
>>> The only definite argument occurred in the thread against "+" operator
>>> was the lack of commutativity (as if one have to prove algebraic
>>> theorems in R).
>>>
>>> Yet another useful suggestion of introducing cat0() and paste0(), for
>>> the common use of cat and paste with sep="" was not absorbed by the
>>> core R either.
>>
>> The gsubfn package has always had a paste0 function and I would be
>> happy to remove it if the core adds it.
>>
>> Also the gsubfn supports quasi perl style string interpolation that
>> can sometimes be used to avoid the use of paste in the first place.
>> Just preface the function in question by fn$ like this:
>>
>> library(gsubfn)
>> fn$cat("pi = $pi\n")
>
> Thanks for the tip. Not bad indeed.
> Almost as readable as
>
> cat("pi = " + pi + "\n")

To me the + can be substantially less readable.  The need to
repeatedly quote everything makes it just as bad as paste.  Compare
the following and try to figure out if there is an error in quoting in
the + and paste solutions.  Trying to distinguish the single and
double quotes is pretty difficult but simple in the fn$ and sprintf
solutions.  Even if there were no quotes the constant need to
interpose quotes makes it hard to read.

library(sqldf) # also pulls in gsubfn which has fn$ and paste0
plant <- "Qn1"
treatment <- "nonchilled"

# using +
# sqldf("select * from CO2 where Plant = '" + plant + "' and Treatment
= '" + treatment + "' limit 10")

# using paste0, also from gsubfn
sqldf(paste0("select * from CO2 where Plant = '", plant, "' and
Treatment = '", treatment, "' limit 10"))

# using paste, almost the same as last one
sqldf(paste("select * from CO2 where Plant = '", plant, "' and
Treatment = '", treatment, "' limit 10", sep = ""))

# With the perl-like interpolation you don't need the repeated quoting
in the first place so its much clearer.

# using perl-like interpolation from gsubfn
fn$sqldf("select * from CO2 where Plant = '$plant' and Treatment =
'$treatment' limit 10")

# sprintf is nearly as good as the perl-like interpolation except you
have to match up % codes and arguments which is a bit of nuisance #
and there are more parentheses.  On the other hand it does have the
advantage that there is the facility for fancier formatting codes
# (though this example does not illustrate that aspect):

# using sprintf
sqldf(sprintf("select * from CO2 where Plant = '%s' and Treatment =
'%s' limit 10", plant, treatment))

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Hadley Wickham-2
In reply to this post by Vitalie S.-2
> Yet another useful suggestion of introducing cat0() and paste0(), for
> the common use of cat and paste with sep="" was not absorbed by the
> core R either.

stringr has str_c which is a replacement for paste with sep = "" and
automatic removal of length 0 inputs.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Peter Dalgaard-2
In reply to this post by Vitalie S.-2

On Jan 22, 2011, at 21:08 , Vitalie S. wrote:

> The only definite argument occurred in the thread against "+" operator
> was the lack of commutativity (as if one have to prove algebraic
> theorems in R).

I think the real killer was associativity, combined with coercion rules:

Is "x"+1+2 supposed to be equal to "x12" or "x3"?

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Spencer Graves-2
In reply to this post by Hadley Wickham-2
Consider the following from Python 2.6.5:


 >>> 'abc'+ 2

Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
     'abc'+ 2
TypeError: cannot concatenate 'str' and 'int' objects
 >>> 'abc'+'2'
'abc2'
 >>>


       Spencer


On 1/23/2011 8:09 AM, Hadley Wickham wrote:
>> Yet another useful suggestion of introducing cat0() and paste0(), for
>> the common use of cat and paste with sep="" was not absorbed by the
>> core R either.
> stringr has str_c which is a replacement for paste with sep = "" and
> automatic removal of length 0 inputs.
>
> Hadley
>
>


--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Spencer Graves-2
In reply to this post by Peter Dalgaard-2
On 1/23/2011 8:50 AM, peter dalgaard wrote:
> On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
>
>> The only definite argument occurred in the thread against "+" operator
>> was the lack of commutativity (as if one have to prove algebraic
>> theorems in R).
> I think the real killer was associativity, combined with coercion rules:
>
> Is "x"+1+2 supposed to be equal to "x12" or "x3"?
>
       Excellent:  This seems like a good reason to follow Python:  
Allow "a+b" with a character vector "a" only if "b" is also a character
vector (or factor?).


       This example raises another question:  If we allow "a+b" for "a"
and "b" both character vectors (and give an error if one is numeric),
what do we do with factors?  If "a" is a factor, return a factor?


       Spencer

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Duncan Murdoch-2
In reply to this post by Peter Dalgaard-2
On 23/01/2011 11:50 AM, peter dalgaard wrote:

>
> On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
>
>> The only definite argument occurred in the thread against "+" operator
>> was the lack of commutativity (as if one have to prove algebraic
>> theorems in R).
>
> I think the real killer was associativity, combined with coercion rules:
>
> Is "x"+1+2 supposed to be equal to "x12" or "x3"?
>

As I pointed out at the time, we don't even have associativity for
integer addition.  For example in

-1L + .Machine$integer.max + 1L

the two possibilities

(-1L + .Machine$integer.max) + 1L

and

-1L + (.Machine$integer.max + 1L)

give different results.  When I try it now without parentheses, I get
the same answer as the first one, but I don't believe we guarantee that
that will always be so.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Vitalie S.-2
In reply to this post by Spencer Graves-2
Spencer Graves <[hidden email]> writes:

> On 1/23/2011 8:50 AM, peter dalgaard wrote:
>> On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
>>
>>> The only definite argument occurred in the thread against "+" operator
>>> was the lack of commutativity (as if one have to prove algebraic
>>> theorems in R).
>> I think the real killer was associativity, combined with coercion rules:
>>
>> Is "x"+1+2 supposed to be equal to "x12" or "x3"?
>>
>       Excellent:  This seems like a good reason to follow Python:  Allow "a+b" with a character vector "a" only if
> "b" is also a character vector (or factor?).
>
>       This example raises another question:  If we allow "a+b" for "a" and "b" both character vectors (and give an
> error if one is numeric), what do we do with factors?  If "a" is a factor,
> return a factor?

If we define custom %+% as:

    `%+%` <- function(a, b){
        if(is.character(a) || is.character(b))
            paste(as.character(a), as.character(b), sep="")
        else
            a + b
    }

because of higher precedence of %any% operators over binary + we have:

    "a" %+% 1 %+% 2
    ## [1] "a12"

and

   str("a" %+% factor(1:2))
   ## chr [1:2] "a1" "a2"

so if + on characters would behave "as if" having slightly higher priority than
other + operators that might solve reasonably the problem.

Vitalie.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Spencer Graves-2


On 1/23/2011 12:15 PM, Vitalie S. wrote:

> Spencer Graves<[hidden email]>  writes:
>
>> On 1/23/2011 8:50 AM, peter dalgaard wrote:
>>> On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
>>>
>>>> The only definite argument occurred in the thread against "+" operator
>>>> was the lack of commutativity (as if one have to prove algebraic
>>>> theorems in R).
>>> I think the real killer was associativity, combined with coercion rules:
>>>
>>> Is "x"+1+2 supposed to be equal to "x12" or "x3"?
>>>
>>        Excellent:  This seems like a good reason to follow Python:  Allow "a+b" with a character vector "a" only if
>> "b" is also a character vector (or factor?).
>>
>>        This example raises another question:  If we allow "a+b" for "a" and "b" both character vectors (and give an
>> error if one is numeric), what do we do with factors?  If "a" is a factor,
>> return a factor?
> If we define custom %+% as:
>
>      `%+%`<- function(a, b){
>          if(is.character(a) || is.character(b))
>              paste(as.character(a), as.character(b), sep="")
>          else
>              a + b
>      }
>
> because of higher precedence of %any% operators over binary + we have:
>
>      "a" %+% 1 %+% 2
>      ## [1] "a12"
>
> and
>
>     str("a" %+% factor(1:2))
>     ## chr [1:2] "a1" "a2"
>
> so if + on characters would behave "as if" having slightly higher priority than
> other + operators that might solve reasonably the problem.
>
> Vitalie.

No:  'a' %+% (1 %+%2)  != ('a' %+% 1) %+% 2, as Peter Dalgaard noted:  
'a3' != 'a12'.


> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Davor Cubranic
In reply to this post by Gabor Grothendieck
On 2011-01-23, at 4:34 AM, Gabor Grothendieck wrote:

> On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <[hidden email]> wrote:
>> Gabor Grothendieck <[hidden email]> writes:
>>
>>> Also the gsubfn supports quasi perl style string interpolation that
>>> can sometimes be used to avoid the use of paste in the first place.
>>> Just preface the function in question by fn$ like this:
>>>
>>> library(gsubfn)
>>> fn$cat("pi = $pi\n")
>>
>> Thanks for the tip. Not bad indeed.
>> Almost as readable as
>>
>> cat("pi = " + pi + "\n")
>
> To me the + can be substantially less readable.  The need to
> repeatedly quote everything makes it just as bad as paste.  Compare
> the following and try to figure out if there is an error in quoting in
> the + and paste solutions.  Trying to distinguish the single and
> double quotes is pretty difficult but simple in the fn$ and sprintf
> solutions.  Even if there were no quotes the constant need to
> interpose quotes makes it hard to read.

That may be a matter of taste, but FWIW it seems that shell-style string interpolation (using the dollar prefix) has going out of style in recent scripting languages. Ruby uses the expression substitution construct ("#{expr}"), while Python has "str.format", both allowing arbitrary expressions.

And most editors have syntax highlighting that distinguishes strings from other program elements. This makes quoting errors pretty obvious.

Davor
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: "+" operator on characters revisited

Gabor Grothendieck
On Mon, Jan 24, 2011 at 2:15 PM, Davor Cubranic <[hidden email]> wrote:

> On 2011-01-23, at 4:34 AM, Gabor Grothendieck wrote:
>
>> On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <[hidden email]> wrote:
>>> Gabor Grothendieck <[hidden email]> writes:
>>>
>>>> Also the gsubfn supports quasi perl style string interpolation that
>>>> can sometimes be used to avoid the use of paste in the first place.
>>>> Just preface the function in question by fn$ like this:
>>>>
>>>> library(gsubfn)
>>>> fn$cat("pi = $pi\n")
>>>
>>> Thanks for the tip. Not bad indeed.
>>> Almost as readable as
>>>
>>> cat("pi = " + pi + "\n")
>>
>> To me the + can be substantially less readable.  The need to
>> repeatedly quote everything makes it just as bad as paste.  Compare
>> the following and try to figure out if there is an error in quoting in
>> the + and paste solutions.  Trying to distinguish the single and
>> double quotes is pretty difficult but simple in the fn$ and sprintf
>> solutions.  Even if there were no quotes the constant need to
>> interpose quotes makes it hard to read.
>
> That may be a matter of taste, but FWIW it seems that shell-style string interpolation (using the dollar prefix) has going out of style in recent scripting languages. Ruby uses the expression substitution construct ("#{expr}"), while Python has "str.format", both allowing arbitrary expressions.
>

fn$ supports that too using `...`

> library(sqldf)
> fn$sqldf("select * from BOD where demand > `mean(BOD$demand)` limit 2")
  Time demand
1    3     19
2    4     16


> And most editors have syntax highlighting that distinguishes strings from other program elements. This makes quoting errors pretty obvious.
>

That only makes it slightly easier to handle the mess.  Its better to
get rid of the quotes in the first place.

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel