regexec() bug in R 3.4.0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

regexec() bug in R 3.4.0

Weeks, Nathan
Hi,

In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector":

     regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
             fixed = FALSE, useBytes = FALSE)

     x, text: a character vector where matches are sought, or an object
         which can be coerced by as.character to a character vector.
         Long vectors are supported.

However, in R 3.4.0, this coercion doesn't seem to automatically occur for the text argument of regexec(), whereas it does for gregexpr(), regexpr(), etc:

============================================================
$ R --vanilla

R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

...
> text <- as.factor("foobar")
> regexec("foo", text)
Error in regexec("foo", text) : invalid 'text' argument
> regexec("foo", as.character(text))                                                                                                             [[1]]                                                                                                                                            [1] 1
attr(,"match.length")
[1] 3
attr(,"useBytes")
[1] TRUE

> gregexpr("foo", text)                                                                                                                          [[1]]
[1] 1
attr(,"match.length")
[1] 3
attr(,"useBytes")
[1] TRUE
============================================================

Is this a documentation issue, a bug in regexec(), or am I misunderstanding how it's supposed to behave?

Thanks,

--
Nathan Weeks
IT Specialist
USDA-ARS Corn Insects and Crop Genetics Research Unit
Crop Genome Informatics Laboratory
Iowa State University







This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regexec() bug in R 3.4.0

Martin Maechler
>>>>> Weeks, Nathan <[hidden email]>
>>>>>     on Wed, 28 Jun 2017 17:11:01 +0000 writes:

> Hi,
>
> In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector":
>
>      regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
>              fixed = FALSE, useBytes = FALSE)
>
>      x, text: a character vector where matches are sought, or an object
>          which can be coerced by as.character to a character vector.
>          Long vectors are supported.
>
> However, in R 3.4.0, this coercion doesn't seem to automatically occur for the text argument of regexec(), whereas it does for gregexpr(), regexpr(), etc:
>
> ============================================================
> $ R --vanilla
>
> R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
> Copyright (C) 2017 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> ...
> > text <- as.factor("foobar")
> > regexec("foo", text)
> Error in regexec("foo", text) : invalid 'text' argument

[...........]

I agree this is an inconsistency of documentation and behaviour,
and hence an (easy to work around) bug.

I propose to fix the code (for consistency) rather than the
documentation and will do so if there's no dissent.

We have become wary and cautious with last minute changes so
this won't be in  R 3.4.1 (due tomorrow Friday) but probably
in 'R 3.4.1 patched" later, and then future versions.

Martin Maechler,
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...