Question about PERL lookahead construct in regex's

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about PERL lookahead construct in regex's

Bert Gunter-2
Folks:

Consider:
> y <- "xx wt"

> grep(" +(?=t)",y, perl = TRUE)
integer(0)
## Unexpected. Lookahead construct does not find "t" after space
## But
> grep(" +(?=.+t)",y, perl = TRUE)
[1] 1
## Expected. Given pattern for **exact** match, lookahead finds it

My concern is:
?regexp says this:
"Patterns (?=...) and (?!...) are zero-width positive and negative lookahead
 *assertions*: they match if an attempt to match the ... forward from the
current position would succeed (or not), but use up no characters in the
string being processed."

But this appears to be imprecise (it confused me, anyway). The usual sense
of "matching" in regex's is "match the pattern somewhere in the string
going forward." But in the perl lookahead construct it apparently must
**exactly** match *everything* in the string that follows.

Questions:
Am I correct about this? If not, what do I misunderstand?
If I am correct, should the regex help be slightly modified to something
like:

"Patterns (?=...) and (?!...) are zero-width positive and negative lookahead
 *assertions*: they match if an attempt to **exactly" match all of ... forward
from the current position would succeed (or not), but use up no characters
in the string being processed."

Thanks.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question about PERL lookahead construct in regex's

glsnow
I think that the current documentation is correct, but that does not
mean that it cannot be improved.

The key phrase for me is "from the current position"  which says to me
that the match needs to happen right there, not just somewhere in the
rest of the string.

If you used the expression " +t" then you would expect it to only
match if the t was immediately after the last space, not somewhere in
the string after the last space, it is the same with the look-ahead.

On Mon, Aug 10, 2020 at 10:37 AM Bert Gunter <[hidden email]> wrote:

>
> Folks:
>
> Consider:
> > y <- "xx wt"
>
> > grep(" +(?=t)",y, perl = TRUE)
> integer(0)
> ## Unexpected. Lookahead construct does not find "t" after space
> ## But
> > grep(" +(?=.+t)",y, perl = TRUE)
> [1] 1
> ## Expected. Given pattern for **exact** match, lookahead finds it
>
> My concern is:
> ?regexp says this:
> "Patterns (?=...) and (?!...) are zero-width positive and negative lookahead
>  *assertions*: they match if an attempt to match the ... forward from the
> current position would succeed (or not), but use up no characters in the
> string being processed."
>
> But this appears to be imprecise (it confused me, anyway). The usual sense
> of "matching" in regex's is "match the pattern somewhere in the string
> going forward." But in the perl lookahead construct it apparently must
> **exactly** match *everything* in the string that follows.
>
> Questions:
> Am I correct about this? If not, what do I misunderstand?
> If I am correct, should the regex help be slightly modified to something
> like:
>
> "Patterns (?=...) and (?!...) are zero-width positive and negative lookahead
>  *assertions*: they match if an attempt to **exactly" match all of ... forward
> from the current position would succeed (or not), but use up no characters
> in the string being processed."
>
> Thanks.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question about PERL lookahead construct in regex's

Stefan Evert-3
In reply to this post by Bert Gunter-2

> On 10 Aug 2020, at 18:36, Bert Gunter <[hidden email]> wrote:
>
> But this appears to be imprecise (it confused me, anyway). The usual sense
> of "matching" in regex's is "match the pattern somewhere in the string
> going forward." But in the perl lookahead construct it apparently must
> **exactly** match *everything* in the string that follows.
>
> Questions:
> Am I correct about this? If not, what do I misunderstand?

I think you're confused about the terminology.  To _match_ a regular expression is to find a substring described by the regexp at a given starting point; what you have in mind is to _search_ a string for matches of a regular expression.

Python uses this terminology in its regexp matching functions, and from what you cited in the documentation so do Perl and PCRE in their docs.

Best,
Stefan
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question about PERL lookahead construct in regex's

Bert Gunter-2
Thank you.
That indeed dispels my brain fog!

Best,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Aug 12, 2020 at 6:35 AM Stefan Evert <[hidden email]>
wrote:

>
> > On 10 Aug 2020, at 18:36, Bert Gunter <[hidden email]> wrote:
> >
> > But this appears to be imprecise (it confused me, anyway). The usual
> sense
> > of "matching" in regex's is "match the pattern somewhere in the string
> > going forward." But in the perl lookahead construct it apparently must
> > **exactly** match *everything* in the string that follows.
> >
> > Questions:
> > Am I correct about this? If not, what do I misunderstand?
>
> I think you're confused about the terminology.  To _match_ a regular
> expression is to find a substring described by the regexp at a given
> starting point; what you have in mind is to _search_ a string for matches
> of a regular expression.
>
> Python uses this terminology in its regexp matching functions, and from
> what you cited in the documentation so do Perl and PCRE in their docs.
>
> Best,
> Stefan
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.