reg expr that retains only bracketed text from strings

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

reg expr that retains only bracketed text from strings

Nevil Amos
Hi

I am trying to extract only the text contained in brackets from a vector of
strings
not all of the strings contain closed bracketed text, they should return an
empty string or NA

this is what I have at the moment


mystrings<-c("ABC","A(B)C","AB(C)")

substring(mystrings, regexpr("\\(|\\)", mystrings))


#this returns the whole string  if there are no brackets.
[1] "ABC"  "(B)C" "(C)"


# my desired desired output:
#    [1]  ""  "(B)" "(C)"

many thanks for any suggestions
Nevil Amos

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reg expr that retains only bracketed text from strings

Eric Berger
Hi Nevil,
Here's one way to do it. (No doubt some regular-expression-gurus will have
more concise ways to get the job done.)

a1 <- sub(".*\\(","\\(",mystrings)
a2 <- sub("\\).*","\\)",a1)
a2[grep("\\(",a2,invert=TRUE)] <- ""
a2

HTH,
Eric




On Wed, Jun 12, 2019 at 8:46 AM nevil amos <[hidden email]> wrote:

> Hi
>
> I am trying to extract only the text contained in brackets from a vector of
> strings
> not all of the strings contain closed bracketed text, they should return an
> empty string or NA
>
> this is what I have at the moment
>
>
> mystrings<-c("ABC","A(B)C","AB(C)")
>
> substring(mystrings, regexpr("\\(|\\)", mystrings))
>
>
> #this returns the whole string  if there are no brackets.
> [1] "ABC"  "(B)C" "(C)"
>
>
> # my desired desired output:
> #    [1]  ""  "(B)" "(C)"
>
> many thanks for any suggestions
> Nevil Amos
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reg expr that retains only bracketed text from strings

Ivan Krylov
In reply to this post by Nevil Amos
On Wed, 12 Jun 2019 15:45:04 +1000
nevil amos <[hidden email]> wrote:

> # my desired desired output:
> #    [1]  ""  "(B)" "(C)"

(function(s) regmatches(
        s,
        gregexpr('\\([^)]+\\)', s)
))(c("ABC","A(B)C","AB(C)"))
# [[1]]
# character(0)
#
# [[2]]
# [1] "(B)"
#
# [[3]]
# [1] "(C)"

This matches all substrings that start with an ( and are followed by
non-zero amount of non-) characters, then terminated by ). If there are
multiple such substrings, all are returned.

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reg expr that retains only bracketed text from strings

R help mailing list-2
In reply to this post by Nevil Amos
strcapture() can help here.

> mystrings<-c("ABC","A(B)C","AB(C)")
> strcapture("^[^{]*(\\([^(]*\\)).*$", mystrings,
proto=data.frame(InParen=""))
  InParen
1    <NA>
2     (B)
3     (C)

Classic regular expressions don't do so well with nested parentheses.
Perhaps a perl-style RE could do that.
> strcapture("^[^{]*(\\([^(]*\\)).*$", proto=data.frame(InParen=""),
x=c("()", "a(s(d)f)g"))
  InParen
1      ()
2   (d)f)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Jun 11, 2019 at 10:46 PM nevil amos <[hidden email]> wrote:

> Hi
>
> I am trying to extract only the text contained in brackets from a vector of
> strings
> not all of the strings contain closed bracketed text, they should return an
> empty string or NA
>
> this is what I have at the moment
>
>
> mystrings<-c("ABC","A(B)C","AB(C)")
>
> substring(mystrings, regexpr("\\(|\\)", mystrings))
>
>
> #this returns the whole string  if there are no brackets.
> [1] "ABC"  "(B)C" "(C)"
>
>
> # my desired desired output:
> #    [1]  ""  "(B)" "(C)"
>
> many thanks for any suggestions
> Nevil Amos
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reg expr that retains only bracketed text from strings

Jim Lemon-4
Hi Nevil,
In case you are still having trouble with this, I wrote something in R
that should do what you want:

mystrings<-c("ABC","A(B)C","AB[C]","<A>BC","{AB}C")

get_enclosed<-function(x,left=c("(","[","<","{"),right=c(")","]",">","}")) {
 newx<-rep("",length(x))
 for(li in 1:length(left)) {
  for(xi in 1:length(x)) {
   lp<-regexpr(left[li],x[xi],fixed=TRUE)
   rp<-regexpr(right[li],x[xi],fixed=TRUE)
   if(lp > 0 && rp > 0)
    newx[xi]<-substr(x[xi],lp+1,rp-1)
  }
 }
 return(newx)
}
get_enclosed(mystrings)

Jim

On Thu, Jun 13, 2019 at 12:32 AM William Dunlap via R-help
<[hidden email]> wrote:

>
> strcapture() can help here.
>
> > mystrings<-c("ABC","A(B)C","AB(C)")
> > strcapture("^[^{]*(\\([^(]*\\)).*$", mystrings,
> proto=data.frame(InParen=""))
>   InParen
> 1    <NA>
> 2     (B)
> 3     (C)
>
> Classic regular expressions don't do so well with nested parentheses.
> Perhaps a perl-style RE could do that.
> > strcapture("^[^{]*(\\([^(]*\\)).*$", proto=data.frame(InParen=""),
> x=c("()", "a(s(d)f)g"))
>   InParen
> 1      ()
> 2   (d)f)
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Tue, Jun 11, 2019 at 10:46 PM nevil amos <[hidden email]> wrote:
>
> > Hi
> >
> > I am trying to extract only the text contained in brackets from a vector of
> > strings
> > not all of the strings contain closed bracketed text, they should return an
> > empty string or NA
> >
> > this is what I have at the moment
> >
> >
> > mystrings<-c("ABC","A(B)C","AB(C)")
> >
> > substring(mystrings, regexpr("\\(|\\)", mystrings))
> >
> >
> > #this returns the whole string  if there are no brackets.
> > [1] "ABC"  "(B)C" "(C)"
> >
> >
> > # my desired desired output:
> > #    [1]  ""  "(B)" "(C)"
> >
> > many thanks for any suggestions
> > Nevil Amos
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.