Pull Stock Symbol Out of String

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Pull Stock Symbol Out of String

Sparks, John James
Dear R Helpers,

My regex skills are beginner to intermediate and banging around the web
has not resulted in a solution to the problem below so I hope that one of
you who has mad skills can help me out.

I want to extract the stock ticker--AMT-- out of the string

American Tower Corporation (REIT) (AMT)

The presence of the other parenthetical text (REIT) makes this difficult.
Please note that the string may or may not have a interfering set of
characters such as the (REIT) so the solution needs to be generalizable to
the last set of characters that are contained in parentheses in the larger
string.  So an example of a string without the interfering (REIT) would be

Aetna Inc. (AET)


Your assistance would be very much appreciated.

--John Sparks

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Pull Stock Symbol Out of String

William Dunlap
The following gets the last parenthesized sequence of non-parentheses
  > sub(".*(\\([^()]+\\))([^()]*)$", "\\1",
          c("Aetna(AET)",
             "American Tower Corp(REIT)(ATC)",
             "No Parens",
             "Qwerty Corp (ASD)(ZXC)(123) extra stuff"))
  [1] "(AET)"     "(ATC)"     "No Parens" "(123)"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Sparks, John James
> Sent: Tuesday, April 08, 2014 11:29 AM
> To: [hidden email]
> Subject: [R] Pull Stock Symbol Out of String
>
> Dear R Helpers,
>
> My regex skills are beginner to intermediate and banging around the web
> has not resulted in a solution to the problem below so I hope that one of
> you who has mad skills can help me out.
>
> I want to extract the stock ticker--AMT-- out of the string
>
> American Tower Corporation (REIT) (AMT)
>
> The presence of the other parenthetical text (REIT) makes this difficult.
> Please note that the string may or may not have a interfering set of
> characters such as the (REIT) so the solution needs to be generalizable to
> the last set of characters that are contained in parentheses in the larger
> string.  So an example of a string without the interfering (REIT) would be
>
> Aetna Inc. (AET)
>
>
> Your assistance would be very much appreciated.
>
> --John Sparks
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Pull Stock Symbol Out of String

Boris Steipe
In reply to this post by Sparks, John James
You could try:

# Use ?regexec and ?regmatches to return a list of grouped matches.
# Use \\(  and \\) to match literal parentheses.
# Use ... to match three characters.
# Use $ to match at end of string.

s1 <- "American Tower Corporation (REIT)Â (AMT)"
s2 <- "Aetna Inc. (AET)"
getSym <- function(s) {regmatches(s, regexec("\\((...)\\)$", s))[[1]][2]}

getSym(s1) # [1] "AMT"
getSym(s2) # [1] "AET"

Cheers,
B.




On 2014-04-08, at 2:29 PM, Sparks, John James wrote:

> Dear R Helpers,
>
> My regex skills are beginner to intermediate and banging around the web
> has not resulted in a solution to the problem below so I hope that one of
> you who has mad skills can help me out.
>
> I want to extract the stock ticker--AMT-- out of the string
>
> American Tower Corporation (REIT)Â (AMT)
>
> The presence of the other parenthetical text (REIT) makes this difficult.
> Please note that the string may or may not have a interfering set of
> characters such as the (REIT) so the solution needs to be generalizable to
> the last set of characters that are contained in parentheses in the larger
> string.  So an example of a string without the interfering (REIT) would be
>
> Aetna Inc. (AET)
>
>
> Your assistance would be very much appreciated.
>
> --John Sparks
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Pull Stock Symbol Out of String

arun kirshna
In reply to this post by Sparks, John James
Hi,
You may try:
library(qdap)
str1 <- c("American Tower Corporation (REIT) (AMT)", "Aetna Inc. (AET)")
unlist(lapply(bracketXtract(str1,"round"),tail,1),use.names=F)
#[1] "AMT" "AET"

A.K.


On Tuesday, April 8, 2014 7:48 PM, "Sparks, John James" <[hidden email]> wrote:
Dear R Helpers,

My regex skills are beginner to intermediate and banging around the web
has not resulted in a solution to the problem below so I hope that one of
you who has mad skills can help me out.

I want to extract the stock ticker--AMT-- out of the string

American Tower Corporation (REIT) (AMT)

The presence of the other parenthetical text (REIT) makes this difficult.
Please note that the string may or may not have a interfering set of
characters such as the (REIT) so the solution needs to be generalizable to
the last set of characters that are contained in parentheses in the larger
string.  So an example of a string without the interfering (REIT) would be

Aetna Inc. (AET)


Your assistance would be very much appreciated.

--John Sparks

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.