Extract Element of String with R's Regex

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Extract Element of String with R's Regex

foolishbrat
Hi,

I have this string, in which I want to extract some of it's element:

> x <- "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL= -963.669 -965.35"

yielding this array

[1] "211952_at"  "RANBP5" "2"



In Perl we would do it this way:

__BEGIN__
my @needed =();
my $str = "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL=
-963.669 -965.35";
$str =~ /Best-K Gene \d+ (\w+) (\w+) Noc= \d - (\d) LL= (.*)/;
push @needed, ($1,$2,$3);
__END___

How can we achieve this with R?

 - E.W.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extract Element of String with R's Regex

Simon Blomberg-4
How about:

unlist(strsplit(x, split=" "))[c(4:5,10)]

That perl script looks like a good reason to avoid perl.

Simon.

On Fri, 2008-08-01 at 15:13 +0900, Edward Wijaya wrote:

> Hi,
>
> I have this string, in which I want to extract some of it's element:
>
> > x <- "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL= -963.669 -965.35"
>
> yielding this array
>
> [1] "211952_at"  "RANBP5" "2"
>
>
>
> In Perl we would do it this way:
>
> __BEGIN__
> my @needed =();
> my $str = "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL=
> -963.669 -965.35";
> $str =~ /Best-K Gene \d+ (\w+) (\w+) Noc= \d - (\d) LL= (.*)/;
> push @needed, ($1,$2,$3);
> __END___
>
> How can we achieve this with R?
>
>  - E.W.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extract Element of String with R's Regex

Stephen Tucker
In reply to this post by foolishbrat
In the example below, a straight application of strsplit() is probably the simplest solution. In a more general case where it may be desirable to match patterns, a combination of sub() or gsub() with strsplit() might do the trick:

> x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35"
> patt <- "Best-K Gene \\d+ (\\w+) (\\w+) Noc= \\d - (\\d) LL= (.*)"

> unlist(strsplit(gsub(patt,"\\1,\\2,\\3",x,perl=TRUE),","))
[1] "211952_at" "RANBP5"    "2"  

Alternatively, you may want to take a look at the gsubfn package - it is quite useful. Still learning to use it myself...

> library(gsubfn)
> unlist(strapply(x,patt,function(x1,x2,x3) c(x1,x2,x3),backref=-3,perl=TRUE))
[1] "211952_at" "RANBP5"    "2"  





----- Original Message ----
From: Simon Blomberg <[hidden email]>
To: Edward Wijaya <[hidden email]>
Cc: [hidden email]
Sent: Thursday, July 31, 2008 11:48:23 PM
Subject: Re: [R] Extract Element of String with R's Regex

How about:

unlist(strsplit(x, split=" "))[c(4:5,10)]

That perl script looks like a good reason to avoid perl.

Simon.

On Fri, 2008-08-01 at 15:13 +0900, Edward Wijaya wrote:

> Hi,
>
> I have this string, in which I want to extract some of it's element:
>
> > x <- "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL= -963.669 -965.35"
>
> yielding this array
>
> [1] "211952_at"  "RANBP5" "2"
>
>
>
> In Perl we would do it this way:
>
> __BEGIN__
> my @needed =();
> my $str = "Best-K Gene 11340 211952_at RANBP5  Noc= 3 - 2  LL=
> -963.669 -965.35";
> $str =~ /Best-K Gene \d+ (\w+) (\w+) Noc= \d - (\d) LL= (.*)/;
> push @needed, ($1,$2,$3);
> __END___
>
> How can we achieve this with R?
>
>  - E.W.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Extract Element of String with R's Regex

Gabor Grothendieck
On Fri, Aug 1, 2008 at 7:31 AM, Stephen Tucker <[hidden email]> wrote:

> In the example below, a straight application of strsplit() is probably the simplest solution. In a more general case where it may be desirable to match patterns, a combination of sub() or gsub() with strsplit() might do the trick:
>
>> x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35"
>> patt <- "Best-K Gene \\d+ (\\w+) (\\w+) Noc= \\d - (\\d) LL= (.*)"
>
>> unlist(strsplit(gsub(patt,"\\1,\\2,\\3",x,perl=TRUE),","))
> [1] "211952_at" "RANBP5"    "2"
>
> Alternatively, you may want to take a look at the gsubfn package - it is quite useful. Still learning to use it myself...
>
>> library(gsubfn)
>> unlist(strapply(x,patt,function(x1,x2,x3) c(x1,x2,x3),backref=-3,perl=TRUE))
> [1] "211952_at" "RANBP5"    "2"
>

This last one can be slightly simplified:

> strapply(x, re, c, backref = -3, perl = TRUE)[[1]]
[1] "211952_at" "RANBP5"    "2"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.