Better use with gsub

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Better use with gsub

Doran, Harold
I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx.

Any regex users with advice most welcome

Harold

xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
yy <- gsub("S","\\1", xx)
a1 <- gsub(":"," ", yy)
a2 <- sapply(a1, function(x) strsplit(x, ' '))
x <- as.numeric(sapply(a2, function(x) x[1]))
y <- as.numeric(sapply(a2, function(x) x[2]))

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Better use with gsub

Gabor Grothendieck
On Fri, Aug 1, 2014 at 10:46 AM, Doran, Harold <[hidden email]> wrote:

> I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx.
>
> Any regex users with advice most welcome
>
> Harold
>
> xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
> yy <- gsub("S","\\1", xx)
> a1 <- gsub(":"," ", yy)
> a2 <- sapply(a1, function(x) strsplit(x, ' '))
> x <- as.numeric(sapply(a2, function(x) x[1]))
> y <- as.numeric(sapply(a2, function(x) x[2]))


> library(gsubfn)
> strapply(xx, "\\d+", as.numeric, simplify = TRUE)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   24   24   24   24   24   24
[2,]   57   86  119  129  138  163


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Better use with gsub

arun kirshna
In reply to this post by Doran, Harold


You could try:
library(stringr)
 
simplify2array(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')))
     [,1] [,2] [,3]  [,4]  [,5]  [,6]
[1,] "24" "24" "24"  "24"  "24"  "24"
[2,] "57" "86" "119" "129" "138" "163"
A.K.

On Friday, August 1, 2014 10:49 AM, "Doran, Harold" <[hidden email]> wrote:
I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx.

Any regex users with advice most welcome

Harold

xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
yy <- gsub("S","\\1", xx)
a1 <- gsub(":"," ", yy)
a2 <- sapply(a1, function(x) strsplit(x, ' '))
x <- as.numeric(sapply(a2, function(x) x[1]))
y <- as.numeric(sapply(a2, function(x) x[2]))

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Better use with gsub

arun kirshna
Forgot about as.numeric.

 sapply(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')),as.numeric)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   24   24   24   24   24   24
[2,]   57   86  119  129  138  163






On Friday, August 1, 2014 10:59 AM, arun <[hidden email]> wrote:


You could try:
library(stringr)
 
simplify2array(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')))
     [,1] [,2] [,3]  [,4]  [,5]  [,6]
[1,] "24" "24" "24"  "24"  "24"  "24"
[2,] "57" "86" "119" "129" "138" "163"
A.K.




On Friday, August 1, 2014 10:49 AM, "Doran, Harold" <[hidden email]> wrote:
I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx.

Any regex users with advice most welcome

Harold

xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
yy <- gsub("S","\\1", xx)
a1 <- gsub(":"," ", yy)
a2 <- sapply(a1, function(x) strsplit(x, ' '))
x <- as.numeric(sapply(a2, function(x) x[1]))
y <- as.numeric(sapply(a2, function(x) x[2]))

    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Better use with gsub

Marc Schwartz-3
In reply to this post by Doran, Harold
On Aug 1, 2014, at 9:46 AM, Doran, Harold <[hidden email]> wrote:

> I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx.
>
> Any regex users with advice most welcome
>
> Harold
>
> xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
> yy <- gsub("S","\\1", xx)
> a1 <- gsub(":"," ", yy)
> a2 <- sapply(a1, function(x) strsplit(x, ' '))
> x <- as.numeric(sapply(a2, function(x) x[1]))
> y <- as.numeric(sapply(a2, function(x) x[2]))


If a matrix is a satisfactory result, rather than two separate vectors:

> sapply(strsplit(gsub("S", "", xx), xx, split = ":"), as.numeric)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   24   24   24   24   24   24
[2,]   57   86  119  129  138  163


Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Better use with gsub

Marek Szatkowski
In reply to this post by Doran, Harold
How about:

x <- as.numeric(sub("^S([0-9]+):([0-9]+)$", "\\1", xx))
y <- as.numeric(sub("^S([0-9]+):([0-9]+)$", "\\2", xx))



2014-08-01 16:46 GMT+02:00 Doran, Harold <[hidden email]>:

> I have done an embarrassingly bad job using a mixture of gsub and strsplit
> to solve a problem. Below is sample code showing what I have to start with
> (the vector xx) and I want to end up with two vectors x and y that contain
> only the digits found in xx.
>
> Any regex users with advice most welcome
>
> Harold
>
> xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",
>  "S24:163")
> yy <- gsub("S","\\1", xx)
> a1 <- gsub(":"," ", yy)
> a2 <- sapply(a1, function(x) strsplit(x, ' '))
> x <- as.numeric(sapply(a2, function(x) x[1]))
> y <- as.numeric(sapply(a2, function(x) x[2]))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.