string handling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

string handling

karena
I have a data.frame as the following:
var1        var2
9G/G09    abd89C/T90
10A/T9    32C/C
90G/G      A/A
.             .
.             .
.             .
10T/C      00G/G90

What I want is to get the letters which are on the left and right of '/'.  for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I only want "C" and "T", how to get these?

thank you,

karena
Reply | Threaded
Open this post in threaded view
|

Re: string handling

jholtman
try this:

> x <- "1234C/Tasdf"
> y <- strsplit(sub("^.*(.)/(.).*", "\\1 \\2", x),' ')[[1]]
> y
[1] "C" "T"
>


On Thu, Jun 3, 2010 at 2:18 PM, karena <[hidden email]> wrote:

>
> I have a data.frame as the following:
> var1        var2
> 9G/G09    abd89C/T90
> 10A/T9    32C/C
> 90G/G      A/A
> .             .
> .             .
> .             .
> 10T/C      00G/G90
>
> What I want is to get the letters which are on the left and right of '/'.
> for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I
> only want "C" and "T", how to get these?
>
> thank you,
>
> karena
> --
> View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: string handling

Wu Gong
In reply to this post by karena
Hope it helps.

text <- "var1        var2
9G/G09    abd89C/T90
10A/T9    32C/C
90G/G      A/A"

x <- read.table(textConnection(text), header = T)

x$var1.1 <- sub(".*(.)/.*", "\\1", x$var1)
x$var1.2 <- sub(".*/(.).*", "\\1", x$var1)
x$var2.1 <- sub(".*(.)/.*", "\\1", x$var2)
x$var2.2 <- sub(".*/(.).*", "\\1", x$var2)
Reply | Threaded
Open this post in threaded view
|

Re: string handling

Hadley Wickham-2
On Thu, Jun 3, 2010 at 4:06 PM, Wu Gong <[hidden email]> wrote:
>
> Hope it helps.
>
> text <- "var1        var2
> 9G/G09    abd89C/T90
> 10A/T9    32C/C
> 90G/G      A/A"
>
> x <- read.table(textConnection(text), header = T)

Or with the stringr package:

library(stringr)
str_match(x$var1, "(.)/(.)")

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: string handling

Gabor Grothendieck
In reply to this post by karena
This solution using strapply in gsubfn is along the same lines as the
stringr solution.  First we read in the data using as.is = TRUE so
that we get character rather than factor columns.  On the other hand,
if your data is already in columns with class factor then just replace
strappy(x, ...) with strapply(as.character(x), ...) below.   Then
lapply over the columns of DF using strapply on each one.    See home
page at http://gsubfn.googlecode.com for more.

> Lines <- "var1        var2
+ 9G/G09    abd89C/T90
+ 10A/T9    32C/C
+ 90G/G      A/A"
>
> library(gsubfn)
> DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE)
> lapply(DF, function(x) strapply(x, "(.)/(.)", c, simplify = rbind))
$var1
     [,1] [,2]
[1,] "G"  "G"
[2,] "A"  "T"
[3,] "G"  "G"

$var2
     [,1] [,2]
[1,] "C"  "T"
[2,] "C"  "C"
[3,] "A"  "A"


Also a slight simplification is possible using gsubfn's capability of
representing a one line function as a formula.  We just preface lapply
with fn$ and then formulas appearing in the arguments (subject to
certain rules) are interpreted as functions.  Here, the formula in the
second argument to lapply is interpreted as the anonymous function we
used above:

> fn$lapply(DF, x ~ strapply(x, "(.)/(.)", c, simplify = rbind))
$var1
     [,1] [,2]
[1,] "G"  "G"
[2,] "A"  "T"
[3,] "G"  "G"

$var2
     [,1] [,2]
[1,] "C"  "T"
[2,] "C"  "C"
[3,] "A"  "A"

On Thu, Jun 3, 2010 at 2:18 PM, karena <[hidden email]> wrote:

>
> I have a data.frame as the following:
> var1        var2
> 9G/G09    abd89C/T90
> 10A/T9    32C/C
> 90G/G      A/A
> .             .
> .             .
> .             .
> 10T/C      00G/G90
>
> What I want is to get the letters which are on the left and right of '/'.
> for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I
> only want "C" and "T", how to get these?
>
> thank you,
>
> karena
> --
> View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: string handling

karena
In reply to this post by jholtman
Thank you guys very much, these help!!
Reply | Threaded
Open this post in threaded view
|

Re: string handling

Gabor Grothendieck
In reply to this post by Gabor Grothendieck
Here is a slightly simpler variant of the strapply solution:

> lapply(DF, strapply, "(.)/(.)", c, simplify = rbind)
$var1
     [,1] [,2]
[1,] "G"  "G"
[2,] "A"  "T"
[3,] "G"  "G"

$var2
     [,1] [,2]
[1,] "C"  "T"
[2,] "C"  "C"
[3,] "A"  "A"


On Fri, Jun 4, 2010 at 8:08 AM, Gabor Grothendieck
<[hidden email]> wrote:

> This solution using strapply in gsubfn is along the same lines as the
> stringr solution.  First we read in the data using as.is = TRUE so
> that we get character rather than factor columns.  On the other hand,
> if your data is already in columns with class factor then just replace
> strappy(x, ...) with strapply(as.character(x), ...) below.   Then
> lapply over the columns of DF using strapply on each one.    See home
> page at http://gsubfn.googlecode.com for more.
>
>> Lines <- "var1        var2
> + 9G/G09    abd89C/T90
> + 10A/T9    32C/C
> + 90G/G      A/A"
>>
>> library(gsubfn)
>> DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE)
>> lapply(DF, function(x) strapply(x, "(.)/(.)", c, simplify = rbind))
> $var1
>     [,1] [,2]
> [1,] "G"  "G"
> [2,] "A"  "T"
> [3,] "G"  "G"
>
> $var2
>     [,1] [,2]
> [1,] "C"  "T"
> [2,] "C"  "C"
> [3,] "A"  "A"
>
>
> Also a slight simplification is possible using gsubfn's capability of
> representing a one line function as a formula.  We just preface lapply
> with fn$ and then formulas appearing in the arguments (subject to
> certain rules) are interpreted as functions.  Here, the formula in the
> second argument to lapply is interpreted as the anonymous function we
> used above:
>
>> fn$lapply(DF, x ~ strapply(x, "(.)/(.)", c, simplify = rbind))
> $var1
>     [,1] [,2]
> [1,] "G"  "G"
> [2,] "A"  "T"
> [3,] "G"  "G"
>
> $var2
>     [,1] [,2]
> [1,] "C"  "T"
> [2,] "C"  "C"
> [3,] "A"  "A"
>
> On Thu, Jun 3, 2010 at 2:18 PM, karena <[hidden email]> wrote:
>>
>> I have a data.frame as the following:
>> var1        var2
>> 9G/G09    abd89C/T90
>> 10A/T9    32C/C
>> 90G/G      A/A
>> .             .
>> .             .
>> .             .
>> 10T/C      00G/G90
>>
>> What I want is to get the letters which are on the left and right of '/'.
>> for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I
>> only want "C" and "T", how to get these?
>>
>> thank you,
>>
>> karena
>> --
>> View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.