how to extract word before /// in a data frame contain many thousands rows.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

how to extract word before /// in a data frame contain many thousands rows.

Stephen HK Wong
Dear All,

I appreciate if you can help me out this. I have a data frame contains many thousand of rows, with some rows that has /// symbol,  as shown in in row 2, I want to extract word before ///, such as in this case, CDH23. Many thanks.
Probe.Set.ID            Gene.Symbol
1  1552301_a_at                  CORO6
2  1552436_a_at CDH23 /// LOC100653137
3  1552477_a_at                   IRF6
4  1552685_a_at                  GRHL1
5    1552742_at                  KCNH8
6  1552752_a_at                  CADM2
7    1552799_at                TSNARE1
8  1552897_a_at                  KCNG3
9  1552902_a_at                  FOXP2
10   1552903_at               B4GALNT2


structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at",
"1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at",
"1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"),
    Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6",
    "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2"
    )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA,
10L), class = "data.frame")


Stephen HK Wong

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract word before /// in a data frame contain many thousands rows.

arun kirshna
Try:
If dat is the dataset.


   library(stringr)
    res <- str_extract(dat$Gene.Symbol, perl('[[:alnum:]]+(?= \\/)'))
 res[!is.na(res)]
 #[1] "CDH23"

A.K.




On Thursday, July 31, 2014 9:54 PM, Stephen HK Wong <[hidden email]> wrote:
Dear All,

I appreciate if you can help me out this. I have a data frame contains many thousand of rows, with some rows that has /// symbol,  as shown in in row 2, I want to extract word before ///, such as in this case, CDH23. Many thanks.
Probe.Set.ID            Gene.Symbol
1  1552301_a_at                  CORO6
2  1552436_a_at CDH23 /// LOC100653137
3  1552477_a_at                   IRF6
4  1552685_a_at                  GRHL1
5    1552742_at                  KCNH8
6  1552752_a_at                  CADM2
7    1552799_at                TSNARE1
8  1552897_a_at                  KCNG3
9  1552902_a_at                  FOXP2
10   1552903_at               B4GALNT2


structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at",
"1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at",
"1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"),
    Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6",
    "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2"
    )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA,
10L), class = "data.frame")


Stephen HK Wong

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to extract word before /// in a data frame contain many thousands rows.

Uwe Ligges-3


On 01.08.2014 07:28, arun wrote:
> Try:
> If dat is the dataset.
>
>
>     library(stringr)
>      res <- str_extract(dat$Gene.Symbol, perl('[[:alnum:]]+(?= \\/)'))
>   res[!is.na(res)]
>   #[1] "CDH23"


Or without additional packages and if you want to keep all information
from the other rows of your data.frame:

gsub(" *///.*$", "", dat$Gene.Symbol)

Best,
Uwe Ligges


> A.K.
>
>
>
>
> On Thursday, July 31, 2014 9:54 PM, Stephen HK Wong <[hidden email]> wrote:
> Dear All,
>
> I appreciate if you can help me out this. I have a data frame contains many thousand of rows, with some rows that has /// symbol,  as shown in in row 2, I want to extract word before ///, such as in this case, CDH23. Many thanks.
> Probe.Set.ID            Gene.Symbol
> 1  1552301_a_at                  CORO6
> 2  1552436_a_at CDH23 /// LOC100653137
> 3  1552477_a_at                   IRF6
> 4  1552685_a_at                  GRHL1
> 5    1552742_at                  KCNH8
> 6  1552752_a_at                  CADM2
> 7    1552799_at                TSNARE1
> 8  1552897_a_at                  KCNG3
> 9  1552902_a_at                  FOXP2
> 10   1552903_at               B4GALNT2
>
>
> structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at",
> "1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at",
> "1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"),
>      Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6",
>      "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2"
>      )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA,
> 10L), class = "data.frame")
>
>
> Stephen HK Wong
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.