Quantcast

Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Paul Miller
Hello All,

Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like:

names(KRASyn)[grep("muta", names(KRASyn))]

The idea then would be to add together the various columns using code like:

KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])

What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational".

So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet.

Can anyone show me how to select the columns I need?

Thanks,

Paul

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

David Winsemius

On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:

> Hello All,
>
> Started out awhile ago trying to select columns in a dataframe whose  
> names contain some variation of the word "mutant" using code like:
>
> names(KRASyn)[grep("muta", names(KRASyn))]
>
> The idea then would be to add together the various columns using  
> code like:
>
> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>
> What I discovered though, is that this selects columns like  
> "nonmutated" and "unmutated" as well as columns like "mutated",  
> "mutation", and "mutational".
>
> So I'd like to know how to select columns that have some variation  
> of the word "mutant" without the "non" or the "un". I've been  
> looking around for an example of how to do that but haven't found  
> anything yet.
>
> Can anyone show me how to select the columns I need?

If you want only columns whose names _begin_ with "muta" then add the  
"^" character at the beginning of your pattern:

names(KRASyn)[grep("^muta", names(KRASyn))]

(This should be explained on the ?regex page.)

--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Paul Miller
Hello Dr. Winsemius,

Unfortunately, I also have terms like "krasmutated". So simply selecting words that start with "muta" won't work in this case.

Thanks,

Paul


--- On Mon, 4/23/12, David Winsemius <[hidden email]> wrote:

> From: David Winsemius <[hidden email]>
> Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
> To: "Paul Miller" <[hidden email]>
> Cc: [hidden email]
> Received: Monday, April 23, 2012, 11:16 AM
>
> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
>
> > Hello All,
> >
> > Started out awhile ago trying to select columns in a
> dataframe whose names contain some variation of the word
> "mutant" using code like:
> >
> > names(KRASyn)[grep("muta", names(KRASyn))]
> >
> > The idea then would be to add together the various
> columns using code like:
> >
> > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
> names(KRASyn))])
> >
> > What I discovered though, is that this selects columns
> like "nonmutated" and "unmutated" as well as columns like
> "mutated", "mutation", and "mutational".
> >
> > So I'd like to know how to select columns that have
> some variation of the word "mutant" without the "non" or the
> "un". I've been looking around for an example of how to do
> that but haven't found anything yet.
> >
> > Can anyone show me how to select the columns I need?
>
> If you want only columns whose names _begin_ with "muta"
> then add the "^" character at the beginning of your
> pattern:
>
> names(KRASyn)[grep("^muta", names(KRASyn))]
>
> (This should be explained on the ?regex page.)
>
> --
> David Winsemius, MD
> West Hartford, CT
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Bert Gunter
In reply to this post by Paul Miller
Below.

-- Bert

On Mon, Apr 23, 2012 at 9:10 AM, Paul Miller <[hidden email]> wrote:

> Hello All,
>
> Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like:
>
> names(KRASyn)[grep("muta", names(KRASyn))]
>
> The idea then would be to add together the various columns using code like:
>
> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>
> What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational".
>
> So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet.

You can't, because you have not provided a full specification of what
can be selected and what can't. Software can only do what you tell it
to -- it cannot read minds. Once you have provided a a complete and
accurate specification of inclusion/exclusion criteria, it should be
easy to write a regex procedure.

"The fault, dear Brutus, lies not in the stars but in ourselves."

-- Bert





>
> Can anyone show me how to select the columns I need?
>
> Thanks,
>
> Paul
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

David Winsemius
In reply to this post by Paul Miller

On Apr 23, 2012, at 12:25 PM, Paul Miller wrote:

> Hello Dr. Winsemius,
>
> Unfortunately, I also have terms like "krasmutated". So simply  
> selecting words that start with "muta" won't work in this case.

You are aware that negative indexing can be used with grep aren't you?

--
David.

>
> Thanks,
>
> Paul
>
>
> --- On Mon, 4/23/12, David Winsemius <[hidden email]> wrote:
>
>> From: David Winsemius <[hidden email]>
>> Subject: Re: [R] Selecting columns whose names contain "mutated"  
>> except when they also contain "non" or "un"
>> To: "Paul Miller" <[hidden email]>
>> Cc: [hidden email]
>> Received: Monday, April 23, 2012, 11:16 AM
>>
>> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
>>
>>> Hello All,
>>>
>>> Started out awhile ago trying to select columns in a
>> dataframe whose names contain some variation of the word
>> "mutant" using code like:
>>>
>>> names(KRASyn)[grep("muta", names(KRASyn))]
>>>
>>> The idea then would be to add together the various
>> columns using code like:
>>>
>>> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
>> names(KRASyn))])
>>>
>>> What I discovered though, is that this selects columns
>> like "nonmutated" and "unmutated" as well as columns like
>> "mutated", "mutation", and "mutational".
>>>
>>> So I'd like to know how to select columns that have
>> some variation of the word "mutant" without the "non" or the
>> "un". I've been looking around for an example of how to do
>> that but haven't found anything yet.
>>>
>>> Can anyone show me how to select the columns I need?
>>
>> If you want only columns whose names _begin_ with "muta"
>> then add the "^" character at the beginning of your
>> pattern:
>>
>> names(KRASyn)[grep("^muta", names(KRASyn))]
>>
>> (This should be explained on the ?regex page.)
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Bert Gunter
In reply to this post by Paul Miller
But maybe ... (see below)
-- Bert

On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller <[hidden email]> wrote:

> Hello Dr. Winsemius,
>
> Unfortunately, I also have terms like "krasmutated". So simply selecting words that start with "muta" won't work in this case.
>
> Thanks,
>
> Paul
>
>
> --- On Mon, 4/23/12, David Winsemius <[hidden email]> wrote:
>
>> From: David Winsemius <[hidden email]>
>> Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
>> To: "Paul Miller" <[hidden email]>
>> Cc: [hidden email]
>> Received: Monday, April 23, 2012, 11:16 AM
>>
>> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
>>
>> > Hello All,
>> >
>> > Started out awhile ago trying to select columns in a
>> dataframe whose names contain some variation of the word
>> "mutant" using code like:
>> >
>> > names(KRASyn)[grep("muta", names(KRASyn))]
>> >
>> > The idea then would be to add together the various
>> columns using code like:
>> >
>> > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
>> names(KRASyn))])
>> >
>> > What I discovered though, is that this selects columns
>> like "nonmutated" and "unmutated" as well as columns like
>> "mutated", "mutation", and "mutational".
>> >
>> > So I'd like to know how to select columns that have
>> some variation of the word "mutant" without the "non" or the
>> "un". I've been looking around for an example of how to do
>> that but haven't found anything yet.

If this **is** a complete specification then wouldn't simply:

x <- names(yourdataframe)
 grepl("muta",x) & !grepl("nonmuta|unmuta",x)

do it?

e.g.
> x <- c("nonmutated","unmutated","mutation","mutated","krasmutated")
> grepl("muta",x) & !grepl("nonmuta|unmuta",x)
[1] FALSE FALSE  TRUE  TRUE  TRUE

>> >
>> > Can anyone show me how to select the columns I need?
>>
>> If you want only columns whose names _begin_ with "muta"
>> then add the "^" character at the beginning of your
>> pattern:
>>
>> names(KRASyn)[grep("^muta", names(KRASyn))]
>>
>> (This should be explained on the ?regex page.)
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Paul Miller
Hi Bert,

Yes, code like:

x <- names(yourdataframe)
grepl("muta",x) & !grepl("nonmuta|unmuta",x)

works perfectly.

Thanks very much for your help.

Paul




--- On Mon, 4/23/12, Bert Gunter <[hidden email]> wrote:

> From: Bert Gunter <[hidden email]>
> Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
> To: "Paul Miller" <[hidden email]>
> Cc: "David Winsemius" <[hidden email]>, [hidden email]
> Received: Monday, April 23, 2012, 12:15 PM
> But maybe ... (see below)
> -- Bert
>
> On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller <[hidden email]>
> wrote:
> > Hello Dr. Winsemius,
> >
> > Unfortunately, I also have terms like "krasmutated". So
> simply selecting words that start with "muta" won't work in
> this case.
> >
> > Thanks,
> >
> > Paul
> >
> >
> > --- On Mon, 4/23/12, David Winsemius <[hidden email]>
> wrote:
> >
> >> From: David Winsemius <[hidden email]>
> >> Subject: Re: [R] Selecting columns whose names
> contain "mutated" except when they also contain "non" or
> "un"
> >> To: "Paul Miller" <[hidden email]>
> >> Cc: [hidden email]
> >> Received: Monday, April 23, 2012, 11:16 AM
> >>
> >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:
> >>
> >> > Hello All,
> >> >
> >> > Started out awhile ago trying to select
> columns in a
> >> dataframe whose names contain some variation of the
> word
> >> "mutant" using code like:
> >> >
> >> > names(KRASyn)[grep("muta", names(KRASyn))]
> >> >
> >> > The idea then would be to add together the
> various
> >> columns using code like:
> >> >
> >> > KRASyn$Mutant_comb <-
> rowSums(KRASyn[grep("muta",
> >> names(KRASyn))])
> >> >
> >> > What I discovered though, is that this selects
> columns
> >> like "nonmutated" and "unmutated" as well as
> columns like
> >> "mutated", "mutation", and "mutational".
> >> >
> >> > So I'd like to know how to select columns that
> have
> >> some variation of the word "mutant" without the
> "non" or the
> >> "un". I've been looking around for an example of
> how to do
> >> that but haven't found anything yet.
>
> If this **is** a complete specification then wouldn't
> simply:
>
> x <- names(yourdataframe)
>  grepl("muta",x) & !grepl("nonmuta|unmuta",x)
>
> do it?
>
> e.g.
> > x <-
> c("nonmutated","unmutated","mutation","mutated","krasmutated")
> > grepl("muta",x) & !grepl("nonmuta|unmuta",x)
> [1] FALSE FALSE  TRUE  TRUE  TRUE
>
> >> >
> >> > Can anyone show me how to select the columns I
> need?
> >>
> >> If you want only columns whose names _begin_ with
> "muta"
> >> then add the "^" character at the beginning of
> your
> >> pattern:
> >>
> >> names(KRASyn)[grep("^muta", names(KRASyn))]
> >>
> >> (This should be explained on the ?regex page.)
> >>
> >> --
> >> David Winsemius, MD
> >> West Hartford, CT
> >>
> >>
> >
> > ______________________________________________
> > [hidden email]
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

glsnow
In reply to this post by Paul Miller
Here is a method that uses negative look behind:

> tmp <- c('mutation','nonmutated','unmutated','verymutated','other')
> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
[1] 1 4

it looks for muta that is not immediatly preceeded by un or non (but
it would match "unusually mutated" since the un is not immediatly
befor the muta).

Hope this helps,

On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller <[hidden email]> wrote:

> Hello All,
>
> Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like:
>
> names(KRASyn)[grep("muta", names(KRASyn))]
>
> The idea then would be to add together the various columns using code like:
>
> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>
> What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational".
>
> So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet.
>
> Can anyone show me how to select the columns I need?
>
> Thanks,
>
> Paul
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Paul Miller
Hi Greg,

This is quite helpful. Not so good yet with regular expressions in general or Perl-like regular expressions. Found the help page though, and think I was able to determine how the code works as well as how I would select only instances where "muta" is preceeded by either "non" or "un".

> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))
[1] "mutation"    "nonmutated"  "unmutated"   "verymutated" "other"      

> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
[1] 1 4

> grep("(?!muta)non|un", tmp, perl=TRUE)
[1] 2 3

Did I get the second grep right?

If so, do you have any sense of why it seems to fail when I apply it to my data?

> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)])

Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) :
  'x' must be numeric

Thanks,

Paul

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

David Winsemius

On Apr 24, 2012, at 9:40 AM, Paul Miller wrote:

> Hi Greg,
>
> This is quite helpful. Not so good yet with regular expressions in  
> general or Perl-like regular expressions. Found the help page  
> though, and think I was able to determine how the code works as well  
> as how I would select only instances where "muta" is preceeded by  
> either "non" or "un".
>
>> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))
> [1] "mutation"    "nonmutated"  "unmutated"   "verymutated" "other"
>
>> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
> [1] 1 4
>
>> grep("(?!muta)non|un", tmp, perl=TRUE)
> [1] 2 3
>
> Did I get the second grep right?
>
> If so, do you have any sense of why it seems to fail when I apply it  
> to my data?
>
>> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un",  
>> names(KRASyn), perl=TRUE)])
>
> Error in rowSums() :
>  'x' must be numeric

The error message strongly suggests at least one non-numeric column.  
What does this return:

lapply( KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)],
               is.numeric)

--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Rui Barradas
In reply to this post by glsnow
Hello,

Greg Snow wrote
Here is a method that uses negative look behind:

> tmp <- c('mutation','nonmutated','unmutated','verymutated','other')
> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
[1] 1 4

it looks for muta that is not immediatly preceeded by un or non (but
it would match "unusually mutated" since the un is not immediatly
befor the muta).

Hope this helps,

On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller <[hidden email]> wrote:
> Hello All,
>
> Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like:
>
> names(KRASyn)[grep("muta", names(KRASyn))]
>
> The idea then would be to add together the various columns using code like:
>
> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))])
>
> What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational".
>
> So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet.
>
> Can anyone show me how to select the columns I need?
>
> Thanks,
>
> Paul
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Has anyone realized that both 'non' and 'un' end with the same letter? The only one we really need to check?

(tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))

i1 <- grepl("muta", tmp)
i2 <- grepl("nmuta", tmp)

tmp[i1 & !i2]


Now, not an answer to Greg's post, just convoluted.


(tmp <- c(tmp, 'permutation', 'commutation'))

cols <- list()
cols[[1]] <- grep("muta", tmp)
cols[[2]] <- grep("nmuta", tmp)
cols[[3]] <- grep("(per)|(com)muta", tmp)

Reduce(setdiff, cols)

Rui Barradas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Paul Miller
In reply to this post by David Winsemius
Hello Dr. Winsemius,

There was a non-numeric column. Thanks for helping me to see the obvious.

Paul

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Peter Dalgaard-2
In reply to this post by Rui Barradas

On Apr 24, 2012, at 19:15 , Rui Barradas wrote:

>
> Has anyone realized that both 'non' and 'un' end with the same letter? The
> only one we really need to check?
>
> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))
>
> i1 <- grepl("muta", tmp)
> i2 <- grepl("nmuta", tmp)
>
> tmp[i1 & !i2]
>


Yes, I was wondering why people were avoiding the obvious use of grepl(). I'm not too happy about the "nmuta" technique though: What about "deletionmutation" and such? Might as well do the safe(r) thing:

i2 <- grepl("unmuta", tmp) | grepl("nonmuta", tmp)

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

glsnow
In reply to this post by Paul Miller
Sorry I took so long getting back to this, but the paying job needs to
take priority.

The regular expression "(?<!un)(?<!non)muta"  looks for a string that
matches "muta" then looks at the characters immediately before it to
see if they match either "un" or "non" in which case it makes it a not
match.  More specifically the regular expression engine steps through
the string and at each point tries the match, so at a given point it
will first see if "un" is before that point, if it is then this point
can't match and it moves the checking point, if it is not "un" then it
moves to the next negative look behind and sees if "non" is just
before the point.  If neither "un" or "non" are just before the point
then it starts matching characters after the point to see if they
match "muta".

So the next pattern is "(?!muta)non|un", the (?!muta) is a negative
look ahead which starts at the point and checks forward to see that
the next characters are not "muta" (but does not include them in the
match), in this case it is a no-op because you are saying that you
want to match at a point where the next characters are not "muta" but
are "non"  and since the next set of characters cannot be both this is
the same as just matching "non", also you need to be aware of the
operator precedence, in that pattern the (?!muta) part only applied to
the "non", not the "un".

To match "nonmuta" or "unmuta" a simple pattern would just be
"(non|un)muta" or "(no|u)nmuta".  You could use the positive
lookbehind (you would still need an "or"), but it would be overkill
for a grep command.  The difference in the positive look ahead/behind
is more important for replacing where the look ahead/behind is needed
for the match to happen, but is not captured as part of the match to
be replaced.



On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller <[hidden email]> wrote:

> Hi Greg,
>
> This is quite helpful. Not so good yet with regular expressions in general or Perl-like regular expressions. Found the help page though, and think I was able to determine how the code works as well as how I would select only instances where "muta" is preceeded by either "non" or "un".
>
>> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other'))
> [1] "mutation"    "nonmutated"  "unmutated"   "verymutated" "other"
>
>> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
> [1] 1 4
>
>> grep("(?!muta)non|un", tmp, perl=TRUE)
> [1] 2 3
>
> Did I get the second grep right?
>
> If so, do you have any sense of why it seems to fail when I apply it to my data?
>
>> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)])
>
> Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) :
>  'x' must be numeric
>
> Thanks,
>
> Paul
>



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Martin Maechler
In reply to this post by David Winsemius
>>>>> David Winsemius <[hidden email]>
>>>>>     on Mon, 23 Apr 2012 12:16:39 -0400 writes:

    > On Apr 23, 2012, at 12:10 PM, Paul Miller wrote:

    >> Hello All,
    >>
    >> Started out awhile ago trying to select columns in a
    >> dataframe whose names contain some variation of the word
    >> "mutant" using code like:
    >>
    >> names(KRASyn)[grep("muta", names(KRASyn))]
    >>
    >> The idea then would be to add together the various
    >> columns using code like:
    >>
    >> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta",
    >> names(KRASyn))])
    >>
    >> What I discovered though, is that this selects columns
    >> like "nonmutated" and "unmutated" as well as columns like
    >> "mutated", "mutation", and "mutational".
    >>
    >> So I'd like to know how to select columns that have some
    >> variation of the word "mutant" without the "non" or the
    >> "un". I've been looking around for an example of how to
    >> do that but haven't found anything yet.
    >>
    >> Can anyone show me how to select the columns I need?

    > If you want only columns whose names _begin_ with "muta"
    > then add the "^" character at the beginning of your
    > pattern:

    > names(KRASyn)[grep("^muta", names(KRASyn))]

    > (This should be explained on the ?regex page.)

It *is* !    Search for "beginning" and you're there.
Martin

    > David Winsemius, MD West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Selecting columns whose names contain "mutated" except when they also contain "non" or "un"

Paul Miller
In reply to this post by glsnow
Hi Greg,

This is very helpful. Thanks for explaining it. I'm clearly going to need to improve my understanding of regular expressions. Currently busy trying to figure out Sweave and knitr though.

Paul

--- On Thu, 4/26/12, Greg Snow <[hidden email]> wrote:

> From: Greg Snow <[hidden email]>
> Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un"
> To: "Paul Miller" <[hidden email]>
> Cc: [hidden email]
> Received: Thursday, April 26, 2012, 1:55 PM
> Sorry I took so long getting back to
> this, but the paying job needs to
> take priority.
>
> The regular expression "(?<!un)(?<!non)muta" 
> looks for a string that
> matches "muta" then looks at the characters immediately
> before it to
> see if they match either "un" or "non" in which case it
> makes it a not
> match.  More specifically the regular expression engine
> steps through
> the string and at each point tries the match, so at a given
> point it
> will first see if "un" is before that point, if it is then
> this point
> can't match and it moves the checking point, if it is not
> "un" then it
> moves to the next negative look behind and sees if "non" is
> just
> before the point.  If neither "un" or "non" are just
> before the point
> then it starts matching characters after the point to see if
> they
> match "muta".
>
> So the next pattern is "(?!muta)non|un", the (?!muta) is a
> negative
> look ahead which starts at the point and checks forward to
> see that
> the next characters are not "muta" (but does not include
> them in the
> match), in this case it is a no-op because you are saying
> that you
> want to match at a point where the next characters are not
> "muta" but
> are "non"  and since the next set of characters cannot
> be both this is
> the same as just matching "non", also you need to be aware
> of the
> operator precedence, in that pattern the (?!muta) part only
> applied to
> the "non", not the "un".
>
> To match "nonmuta" or "unmuta" a simple pattern would just
> be
> "(non|un)muta" or "(no|u)nmuta".  You could use the
> positive
> lookbehind (you would still need an "or"), but it would be
> overkill
> for a grep command.  The difference in the positive
> look ahead/behind
> is more important for replacing where the look ahead/behind
> is needed
> for the match to happen, but is not captured as part of the
> match to
> be replaced.
>
>
>
> On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller <[hidden email]>
> wrote:
> > Hi Greg,
> >
> > This is quite helpful. Not so good yet with regular
> expressions in general or Perl-like regular expressions.
> Found the help page though, and think I was able to
> determine how the code works as well as how I would select
> only instances where "muta" is preceeded by either "non" or
> "un".
> >
> >> (tmp <-
> c('mutation','nonmutated','unmutated','verymutated','other'))
> > [1] "mutation"    "nonmutated"  "unmutated"  
> "verymutated" "other"
> >
> >> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE)
> > [1] 1 4
> >
> >> grep("(?!muta)non|un", tmp, perl=TRUE)
> > [1] 2 3
> >
> > Did I get the second grep right?
> >
> > If so, do you have any sense of why it seems to fail
> when I apply it to my data?
> >
> >> KRASyn$NonMutant_comb <-
> rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn),
> perl=TRUE)])
> >
> > Error in rowSums(KRASyn[grep("(?!muta)non|un",
> names(KRASyn), perl = TRUE)]) :
> >  'x' must be numeric
> >
> > Thanks,
> >
> > Paul
> >
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> [hidden email]
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...