|
Hello All,
Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: names(KRASyn)[grep("muta", names(KRASyn))] The idea then would be to add together the various columns using code like: KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. Can anyone show me how to select the columns I need? Thanks, Paul ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: > Hello All, > > Started out awhile ago trying to select columns in a dataframe whose > names contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using > code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like > "nonmutated" and "unmutated" as well as columns like "mutated", > "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation > of the word "mutant" without the "non" or the "un". I've been > looking around for an example of how to do that but haven't found > anything yet. > > Can anyone show me how to select the columns I need? If you want only columns whose names _begin_ with "muta" then add the "^" character at the beginning of your pattern: names(KRASyn)[grep("^muta", names(KRASyn))] (This should be explained on the ?regex page.) -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hello Dr. Winsemius,
Unfortunately, I also have terms like "krasmutated". So simply selecting words that start with "muta" won't work in this case. Thanks, Paul --- On Mon, 4/23/12, David Winsemius <[hidden email]> wrote: > From: David Winsemius <[hidden email]> > Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un" > To: "Paul Miller" <[hidden email]> > Cc: [hidden email] > Received: Monday, April 23, 2012, 11:16 AM > > On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: > > > Hello All, > > > > Started out awhile ago trying to select columns in a > dataframe whose names contain some variation of the word > "mutant" using code like: > > > > names(KRASyn)[grep("muta", names(KRASyn))] > > > > The idea then would be to add together the various > columns using code like: > > > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", > names(KRASyn))]) > > > > What I discovered though, is that this selects columns > like "nonmutated" and "unmutated" as well as columns like > "mutated", "mutation", and "mutational". > > > > So I'd like to know how to select columns that have > some variation of the word "mutant" without the "non" or the > "un". I've been looking around for an example of how to do > that but haven't found anything yet. > > > > Can anyone show me how to select the columns I need? > > If you want only columns whose names _begin_ with "muta" > then add the "^" character at the beginning of your > pattern: > > names(KRASyn)[grep("^muta", names(KRASyn))] > > (This should be explained on the ?regex page.) > > -- > David Winsemius, MD > West Hartford, CT > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Paul Miller
Below.
-- Bert On Mon, Apr 23, 2012 at 9:10 AM, Paul Miller <[hidden email]> wrote: > Hello All, > > Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. You can't, because you have not provided a full specification of what can be selected and what can't. Software can only do what you tell it to -- it cannot read minds. Once you have provided a a complete and accurate specification of inclusion/exclusion criteria, it should be easy to write a regex procedure. "The fault, dear Brutus, lies not in the stars but in ourselves." -- Bert > > Can anyone show me how to select the columns I need? > > Thanks, > > Paul > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Paul Miller
On Apr 23, 2012, at 12:25 PM, Paul Miller wrote: > Hello Dr. Winsemius, > > Unfortunately, I also have terms like "krasmutated". So simply > selecting words that start with "muta" won't work in this case. You are aware that negative indexing can be used with grep aren't you? -- David. > > Thanks, > > Paul > > > --- On Mon, 4/23/12, David Winsemius <[hidden email]> wrote: > >> From: David Winsemius <[hidden email]> >> Subject: Re: [R] Selecting columns whose names contain "mutated" >> except when they also contain "non" or "un" >> To: "Paul Miller" <[hidden email]> >> Cc: [hidden email] >> Received: Monday, April 23, 2012, 11:16 AM >> >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: >> >>> Hello All, >>> >>> Started out awhile ago trying to select columns in a >> dataframe whose names contain some variation of the word >> "mutant" using code like: >>> >>> names(KRASyn)[grep("muta", names(KRASyn))] >>> >>> The idea then would be to add together the various >> columns using code like: >>> >>> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", >> names(KRASyn))]) >>> >>> What I discovered though, is that this selects columns >> like "nonmutated" and "unmutated" as well as columns like >> "mutated", "mutation", and "mutational". >>> >>> So I'd like to know how to select columns that have >> some variation of the word "mutant" without the "non" or the >> "un". I've been looking around for an example of how to do >> that but haven't found anything yet. >>> >>> Can anyone show me how to select the columns I need? >> >> If you want only columns whose names _begin_ with "muta" >> then add the "^" character at the beginning of your >> pattern: >> >> names(KRASyn)[grep("^muta", names(KRASyn))] >> >> (This should be explained on the ?regex page.) >> >> -- >> David Winsemius, MD >> West Hartford, CT >> >> David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Paul Miller
But maybe ... (see below)
-- Bert On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller <[hidden email]> wrote: > Hello Dr. Winsemius, > > Unfortunately, I also have terms like "krasmutated". So simply selecting words that start with "muta" won't work in this case. > > Thanks, > > Paul > > > --- On Mon, 4/23/12, David Winsemius <[hidden email]> wrote: > >> From: David Winsemius <[hidden email]> >> Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un" >> To: "Paul Miller" <[hidden email]> >> Cc: [hidden email] >> Received: Monday, April 23, 2012, 11:16 AM >> >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: >> >> > Hello All, >> > >> > Started out awhile ago trying to select columns in a >> dataframe whose names contain some variation of the word >> "mutant" using code like: >> > >> > names(KRASyn)[grep("muta", names(KRASyn))] >> > >> > The idea then would be to add together the various >> columns using code like: >> > >> > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", >> names(KRASyn))]) >> > >> > What I discovered though, is that this selects columns >> like "nonmutated" and "unmutated" as well as columns like >> "mutated", "mutation", and "mutational". >> > >> > So I'd like to know how to select columns that have >> some variation of the word "mutant" without the "non" or the >> "un". I've been looking around for an example of how to do >> that but haven't found anything yet. If this **is** a complete specification then wouldn't simply: x <- names(yourdataframe) grepl("muta",x) & !grepl("nonmuta|unmuta",x) do it? e.g. > x <- c("nonmutated","unmutated","mutation","mutated","krasmutated") > grepl("muta",x) & !grepl("nonmuta|unmuta",x) [1] FALSE FALSE TRUE TRUE TRUE >> > >> > Can anyone show me how to select the columns I need? >> >> If you want only columns whose names _begin_ with "muta" >> then add the "^" character at the beginning of your >> pattern: >> >> names(KRASyn)[grep("^muta", names(KRASyn))] >> >> (This should be explained on the ?regex page.) >> >> -- >> David Winsemius, MD >> West Hartford, CT >> >> > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Bert,
Yes, code like: x <- names(yourdataframe) grepl("muta",x) & !grepl("nonmuta|unmuta",x) works perfectly. Thanks very much for your help. Paul --- On Mon, 4/23/12, Bert Gunter <[hidden email]> wrote: > From: Bert Gunter <[hidden email]> > Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un" > To: "Paul Miller" <[hidden email]> > Cc: "David Winsemius" <[hidden email]>, [hidden email] > Received: Monday, April 23, 2012, 12:15 PM > But maybe ... (see below) > -- Bert > > On Mon, Apr 23, 2012 at 9:25 AM, Paul Miller <[hidden email]> > wrote: > > Hello Dr. Winsemius, > > > > Unfortunately, I also have terms like "krasmutated". So > simply selecting words that start with "muta" won't work in > this case. > > > > Thanks, > > > > Paul > > > > > > --- On Mon, 4/23/12, David Winsemius <[hidden email]> > wrote: > > > >> From: David Winsemius <[hidden email]> > >> Subject: Re: [R] Selecting columns whose names > contain "mutated" except when they also contain "non" or > "un" > >> To: "Paul Miller" <[hidden email]> > >> Cc: [hidden email] > >> Received: Monday, April 23, 2012, 11:16 AM > >> > >> On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: > >> > >> > Hello All, > >> > > >> > Started out awhile ago trying to select > columns in a > >> dataframe whose names contain some variation of the > word > >> "mutant" using code like: > >> > > >> > names(KRASyn)[grep("muta", names(KRASyn))] > >> > > >> > The idea then would be to add together the > various > >> columns using code like: > >> > > >> > KRASyn$Mutant_comb <- > rowSums(KRASyn[grep("muta", > >> names(KRASyn))]) > >> > > >> > What I discovered though, is that this selects > columns > >> like "nonmutated" and "unmutated" as well as > columns like > >> "mutated", "mutation", and "mutational". > >> > > >> > So I'd like to know how to select columns that > have > >> some variation of the word "mutant" without the > "non" or the > >> "un". I've been looking around for an example of > how to do > >> that but haven't found anything yet. > > If this **is** a complete specification then wouldn't > simply: > > x <- names(yourdataframe) > grepl("muta",x) & !grepl("nonmuta|unmuta",x) > > do it? > > e.g. > > x <- > c("nonmutated","unmutated","mutation","mutated","krasmutated") > > grepl("muta",x) & !grepl("nonmuta|unmuta",x) > [1] FALSE FALSE TRUE TRUE TRUE > > >> > > >> > Can anyone show me how to select the columns I > need? > >> > >> If you want only columns whose names _begin_ with > "muta" > >> then add the "^" character at the beginning of > your > >> pattern: > >> > >> names(KRASyn)[grep("^muta", names(KRASyn))] > >> > >> (This should be explained on the ?regex page.) > >> > >> -- > >> David Winsemius, MD > >> West Hartford, CT > >> > >> > > > > ______________________________________________ > > [hidden email] > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Paul Miller
Here is a method that uses negative look behind:
> tmp <- c('mutation','nonmutated','unmutated','verymutated','other') > grep("(?<!un)(?<!non)muta", tmp, perl=TRUE) [1] 1 4 it looks for muta that is not immediatly preceeded by un or non (but it would match "unusually mutated" since the un is not immediatly befor the muta). Hope this helps, On Mon, Apr 23, 2012 at 10:10 AM, Paul Miller <[hidden email]> wrote: > Hello All, > > Started out awhile ago trying to select columns in a dataframe whose names contain some variation of the word "mutant" using code like: > > names(KRASyn)[grep("muta", names(KRASyn))] > > The idea then would be to add together the various columns using code like: > > KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", names(KRASyn))]) > > What I discovered though, is that this selects columns like "nonmutated" and "unmutated" as well as columns like "mutated", "mutation", and "mutational". > > So I'd like to know how to select columns that have some variation of the word "mutant" without the "non" or the "un". I've been looking around for an example of how to do that but haven't found anything yet. > > Can anyone show me how to select the columns I need? > > Thanks, > > Paul > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi Greg,
This is quite helpful. Not so good yet with regular expressions in general or Perl-like regular expressions. Found the help page though, and think I was able to determine how the code works as well as how I would select only instances where "muta" is preceeded by either "non" or "un". > (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) [1] "mutation" "nonmutated" "unmutated" "verymutated" "other" > grep("(?<!un)(?<!non)muta", tmp, perl=TRUE) [1] 1 4 > grep("(?!muta)non|un", tmp, perl=TRUE) [1] 2 3 Did I get the second grep right? If so, do you have any sense of why it seems to fail when I apply it to my data? > KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)]) Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) : 'x' must be numeric Thanks, Paul ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Apr 24, 2012, at 9:40 AM, Paul Miller wrote: > Hi Greg, > > This is quite helpful. Not so good yet with regular expressions in > general or Perl-like regular expressions. Found the help page > though, and think I was able to determine how the code works as well > as how I would select only instances where "muta" is preceeded by > either "non" or "un". > >> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) > [1] "mutation" "nonmutated" "unmutated" "verymutated" "other" > >> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE) > [1] 1 4 > >> grep("(?!muta)non|un", tmp, perl=TRUE) > [1] 2 3 > > Did I get the second grep right? > > If so, do you have any sense of why it seems to fail when I apply it > to my data? > >> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", >> names(KRASyn), perl=TRUE)]) > > Error in rowSums() : > 'x' must be numeric The error message strongly suggests at least one non-numeric column. What does this return: lapply( KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)], is.numeric) -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by glsnow
Hello,
Has anyone realized that both 'non' and 'un' end with the same letter? The only one we really need to check? (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) i1 <- grepl("muta", tmp) i2 <- grepl("nmuta", tmp) tmp[i1 & !i2] Now, not an answer to Greg's post, just convoluted. (tmp <- c(tmp, 'permutation', 'commutation')) cols <- list() cols[[1]] <- grep("muta", tmp) cols[[2]] <- grep("nmuta", tmp) cols[[3]] <- grep("(per)|(com)muta", tmp) Reduce(setdiff, cols) Rui Barradas |
|
In reply to this post by David Winsemius
Hello Dr. Winsemius,
There was a non-numeric column. Thanks for helping me to see the obvious. Paul ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Rui Barradas
On Apr 24, 2012, at 19:15 , Rui Barradas wrote: > > Has anyone realized that both 'non' and 'un' end with the same letter? The > only one we really need to check? > > (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) > > i1 <- grepl("muta", tmp) > i2 <- grepl("nmuta", tmp) > > tmp[i1 & !i2] > Yes, I was wondering why people were avoiding the obvious use of grepl(). I'm not too happy about the "nmuta" technique though: What about "deletionmutation" and such? Might as well do the safe(r) thing: i2 <- grepl("unmuta", tmp) | grepl("nonmuta", tmp) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Paul Miller
Sorry I took so long getting back to this, but the paying job needs to
take priority. The regular expression "(?<!un)(?<!non)muta" looks for a string that matches "muta" then looks at the characters immediately before it to see if they match either "un" or "non" in which case it makes it a not match. More specifically the regular expression engine steps through the string and at each point tries the match, so at a given point it will first see if "un" is before that point, if it is then this point can't match and it moves the checking point, if it is not "un" then it moves to the next negative look behind and sees if "non" is just before the point. If neither "un" or "non" are just before the point then it starts matching characters after the point to see if they match "muta". So the next pattern is "(?!muta)non|un", the (?!muta) is a negative look ahead which starts at the point and checks forward to see that the next characters are not "muta" (but does not include them in the match), in this case it is a no-op because you are saying that you want to match at a point where the next characters are not "muta" but are "non" and since the next set of characters cannot be both this is the same as just matching "non", also you need to be aware of the operator precedence, in that pattern the (?!muta) part only applied to the "non", not the "un". To match "nonmuta" or "unmuta" a simple pattern would just be "(non|un)muta" or "(no|u)nmuta". You could use the positive lookbehind (you would still need an "or"), but it would be overkill for a grep command. The difference in the positive look ahead/behind is more important for replacing where the look ahead/behind is needed for the match to happen, but is not captured as part of the match to be replaced. On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller <[hidden email]> wrote: > Hi Greg, > > This is quite helpful. Not so good yet with regular expressions in general or Perl-like regular expressions. Found the help page though, and think I was able to determine how the code works as well as how I would select only instances where "muta" is preceeded by either "non" or "un". > >> (tmp <- c('mutation','nonmutated','unmutated','verymutated','other')) > [1] "mutation" "nonmutated" "unmutated" "verymutated" "other" > >> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE) > [1] 1 4 > >> grep("(?!muta)non|un", tmp, perl=TRUE) > [1] 2 3 > > Did I get the second grep right? > > If so, do you have any sense of why it seems to fail when I apply it to my data? > >> KRASyn$NonMutant_comb <- rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl=TRUE)]) > > Error in rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), perl = TRUE)]) : > 'x' must be numeric > > Thanks, > > Paul > -- Gregory (Greg) L. Snow Ph.D. [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Winsemius
>>>>> David Winsemius <[hidden email]>
>>>>> on Mon, 23 Apr 2012 12:16:39 -0400 writes: > On Apr 23, 2012, at 12:10 PM, Paul Miller wrote: >> Hello All, >> >> Started out awhile ago trying to select columns in a >> dataframe whose names contain some variation of the word >> "mutant" using code like: >> >> names(KRASyn)[grep("muta", names(KRASyn))] >> >> The idea then would be to add together the various >> columns using code like: >> >> KRASyn$Mutant_comb <- rowSums(KRASyn[grep("muta", >> names(KRASyn))]) >> >> What I discovered though, is that this selects columns >> like "nonmutated" and "unmutated" as well as columns like >> "mutated", "mutation", and "mutational". >> >> So I'd like to know how to select columns that have some >> variation of the word "mutant" without the "non" or the >> "un". I've been looking around for an example of how to >> do that but haven't found anything yet. >> >> Can anyone show me how to select the columns I need? > If you want only columns whose names _begin_ with "muta" > then add the "^" character at the beginning of your > pattern: > names(KRASyn)[grep("^muta", names(KRASyn))] > (This should be explained on the ?regex page.) It *is* ! Search for "beginning" and you're there. Martin > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by glsnow
Hi Greg,
This is very helpful. Thanks for explaining it. I'm clearly going to need to improve my understanding of regular expressions. Currently busy trying to figure out Sweave and knitr though. Paul --- On Thu, 4/26/12, Greg Snow <[hidden email]> wrote: > From: Greg Snow <[hidden email]> > Subject: Re: [R] Selecting columns whose names contain "mutated" except when they also contain "non" or "un" > To: "Paul Miller" <[hidden email]> > Cc: [hidden email] > Received: Thursday, April 26, 2012, 1:55 PM > Sorry I took so long getting back to > this, but the paying job needs to > take priority. > > The regular expression "(?<!un)(?<!non)muta" > looks for a string that > matches "muta" then looks at the characters immediately > before it to > see if they match either "un" or "non" in which case it > makes it a not > match. More specifically the regular expression engine > steps through > the string and at each point tries the match, so at a given > point it > will first see if "un" is before that point, if it is then > this point > can't match and it moves the checking point, if it is not > "un" then it > moves to the next negative look behind and sees if "non" is > just > before the point. If neither "un" or "non" are just > before the point > then it starts matching characters after the point to see if > they > match "muta". > > So the next pattern is "(?!muta)non|un", the (?!muta) is a > negative > look ahead which starts at the point and checks forward to > see that > the next characters are not "muta" (but does not include > them in the > match), in this case it is a no-op because you are saying > that you > want to match at a point where the next characters are not > "muta" but > are "non" and since the next set of characters cannot > be both this is > the same as just matching "non", also you need to be aware > of the > operator precedence, in that pattern the (?!muta) part only > applied to > the "non", not the "un". > > To match "nonmuta" or "unmuta" a simple pattern would just > be > "(non|un)muta" or "(no|u)nmuta". You could use the > positive > lookbehind (you would still need an "or"), but it would be > overkill > for a grep command. The difference in the positive > look ahead/behind > is more important for replacing where the look ahead/behind > is needed > for the match to happen, but is not captured as part of the > match to > be replaced. > > > > On Tue, Apr 24, 2012 at 7:40 AM, Paul Miller <[hidden email]> > wrote: > > Hi Greg, > > > > This is quite helpful. Not so good yet with regular > expressions in general or Perl-like regular expressions. > Found the help page though, and think I was able to > determine how the code works as well as how I would select > only instances where "muta" is preceeded by either "non" or > "un". > > > >> (tmp <- > c('mutation','nonmutated','unmutated','verymutated','other')) > > [1] "mutation" "nonmutated" "unmutated" > "verymutated" "other" > > > >> grep("(?<!un)(?<!non)muta", tmp, perl=TRUE) > > [1] 1 4 > > > >> grep("(?!muta)non|un", tmp, perl=TRUE) > > [1] 2 3 > > > > Did I get the second grep right? > > > > If so, do you have any sense of why it seems to fail > when I apply it to my data? > > > >> KRASyn$NonMutant_comb <- > rowSums(KRASyn[grep("(?!muta)non|un", names(KRASyn), > perl=TRUE)]) > > > > Error in rowSums(KRASyn[grep("(?!muta)non|un", > names(KRASyn), perl = TRUE)]) : > > 'x' must be numeric > > > > Thanks, > > > > Paul > > > > > > -- > Gregory (Greg) L. Snow Ph.D. > [hidden email] > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
