|
I would like to remove rows from the following data frame (df) if there are
only two specific elements found in the df$ch character string (I want to remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like to remove rows if the first non-zero element is "D" or "d". ch count 1 0000000000D0000000000000000000000000000000000000 0.007368; 2 0000000000d0000000000000000000000000000000000000 0.002456; 3 000000000T00000000000000000000000000000000000000 0.007368; 4 000000000TD0000000000000000000000000000000000000 0.007368; 5 000000000T00000000000000000000000000000000000000 0.002456; 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 00000000T000000000000000000000000000000000000000 0.007368; 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; I tried the following but it doesn't work if there is more than one character per string: >df <- df[!df$ch %in% c("0","D"),] >df <- df[!df$ch %in% c("0","d"),] Any help greatly appreciated, Claudia [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hello,
Try regular expressions instead. In this data.frame, I've changed row nr.4 to have a row with 'D' as first non-zero character. dd <- read.table(text=" ch count 1 0000000000D0000000000000000000000000000000000000 0.007368 2 0000000000d0000000000000000000000000000000000000 0.002456 3 000000000T00000000000000000000000000000000000000 0.007368 4 000000000DT0000000000000000000000000000000000000 0.007368 5 000000000T00000000000000000000000000000000000000 0.002456 6 000000000Td0000000000000000000000000000000000000 0.002456 7 00000000T000000000000000000000000000000000000000 0.007368 8 00000000T0D0000000000000000000000000000000000000 0.007368 9 00000000T000000000000000000000000000000000000000 0.002456 10 00000000T0d0000000000000000000000000000000000000 0.002456 ", header=TRUE) dd i1 <- grepl("^([0D]|[0d])*$", dd$ch) i2 <- grepl("^0*[Dd]", dd$ch) dd[!i1, ] dd[!i2, ] dd[!(i1 | i2), ] Hope this helps, Rui Barradas Em 02-07-2012 23:48, Claudia Penaloza escreveu: > I would like to remove rows from the following data frame (df) if there are > only two specific elements found in the df$ch character string (I want to > remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like > to remove rows if the first non-zero element is "D" or "d". > > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368; > 2 0000000000d0000000000000000000000000000000000000 0.002456; > 3 000000000T00000000000000000000000000000000000000 0.007368; > 4 000000000TD0000000000000000000000000000000000000 0.007368; > 5 000000000T00000000000000000000000000000000000000 0.002456; > 6 000000000Td0000000000000000000000000000000000000 0.002456; > 7 00000000T000000000000000000000000000000000000000 0.007368; > 8 00000000T0D0000000000000000000000000000000000000 0.007368; > 9 00000000T000000000000000000000000000000000000000 0.002456; > 10 00000000T0d0000000000000000000000000000000000000 0.002456; > > > I tried the following but it doesn't work if there is more than one > character per string: > >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),] > > Any help greatly appreciated, > Claudia > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by seapen
Hi, Try this: dat1<-read.table(text=" 1 0000000000D0000000000000000000000000000000000000 0.007368; 2 0000000000d0000000000000000000000000000000000000 0.002456; 3 000000000T00000000000000000000000000000000000000 0.007368; 4 000000000TD0000000000000000000000000000000000000 0.007368; 5 000000000T00000000000000000000000000000000000000 0.002456; 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 00000000T000000000000000000000000000000000000000 0.007368; 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; ",sep="",header=FALSE) colnames(dat1)<-c("num","Ch", "count") #I guess this is what you wanted. dat1[grepl("TD|Td|T",dat1$Ch),] num Ch count 3 3 000000000T00000000000000000000000000000000000000 0.007368; 4 4 000000000TD0000000000000000000000000000000000000 0.007368; 5 5 000000000T00000000000000000000000000000000000000 0.002456; 6 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 7 00000000T000000000000000000000000000000000000000 0.007368; 8 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 9 00000000T000000000000000000000000000000000000000 0.002456; 10 10 00000000T0d0000000000000000000000000000000000000 0.002456; #If you want to remove D or d rows dat1[!grepl("D|d",dat1$Ch),] num Ch count 3 3 000000000T00000000000000000000000000000000000000 0.007368; 5 5 000000000T00000000000000000000000000000000000000 0.002456; 7 7 00000000T000000000000000000000000000000000000000 0.007368; 9 9 00000000T000000000000000000000000000000000000000 0.002456; A.K. ----- Original Message ----- From: Claudia Penaloza <[hidden email]> To: [hidden email] Cc: Sent: Monday, July 2, 2012 6:48 PM Subject: [R] Removing rows if certain elements are found in character string I would like to remove rows from the following data frame (df) if there are only two specific elements found in the df$ch character string (I want to remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like to remove rows if the first non-zero element is "D" or "d". ch count 1 0000000000D0000000000000000000000000000000000000 0.007368; 2 0000000000d0000000000000000000000000000000000000 0.002456; 3 000000000T00000000000000000000000000000000000000 0.007368; 4 000000000TD0000000000000000000000000000000000000 0.007368; 5 000000000T00000000000000000000000000000000000000 0.002456; 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 00000000T000000000000000000000000000000000000000 0.007368; 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; I tried the following but it doesn't work if there is more than one character per string: >df <- df[!df$ch %in% c("0","D"),] >df <- df[!df$ch %in% c("0","d"),] Any help greatly appreciated, Claudia [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Rui Barradas
You will have to change the 'i1' expression as follows:
> i1 <- grepl("^([0D]|[0d])*$", dd$ch) > i1 # matches strings with d & D in them [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > # second string had 'd' & 'D' in it so it was TRUE above and FALSE below > i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch) > i1new [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > I put a 'd' and 'D' in the second string and the original regular expression is equivalent to grepl("^[0dD]*$", dd$ch) which will match strings containing d, D and 0. If you only want 'd' or 'D' (and not both), then you will have to use the one in 'i1new'. On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <[hidden email]> wrote: > Hello, > > Try regular expressions instead. > In this data.frame, I've changed row nr.4 to have a row with 'D' as first > non-zero character. > > dd <- read.table(text=" > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368 > 2 0000000000d0000000000000000000000000000000000000 0.002456 > 3 000000000T00000000000000000000000000000000000000 0.007368 > 4 000000000DT0000000000000000000000000000000000000 0.007368 > > 5 000000000T00000000000000000000000000000000000000 0.002456 > 6 000000000Td0000000000000000000000000000000000000 0.002456 > 7 00000000T000000000000000000000000000000000000000 0.007368 > 8 00000000T0D0000000000000000000000000000000000000 0.007368 > 9 00000000T000000000000000000000000000000000000000 0.002456 > 10 00000000T0d0000000000000000000000000000000000000 0.002456 > ", header=TRUE) > dd > > i1 <- grepl("^([0D]|[0d])*$", dd$ch) > i2 <- grepl("^0*[Dd]", dd$ch) > > dd[!i1, ] > dd[!i2, ] > dd[!(i1 | i2), ] > > > Hope this helps, > > Rui Barradas > > Em 02-07-2012 23:48, Claudia Penaloza escreveu: > >> I would like to remove rows from the following data frame (df) if there >> are >> only two specific elements found in the df$ch character string (I want to >> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like >> to remove rows if the first non-zero element is "D" or "d". >> >> >> ch count >> 1 0000000000D0000000000000000000000000000000000000 0.007368; >> 2 0000000000d0000000000000000000000000000000000000 0.002456; >> 3 000000000T00000000000000000000000000000000000000 0.007368; >> 4 000000000TD0000000000000000000000000000000000000 0.007368; >> 5 000000000T00000000000000000000000000000000000000 0.002456; >> 6 000000000Td0000000000000000000000000000000000000 0.002456; >> 7 00000000T000000000000000000000000000000000000000 0.007368; >> 8 00000000T0D0000000000000000000000000000000000000 0.007368; >> 9 00000000T000000000000000000000000000000000000000 0.002456; >> 10 00000000T0d0000000000000000000000000000000000000 0.002456; >> >> >> I tried the following but it doesn't work if there is more than one >> character per string: >> >>> df <- df[!df$ch %in% c("0","D"),] >>> df <- df[!df$ch %in% c("0","d"),] >> >> >> Any help greatly appreciated, >> Claudia >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Rui Barradas
Hi, I didn't think about the situation where D comes before T. I changed my code a little to accommodate that. dat2<-read.table(text=" 1 0000000000D0000000000000000000000000000000000000 0.007368; 2 0000000000d0000000000000000000000000000000000000 0.002456; 3 000000000T00000000000000000000000000000000000000 0.007368; 4 000000000DT0000000000000000000000000000000000000 0.007368; 5 000000000T00000000000000000000000000000000000000 0.002456; 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 00000000T000000000000000000000000000000000000000 0.007368; 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; ",sep="",header=FALSE) colnames(dat2)<-c("num","Ch", "count") dat2[grepl("0T|0Td|0TD",dat2$Ch),] num Ch count 3 3 000000000T00000000000000000000000000000000000000 0.007368; 5 5 000000000T00000000000000000000000000000000000000 0.002456; 6 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 7 00000000T000000000000000000000000000000000000000 0.007368; 8 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 9 00000000T000000000000000000000000000000000000000 0.002456; 10 10 00000000T0d0000000000000000000000000000000000000 0.002456; A.K. ----- Original Message ----- From: Rui Barradas <[hidden email]> To: Claudia Penaloza <[hidden email]> Cc: [hidden email] Sent: Monday, July 2, 2012 7:24 PM Subject: Re: [R] Removing rows if certain elements are found in character string Hello, Try regular expressions instead. In this data.frame, I've changed row nr.4 to have a row with 'D' as first non-zero character. dd <- read.table(text=" ch count 1 0000000000D0000000000000000000000000000000000000 0.007368 2 0000000000d0000000000000000000000000000000000000 0.002456 3 000000000T00000000000000000000000000000000000000 0.007368 4 000000000DT0000000000000000000000000000000000000 0.007368 5 000000000T00000000000000000000000000000000000000 0.002456 6 000000000Td0000000000000000000000000000000000000 0.002456 7 00000000T000000000000000000000000000000000000000 0.007368 8 00000000T0D0000000000000000000000000000000000000 0.007368 9 00000000T000000000000000000000000000000000000000 0.002456 10 00000000T0d0000000000000000000000000000000000000 0.002456 ", header=TRUE) dd i1 <- grepl("^([0D]|[0d])*$", dd$ch) i2 <- grepl("^0*[Dd]", dd$ch) dd[!i1, ] dd[!i2, ] dd[!(i1 | i2), ] Hope this helps, Rui Barradas Em 02-07-2012 23:48, Claudia Penaloza escreveu: > I would like to remove rows from the following data frame (df) if there are > only two specific elements found in the df$ch character string (I want to > remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like > to remove rows if the first non-zero element is "D" or "d". > > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368; > 2 0000000000d0000000000000000000000000000000000000 0.002456; > 3 000000000T00000000000000000000000000000000000000 0.007368; > 4 000000000TD0000000000000000000000000000000000000 0.007368; > 5 000000000T00000000000000000000000000000000000000 0.002456; > 6 000000000Td0000000000000000000000000000000000000 0.002456; > 7 00000000T000000000000000000000000000000000000000 0.007368; > 8 00000000T0D0000000000000000000000000000000000000 0.007368; > 9 00000000T000000000000000000000000000000000000000 0.002456; > 10 00000000T0d0000000000000000000000000000000000000 0.002456; > > > I tried the following but it doesn't work if there is more than one > character per string: > >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),] > > Any help greatly appreciated, > Claudia > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by seapen
On Jul 2, 2012, at 6:48 PM, Claudia Penaloza wrote: > I would like to remove rows from the following data frame (df) if > there are > only two specific elements found in the df$ch character string (I > want to > remove rows with only "0" & "D" or "0" & "d"). Alternatively, I > would like > to remove rows if the first non-zero element is "D" or "d". > > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368; > 2 0000000000d0000000000000000000000000000000000000 0.002456; > 3 000000000T00000000000000000000000000000000000000 0.007368; > 4 000000000TD0000000000000000000000000000000000000 0.007368; > 5 000000000T00000000000000000000000000000000000000 0.002456; > 6 000000000Td0000000000000000000000000000000000000 0.002456; > 7 00000000T000000000000000000000000000000000000000 0.007368; > 8 00000000T0D0000000000000000000000000000000000000 0.007368; > 9 00000000T000000000000000000000000000000000000000 0.002456; > 10 00000000T0d0000000000000000000000000000000000000 0.002456; > > > I tried the following but it doesn't work if there is more than one > character per string: > >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),] You seem to be missing test cases for the second set of conditions but this works for the first set (and might for the second): > dat[ grepl("[^0dD]", dat$ch) & ! grepl("^0+d|^0^D", dat$ch) , ] ch count 3 000000000T00000000000000000000000000000000000000 0.007368 4 000000000TD0000000000000000000000000000000000000 0.007368 5 000000000T00000000000000000000000000000000000000 0.002456 6 000000000Td0000000000000000000000000000000000000 0.002456 7 00000000T000000000000000000000000000000000000000 0.007368 8 00000000T0D0000000000000000000000000000000000000 0.007368 9 00000000T000000000000000000000000000000000000000 0.002456 10 00000000T0d0000000000000000000000000000000000000 0.002456 > -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by jholtman
Hello,
Inline. Em 03-07-2012 01:15, jim holtman escreveu: > You will have to change the 'i1' expression as follows: > >> i1 <- grepl("^([0D]|[0d])*$", dd$ch) >> i1 # matches strings with d & D in them > [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> # second string had 'd' & 'D' in it so it was TRUE above and FALSE below >> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch) >> i1new > [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> Right, apparently, I forgot that grep is greedy, and the test cases were not complete. >> > > I put a 'd' and 'D' in the second string and the original regular > expression is equivalent to > > grepl("^[0dD]*$", dd$ch) > This is only for the first request, and does not solve cases where there are characters other than '0', 'd' or 'D', but 'd' or 'D' are the first non-zero. This is the case of my 4th row, changed from the OP's data example. My regexpr for 'i2' is equivalent to this one, that I believe is more readable: i2b <- grepl("^0{0,}[Dd]", dd$ch) First a zero, that might occur zero or more times, then a 'd' or 'D', then and til the end, irrelevant. > which will match strings containing d, D and 0. If you only want 'd' > or 'D' (and not both), then you will have to use the one in 'i1new'. > To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'. Rui Barradas > On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <[hidden email]> wrote: >> Hello, >> >> Try regular expressions instead. >> In this data.frame, I've changed row nr.4 to have a row with 'D' as first >> non-zero character. >> >> dd <- read.table(text=" >> >> ch count >> 1 0000000000D0000000000000000000000000000000000000 0.007368 >> 2 0000000000d0000000000000000000000000000000000000 0.002456 >> 3 000000000T00000000000000000000000000000000000000 0.007368 >> 4 000000000DT0000000000000000000000000000000000000 0.007368 >> >> 5 000000000T00000000000000000000000000000000000000 0.002456 >> 6 000000000Td0000000000000000000000000000000000000 0.002456 >> 7 00000000T000000000000000000000000000000000000000 0.007368 >> 8 00000000T0D0000000000000000000000000000000000000 0.007368 >> 9 00000000T000000000000000000000000000000000000000 0.002456 >> 10 00000000T0d0000000000000000000000000000000000000 0.002456 >> ", header=TRUE) >> dd >> >> i1 <- grepl("^([0D]|[0d])*$", dd$ch) >> i2 <- grepl("^0*[Dd]", dd$ch) >> >> dd[!i1, ] >> dd[!i2, ] >> dd[!(i1 | i2), ] >> >> >> Hope this helps, >> >> Rui Barradas >> >> Em 02-07-2012 23:48, Claudia Penaloza escreveu: >> >>> I would like to remove rows from the following data frame (df) if there >>> are >>> only two specific elements found in the df$ch character string (I want to >>> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like >>> to remove rows if the first non-zero element is "D" or "d". >>> >>> >>> ch count >>> 1 0000000000D0000000000000000000000000000000000000 0.007368; >>> 2 0000000000d0000000000000000000000000000000000000 0.002456; >>> 3 000000000T00000000000000000000000000000000000000 0.007368; >>> 4 000000000TD0000000000000000000000000000000000000 0.007368; >>> 5 000000000T00000000000000000000000000000000000000 0.002456; >>> 6 000000000Td0000000000000000000000000000000000000 0.002456; >>> 7 00000000T000000000000000000000000000000000000000 0.007368; >>> 8 00000000T0D0000000000000000000000000000000000000 0.007368; >>> 9 00000000T000000000000000000000000000000000000000 0.002456; >>> 10 00000000T0d0000000000000000000000000000000000000 0.002456; >>> >>> >>> I tried the following but it doesn't work if there is more than one >>> character per string: >>> >>>> df <- df[!df$ch %in% c("0","D"),] >>>> df <- df[!df$ch %in% c("0","d"),] >>> >>> >>> Any help greatly appreciated, >>> Claudia >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly because there
are no instances of 'Dd' or 'dD' in the data set (that I would/not want to include/exclude)... but I understand that 'i1new' targets precisely what I want. Why isn't a leader of zero's required for either 'i1' or 'i1new', as so? i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$", dd$ch) Thank you again, Claudia On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <[hidden email]> wrote: > Hello, > > Inline. > > Em 03-07-2012 01:15, jim holtman escreveu: > > You will have to change the 'i1' expression as follows: >> >> i1 <- grepl("^([0D]|[0d])*$", dd$ch) >>> i1 # matches strings with d & D in them >>> >> [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> >>> # second string had 'd' & 'D' in it so it was TRUE above and FALSE below >>> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch) >>> i1new >>> >> [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> >>> >>> > Right, apparently, I forgot that grep is greedy, and the test cases were > not complete. > > >>> >> I put a 'd' and 'D' in the second string and the original regular >> expression is equivalent to >> >> grepl("^[0dD]*$", dd$ch) >> >> > This is only for the first request, and does not solve cases where there > are characters other than '0', 'd' or 'D', but 'd' or 'D' are the first > non-zero. This is the case of my 4th row, changed from the OP's data > example. > > My regexpr for 'i2' is equivalent to this one, that I believe is more > readable: > > > i2b <- grepl("^0{0,}[Dd]", dd$ch) > > > First a zero, that might occur zero or more times, then a 'd' or 'D', then > and til the end, irrelevant. > > > which will match strings containing d, D and 0. If you only want 'd' >> or 'D' (and not both), then you will have to use the one in 'i1new'. >> >> > To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'. > > Rui Barradas > > > On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <[hidden email]> >> wrote: >> >>> Hello, >>> >>> Try regular expressions instead. >>> In this data.frame, I've changed row nr.4 to have a row with 'D' as first >>> non-zero character. >>> >>> dd <- read.table(text=" >>> >>> ch count >>> 1 0000000000D0000000000000000000**000000000000000000 0.007368 >>> 2 0000000000d0000000000000000000**000000000000000000 0.002456 >>> 3 000000000T00000000000000000000**000000000000000000 0.007368 >>> 4 000000000DT0000000000000000000**000000000000000000 0.007368 >>> >>> 5 000000000T00000000000000000000**000000000000000000 0.002456 >>> 6 000000000Td0000000000000000000**000000000000000000 0.002456 >>> 7 00000000T000000000000000000000**000000000000000000 0.007368 >>> 8 00000000T0D0000000000000000000**000000000000000000 0.007368 >>> 9 00000000T000000000000000000000**000000000000000000 0.002456 >>> 10 00000000T0d0000000000000000000**000000000000000000 0.002456 >>> ", header=TRUE) >>> dd >>> >>> i1 <- grepl("^([0D]|[0d])*$", dd$ch) >>> i2 <- grepl("^0*[Dd]", dd$ch) >>> >>> dd[!i1, ] >>> dd[!i2, ] >>> dd[!(i1 | i2), ] >>> >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> Em 02-07-2012 23:48, Claudia Penaloza escreveu: >>> >>> I would like to remove rows from the following data frame (df) if there >>>> are >>>> only two specific elements found in the df$ch character string (I want >>>> to >>>> remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would >>>> like >>>> to remove rows if the first non-zero element is "D" or "d". >>>> >>>> >>>> ch count >>>> 1 0000000000D0000000000000000000**000000000000000000 0.007368; >>>> 2 0000000000d0000000000000000000**000000000000000000 0.002456; >>>> 3 000000000T00000000000000000000**000000000000000000 0.007368; >>>> 4 000000000TD0000000000000000000**000000000000000000 0.007368; >>>> 5 000000000T00000000000000000000**000000000000000000 0.002456; >>>> 6 000000000Td0000000000000000000**000000000000000000 0.002456; >>>> 7 00000000T000000000000000000000**000000000000000000 0.007368; >>>> 8 00000000T0D0000000000000000000**000000000000000000 0.007368; >>>> 9 00000000T000000000000000000000**000000000000000000 0.002456; >>>> 10 00000000T0d0000000000000000000**000000000000000000 0.002456; >>>> >>>> >>>> I tried the following but it doesn't work if there is more than one >>>> character per string: >>>> >>>> df <- df[!df$ch %in% c("0","D"),] >>>>> df <- df[!df$ch %in% c("0","d"),] >>>>> >>>> >>>> >>>> Any help greatly appreciated, >>>> Claudia >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> ______________________________**________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html <http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hello,
I'm glad it helped. See answer inline. Em 03-07-2012 17:09, Claudia Penaloza escreveu: > Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly > because there are no instances of 'Dd' or 'dD' in the data set (that I > would/not want to include/exclude)... but I understand that 'i1new' > targets precisely what I want. > Why isn't a leader of zero's required for either 'i1' or 'i1new', as so? > i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$", dd$ch) > Because both 'i1' and 'i1new' test from beginning to end of string, allowing only '0' and either 'd' or 'D', but not both (i1new). So, there's no need to explicitly test for a string that begins with '0'. Rui Barradas > Thank you again, > Claudia > On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <[hidden email] > <mailto:[hidden email]>> wrote: > > Hello, > > Inline. > > Em 03-07-2012 01:15, jim holtman escreveu: > > You will have to change the 'i1' expression as follows: > > i1 <- grepl("^([0D]|[0d])*$", dd$ch) > i1 # matches strings with d & D in them > > [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > # second string had 'd' & 'D' in it so it was TRUE above and > FALSE below > i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch) > i1new > > [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > > > Right, apparently, I forgot that grep is greedy, and the test cases > were not complete. > > > > I put a 'd' and 'D' in the second string and the original regular > expression is equivalent to > > grepl("^[0dD]*$", dd$ch) > > > This is only for the first request, and does not solve cases where > there are characters other than '0', 'd' or 'D', but 'd' or 'D' are > the first non-zero. This is the case of my 4th row, changed from the > OP's data example. > > My regexpr for 'i2' is equivalent to this one, that I believe is > more readable: > > > i2b <- grepl("^0{0,}[Dd]", dd$ch) > > > First a zero, that might occur zero or more times, then a 'd' or > 'D', then and til the end, irrelevant. > > > which will match strings containing d, D and 0. If you only > want 'd' > or 'D' (and not both), then you will have to use the one in 'i1new'. > > > To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'. > > Rui Barradas > > > On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas > <[hidden email] <mailto:[hidden email]>> wrote: > > Hello, > > Try regular expressions instead. > In this data.frame, I've changed row nr.4 to have a row with > 'D' as first > non-zero character. > > dd <- read.table(text=" > > ch count > 1 0000000000D0000000000000000000__000000000000000000 0.007368 > 2 0000000000d0000000000000000000__000000000000000000 0.002456 > 3 000000000T00000000000000000000__000000000000000000 0.007368 > 4 000000000DT0000000000000000000__000000000000000000 0.007368 > > 5 000000000T00000000000000000000__000000000000000000 0.002456 > 6 000000000Td0000000000000000000__000000000000000000 0.002456 > 7 00000000T000000000000000000000__000000000000000000 0.007368 > 8 00000000T0D0000000000000000000__000000000000000000 0.007368 > 9 00000000T000000000000000000000__000000000000000000 0.002456 > 10 00000000T0d0000000000000000000__000000000000000000 0.002456 > ", header=TRUE) > dd > > i1 <- grepl("^([0D]|[0d])*$", dd$ch) > i2 <- grepl("^0*[Dd]", dd$ch) > > dd[!i1, ] > dd[!i2, ] > dd[!(i1 | i2), ] > > > Hope this helps, > > Rui Barradas > > Em 02-07-2012 23:48, Claudia Penaloza escreveu: > > I would like to remove rows from the following data > frame (df) if there > are > only two specific elements found in the df$ch character > string (I want to > remove rows with only "0" & "D" or "0" & "d"). > Alternatively, I would like > to remove rows if the first non-zero element is "D" or "d". > > > ch > count > 1 0000000000D0000000000000000000__000000000000000000 > 0.007368; > 2 0000000000d0000000000000000000__000000000000000000 > 0.002456; > 3 000000000T00000000000000000000__000000000000000000 > 0.007368; > 4 000000000TD0000000000000000000__000000000000000000 > 0.007368; > 5 000000000T00000000000000000000__000000000000000000 > 0.002456; > 6 000000000Td0000000000000000000__000000000000000000 > 0.002456; > 7 00000000T000000000000000000000__000000000000000000 > 0.007368; > 8 00000000T0D0000000000000000000__000000000000000000 > 0.007368; > 9 00000000T000000000000000000000__000000000000000000 > 0.002456; > 10 00000000T0d0000000000000000000__000000000000000000 > 0.002456; > > > I tried the following but it doesn't work if there is > more than one > character per string: > > df <- df[!df$ch %in% c("0","D"),] > df <- df[!df$ch %in% c("0","d"),] > > > > Any help greatly appreciated, > Claudia > > [[alternative HTML version deleted]] > > ________________________________________________ > [hidden email] <mailto:[hidden email]> > mailing list > https://stat.ethz.ch/mailman/__listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/__posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, > reproducible code. > > > ________________________________________________ > [hidden email] <mailto:[hidden email]> mailing list > https://stat.ethz.ch/mailman/__listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/__posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible > code. > > > > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Got it! Thank you Rui!
cp On Tue, Jul 3, 2012 at 10:14 AM, Rui Barradas <[hidden email]> wrote: > Hello, > > I'm glad it helped. See answer inline. > > Em 03-07-2012 17:09, Claudia Penaloza escreveu: > > Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly >> because there are no instances of 'Dd' or 'dD' in the data set (that I >> would/not want to include/exclude)... but I understand that 'i1new' >> targets precisely what I want. >> Why isn't a leader of zero's required for either 'i1' or 'i1new', as so? >> i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$**", dd$ch) >> >> > Because both 'i1' and 'i1new' test from beginning to end of string, > allowing only '0' and either 'd' or 'D', but not both (i1new). > > So, there's no need to explicitly test for a string that begins with '0'. > > Rui Barradas > > Thank you again, >> Claudia >> On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <[hidden email] >> <mailto:[hidden email]>> wrote: >> >> Hello, >> >> Inline. >> >> Em 03-07-2012 01:15, jim holtman escreveu: >> >> You will have to change the 'i1' expression as follows: >> >> i1 <- grepl("^([0D]|[0d])*$", dd$ch) >> i1 # matches strings with d & D in them >> >> [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> >> # second string had 'd' & 'D' in it so it was TRUE above and >> FALSE below >> i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch) >> i1new >> >> [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> >> >> >> Right, apparently, I forgot that grep is greedy, and the test cases >> were not complete. >> >> >> >> I put a 'd' and 'D' in the second string and the original regular >> expression is equivalent to >> >> grepl("^[0dD]*$", dd$ch) >> >> >> This is only for the first request, and does not solve cases where >> there are characters other than '0', 'd' or 'D', but 'd' or 'D' are >> the first non-zero. This is the case of my 4th row, changed from the >> OP's data example. >> >> My regexpr for 'i2' is equivalent to this one, that I believe is >> more readable: >> >> >> i2b <- grepl("^0{0,}[Dd]", dd$ch) >> >> >> First a zero, that might occur zero or more times, then a 'd' or >> 'D', then and til the end, irrelevant. >> >> >> which will match strings containing d, D and 0. If you only >> want 'd' >> or 'D' (and not both), then you will have to use the one in >> 'i1new'. >> >> >> To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'. >> >> Rui Barradas >> >> >> On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas >> <[hidden email] <mailto:[hidden email]>> wrote: >> >> Hello, >> >> Try regular expressions instead. >> In this data.frame, I've changed row nr.4 to have a row with >> 'D' as first >> non-zero character. >> >> dd <- read.table(text=" >> >> ch count >> 1 0000000000D0000000000000000000**__000000000000000000 >> 0.007368 >> 2 0000000000d0000000000000000000**__000000000000000000 >> 0.002456 >> 3 000000000T00000000000000000000**__000000000000000000 >> 0.007368 >> 4 000000000DT0000000000000000000**__000000000000000000 >> 0.007368 >> >> 5 000000000T00000000000000000000**__000000000000000000 >> 0.002456 >> 6 000000000Td0000000000000000000**__000000000000000000 >> 0.002456 >> 7 00000000T000000000000000000000**__000000000000000000 >> 0.007368 >> 8 00000000T0D0000000000000000000**__000000000000000000 >> 0.007368 >> 9 00000000T000000000000000000000**__000000000000000000 >> 0.002456 >> 10 00000000T0d0000000000000000000**__000000000000000000 >> 0.002456 >> >> ", header=TRUE) >> dd >> >> i1 <- grepl("^([0D]|[0d])*$", dd$ch) >> i2 <- grepl("^0*[Dd]", dd$ch) >> >> dd[!i1, ] >> dd[!i2, ] >> dd[!(i1 | i2), ] >> >> >> Hope this helps, >> >> Rui Barradas >> >> Em 02-07-2012 23:48, Claudia Penaloza escreveu: >> >> I would like to remove rows from the following data >> frame (df) if there >> are >> only two specific elements found in the df$ch character >> string (I want to >> remove rows with only "0" & "D" or "0" & "d"). >> Alternatively, I would like >> to remove rows if the first non-zero element is "D" or >> "d". >> >> >> ch >> count >> 1 0000000000D0000000000000000000**__000000000000000000 >> 0.007368; >> 2 0000000000d0000000000000000000**__000000000000000000 >> 0.002456; >> 3 000000000T00000000000000000000**__000000000000000000 >> 0.007368; >> 4 000000000TD0000000000000000000**__000000000000000000 >> 0.007368; >> 5 000000000T00000000000000000000**__000000000000000000 >> 0.002456; >> 6 000000000Td0000000000000000000**__000000000000000000 >> 0.002456; >> 7 00000000T000000000000000000000**__000000000000000000 >> 0.007368; >> 8 00000000T0D0000000000000000000**__000000000000000000 >> 0.007368; >> 9 00000000T000000000000000000000**__000000000000000000 >> 0.002456; >> 10 00000000T0d0000000000000000000**__000000000000000000 >> >> 0.002456; >> >> >> I tried the following but it doesn't work if there is >> more than one >> character per string: >> >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),] >> >> >> >> Any help greatly appreciated, >> Claudia >> >> [[alternative HTML version deleted]] >> >> ______________________________**__________________ >> [hidden email] <mailto:[hidden email]> >> mailing list >> https://stat.ethz.ch/mailman/_**_listinfo/r-help<https://stat.ethz.ch/mailman/__listinfo/r-help> >> >> <https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> > >> PLEASE do read the posting guide >> http://www.R-project.org/__**posting-guide.html<http://www.R-project.org/__posting-guide.html> >> >> <http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >> > >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> ______________________________**__________________ >> [hidden email] <mailto:[hidden email]> mailing >> list >> https://stat.ethz.ch/mailman/_**_listinfo/r-help<https://stat.ethz.ch/mailman/__listinfo/r-help> >> >> <https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> > >> PLEASE do read the posting guide >> http://www.R-project.org/__**posting-guide.html<http://www.R-project.org/__posting-guide.html> >> >> <http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >> > >> and provide commented, minimal, self-contained, reproducible >> code. >> >> >> >> >> >> >> > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by seapen
Perhaps I've missed something, but if it's really true that the goal is to
remove rows if the first non-zero element is "D" or "d", then how about this: tmp <- gsub('0','',df$ch) first <- substr(tmp,1,1) subset(df, tolower(first) != 'd') and of course it could be rolled up into a single expression, but I wrote it in several steps to make it easy to follow. No need to wrap one's brain around regular expressions (which is hard for me!) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/2/12 3:48 PM, "Claudia Penaloza" <[hidden email]> wrote: >I would like to remove rows from the following data frame (df) if there >are >only two specific elements found in the df$ch character string (I want to >remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like >to remove rows if the first non-zero element is "D" or "d". > > > ch count >1 0000000000D0000000000000000000000000000000000000 0.007368; >2 0000000000d0000000000000000000000000000000000000 0.002456; >3 000000000T00000000000000000000000000000000000000 0.007368; >4 000000000TD0000000000000000000000000000000000000 0.007368; >5 000000000T00000000000000000000000000000000000000 0.002456; >6 000000000Td0000000000000000000000000000000000000 0.002456; >7 00000000T000000000000000000000000000000000000000 0.007368; >8 00000000T0D0000000000000000000000000000000000000 0.007368; >9 00000000T000000000000000000000000000000000000000 0.002456; >10 00000000T0d0000000000000000000000000000000000000 0.002456; > > >I tried the following but it doesn't work if there is more than one >character per string: > >>df <- df[!df$ch %in% c("0","D"),] >>df <- df[!df$ch %in% c("0","d"),] > >Any help greatly appreciated, >Claudia > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
