Hi,
I have two columns with data (both identifiers - it's an affiliation list) and I would like to delete the rows in which the observations in the second column have a frequency < 5 in the entire second column. Example: 1 a 1 b 1 c 2 a 2 b 2 d Let's say, I would like to delete the rows in which the observation in the second column has a frequency < 2 in the entire second column. This would result in: 1 a 1 b 2 a 2 b How can I do this? Thanks in advance! Mathijs |
Suppose this is your data frame:
> df = data.frame(x=c(1,1,1,2,2,2),y=c('a','b','c','a','b','d')) > df x y 1 1 a 2 1 b 3 1 c 4 2 a 5 2 b 6 2 d > df[!table(df$y)[df$y] < 2,] x y 1 1 a 2 1 b 4 2 a 5 2 b Note that this will only work properly if y is a factor or character variable. If y was numeric, you would need df[!table(df$y)[as.character(df$y)] - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [hidden email] On Thu, 9 Dec 2010, mathijsdevaan wrote: > > Hi, > > I have two columns with data (both identifiers - it's an affiliation list) > and I would like to delete the rows in which the observations in the second > column have a frequency < 5 in the entire second column. Example: > > 1 a > 1 b > 1 c > 2 a > 2 b > 2 d > > Let's say, I would like to delete the rows in which the observation in the > second column has a frequency < 2 in the entire second column. This would > result in: > > 1 a > 1 b > 2 a > 2 b > > How can I do this? Thanks in advance! > > Mathijs > -- > View this message in context: http://r.789695.n4.nabble.com/Delete-observations-with-a-frequency-x-tp3081226p3081226.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by mdvaan
mathijsdevaan wrote on 12/09/2010 04:21:54 PM:
> I have two columns with data (both identifiers - it's an affiliation list) > and I would like to delete the rows in which the observations in the second > column have a frequency < 5 in the entire second column. Example: > > 1 a > 1 b > 1 c > 2 a > 2 b > 2 d > > Let's say, I would like to delete the rows in which the observation in > second column has a frequency < 2 in the entire second column. This would > result in: > > 1 a > 1 b > 2 a > 2 b > > How can I do this? Thanks in advance! > It's not clear whether you want to delete rows where the value second column occurs less than 5 times or appears less than 2 times. I'll assume the latter. foo <- data.frame(k=rep(1:2, each=3), x=letters[c(1,2,3,1,2,4)]) bar <- subset(foo, x %in% names(table(foo$x))[table(foo$x)>=2]) No doubt others can write this more succinctly. -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD [hidden email] 541/754-4638 [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Powered by Nabble | Edit this page |