# How to group by then count?

6 messages
Open this post in threaded view
|

## How to group by then count?

 Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings:  x = c("1", "1", "2", "1", "5", "2") We want to count number of appearance of each string. i.e. in vector x, string "1" appears 3 times; "2" appears twice and "5" appears once. Then I want to know which string is the majority. In this case, it is "1". For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! -Monnand         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to group by then count?

 Dear Monnad, one possible way would be to use as.factor() and in the summary you would get counts for every level. Like this:   x = c("1", "1", "2", "1", "5", "2") summary(as.factor(x)) Cheers, Christian > Hi all, > > I thought this was a very naive problem but I have not found any solution > which is idiomatic to R. > > The problem is like this: > > Assuming we have vector of strings: >   x = c("1", "1", "2", "1", "5", "2") > > We want to count number of appearance of each string. i.e. in vector x, > string "1" appears 3 times; "2" appears twice and "5" appears once. Then I > want to know which string is the majority. In this case, it is "1". > > For imperative languages like C, C++ Java and python, I would use a hash > table to count each strings where keys are the strings and values are the > number of appearance. For functional languages like clojure, there're > higher order functions like group-by. > > However, for R, I can hardly find a good solution to this simple problem. I > found a hash package, which implements hash table. However, installing a > package simple for a hash table is really annoying for me. I did find > aggregate and other functions which operates on data frames. But in my > case, it is a simple vector. Converting it to a data frame may be not > desirable. (Or is it?) > > Could anyone suggest me an idiomatic way of doing such job in R? I would be > appreciate for your help! > > -Monnand > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to group by then count?

 In reply to this post by Monnand > On 04-01-2015, at 10:02, Monnand <[hidden email]> wrote: > > Hi all, > > I thought this was a very naive problem but I have not found any solution > which is idiomatic to R. > > The problem is like this: > > Assuming we have vector of strings: > x = c("1", "1", "2", "1", "5", "2") > > We want to count number of appearance of each string. i.e. in vector x, > string "1" appears 3 times; "2" appears twice and "5" appears once. Then I > want to know which string is the majority. In this case, it is "1". > > For imperative languages like C, C++ Java and python, I would use a hash > table to count each strings where keys are the strings and values are the > number of appearance. For functional languages like clojure, there're > higher order functions like group-by. > > However, for R, I can hardly find a good solution to this simple problem. I > found a hash package, which implements hash table. However, installing a > package simple for a hash table is really annoying for me. I did find > aggregate and other functions which operates on data frames. But in my > case, it is a simple vector. Converting it to a data frame may be not > desirable. (Or is it?) > > Could anyone suggest me an idiomatic way of doing such job in R? I would be > appreciate for your help! > Have a look at table: ?table Berend ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: How to group by then count?

 In reply to this post by Monnand This seems to me to be a case where thinking in terms of computer programming concepts is getting in the way a bit. Approach it as a data analysis task; the S language (upon which R is based) is designed in part for data analysis so there is a function that does most of the job for you. (I changed your vector of strings to make the result more easily interpreted) > x = c("1", "1", "2", "1", "5", "2",'3','5','5','2','2') > tmp <- table(x)      ## counts the number of appearances of each element > tmp[tmp==max(tmp)]   ## finds which one occurs most often 2 4 Meaning that the element '2' appears 4 times.  The table() function should be fast even with long vectors. Here's an example with a vector of length 1 million: foo <- table( sample(letters, 1e6, replace=TRUE) ) One of the seminal books on the S language is John M Chambers' Programming with Data -- and I would emphasize the "with Data" part of that title. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/4/15, 1:02 AM, "Monnand" <[hidden email]> wrote: >Hi all, > >I thought this was a very naive problem but I have not found any solution >which is idiomatic to R. > >The problem is like this: > >Assuming we have vector of strings: > x = c("1", "1", "2", "1", "5", "2") > >We want to count number of appearance of each string. i.e. in vector x, >string "1" appears 3 times; "2" appears twice and "5" appears once. Then I >want to know which string is the majority. In this case, it is "1". > >For imperative languages like C, C++ Java and python, I would use a hash >table to count each strings where keys are the strings and values are the >number of appearance. For functional languages like clojure, there're >higher order functions like group-by. > >However, for R, I can hardly find a good solution to this simple problem. >I >found a hash package, which implements hash table. However, installing a >package simple for a hash table is really annoying for me. I did find >aggregate and other functions which operates on data frames. But in my >case, it is a simple vector. Converting it to a data frame may be not >desirable. (Or is it?) > >Could anyone suggest me an idiomatic way of doing such job in R? I would >be >appreciate for your help! > >-Monnand > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help>PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html>and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.