Counting duplicates in a dataframe

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Counting duplicates in a dataframe

asafwe
Hello,

I am looking at a two-way ANOVA dataset, and would like to count the rows in the dataframe with the same level of the first factor ("Gender") and the second factor ("Dosage"). In other words, I am interested in the number of observations per each "cell" in a (not necessarily balanced) two-way layout.

What is the simplest way to do this?

Thanks a lot,

Asaf

   Observation Gender Dosage Alertness
1            1      m      a         8
2            2      m      a        12
3            3      m      a        13
4            4      m      a        12
5            5      m      b         6
6            6      m      b         7
7            7      m      b        23
8            8      m      b        14
9            9      f      a        15
10          10      f      a        12
11          11      f      a        22
12          12      f      a        14
13          13      f      b        15
14          14      f      b        12
15          15      f      b        18
16          16      f      b        22
Reply | Threaded
Open this post in threaded view
|

Re: Counting duplicates in a dataframe

PIKAL Petr
Hi

> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of asafwe
> Sent: Monday, October 22, 2012 4:02 AM
> To: [hidden email]
> Subject: [R] Counting duplicates in a dataframe
>
> Hello,
>
> I am looking at a two-way ANOVA dataset, and would like to count the
> rows in the dataframe with the same level of the first factor
> ("Gender") and the second factor ("Dosage"). In other words, I am
> interested in the number of observations per each "cell" in a (not
> necessarily balanced) two-way layout.

How is it realated with duplicates?

Do you want something like that?
xtabs(~Gender+Dosage, data=some.data.frame)

Regards
Petr

>
> What is the simplest way to do this?
>
> Thanks a lot,
>
> Asaf
>
>    Observation Gender Dosage Alertness
> 1            1      m      a         8
> 2            2      m      a        12
> 3            3      m      a        13
> 4            4      m      a        12
> 5            5      m      b         6
> 6            6      m      b         7
> 7            7      m      b        23
> 8            8      m      b        14
> 9            9      f      a        15
> 10          10      f      a        12
> 11          11      f      a        22
> 12          12      f      a        14
> 13          13      f      b        15
> 14          14      f      b        12
> 15          15      f      b        18
> 16          16      f      b        22
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Counting-
> duplicates-in-a-dataframe-tp4646954.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting duplicates in a dataframe

arun kirshna
In reply to this post by asafwe
Hi,
try this:
dat1<-read.table(text="
Observation Gender Dosage Alertness
1            1      m      a        8
2            2      m      a        12
3            3      m      a        13
4            4      m      a        12
5            5      m      b        6
6            6      m      b        7
7            7      m      b        23
8            8      m      b        14
9            9      f      a        15
10          10      f      a        12
11          11      f      a        22
12          12      f      a        14
13          13      f      b        15
14          14      f      b        12
15          15      f      b        18
16          16      f      b        22
",sep="",header=TRUE,stringsAsFactors=FALSE)

library(reshape)
 cast(dat1,Gender~Dosage,length)
#  Gender a b
#1      f 4 4
#2      m 4 4
A.K.





----- Original Message -----
From: asafwe <[hidden email]>
To: [hidden email]
Cc:
Sent: Sunday, October 21, 2012 10:02 PM
Subject: [R] Counting duplicates in a dataframe

Hello,

I am looking at a two-way ANOVA dataset, and would like to count the rows in
the dataframe with the same level of the first factor ("Gender") and the
second factor ("Dosage"). In other words, I am interested in the number of
observations per each "cell" in a (not necessarily balanced) two-way layout.

What is the simplest way to do this?

Thanks a lot,

Asaf

   Observation Gender Dosage Alertness
1            1      m      a         8
2            2      m      a        12
3            3      m      a        13
4            4      m      a        12
5            5      m      b         6
6            6      m      b         7
7            7      m      b        23
8            8      m      b        14
9            9      f      a        15
10          10      f      a        12
11          11      f      a        22
12          12      f      a        14
13          13      f      b        15
14          14      f      b        12
15          15      f      b        18
16          16      f      b        22



--
View this message in context: http://r.789695.n4.nabble.com/Counting-duplicates-in-a-dataframe-tp4646954.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting duplicates in a dataframe

arun kirshna
In reply to this post by asafwe
HI,
Another way:
dat1<-read.table(text="
Observation Gender Dosage Alertness
1            1      m      a        8
2            2      m      a        12
3            3      m      a        13
4            4      m      a        12
5            5      m      b        6
6            6      m      b        7
7            7      m      b        23
8            8      m      b        14
9            9      f      a        15
10          10      f      a        12
11          11      f      a        22
12          12      f      a        14
13          13      f      b        15
14          14      f      b        12
15          15      f      b        18
16          16      f      b        22
",sep="",header=TRUE,stringsAsFactors=FALSE)


tapply(dat1$Observation,list(dat1$Gender,dat1$Dosage),length)
#  a b
#f 4 4
#m 4 4
A.K.

----- Original Message -----
From: asafwe <[hidden email]>
To: [hidden email]
Cc:
Sent: Sunday, October 21, 2012 10:02 PM
Subject: [R] Counting duplicates in a dataframe

Hello,

I am looking at a two-way ANOVA dataset, and would like to count the rows in
the dataframe with the same level of the first factor ("Gender") and the
second factor ("Dosage"). In other words, I am interested in the number of
observations per each "cell" in a (not necessarily balanced) two-way layout.

What is the simplest way to do this?

Thanks a lot,

Asaf

   Observation Gender Dosage Alertness
1            1      m      a         8
2            2      m      a        12
3            3      m      a        13
4            4      m      a        12
5            5      m      b         6
6            6      m      b         7
7            7      m      b        23
8            8      m      b        14
9            9      f      a        15
10          10      f      a        12
11          11      f      a        22
12          12      f      a        14
13          13      f      b        15
14          14      f      b        12
15          15      f      b        18
16          16      f      b        22



--
View this message in context: http://r.789695.n4.nabble.com/Counting-duplicates-in-a-dataframe-tp4646954.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting duplicates in a dataframe

David Winsemius

On Oct 22, 2012, at 7:48 AM, arun wrote:

> HI,
> Another way:
> dat1<-read.table(text="
> Observation Gender Dosage Alertness
> 1            1      m      a        8
> 2            2      m      a        12
> 3            3      m      a        13
> 4            4      m      a        12
> 5            5      m      b        6
> 6            6      m      b        7
> 7            7      m      b        23
> 8            8      m      b        14
> 9            9      f      a        15
> 10          10      f      a        12
> 11          11      f      a        22
> 12          12      f      a        14
> 13          13      f      b        15
> 14          14      f      b        12
> 15          15      f      b        18
> 16          16      f      b        22
> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>
>
> tapply(dat1$Observation,list(dat1$Gender,dat1$Dosage),length)
> #  a b
> #f 4 4
> #m 4 4

How is that different than:

table(dat1$Gender, dat1$Dosage)

--
David.

> A.K.
>
> ----- Original Message -----
> From: asafwe <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Sunday, October 21, 2012 10:02 PM
> Subject: [R] Counting duplicates in a dataframe
>
> Hello,
>
> I am looking at a two-way ANOVA dataset, and would like to count the rows in
> the dataframe with the same level of the first factor ("Gender") and the
> second factor ("Dosage"). In other words, I am interested in the number of
> observations per each "cell" in a (not necessarily balanced) two-way layout.
>
> What is the simplest way to do this?
>
> Thanks a lot,
>
> Asaf
>
>    Observation Gender Dosage Alertness
> 1            1      m      a         8
> 2            2      m      a        12
> 3            3      m      a        13
> 4            4      m      a        12
> 5            5      m      b         6
> 6            6      m      b         7
> 7            7      m      b        23
> 8            8      m      b        14
> 9            9      f      a        15
> 10          10      f      a        12
> 11          11      f      a        22
> 12          12      f      a        14
> 13          13      f      b        15
> 14          14      f      b        12
> 15          15      f      b        18
> 16          16      f      b        22
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Counting-duplicates-in-a-dataframe-tp4646954.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting duplicates in a dataframe

asafwe
Thank you all; David -- this is, in fact, exactly what I need!

Asaf
Reply | Threaded
Open this post in threaded view
|

Re: Counting duplicates in a dataframe

arun kirshna
In reply to this post by David Winsemius
Hi,
?ctab() also gets the same result.
library(catspec)
dat2<-within(dat1,{Gender<-factor(Gender);Dosage<-factor(Dosage)})
 ctab(dat2$Gender,dat2$Dosage)
#   a b
#     
#f  4 4
#m  4 4
A.K.



----- Original Message -----
From: David Winsemius <[hidden email]>
To: arun <[hidden email]>
Cc: asafwe <[hidden email]>; R help <[hidden email]>
Sent: Monday, October 22, 2012 5:26 PM
Subject: Re: [R] Counting duplicates in a dataframe


On Oct 22, 2012, at 7:48 AM, arun wrote:

> HI,
> Another way:
> dat1<-read.table(text="
> Observation Gender Dosage Alertness
> 1            1      m      a        8
> 2            2      m      a        12
> 3            3      m      a        13
> 4            4      m      a        12
> 5            5      m      b        6
> 6            6      m      b        7
> 7            7      m      b        23
> 8            8      m      b        14
> 9            9      f      a        15
> 10          10      f      a        12
> 11          11      f      a        22
> 12          12      f      a        14
> 13          13      f      b        15
> 14          14      f      b        12
> 15          15      f      b        18
> 16          16      f      b        22
> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>
>
> tapply(dat1$Observation,list(dat1$Gender,dat1$Dosage),length)
> #  a b
> #f 4 4
> #m 4 4

How is that different than:

table(dat1$Gender, dat1$Dosage)

--
David.

> A.K.
>
> ----- Original Message -----
> From: asafwe <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Sunday, October 21, 2012 10:02 PM
> Subject: [R] Counting duplicates in a dataframe
>
> Hello,
>
> I am looking at a two-way ANOVA dataset, and would like to count the rows in
> the dataframe with the same level of the first factor ("Gender") and the
> second factor ("Dosage"). In other words, I am interested in the number of
> observations per each "cell" in a (not necessarily balanced) two-way layout.
>
> What is the simplest way to do this?
>
> Thanks a lot,
>
> Asaf
>
>    Observation Gender Dosage Alertness
> 1            1      m      a         8
> 2            2      m      a        12
> 3            3      m      a        13
> 4            4      m      a        12
> 5            5      m      b         6
> 6            6      m      b         7
> 7            7      m      b        23
> 8            8      m      b        14
> 9            9      f      a        15
> 10          10      f      a        12
> 11          11      f      a        22
> 12          12      f      a        14
> 13          13      f      b        15
> 14          14      f      b        12
> 15          15      f      b        18
> 16          16      f      b        22
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Counting-duplicates-in-a-dataframe-tp4646954.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.