Joining two (or more) frequency tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Joining two (or more) frequency tables

Jan.Sunde
Hi,
I am still new to R and have, in spite of searching all over, found no
"understandable" documentation for how to join two (or more) frequency
tables.
Why would I want to do that and what do I mean by "joining" ? Let me try
to explain:

I have a diagnosis form that allows registration of up to three diagnosis
codes on animals with birth defects, like this:

father.id diagnosis diagnosis2 diagnosis3
a             100         200            300
b              200        100            340
....

and so on.

As you can see the same diagnosis can show up in either field (1, 2 or 3).

I am interested in how many times each diagnosis shows up for offspring
from the same father (looking at inheritance)

The best I have come up with so far is

table(father.id, diagnosis)
table(father.id, diagnosis2)
table(father.id, diagnosis3)

which gives me the frequency counts for each combination of father.id and
diagnosis code

and then I manually join these tables together by cutting and pasting (in
Excel!!). Oh the horror!

The resulting table looks like this though and gives me what I want:

father.id      100      200      300   340  
a                   1         1         1     0
b                   1          1         0    1
.....

This is incredibly cumbersome and therefore I want to find a way of doing
this completely in R script so that I have the code ready when I need it
again.
So far I have tried with several varieties of merge() and xtabs() but have
given up. I can not afford to spend more time on this and as a last resort
I'm hoping anyone here could pass me a tip.
I use R for modelling but I am so far finding it hard to wrap my head
around it when it comes to the pre-analysis data manipulation like this as
I am not proficient enough (i.e. do not quite understand) in the language
and data structure intricacies.

Regards,
Jan
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Joining two (or more) frequency tables

jholtman
use the 'reshape package:

> require(reshape)
Loading required package: reshape
Loading required package: plyr
> x
  father.id diagnosis diagnosis2 diagnosis3
1         a       100        200        300
2         b       200        100        340

> melt(x)
Using father.id as id variables
  father.id   variable value
1         a  diagnosis   100
2         b  diagnosis   200
3         a diagnosis2   200
4         b diagnosis2   100
5         a diagnosis3   300
6         b diagnosis3   340
> z <- melt(x)
Using father.id as id variables
> table(z$father.id, z$value)

    100 200 300 340
  a   1   1   1   0
  b   1   1   0   1
>


On Wed, May 19, 2010 at 5:56 AM,  <[hidden email]> wrote:

> Hi,
> I am still new to R and have, in spite of searching all over, found no
> "understandable" documentation for how to join two (or more) frequency
> tables.
> Why would I want to do that and what do I mean by "joining" ? Let me try
> to explain:
>
> I have a diagnosis form that allows registration of up to three diagnosis
> codes on animals with birth defects, like this:
>
> father.id diagnosis diagnosis2 diagnosis3
> a             100         200            300
> b              200        100            340
> ....
>
> and so on.
>
> As you can see the same diagnosis can show up in either field (1, 2 or 3).
>
> I am interested in how many times each diagnosis shows up for offspring
> from the same father (looking at inheritance)
>
> The best I have come up with so far is
>
> table(father.id, diagnosis)
> table(father.id, diagnosis2)
> table(father.id, diagnosis3)
>
> which gives me the frequency counts for each combination of father.id and
> diagnosis code
>
> and then I manually join these tables together by cutting and pasting (in
> Excel!!). Oh the horror!
>
> The resulting table looks like this though and gives me what I want:
>
> father.id      100      200      300   340
> a                   1         1         1     0
> b                   1          1         0    1
> .....
>
> This is incredibly cumbersome and therefore I want to find a way of doing
> this completely in R script so that I have the code ready when I need it
> again.
> So far I have tried with several varieties of merge() and xtabs() but have
> given up. I can not afford to spend more time on this and as a last resort
> I'm hoping anyone here could pass me a tip.
> I use R for modelling but I am so far finding it hard to wrap my head
> around it when it comes to the pre-analysis data manipulation like this as
> I am not proficient enough (i.e. do not quite understand) in the language
> and data structure intricacies.
>
> Regards,
> Jan
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Joining two (or more) frequency tables

Wu Gong
In reply to this post by Jan.Sunde
> ## Create a sample data.
> data <- data.frame(father.id = letters[1:5],
+ diagnosis = sample(c(100,200,300,340),5,replace=TRUE),
+ diagnosis1 = sample(c(100,200,300,340),5,replace=TRUE),
+ diagnosis2 = sample(c(100,200,300,340),5,replace=TRUE))
> data
  father.id diagnosis diagnosis1 diagnosis2
1         a       340        100        300
2         b       200        300        340
3         c       200        300        100
4         d       200        100        300
5         e       200        100        100
>
> ## Create a matrix by replicating the father.id column 3 times
> ## Because we have 3 diagnosis columns to match when use table() function
> match <- as.matrix(data[,rep(1:2,c(3,0))])
> match
     father.id father.id.1 father.id.2
[1,] "a"       "a"         "a"        
[2,] "b"       "b"         "b"        
[3,] "c"       "c"         "c"        
[4,] "d"       "d"         "d"        
[5,] "e"       "e"         "e"        
>
> ## Count
> table(match,as.matrix(data[,2:4]))
     
match 100 200 300 340
    a   1   0   1   1
    b   0   1   1   1
    c   1   1   1   0
    d   1   1   1   0
    e   2   1   0   0
>


Please refer to:
http://r.789695.n4.nabble.com/Counting-Frequencies-in-Data-Frame-tt2221342.html#a2221487