

I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame.
Ex.
> DF<data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> colnames(DF)<c("X", "Y")
> DF
X Y
1 CC L
2 CC U
3 <NA> L
4 CG U
5 GG L
6 GC <NA>
I have an ugly solution, which works if you know the factor levels of Y in advance.
> ans<rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
+ table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), ""))))
> rownames(ans)<c("L", "U")
> ans
C G
L 2 2
U 3 1
I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
Any ideas?
Brian
I fiddled around and found this solution, which is far from elegant,
but it doesn't require you to know the factor levels in advance.
t < with(DF, tapply(as.character(X), Y, table))
lapply(t, function(x)
table(strsplit(paste(names(x),collapse=""),split="")))
Darin
On Fri, Sep 10, 2010 at 02:40:50PM 0500, Davis, Brian wrote:
I'm my quest for brevity I think I scarified too much clarity.
I'll try to be a little less brief in the hopes of being more clear.
Say I have data frame like this as before:
> DF<data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> colnames(DF)<c("X", "Y")
> DF
X Y
1 CC L
2 CC U
3 <NA> L
4 CG U
5 GG L
6 GC <NA>
I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y
So for DF$Y == "L" there are 2 "C"'s and 2 "G"'s
and for DF$Y == "U" there are 3 "C"'s and 1 "G"
The NA's should not contribute to the counts.
If I had a individual character in DF$X instead of a string like:
> DF2<data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
> colnames(DF2)<c("X", "Y")
> DF2
X Y
1 C L
2 C U
3 <NA> L
4 C U
5 G L
6 G <NA>
Then table gives me exactly what I need.
> table(DF2)
Y
X L U
C 1 2
G 1 0
Hopefully this is a little bit clearer what I'm trying to accomplish.
Brian
Brian 
Here's the only thing I can come up with to give the
same result as your "ans", but it doesn't seem to correspond
with your description of the problem.
> DF1 = DF
> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
> DF2 = DF
> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
> newDF = rbind(DF1,DF2)
> table(newDF$Y,newDF$X)
C G
L 2 2
U 3 1
Thomas,
I don't *believe* I have the heterozygote coded both ways. However, I haven't check thoroughly. I do notice that some SNPs (X's) only have 2 levels say AA and AT. One could build the multiplication table from the levels I suppose. (Or force the factor to include all three levels)
Brian
>
>>
>
>
>
