Counting occurances of a letter by a factor

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Counting occurances of a letter by a factor

Davis, Brian
I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.

Ex.
> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> colnames(DF)<-c("X", "Y")
> DF
     X    Y
1   CC    L
2   CC    U
3 <NA>    L
4   CG    U
5   GG    L
6   GC <NA>

I have an ugly solution, which works if you know the factor levels of Y in advance.

> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
+ table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
> rownames(ans)<-c("L", "U")
> ans
  C G
L 2 2
U 3 1


I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.

Any ideas?

Brian

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Peng, C
try:

?ftable
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Darin A. England
In reply to this post by Davis, Brian
I fiddled around and found this solution, which is far from elegant,
but it doesn't require you to know the factor levels in advance.

t <- with(DF, tapply(as.character(X), Y, table))
lapply(t, function(x)
    table(strsplit(paste(names(x),collapse=""),split="")))

Darin


On Fri, Sep 10, 2010 at 02:40:50PM -0500, Davis, Brian wrote:

> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
>
> Ex.
> > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> > colnames(DF)<-c("X", "Y")
> > DF
>      X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in advance.
>
> > ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
> > rownames(ans)<-c("L", "U")
> > ans
>   C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>
> Any ideas?
>
> Brian
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Davis, Brian
In reply to this post by Davis, Brian
I'm my quest for brevity I think I scarified too much clarity.

I'll try to be a little less brief in the hopes of being more clear.

Say I have data frame like this as before:
> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> colnames(DF)<-c("X", "Y")
> DF
     X    Y
1   CC    L
2   CC    U
3 <NA>    L
4   CG    U
5   GG    L
6   GC <NA>

I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y

So for DF$Y == "L"  there are 2 "C"'s and 2 "G"'s
and for DF$Y == "U" there are 3 "C"'s and 1 "G"

The NA's should not contribute to the counts.

If I had a individual character in DF$X instead of a string like:

> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
> colnames(DF2)<-c("X", "Y")
> DF2
     X    Y
1    C    L
2    C    U
3 <NA>    L
4    C    U
5    G    L
6    G <NA>

Then table gives me exactly what I need.

> table(DF2)
   Y
X   L U
  C 1 2
  G 1 0



Hopefully this is a little bit clearer what I'm trying to accomplish.

Brian

-----Original Message-----
From: Phil Spector [mailto:[hidden email]]
Sent: Friday, September 10, 2010 2:52 PM
To: Davis, Brian
Subject: Re: [R] Counting occurances of a letter by a factor

Brian -
    Here's the only thing I can come up with to give the
same result as your "ans", but it doesn't seem to correspond
with your description of the problem.

> DF1 = DF
> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
> DF2 = DF
> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
> newDF = rbind(DF1,DF2)
> table(newDF$Y,newDF$X)

     C G
   L 2 2
   U 3 1

  - Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [hidden email]



On Fri, 10 Sep 2010, Davis, Brian wrote:

> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>     X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
>  C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>
> Any ideas?
>
> Brian
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Brian Diggs
In reply to this post by Davis, Brian
On 9/10/2010 12:40 PM, Davis, Brian wrote:

> I'm trying to find a more elegant way of doing this.  What I'm trying
> to accomplish is to count the frequency of letters (major / minor
> alleles) in a string grouped by the factor levels in another column
> of my data frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>       X    Y
> 1   CC    L
> 2   CC    U
> 3<NA>     L
> 4   CG    U
> 5   GG    L
> 6   GC<NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
>    C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but
> haven't found a combination that gives a more general solution to
> this problem.
>
> Any ideas?
>
> Brian

You are almost there.  The "plyr" package gets you the rest of the way.
  You already have something that will, for a group of cases with the
same "Y" value, tabulate the "X" values the way you want.  ddply will
split the dataframe up by "Y" values and run that on each part.

library("plyr")

tab <- ddply(DF, .(Y),
function(x) {table(unlist(strsplit(as.character(x$X),"")))})
tab

#     Y C G
#1    L 2 2
#2    U 3 1
#3 <NA> 1 1

It is almost what you asked for.  If you really want it as a matrix with
named rows:

tab2 <- as.matrix(tab[,-1])
rownames(tab2) <- tab[,1]

It still has an entry for the NA value of "Y", but that can be filtered
as whatever step you like.

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Thomas Lumley
In reply to this post by Davis, Brian
On Fri, 10 Sep 2010, Davis, Brian wrote:

> I'm my quest for brevity I think I scarified too much clarity.
>
> I'll try to be a little less brief in the hopes of being more clear.
>
> Say I have data frame like this as before:
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>     X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
>
> I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y
>
> So for DF$Y == "L"  there are 2 "C"'s and 2 "G"'s
> and for DF$Y == "U" there are 3 "C"'s and 1 "G"
>
> The NA's should not contribute to the counts.
>
> If I had a individual character in DF$X instead of a string like:
>
>> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF2)<-c("X", "Y")
>> DF2
>     X    Y
> 1    C    L
> 2    C    U
> 3 <NA>    L
> 4    C    U
> 5    G    L
> 6    G <NA>
>
> Then table gives me exactly what I need.
>
>> table(DF2)
>   Y
> X   L U
>  C 1 2
>  G 1 0
>

I would use table() as the first step
> table(DF[,1],DF[,2])

      L U
   CC 1 1
   CG 0 1
   GC 0 0
   GG 1 0


and then multiply by a matrix that counts C and G:  
> cg<-rbind(C=c(2,1,1,0),G=c(0,1,1,2))
> cg%*%table(DF[,1],DF[,2])

     L U
   C 2 3
   G 2 1

If the genotype is a factor then you don't have to worry about empty genotypes.

Also, do you actually get the heterozygotes coded both ways?  When I have had to do this it has been simplified by having the heterozygotes all coded the same way (ie, only one of CG and GC appears), so that as.numeric() on the factor variable gives the number of copies of the alphabetically later allele.

          -thomas

>
> Hopefully this is a little bit clearer what I'm trying to accomplish.
>
> Brian
>
> -----Original Message-----
> From: Phil Spector [mailto:[hidden email]]
> Sent: Friday, September 10, 2010 2:52 PM
> To: Davis, Brian
> Subject: Re: [R] Counting occurances of a letter by a factor
>
> Brian -
>    Here's the only thing I can come up with to give the
> same result as your "ans", but it doesn't seem to correspond
> with your description of the problem.
>
>> DF1 = DF
>> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
>> DF2 = DF
>> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
>> newDF = rbind(DF1,DF2)
>> table(newDF$Y,newDF$X)
>
>     C G
>   L 2 2
>   U 3 1
>
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> [hidden email]
>
>
>
> On Fri, 10 Sep 2010, Davis, Brian wrote:
>
>> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
>>
>> Ex.
>>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>>> colnames(DF)<-c("X", "Y")
>>> DF
>>     X    Y
>> 1   CC    L
>> 2   CC    U
>> 3 <NA>    L
>> 4   CG    U
>> 5   GG    L
>> 6   GC <NA>
>>
>> I have an ugly solution, which works if you know the factor levels of Y in advance.
>>
>>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
>> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
>>> rownames(ans)<-c("L", "U")
>>> ans
>>  C G
>> L 2 2
>> U 3 1
>>
>>
>> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>>
>> Any ideas?
>>
>> Brian
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Davis, Brian
Thomas,

I don't *believe* I have the heterozygote coded both ways.  However, I haven't check thoroughly.  I do notice that some SNPs (X's) only have 2 levels say AA and AT.  One could build the multiplication table from the levels I suppose.  (Or force the factor to include all three levels)

Brian

-----Original Message-----
From: Thomas Lumley [mailto:[hidden email]]
Sent: Friday, September 10, 2010 3:22 PM
To: Davis, Brian
Cc: Phil Spector; [hidden email]
Subject: Re: [R] Counting occurances of a letter by a factor

On Fri, 10 Sep 2010, Davis, Brian wrote:

> I'm my quest for brevity I think I scarified too much clarity.
>
> I'll try to be a little less brief in the hopes of being more clear.
>
> Say I have data frame like this as before:
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>     X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
>
> I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y
>
> So for DF$Y == "L"  there are 2 "C"'s and 2 "G"'s
> and for DF$Y == "U" there are 3 "C"'s and 1 "G"
>
> The NA's should not contribute to the counts.
>
> If I had a individual character in DF$X instead of a string like:
>
>> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF2)<-c("X", "Y")
>> DF2
>     X    Y
> 1    C    L
> 2    C    U
> 3 <NA>    L
> 4    C    U
> 5    G    L
> 6    G <NA>
>
> Then table gives me exactly what I need.
>
>> table(DF2)
>   Y
> X   L U
>  C 1 2
>  G 1 0
>

I would use table() as the first step
> table(DF[,1],DF[,2])

      L U
   CC 1 1
   CG 0 1
   GC 0 0
   GG 1 0


and then multiply by a matrix that counts C and G:  
> cg<-rbind(C=c(2,1,1,0),G=c(0,1,1,2))
> cg%*%table(DF[,1],DF[,2])

     L U
   C 2 3
   G 2 1

If the genotype is a factor then you don't have to worry about empty genotypes.

Also, do you actually get the heterozygotes coded both ways?  When I have had to do this it has been simplified by having the heterozygotes all coded the same way (ie, only one of CG and GC appears), so that as.numeric() on the factor variable gives the number of copies of the alphabetically later allele.

          -thomas

>
> Hopefully this is a little bit clearer what I'm trying to accomplish.
>
> Brian
>
> -----Original Message-----
> From: Phil Spector [mailto:[hidden email]]
> Sent: Friday, September 10, 2010 2:52 PM
> To: Davis, Brian
> Subject: Re: [R] Counting occurances of a letter by a factor
>
> Brian -
>    Here's the only thing I can come up with to give the
> same result as your "ans", but it doesn't seem to correspond
> with your description of the problem.
>
>> DF1 = DF
>> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
>> DF2 = DF
>> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
>> newDF = rbind(DF1,DF2)
>> table(newDF$Y,newDF$X)
>
>     C G
>   L 2 2
>   U 3 1
>
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
> [hidden email]
>
>
>
> On Fri, 10 Sep 2010, Davis, Brian wrote:
>
>> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
>>
>> Ex.
>>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>>> colnames(DF)<-c("X", "Y")
>>> DF
>>     X    Y
>> 1   CC    L
>> 2   CC    U
>> 3 <NA>    L
>> 4   CG    U
>> 5   GG    L
>> 6   GC <NA>
>>
>> I have an ugly solution, which works if you know the factor levels of Y in advance.
>>
>>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
>> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
>>> rownames(ans)<-c("L", "U")
>>> ans
>>  C G
>> L 2 2
>> U 3 1
>>
>>
>> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>>
>> Any ideas?
>>
>> Brian
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Counting occurances of a letter by a factor

Thomas Lumley
On Fri, 10 Sep 2010, Davis, Brian wrote:

> Thomas,
>
> I don't *believe* I have the heterozygote coded both ways.  However, I haven't check thoroughly.  I do notice that some SNPs (X's) only have 2 levels say AA and AT.  One could build the multiplication table from the levels I suppose.  (Or force the factor to include all three levels)


I would force the factor to include all three levels.

    -thomas

> Brian
>
> -----Original Message-----
> From: Thomas Lumley [mailto:[hidden email]]
> Sent: Friday, September 10, 2010 3:22 PM
> To: Davis, Brian
> Cc: Phil Spector; [hidden email]
> Subject: Re: [R] Counting occurances of a letter by a factor
>
> On Fri, 10 Sep 2010, Davis, Brian wrote:
>
>> I'm my quest for brevity I think I scarified too much clarity.
>>
>> I'll try to be a little less brief in the hopes of being more clear.
>>
>> Say I have data frame like this as before:
>>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>>> colnames(DF)<-c("X", "Y")
>>> DF
>>     X    Y
>> 1   CC    L
>> 2   CC    U
>> 3 <NA>    L
>> 4   CG    U
>> 5   GG    L
>> 6   GC <NA>
>>
>> I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y
>>
>> So for DF$Y == "L"  there are 2 "C"'s and 2 "G"'s
>> and for DF$Y == "U" there are 3 "C"'s and 1 "G"
>>
>> The NA's should not contribute to the counts.
>>
>> If I had a individual character in DF$X instead of a string like:
>>
>>> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
>>> colnames(DF2)<-c("X", "Y")
>>> DF2
>>     X    Y
>> 1    C    L
>> 2    C    U
>> 3 <NA>    L
>> 4    C    U
>> 5    G    L
>> 6    G <NA>
>>
>> Then table gives me exactly what I need.
>>
>>> table(DF2)
>>   Y
>> X   L U
>>  C 1 2
>>  G 1 0
>>
>
> I would use table() as the first step
>> table(DF[,1],DF[,2])
>
>      L U
>   CC 1 1
>   CG 0 1
>   GC 0 0
>   GG 1 0
>
>
> and then multiply by a matrix that counts C and G:
>> cg<-rbind(C=c(2,1,1,0),G=c(0,1,1,2))
>> cg%*%table(DF[,1],DF[,2])
>
>     L U
>   C 2 3
>   G 2 1
>
> If the genotype is a factor then you don't have to worry about empty genotypes.
>
> Also, do you actually get the heterozygotes coded both ways?  When I have had to do this it has been simplified by having the heterozygotes all coded the same way (ie, only one of CG and GC appears), so that as.numeric() on the factor variable gives the number of copies of the alphabetically later allele.
>
>          -thomas
>
>>
>> Hopefully this is a little bit clearer what I'm trying to accomplish.
>>
>> Brian
>>
>> -----Original Message-----
>> From: Phil Spector [mailto:[hidden email]]
>> Sent: Friday, September 10, 2010 2:52 PM
>> To: Davis, Brian
>> Subject: Re: [R] Counting occurances of a letter by a factor
>>
>> Brian -
>>    Here's the only thing I can come up with to give the
>> same result as your "ans", but it doesn't seem to correspond
>> with your description of the problem.
>>
>>> DF1 = DF
>>> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
>>> DF2 = DF
>>> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
>>> newDF = rbind(DF1,DF2)
>>> table(newDF$Y,newDF$X)
>>
>>     C G
>>   L 2 2
>>   U 3 1
>>
>> - Phil Spector
>> Statistical Computing Facility
>> Department of Statistics
>> UC Berkeley
>> [hidden email]
>>
>>
>>
>> On Fri, 10 Sep 2010, Davis, Brian wrote:
>>
>>> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
>>>
>>> Ex.
>>>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>>>> colnames(DF)<-c("X", "Y")
>>>> DF
>>>     X    Y
>>> 1   CC    L
>>> 2   CC    U
>>> 3 <NA>    L
>>> 4   CG    U
>>> 5   GG    L
>>> 6   GC <NA>
>>>
>>> I have an ugly solution, which works if you know the factor levels of Y in advance.
>>>
>>>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
>>> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
>>>> rownames(ans)<-c("L", "U")
>>>> ans
>>>  C G
>>> L 2 2
>>> U 3 1
>>>
>>>
>>> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>>>
>>> Any ideas?
>>>
>>> Brian
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Thomas Lumley
> Professor of Biostatistics
> University of Washington, Seattle
>
>

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.