|
Dear all,
I am a R beginner, and I am looking for a way to do the same thing for all levels of a column in a table. Basically, I have a bunch of protein sequences composed of different amino acid residues, and each residue is represented by an uppercase letter. I want to calculate the ratio of different amino acid residues at each position of the proteins. Here is an example table: Proteins Time_zero 1 2 3 4 5 6 7 8 p1 0.0050723 L E Y I I P D A p2 0.0002731 T E N L V P G A p3 9.757E-05 L M Y Q I P E C p4 0.0002077 R E Y L I S E A If I name this table as myfile.txt, I have the following scripts to calculate the ratio of each amino acid residue at position 1: # showing levels of the 3rd column, which means the types of residues >myfile[,3] # calculating the ratio of L >list=c(which(myfile[,3]=="L")) >time0total=sum(myfile[,2]) >AA_L=0 >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >ratio_L=AA_L/time0total So how can I write a script to do the same thing for the other two levels (T and R) in column 3, and also do this for every column that contains amino acid residues? Many thanks for any help you could give me on this topic! :) Regards, Zhao -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
First thing is to supply the data in a useable format. As is it is essenatially unreadable. All R-beginners do this. :)
Have a look at the dput function (?dput) for a good way to supply sample data in an email. If you have a large dataset probably a few dozen lines of data would be fine. Something like dput(head(mydata)) should be fine. Just copy and paste the output into your email. Welcome to R. I think you will like it. John Kane Kingston ON Canada > -----Original Message----- > From: [hidden email] > Sent: Mon, 23 Jul 2012 18:01:11 -0400 > To: [hidden email] > Subject: [R] How to do the same thing for all levels of a column? > > Dear all, > > > > I am a R beginner, and I am looking for a way to do the same thing for > all > levels of a column in a table. > > > > Basically, I have a bunch of protein sequences composed of different > amino > acid residues, and each residue is represented by an uppercase letter. I > want to calculate the ratio of different amino acid residues at each > position of the proteins. Here is an example table: > > Proteins > > Time_zero > > 1 > > 2 > > 3 > > 4 > > 5 > > 6 > > 7 > > 8 > > p1 > > 0.0050723 > > L > > E > > Y > > I > > I > > P > > D > > A > > p2 > > 0.0002731 > > T > > E > > N > > L > > V > > P > > G > > A > > p3 > > 9.757E-05 > > L > > M > > Y > > Q > > I > > P > > E > > C > > p4 > > 0.0002077 > > R > > E > > Y > > L > > I > > S > > E > > A > > > > If I name this table as myfile.txt, I have the following scripts to > calculate the ratio of each amino acid residue at position 1: > > # showing levels of the 3rd column, which means the types of residues > > >myfile[,3] > > > > # calculating the ratio of L > > >list=c(which(myfile[,3]=="L")) > > >time0total=sum(myfile[,2]) > > >AA_L=0 > > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > > >ratio_L=AA_L/time0total > > > > So how can I write a script to do the same thing for the other two levels > (T and R) in column 3, and also do this for every column that contains > amino acid residues? > > > > Many thanks for any help you could give me on this topic! :) > > > > Regards, > > Zhao > -- > Zhao JIN > Ph.D. Candidate > Ruth Ley Lab > 467 Biotech > Field of Microbiology, Cornell University > Lab: 607.255.4954 > Cell: 412.889.3675 > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop! ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi John,
Thank you for the tips. My apologies about the unreadable sample data... So here is the output of the sample data, and hopefully it works this time :) structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = structure(c(1L, 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = structure(c(1L, 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = c("Proteins", "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = c(NA, 4L), class = "data.frame") And here is my original question: Basically, I have a bunch of protein sequences composed of different amino acid residues, and each residue is represented by an uppercase letter. I want to calculate the ratio of different amino acid residues at each position of the proteins. If I name this table as myfile.txt, I have the following scripts to calculate the ratio of each amino acid residue at position 1: # showing levels of the 3rd column, which means the types of residues >myfile[,3] # calculating the ratio of L >list=c(which(myfile[,3]=="L")) >time0total=sum(myfile[,2]) >AA_L=0 >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >ratio_L=AA_L/time0total So how can I write a script to do the same thing for the other two levels (T and R) in column 3, and also do this for every column that contains amino acid residues? Thanks a lot! Regards, Zhao 2012/7/24 John Kane <[hidden email]> > First thing is to supply the data in a useable format. As is it is > essenatially unreadable. All R-beginners do this. :) > > Have a look at the dput function (?dput) for a good way to supply sample > data in an email. > > If you have a large dataset probably a few dozen lines of data would be > fine. > > Something like dput(head(mydata)) should be fine. Just copy and paste the > output into your email. > > Welcome to R. I think you will like it. > > John Kane > Kingston ON Canada > > > > -----Original Message----- > > From: [hidden email] > > Sent: Mon, 23 Jul 2012 18:01:11 -0400 > > To: [hidden email] > > Subject: [R] How to do the same thing for all levels of a column? > > > > Dear all, > > > > > > > > I am a R beginner, and I am looking for a way to do the same thing for > > all > > levels of a column in a table. > > > > > > > > Basically, I have a bunch of protein sequences composed of different > > amino > > acid residues, and each residue is represented by an uppercase letter. I > > want to calculate the ratio of different amino acid residues at each > > position of the proteins. Here is an example table: > > > > Proteins > > > > Time_zero > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > 5 > > > > 6 > > > > 7 > > > > 8 > > > > p1 > > > > 0.0050723 > > > > L > > > > E > > > > Y > > > > I > > > > I > > > > P > > > > D > > > > A > > > > p2 > > > > 0.0002731 > > > > T > > > > E > > > > N > > > > L > > > > V > > > > P > > > > G > > > > A > > > > p3 > > > > 9.757E-05 > > > > L > > > > M > > > > Y > > > > Q > > > > I > > > > P > > > > E > > > > C > > > > p4 > > > > 0.0002077 > > > > R > > > > E > > > > Y > > > > L > > > > I > > > > S > > > > E > > > > A > > > > > > > > If I name this table as myfile.txt, I have the following scripts to > > calculate the ratio of each amino acid residue at position 1: > > > > # showing levels of the 3rd column, which means the types of residues > > > > >myfile[,3] > > > > > > > > # calculating the ratio of L > > > > >list=c(which(myfile[,3]=="L")) > > > > >time0total=sum(myfile[,2]) > > > > >AA_L=0 > > > > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > > > > >ratio_L=AA_L/time0total > > > > > > > > So how can I write a script to do the same thing for the other two levels > > (T and R) in column 3, and also do this for every column that contains > > amino acid residues? > > > > > > > > Many thanks for any help you could give me on this topic! :) > > > > > > > > Regards, > > > > Zhao > > -- > > Zhao JIN > > Ph.D. Candidate > > Ruth Ley Lab > > 467 Biotech > > Field of Microbiology, Cornell University > > Lab: 607.255.4954 > > Cell: 412.889.3675 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ____________________________________________________________ > FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on > your desktop! > Check it out at http://www.inbox.com/marineaquarium > > > -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
I think this does what you want using two packages, plyr and reshape2 that you may have to install. If so install.packages("plyr", "reshape2") should do the trick. library(plyr) library(reshape2) # using supplied file 'myfile" from below time0total = sum(myfile[,2]) mydata <- myfile[, 2:10] md1 <- melt(mydata, id = "Time_zero") ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) John Kane Kingston ON Canada -----Original Message----- From: [hidden email] Sent: Tue, 24 Jul 2012 10:25:21 -0400 To: [hidden email] Subject: Re: [R] How to do the same thing for all levels of a column? Hi John, Thank you for the tips. My apologies about the unreadable sample data... So here is the output of the sample data, and hopefully it works this time :) myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = structure(c(1L, 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = structure(c(1L, 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = c("Proteins", "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = c(NA, 4L), class = "data.frame") And here is my original question: Basically, I have a bunch of protein sequences composed of different amino acid residues, and each residue is represented by an uppercase letter. I want to calculate the ratio of different amino acid residues at each position of the proteins. If I name this table as myfile.txt, I have the following scripts to calculate the ratio of each amino acid residue at position 1: # showing levels of the 3rd column, which means the types of residues >myfile[,3] # calculating the ratio of L >list=c(which(myfile[,3]=="L")) >time0total=sum(myfile[,2]) >AA_L=0 >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >ratio_L=AA_L/time0total So how can I write a script to do the same thing for the other two levels (T and R) in column 3, and also do this for every column that contains amino acid residues? Thanks a lot! Regards, Zhao 2012/7/24 John Kane <[1][hidden email]> First thing is to supply the data in a useable format. As is it is essenatially unreadable. All R-beginners do this. :) Have a look at the dput function (?dput) for a good way to supply sample data in an email. If you have a large dataset probably a few dozen lines of data would be fine. Something like dput(head(mydata)) should be fine. Just copy and paste the output into your email. Welcome to R. I think you will like it. John Kane Kingston ON Canada > -----Original Message----- > From: [2][hidden email] > Sent: Mon, 23 Jul 2012 18:01:11 -0400 > To: [3][hidden email] > Subject: [R] How to do the same thing for all levels of a column? > > Dear all, > > > > I am a R beginner, and I am looking for a way to do the same thing for > all > levels of a column in a table. > > > > Basically, I have a bunch of protein sequences composed of different > amino > acid residues, and each residue is represented by an uppercase letter. I > want to calculate the ratio of different amino acid residues at each > position of the proteins. Here is an example table: > > Proteins > > Time_zero > > 1 > > 2 > > 3 > > 4 > > 5 > > 6 > > 7 > > 8 > > p1 > > 0.0050723 > > L > > E > > Y > > I > > I > > P > > D > > A > > p2 > > 0.0002731 > > T > > E > > N > > L > > V > > P > > G > > A > > p3 > > 9.757E-05 > > L > > M > > Y > > Q > > I > > P > > E > > C > > p4 > > 0.0002077 > > R > > E > > Y > > L > > I > > S > > E > > A > > > > If I name this table as myfile.txt, I have the following scripts to > calculate the ratio of each amino acid residue at position 1: > > # showing levels of the 3rd column, which means the types of residues > > >myfile[,3] > > > > # calculating the ratio of L > > >list=c(which(myfile[,3]=="L")) > > >time0total=sum(myfile[,2]) > > >AA_L=0 > > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > > >ratio_L=AA_L/time0total > > > > So how can I write a script to do the same thing for the other two levels > (T and R) in column 3, and also do this for every column that contains > amino acid residues? > > > > Many thanks for any help you could give me on this topic! :) > > > > Regards, > > Zhao > -- > Zhao JIN > Ph.D. Candidate > Ruth Ley Lab > 467 Biotech > Field of Microbiology, Cornell University > Lab: 607.255.4954 > Cell: 412.889.3675 > > [[alternative HTML version deleted]] > > ______________________________________________ > [4][hidden email] mailing list > [5]https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > [6]http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop! Check it out at [7]http://www.inbox.com/marineaquarium -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 _________________________________________________________________ [8]3D Earth Screensaver Preview Free 3D Earth Screensaver Watch the Earth right on your desktop! Check it out at [9]www.inbox.com/earth References 1. mailto:[hidden email] 2. mailto:[hidden email] 3. mailto:[hidden email] 4. mailto:[hidden email] 5. https://stat.ethz.ch/mailman/listinfo/r-help 6. http://www.R-project.org/posting-guide.html 7. http://www.inbox.com/marineaquarium 8. http://www.inbox.com/earth 9. http://www.inbox.com/earth ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
The OP's request is a bit ambiguous to me: at a given residue, do you
wish to calculate the proportions for only those amino acids that appear at that residue, or do you wish to include the proportions for all amino acids, some of which might then be 0. Assuming the former, then I don't think one needs to go to the lengths described by John below. Using your example (thanks!), the following seems to suffice: > sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) $X1 x L R T 0.50 0.25 0.25 $X2 x E M 0.75 0.25 $X3 x N Y 0.25 0.75 $X4 x I L Q 0.25 0.50 0.25 $X5 x I V 0.75 0.25 $X6 x P S 0.75 0.25 $X7 x D E G 0.25 0.50 0.25 $X8 x A C 0.75 0.25 This could, of course, then be modified to add zero proportions for all non-appearing amino acids. -- Cheers, Bert On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote: > > I think this does what you want using two packages, plyr and reshape2 that > you may have to install. If so install.packages("plyr", "reshape2") should > do the trick. > library(plyr) > library(reshape2) > # using supplied file 'myfile" from below > time0total = sum(myfile[,2]) > mydata <- myfile[, 2:10] > md1 <- melt(mydata, id = "Time_zero") > ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) > > > John Kane > Kingston ON Canada > > -----Original Message----- > From: [hidden email] > Sent: Tue, 24 Jul 2012 10:25:21 -0400 > To: [hidden email] > Subject: Re: [R] How to do the same thing for all levels of a column? > > Hi John, > Thank you for the tips. My apologies about the unreadable sample data... > So here is the output of the sample data, and hopefully it works this time > :) > myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", > "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, > 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", > "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L > ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, > 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, > 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = > structure(c(1L, > 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, > 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, > 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = > structure(c(1L, > 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = > c("Proteins", > "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = > c(NA, > 4L), class = "data.frame") > And here is my original question: > Basically, I have a bunch of protein sequences composed of different amino > acid residues, and each residue is represented by an uppercase letter. I > want to calculate the ratio of different amino acid residues at each > position of the proteins. > > If I name this table as myfile.txt, I have the following scripts to > calculate the ratio of each amino acid residue at position 1: > > # showing levels of the 3rd column, which means the types of residues > > >myfile[,3] > > > # calculating the ratio of L > > >list=c(which(myfile[,3]=="L")) > > >time0total=sum(myfile[,2]) > > >AA_L=0 > > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > > >ratio_L=AA_L/time0total > > > So how can I write a script to do the same thing for the other two levels (T > and R) in column 3, and also do this for every column that contains amino > acid residues? > > Thanks a lot! > > Regards, > > Zhao > 2012/7/24 John Kane <[1][hidden email]> > > First thing is to supply the data in a useable format. As is it is > essenatially unreadable. All R-beginners do this. :) > Have a look at the dput function (?dput) for a good way to supply sample > data in an email. > If you have a large dataset probably a few dozen lines of data would be > fine. > Something like dput(head(mydata)) should be fine. Just copy and paste the > output into your email. > Welcome to R. I think you will like it. > John Kane > Kingston ON Canada > > > -----Original Message----- > > From: [2][hidden email] > > Sent: Mon, 23 Jul 2012 18:01:11 -0400 > > To: [3][hidden email] > > Subject: [R] How to do the same thing for all levels of a column? > > > > Dear all, > > > > > > > > I am a R beginner, and I am looking for a way to do the same thing for > > all > > levels of a column in a table. > > > > > > > > Basically, I have a bunch of protein sequences composed of different > > amino > > acid residues, and each residue is represented by an uppercase letter. I > > want to calculate the ratio of different amino acid residues at each > > position of the proteins. Here is an example table: > > > > Proteins > > > > Time_zero > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > 5 > > > > 6 > > > > 7 > > > > 8 > > > > p1 > > > > 0.0050723 > > > > L > > > > E > > > > Y > > > > I > > > > I > > > > P > > > > D > > > > A > > > > p2 > > > > 0.0002731 > > > > T > > > > E > > > > N > > > > L > > > > V > > > > P > > > > G > > > > A > > > > p3 > > > > 9.757E-05 > > > > L > > > > M > > > > Y > > > > Q > > > > I > > > > P > > > > E > > > > C > > > > p4 > > > > 0.0002077 > > > > R > > > > E > > > > Y > > > > L > > > > I > > > > S > > > > E > > > > A > > > > > > > > If I name this table as myfile.txt, I have the following scripts to > > calculate the ratio of each amino acid residue at position 1: > > > > # showing levels of the 3rd column, which means the types of residues > > > > >myfile[,3] > > > > > > > > # calculating the ratio of L > > > > >list=c(which(myfile[,3]=="L")) > > > > >time0total=sum(myfile[,2]) > > > > >AA_L=0 > > > > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > > > > >ratio_L=AA_L/time0total > > > > > > > > So how can I write a script to do the same thing for the other two levels > > (T and R) in column 3, and also do this for every column that contains > > amino acid residues? > > > > > > > > Many thanks for any help you could give me on this topic! :) > > > > > > > > Regards, > > > > Zhao > > -- > > Zhao JIN > > Ph.D. Candidate > > Ruth Ley Lab > > 467 Biotech > > Field of Microbiology, Cornell University > > Lab: 607.255.4954 > > Cell: 412.889.3675 > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [4][hidden email] mailing list > > [5]https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > [6]http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > ____________________________________________________________ > FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on > your desktop! > Check it out at [7]http://www.inbox.com/marineaquarium > > -- > Zhao JIN > Ph.D. Candidate > Ruth Ley Lab > 467 Biotech > Field of Microbiology, Cornell University > Lab: 607.255.4954 > Cell: 412.889.3675 > _________________________________________________________________ > > [8]3D Earth Screensaver Preview > Free 3D Earth Screensaver > Watch the Earth right on your desktop! Check it out at > [9]www.inbox.com/earth > > References > > 1. mailto:[hidden email] > 2. mailto:[hidden email] > 3. mailto:[hidden email] > 4. mailto:[hidden email] > 5. https://stat.ethz.ch/mailman/listinfo/r-help > 6. http://www.R-project.org/posting-guide.html > 7. http://www.inbox.com/marineaquarium > 8. http://www.inbox.com/earth > 9. http://www.inbox.com/earth > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
OK, I admit it: I re-read what you wrote and now I'm confused. Is:
> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) X1 X2 X3 X4 X5 X6 X7 X8 [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 what you want? -- Bert On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote: > The OP's request is a bit ambiguous to me: at a given residue, do you > wish to calculate the proportions for only those amino acids that > appear at that residue, or do you wish to include the proportions for > all amino acids, some of which might then be 0. > > Assuming the former, then I don't think one needs to go to the lengths > described by John below. > > Using your example (thanks!), the following seems to suffice: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) > > $X1 > x > L R T > 0.50 0.25 0.25 > > $X2 > x > E M > 0.75 0.25 > > $X3 > x > N Y > 0.25 0.75 > > $X4 > x > I L Q > 0.25 0.50 0.25 > > $X5 > x > I V > 0.75 0.25 > > $X6 > x > P S > 0.75 0.25 > > $X7 > x > D E G > 0.25 0.50 0.25 > > $X8 > x > A C > 0.75 0.25 > > > This could, of course, then be modified to add zero proportions for > all non-appearing amino acids. > > -- Cheers, > Bert > > On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote: >> >> I think this does what you want using two packages, plyr and reshape2 that >> you may have to install. If so install.packages("plyr", "reshape2") should >> do the trick. >> library(plyr) >> library(reshape2) >> # using supplied file 'myfile" from below >> time0total = sum(myfile[,2]) >> mydata <- myfile[, 2:10] >> md1 <- melt(mydata, id = "Time_zero") >> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) >> >> >> John Kane >> Kingston ON Canada >> >> -----Original Message----- >> From: [hidden email] >> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >> To: [hidden email] >> Subject: Re: [R] How to do the same thing for all levels of a column? >> >> Hi John, >> Thank you for the tips. My apologies about the unreadable sample data... >> So here is the output of the sample data, and hopefully it works this time >> :) >> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", >> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, >> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >> structure(c(1L, >> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, >> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, >> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >> structure(c(1L, >> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >> c("Proteins", >> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >> c(NA, >> 4L), class = "data.frame") >> And here is my original question: >> Basically, I have a bunch of protein sequences composed of different amino >> acid residues, and each residue is represented by an uppercase letter. I >> want to calculate the ratio of different amino acid residues at each >> position of the proteins. >> >> If I name this table as myfile.txt, I have the following scripts to >> calculate the ratio of each amino acid residue at position 1: >> >> # showing levels of the 3rd column, which means the types of residues >> >> >myfile[,3] >> >> >> # calculating the ratio of L >> >> >list=c(which(myfile[,3]=="L")) >> >> >time0total=sum(myfile[,2]) >> >> >AA_L=0 >> >> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >> >> >ratio_L=AA_L/time0total >> >> >> So how can I write a script to do the same thing for the other two levels (T >> and R) in column 3, and also do this for every column that contains amino >> acid residues? >> >> Thanks a lot! >> >> Regards, >> >> Zhao >> 2012/7/24 John Kane <[1][hidden email]> >> >> First thing is to supply the data in a useable format. As is it is >> essenatially unreadable. All R-beginners do this. :) >> Have a look at the dput function (?dput) for a good way to supply sample >> data in an email. >> If you have a large dataset probably a few dozen lines of data would be >> fine. >> Something like dput(head(mydata)) should be fine. Just copy and paste the >> output into your email. >> Welcome to R. I think you will like it. >> John Kane >> Kingston ON Canada >> >> > -----Original Message----- >> > From: [2][hidden email] >> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >> > To: [3][hidden email] >> > Subject: [R] How to do the same thing for all levels of a column? >> > >> > Dear all, >> > >> > >> > >> > I am a R beginner, and I am looking for a way to do the same thing for >> > all >> > levels of a column in a table. >> > >> > >> > >> > Basically, I have a bunch of protein sequences composed of different >> > amino >> > acid residues, and each residue is represented by an uppercase letter. I >> > want to calculate the ratio of different amino acid residues at each >> > position of the proteins. Here is an example table: >> > >> > Proteins >> > >> > Time_zero >> > >> > 1 >> > >> > 2 >> > >> > 3 >> > >> > 4 >> > >> > 5 >> > >> > 6 >> > >> > 7 >> > >> > 8 >> > >> > p1 >> > >> > 0.0050723 >> > >> > L >> > >> > E >> > >> > Y >> > >> > I >> > >> > I >> > >> > P >> > >> > D >> > >> > A >> > >> > p2 >> > >> > 0.0002731 >> > >> > T >> > >> > E >> > >> > N >> > >> > L >> > >> > V >> > >> > P >> > >> > G >> > >> > A >> > >> > p3 >> > >> > 9.757E-05 >> > >> > L >> > >> > M >> > >> > Y >> > >> > Q >> > >> > I >> > >> > P >> > >> > E >> > >> > C >> > >> > p4 >> > >> > 0.0002077 >> > >> > R >> > >> > E >> > >> > Y >> > >> > L >> > >> > I >> > >> > S >> > >> > E >> > >> > A >> > >> > >> > >> > If I name this table as myfile.txt, I have the following scripts to >> > calculate the ratio of each amino acid residue at position 1: >> > >> > # showing levels of the 3rd column, which means the types of residues >> > >> > >myfile[,3] >> > >> > >> > >> > # calculating the ratio of L >> > >> > >list=c(which(myfile[,3]=="L")) >> > >> > >time0total=sum(myfile[,2]) >> > >> > >AA_L=0 >> > >> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >> > >> > >ratio_L=AA_L/time0total >> > >> > >> > >> > So how can I write a script to do the same thing for the other two levels >> > (T and R) in column 3, and also do this for every column that contains >> > amino acid residues? >> > >> > >> > >> > Many thanks for any help you could give me on this topic! :) >> > >> > >> > >> > Regards, >> > >> > Zhao >> > -- >> > Zhao JIN >> > Ph.D. Candidate >> > Ruth Ley Lab >> > 467 Biotech >> > Field of Microbiology, Cornell University >> > Lab: 607.255.4954 >> > Cell: 412.889.3675 >> > >> >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > [4][hidden email] mailing list >> > [5]https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > [6]http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> ____________________________________________________________ >> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >> your desktop! >> Check it out at [7]http://www.inbox.com/marineaquarium >> >> -- >> Zhao JIN >> Ph.D. Candidate >> Ruth Ley Lab >> 467 Biotech >> Field of Microbiology, Cornell University >> Lab: 607.255.4954 >> Cell: 412.889.3675 >> _________________________________________________________________ >> >> [8]3D Earth Screensaver Preview >> Free 3D Earth Screensaver >> Watch the Earth right on your desktop! Check it out at >> [9]www.inbox.com/earth >> >> References >> >> 1. mailto:[hidden email] >> 2. mailto:[hidden email] >> 3. mailto:[hidden email] >> 4. mailto:[hidden email] >> 5. https://stat.ethz.ch/mailman/listinfo/r-help >> 6. http://www.R-project.org/posting-guide.html >> 7. http://www.inbox.com/marineaquarium >> 8. http://www.inbox.com/earth >> 9. http://www.inbox.com/earth >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Sorry. Typo in my previous. Should be:
> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) $X1 L R T 0.91491320 0.03675651 0.04833030 $X2 E M 0.9827278 0.0172722 $X3 N Y 0.0483303 0.9516697 $X4 I L Q 0.8976410 0.0850868 0.0172722 $X5 I V 0.9516697 0.0483303 $X6 P S 0.96324349 0.03675651 $X7 D E G 0.8976410 0.0540287 0.0483303 $X8 A C 0.9827278 0.0172722 On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote: > OK, I admit it: I re-read what you wrote and now I'm confused. Is: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) > > X1 X2 X3 X4 X5 X6 X7 X8 > [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 > [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 > [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 > [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 > > what you want? > > -- Bert > On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote: >> The OP's request is a bit ambiguous to me: at a given residue, do you >> wish to calculate the proportions for only those amino acids that >> appear at that residue, or do you wish to include the proportions for >> all amino acids, some of which might then be 0. >> >> Assuming the former, then I don't think one needs to go to the lengths >> described by John below. >> >> Using your example (thanks!), the following seems to suffice: >> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) >> >> $X1 >> x >> L R T >> 0.50 0.25 0.25 >> >> $X2 >> x >> E M >> 0.75 0.25 >> >> $X3 >> x >> N Y >> 0.25 0.75 >> >> $X4 >> x >> I L Q >> 0.25 0.50 0.25 >> >> $X5 >> x >> I V >> 0.75 0.25 >> >> $X6 >> x >> P S >> 0.75 0.25 >> >> $X7 >> x >> D E G >> 0.25 0.50 0.25 >> >> $X8 >> x >> A C >> 0.75 0.25 >> >> >> This could, of course, then be modified to add zero proportions for >> all non-appearing amino acids. >> >> -- Cheers, >> Bert >> >> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote: >>> >>> I think this does what you want using two packages, plyr and reshape2 that >>> you may have to install. If so install.packages("plyr", "reshape2") should >>> do the trick. >>> library(plyr) >>> library(reshape2) >>> # using supplied file 'myfile" from below >>> time0total = sum(myfile[,2]) >>> mydata <- myfile[, 2:10] >>> md1 <- melt(mydata, id = "Time_zero") >>> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) >>> >>> >>> John Kane >>> Kingston ON Canada >>> >>> -----Original Message----- >>> From: [hidden email] >>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >>> To: [hidden email] >>> Subject: Re: [R] How to do the same thing for all levels of a column? >>> >>> Hi John, >>> Thank you for the tips. My apologies about the unreadable sample data... >>> So here is the output of the sample data, and hopefully it works this time >>> :) >>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", >>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, >>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >>> structure(c(1L, >>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, >>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, >>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >>> structure(c(1L, >>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >>> c("Proteins", >>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >>> c(NA, >>> 4L), class = "data.frame") >>> And here is my original question: >>> Basically, I have a bunch of protein sequences composed of different amino >>> acid residues, and each residue is represented by an uppercase letter. I >>> want to calculate the ratio of different amino acid residues at each >>> position of the proteins. >>> >>> If I name this table as myfile.txt, I have the following scripts to >>> calculate the ratio of each amino acid residue at position 1: >>> >>> # showing levels of the 3rd column, which means the types of residues >>> >>> >myfile[,3] >>> >>> >>> # calculating the ratio of L >>> >>> >list=c(which(myfile[,3]=="L")) >>> >>> >time0total=sum(myfile[,2]) >>> >>> >AA_L=0 >>> >>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>> >>> >ratio_L=AA_L/time0total >>> >>> >>> So how can I write a script to do the same thing for the other two levels (T >>> and R) in column 3, and also do this for every column that contains amino >>> acid residues? >>> >>> Thanks a lot! >>> >>> Regards, >>> >>> Zhao >>> 2012/7/24 John Kane <[1][hidden email]> >>> >>> First thing is to supply the data in a useable format. As is it is >>> essenatially unreadable. All R-beginners do this. :) >>> Have a look at the dput function (?dput) for a good way to supply sample >>> data in an email. >>> If you have a large dataset probably a few dozen lines of data would be >>> fine. >>> Something like dput(head(mydata)) should be fine. Just copy and paste the >>> output into your email. >>> Welcome to R. I think you will like it. >>> John Kane >>> Kingston ON Canada >>> >>> > -----Original Message----- >>> > From: [2][hidden email] >>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >>> > To: [3][hidden email] >>> > Subject: [R] How to do the same thing for all levels of a column? >>> > >>> > Dear all, >>> > >>> > >>> > >>> > I am a R beginner, and I am looking for a way to do the same thing for >>> > all >>> > levels of a column in a table. >>> > >>> > >>> > >>> > Basically, I have a bunch of protein sequences composed of different >>> > amino >>> > acid residues, and each residue is represented by an uppercase letter. I >>> > want to calculate the ratio of different amino acid residues at each >>> > position of the proteins. Here is an example table: >>> > >>> > Proteins >>> > >>> > Time_zero >>> > >>> > 1 >>> > >>> > 2 >>> > >>> > 3 >>> > >>> > 4 >>> > >>> > 5 >>> > >>> > 6 >>> > >>> > 7 >>> > >>> > 8 >>> > >>> > p1 >>> > >>> > 0.0050723 >>> > >>> > L >>> > >>> > E >>> > >>> > Y >>> > >>> > I >>> > >>> > I >>> > >>> > P >>> > >>> > D >>> > >>> > A >>> > >>> > p2 >>> > >>> > 0.0002731 >>> > >>> > T >>> > >>> > E >>> > >>> > N >>> > >>> > L >>> > >>> > V >>> > >>> > P >>> > >>> > G >>> > >>> > A >>> > >>> > p3 >>> > >>> > 9.757E-05 >>> > >>> > L >>> > >>> > M >>> > >>> > Y >>> > >>> > Q >>> > >>> > I >>> > >>> > P >>> > >>> > E >>> > >>> > C >>> > >>> > p4 >>> > >>> > 0.0002077 >>> > >>> > R >>> > >>> > E >>> > >>> > Y >>> > >>> > L >>> > >>> > I >>> > >>> > S >>> > >>> > E >>> > >>> > A >>> > >>> > >>> > >>> > If I name this table as myfile.txt, I have the following scripts to >>> > calculate the ratio of each amino acid residue at position 1: >>> > >>> > # showing levels of the 3rd column, which means the types of residues >>> > >>> > >myfile[,3] >>> > >>> > >>> > >>> > # calculating the ratio of L >>> > >>> > >list=c(which(myfile[,3]=="L")) >>> > >>> > >time0total=sum(myfile[,2]) >>> > >>> > >AA_L=0 >>> > >>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>> > >>> > >ratio_L=AA_L/time0total >>> > >>> > >>> > >>> > So how can I write a script to do the same thing for the other two levels >>> > (T and R) in column 3, and also do this for every column that contains >>> > amino acid residues? >>> > >>> > >>> > >>> > Many thanks for any help you could give me on this topic! :) >>> > >>> > >>> > >>> > Regards, >>> > >>> > Zhao >>> > -- >>> > Zhao JIN >>> > Ph.D. Candidate >>> > Ruth Ley Lab >>> > 467 Biotech >>> > Field of Microbiology, Cornell University >>> > Lab: 607.255.4954 >>> > Cell: 412.889.3675 >>> > >>> >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > [4][hidden email] mailing list >>> > [5]https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > [6]http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> ____________________________________________________________ >>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >>> your desktop! >>> Check it out at [7]http://www.inbox.com/marineaquarium >>> >>> -- >>> Zhao JIN >>> Ph.D. Candidate >>> Ruth Ley Lab >>> 467 Biotech >>> Field of Microbiology, Cornell University >>> Lab: 607.255.4954 >>> Cell: 412.889.3675 >>> _________________________________________________________________ >>> >>> [8]3D Earth Screensaver Preview >>> Free 3D Earth Screensaver >>> Watch the Earth right on your desktop! Check it out at >>> [9]www.inbox.com/earth >>> >>> References >>> >>> 1. mailto:[hidden email] >>> 2. mailto:[hidden email] >>> 3. mailto:[hidden email] >>> 4. mailto:[hidden email] >>> 5. https://stat.ethz.ch/mailman/listinfo/r-help >>> 6. http://www.R-project.org/posting-guide.html >>> 7. http://www.inbox.com/marineaquarium >>> 8. http://www.inbox.com/earth >>> 9. http://www.inbox.com/earth >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
... and I neglected to mention that f = myfiles[,2]
Sigh.... More coffee needed. -- Bert On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[hidden email]> wrote: > Sorry. Typo in my previous. Should be: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) > $X1 > L R T > 0.91491320 0.03675651 0.04833030 > > $X2 > E M > 0.9827278 0.0172722 > > $X3 > N Y > 0.0483303 0.9516697 > > $X4 > I L Q > 0.8976410 0.0850868 0.0172722 > > $X5 > I V > 0.9516697 0.0483303 > > $X6 > P S > 0.96324349 0.03675651 > > $X7 > D E G > 0.8976410 0.0540287 0.0483303 > > $X8 > A C > 0.9827278 0.0172722 > > > > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote: >> OK, I admit it: I re-read what you wrote and now I'm confused. Is: >> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) >> >> X1 X2 X3 X4 X5 X6 X7 X8 >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 >> >> what you want? >> >> -- Bert >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote: >>> The OP's request is a bit ambiguous to me: at a given residue, do you >>> wish to calculate the proportions for only those amino acids that >>> appear at that residue, or do you wish to include the proportions for >>> all amino acids, some of which might then be 0. >>> >>> Assuming the former, then I don't think one needs to go to the lengths >>> described by John below. >>> >>> Using your example (thanks!), the following seems to suffice: >>> >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) >>> >>> $X1 >>> x >>> L R T >>> 0.50 0.25 0.25 >>> >>> $X2 >>> x >>> E M >>> 0.75 0.25 >>> >>> $X3 >>> x >>> N Y >>> 0.25 0.75 >>> >>> $X4 >>> x >>> I L Q >>> 0.25 0.50 0.25 >>> >>> $X5 >>> x >>> I V >>> 0.75 0.25 >>> >>> $X6 >>> x >>> P S >>> 0.75 0.25 >>> >>> $X7 >>> x >>> D E G >>> 0.25 0.50 0.25 >>> >>> $X8 >>> x >>> A C >>> 0.75 0.25 >>> >>> >>> This could, of course, then be modified to add zero proportions for >>> all non-appearing amino acids. >>> >>> -- Cheers, >>> Bert >>> >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote: >>>> >>>> I think this does what you want using two packages, plyr and reshape2 that >>>> you may have to install. If so install.packages("plyr", "reshape2") should >>>> do the trick. >>>> library(plyr) >>>> library(reshape2) >>>> # using supplied file 'myfile" from below >>>> time0total = sum(myfile[,2]) >>>> mydata <- myfile[, 2:10] >>>> md1 <- melt(mydata, id = "Time_zero") >>>> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) >>>> >>>> >>>> John Kane >>>> Kingston ON Canada >>>> >>>> -----Original Message----- >>>> From: [hidden email] >>>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >>>> To: [hidden email] >>>> Subject: Re: [R] How to do the same thing for all levels of a column? >>>> >>>> Hi John, >>>> Thank you for the tips. My apologies about the unreadable sample data... >>>> So here is the output of the sample data, and hopefully it works this time >>>> :) >>>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", >>>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >>>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >>>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >>>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >>>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, >>>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >>>> structure(c(1L, >>>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, >>>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, >>>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >>>> structure(c(1L, >>>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >>>> c("Proteins", >>>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >>>> c(NA, >>>> 4L), class = "data.frame") >>>> And here is my original question: >>>> Basically, I have a bunch of protein sequences composed of different amino >>>> acid residues, and each residue is represented by an uppercase letter. I >>>> want to calculate the ratio of different amino acid residues at each >>>> position of the proteins. >>>> >>>> If I name this table as myfile.txt, I have the following scripts to >>>> calculate the ratio of each amino acid residue at position 1: >>>> >>>> # showing levels of the 3rd column, which means the types of residues >>>> >>>> >myfile[,3] >>>> >>>> >>>> # calculating the ratio of L >>>> >>>> >list=c(which(myfile[,3]=="L")) >>>> >>>> >time0total=sum(myfile[,2]) >>>> >>>> >AA_L=0 >>>> >>>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> >>>> >ratio_L=AA_L/time0total >>>> >>>> >>>> So how can I write a script to do the same thing for the other two levels (T >>>> and R) in column 3, and also do this for every column that contains amino >>>> acid residues? >>>> >>>> Thanks a lot! >>>> >>>> Regards, >>>> >>>> Zhao >>>> 2012/7/24 John Kane <[1][hidden email]> >>>> >>>> First thing is to supply the data in a useable format. As is it is >>>> essenatially unreadable. All R-beginners do this. :) >>>> Have a look at the dput function (?dput) for a good way to supply sample >>>> data in an email. >>>> If you have a large dataset probably a few dozen lines of data would be >>>> fine. >>>> Something like dput(head(mydata)) should be fine. Just copy and paste the >>>> output into your email. >>>> Welcome to R. I think you will like it. >>>> John Kane >>>> Kingston ON Canada >>>> >>>> > -----Original Message----- >>>> > From: [2][hidden email] >>>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >>>> > To: [3][hidden email] >>>> > Subject: [R] How to do the same thing for all levels of a column? >>>> > >>>> > Dear all, >>>> > >>>> > >>>> > >>>> > I am a R beginner, and I am looking for a way to do the same thing for >>>> > all >>>> > levels of a column in a table. >>>> > >>>> > >>>> > >>>> > Basically, I have a bunch of protein sequences composed of different >>>> > amino >>>> > acid residues, and each residue is represented by an uppercase letter. I >>>> > want to calculate the ratio of different amino acid residues at each >>>> > position of the proteins. Here is an example table: >>>> > >>>> > Proteins >>>> > >>>> > Time_zero >>>> > >>>> > 1 >>>> > >>>> > 2 >>>> > >>>> > 3 >>>> > >>>> > 4 >>>> > >>>> > 5 >>>> > >>>> > 6 >>>> > >>>> > 7 >>>> > >>>> > 8 >>>> > >>>> > p1 >>>> > >>>> > 0.0050723 >>>> > >>>> > L >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > I >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > D >>>> > >>>> > A >>>> > >>>> > p2 >>>> > >>>> > 0.0002731 >>>> > >>>> > T >>>> > >>>> > E >>>> > >>>> > N >>>> > >>>> > L >>>> > >>>> > V >>>> > >>>> > P >>>> > >>>> > G >>>> > >>>> > A >>>> > >>>> > p3 >>>> > >>>> > 9.757E-05 >>>> > >>>> > L >>>> > >>>> > M >>>> > >>>> > Y >>>> > >>>> > Q >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > E >>>> > >>>> > C >>>> > >>>> > p4 >>>> > >>>> > 0.0002077 >>>> > >>>> > R >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > L >>>> > >>>> > I >>>> > >>>> > S >>>> > >>>> > E >>>> > >>>> > A >>>> > >>>> > >>>> > >>>> > If I name this table as myfile.txt, I have the following scripts to >>>> > calculate the ratio of each amino acid residue at position 1: >>>> > >>>> > # showing levels of the 3rd column, which means the types of residues >>>> > >>>> > >myfile[,3] >>>> > >>>> > >>>> > >>>> > # calculating the ratio of L >>>> > >>>> > >list=c(which(myfile[,3]=="L")) >>>> > >>>> > >time0total=sum(myfile[,2]) >>>> > >>>> > >AA_L=0 >>>> > >>>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> > >>>> > >ratio_L=AA_L/time0total >>>> > >>>> > >>>> > >>>> > So how can I write a script to do the same thing for the other two levels >>>> > (T and R) in column 3, and also do this for every column that contains >>>> > amino acid residues? >>>> > >>>> > >>>> > >>>> > Many thanks for any help you could give me on this topic! :) >>>> > >>>> > >>>> > >>>> > Regards, >>>> > >>>> > Zhao >>>> > -- >>>> > Zhao JIN >>>> > Ph.D. Candidate >>>> > Ruth Ley Lab >>>> > 467 Biotech >>>> > Field of Microbiology, Cornell University >>>> > Lab: 607.255.4954 >>>> > Cell: 412.889.3675 >>>> > >>>> >>>> > [[alternative HTML version deleted]] >>>> > >>>> > ______________________________________________ >>>> > [4][hidden email] mailing list >>>> > [5]https://stat.ethz.ch/mailman/listinfo/r-help >>>> > PLEASE do read the posting guide >>>> > [6]http://www.R-project.org/posting-guide.html >>>> > and provide commented, minimal, self-contained, reproducible code. >>>> ____________________________________________________________ >>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >>>> your desktop! >>>> Check it out at [7]http://www.inbox.com/marineaquarium >>>> >>>> -- >>>> Zhao JIN >>>> Ph.D. Candidate >>>> Ruth Ley Lab >>>> 467 Biotech >>>> Field of Microbiology, Cornell University >>>> Lab: 607.255.4954 >>>> Cell: 412.889.3675 >>>> _________________________________________________________________ >>>> >>>> [8]3D Earth Screensaver Preview >>>> Free 3D Earth Screensaver >>>> Watch the Earth right on your desktop! Check it out at >>>> [9]www.inbox.com/earth >>>> >>>> References >>>> >>>> 1. mailto:[hidden email] >>>> 2. mailto:[hidden email] >>>> 3. mailto:[hidden email] >>>> 4. mailto:[hidden email] >>>> 5. https://stat.ethz.ch/mailman/listinfo/r-help >>>> 6. http://www.R-project.org/posting-guide.html >>>> 7. http://www.inbox.com/marineaquarium >>>> 8. http://www.inbox.com/earth >>>> 9. http://www.inbox.com/earth >>>> ______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> >>> Internal Contact Info: >>> Phone: 467-7374 >>> Website: >>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi John and Bert,
Thank you so much for your replies. Both of your scripts worked well, so now I've learnt two ways to do it. :) Bert: I was not very clear on what I wanted to do. I just would like to calculate the residues shown in the table, not all residues. The *apply*functions * *are amazing! John: as I am still digesting the codes, I am not sure if I fully understood the argument .(variables, value) in the *ddply* line. The description of *ddply* says that .variables show the variables to split data frame by, as quoted variables, a formula or character vector. So does .(variables, value) tell R to split the data frame by values, which are the types of amino acid residues? Thank you all again. Cheers, Zhao 2012/7/24 Bert Gunter <[hidden email]> > ... and I neglected to mention that f = myfiles[,2] > > Sigh.... More coffee needed. > > -- Bert > > On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[hidden email]> wrote: > > Sorry. Typo in my previous. Should be: > > > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) > > $X1 > > L R T > > 0.91491320 0.03675651 0.04833030 > > > > $X2 > > E M > > 0.9827278 0.0172722 > > > > $X3 > > N Y > > 0.0483303 0.9516697 > > > > $X4 > > I L Q > > 0.8976410 0.0850868 0.0172722 > > > > $X5 > > I V > > 0.9516697 0.0483303 > > > > $X6 > > P S > > 0.96324349 0.03675651 > > > > $X7 > > D E G > > 0.8976410 0.0540287 0.0483303 > > > > $X8 > > A C > > 0.9827278 0.0172722 > > > > > > > > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote: > >> OK, I admit it: I re-read what you wrote and now I'm confused. Is: > >> > >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) > >> > >> X1 X2 X3 X4 X5 X6 X7 X8 > >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 > >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 > >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 > >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 > >> > >> what you want? > >> > >> -- Bert > >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote: > >>> The OP's request is a bit ambiguous to me: at a given residue, do you > >>> wish to calculate the proportions for only those amino acids that > >>> appear at that residue, or do you wish to include the proportions for > >>> all amino acids, some of which might then be 0. > >>> > >>> Assuming the former, then I don't think one needs to go to the lengths > >>> described by John below. > >>> > >>> Using your example (thanks!), the following seems to suffice: > >>> > >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) > >>> > >>> $X1 > >>> x > >>> L R T > >>> 0.50 0.25 0.25 > >>> > >>> $X2 > >>> x > >>> E M > >>> 0.75 0.25 > >>> > >>> $X3 > >>> x > >>> N Y > >>> 0.25 0.75 > >>> > >>> $X4 > >>> x > >>> I L Q > >>> 0.25 0.50 0.25 > >>> > >>> $X5 > >>> x > >>> I V > >>> 0.75 0.25 > >>> > >>> $X6 > >>> x > >>> P S > >>> 0.75 0.25 > >>> > >>> $X7 > >>> x > >>> D E G > >>> 0.25 0.50 0.25 > >>> > >>> $X8 > >>> x > >>> A C > >>> 0.75 0.25 > >>> > >>> > >>> This could, of course, then be modified to add zero proportions for > >>> all non-appearing amino acids. > >>> > >>> -- Cheers, > >>> Bert > >>> > >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> > wrote: > >>>> > >>>> I think this does what you want using two packages, plyr and > reshape2 that > >>>> you may have to install. If so install.packages("plyr", > "reshape2") should > >>>> do the trick. > >>>> library(plyr) > >>>> library(reshape2) > >>>> # using supplied file 'myfile" from below > >>>> time0total = sum(myfile[,2]) > >>>> mydata <- myfile[, 2:10] > >>>> md1 <- melt(mydata, id = "Time_zero") > >>>> ddply(md1, .(variable, value), summarise, sum = > sum(Time_zero)/time0total) > >>>> > >>>> > >>>> John Kane > >>>> Kingston ON Canada > >>>> > >>>> -----Original Message----- > >>>> From: [hidden email] > >>>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 > >>>> To: [hidden email] > >>>> Subject: Re: [R] How to do the same thing for all levels of a > column? > >>>> > >>>> Hi John, > >>>> Thank you for the tips. My apologies about the unreadable sample > data... > >>>> So here is the output of the sample data, and hopefully it works > this time > >>>> :) > >>>> myfile <- structure(list(Proteins = structure(1:4, .Label = > c("p1", "p2", > >>>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, > >>>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = > c("L", > >>>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L > >>>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, > >>>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = > structure(c(1L, > >>>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), > X5 = > >>>> structure(c(1L, > >>>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = > structure(c(1L, > >>>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = > structure(c(1L, > >>>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), > X8 = > >>>> structure(c(1L, > >>>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), > .Names = > >>>> c("Proteins", > >>>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), > row.names = > >>>> c(NA, > >>>> 4L), class = "data.frame") > >>>> And here is my original question: > >>>> Basically, I have a bunch of protein sequences composed of > different amino > >>>> acid residues, and each residue is represented by an uppercase > letter. I > >>>> want to calculate the ratio of different amino acid residues at > each > >>>> position of the proteins. > >>>> > >>>> If I name this table as myfile.txt, I have the following > scripts to > >>>> calculate the ratio of each amino acid residue at position 1: > >>>> > >>>> # showing levels of the 3rd column, which means the types of > residues > >>>> > >>>> >myfile[,3] > >>>> > >>>> > >>>> # calculating the ratio of L > >>>> > >>>> >list=c(which(myfile[,3]=="L")) > >>>> > >>>> >time0total=sum(myfile[,2]) > >>>> > >>>> >AA_L=0 > >>>> > >>>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > >>>> > >>>> >ratio_L=AA_L/time0total > >>>> > >>>> > >>>> So how can I write a script to do the same thing for the other two > levels (T > >>>> and R) in column 3, and also do this for every column that > contains amino > >>>> acid residues? > >>>> > >>>> Thanks a lot! > >>>> > >>>> Regards, > >>>> > >>>> Zhao > >>>> 2012/7/24 John Kane <[1][hidden email]> > >>>> > >>>> First thing is to supply the data in a useable format. As is it > is > >>>> essenatially unreadable. All R-beginners do this. :) > >>>> Have a look at the dput function (?dput) for a good way to > supply sample > >>>> data in an email. > >>>> If you have a large dataset probably a few dozen lines of data > would be > >>>> fine. > >>>> Something like dput(head(mydata)) should be fine. Just copy and > paste the > >>>> output into your email. > >>>> Welcome to R. I think you will like it. > >>>> John Kane > >>>> Kingston ON Canada > >>>> > >>>> > -----Original Message----- > >>>> > From: [2][hidden email] > >>>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 > >>>> > To: [3][hidden email] > >>>> > Subject: [R] How to do the same thing for all levels of a column? > >>>> > > >>>> > Dear all, > >>>> > > >>>> > > >>>> > > >>>> > I am a R beginner, and I am looking for a way to do the same > thing for > >>>> > all > >>>> > levels of a column in a table. > >>>> > > >>>> > > >>>> > > >>>> > Basically, I have a bunch of protein sequences composed of > different > >>>> > amino > >>>> > acid residues, and each residue is represented by an uppercase > letter. I > >>>> > want to calculate the ratio of different amino acid residues at > each > >>>> > position of the proteins. Here is an example table: > >>>> > > >>>> > Proteins > >>>> > > >>>> > Time_zero > >>>> > > >>>> > 1 > >>>> > > >>>> > 2 > >>>> > > >>>> > 3 > >>>> > > >>>> > 4 > >>>> > > >>>> > 5 > >>>> > > >>>> > 6 > >>>> > > >>>> > 7 > >>>> > > >>>> > 8 > >>>> > > >>>> > p1 > >>>> > > >>>> > 0.0050723 > >>>> > > >>>> > L > >>>> > > >>>> > E > >>>> > > >>>> > Y > >>>> > > >>>> > I > >>>> > > >>>> > I > >>>> > > >>>> > P > >>>> > > >>>> > D > >>>> > > >>>> > A > >>>> > > >>>> > p2 > >>>> > > >>>> > 0.0002731 > >>>> > > >>>> > T > >>>> > > >>>> > E > >>>> > > >>>> > N > >>>> > > >>>> > L > >>>> > > >>>> > V > >>>> > > >>>> > P > >>>> > > >>>> > G > >>>> > > >>>> > A > >>>> > > >>>> > p3 > >>>> > > >>>> > 9.757E-05 > >>>> > > >>>> > L > >>>> > > >>>> > M > >>>> > > >>>> > Y > >>>> > > >>>> > Q > >>>> > > >>>> > I > >>>> > > >>>> > P > >>>> > > >>>> > E > >>>> > > >>>> > C > >>>> > > >>>> > p4 > >>>> > > >>>> > 0.0002077 > >>>> > > >>>> > R > >>>> > > >>>> > E > >>>> > > >>>> > Y > >>>> > > >>>> > L > >>>> > > >>>> > I > >>>> > > >>>> > S > >>>> > > >>>> > E > >>>> > > >>>> > A > >>>> > > >>>> > > >>>> > > >>>> > If I name this table as myfile.txt, I have the following scripts > to > >>>> > calculate the ratio of each amino acid residue at position 1: > >>>> > > >>>> > # showing levels of the 3rd column, which means the types of > residues > >>>> > > >>>> > >myfile[,3] > >>>> > > >>>> > > >>>> > > >>>> > # calculating the ratio of L > >>>> > > >>>> > >list=c(which(myfile[,3]=="L")) > >>>> > > >>>> > >time0total=sum(myfile[,2]) > >>>> > > >>>> > >AA_L=0 > >>>> > > >>>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > >>>> > > >>>> > >ratio_L=AA_L/time0total > >>>> > > >>>> > > >>>> > > >>>> > So how can I write a script to do the same thing for the other > two levels > >>>> > (T and R) in column 3, and also do this for every column that > contains > >>>> > amino acid residues? > >>>> > > >>>> > > >>>> > > >>>> > Many thanks for any help you could give me on this topic! :) > >>>> > > >>>> > > >>>> > > >>>> > Regards, > >>>> > > >>>> > Zhao > >>>> > -- > >>>> > Zhao JIN > >>>> > Ph.D. Candidate > >>>> > Ruth Ley Lab > >>>> > 467 Biotech > >>>> > Field of Microbiology, Cornell University > >>>> > Lab: 607.255.4954 > >>>> > Cell: 412.889.3675 > >>>> > > >>>> > >>>> > [[alternative HTML version deleted]] > >>>> > > >>>> > ______________________________________________ > >>>> > [4][hidden email] mailing list > >>>> > [5]https://stat.ethz.ch/mailman/listinfo/r-help > >>>> > PLEASE do read the posting guide > >>>> > [6]http://www.R-project.org/posting-guide.html > >>>> > and provide commented, minimal, self-contained, reproducible > code. > >>>> ____________________________________________________________ > >>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & > orcas on > >>>> your desktop! > >>>> Check it out at [7]http://www.inbox.com/marineaquarium > >>>> > >>>> -- > >>>> Zhao JIN > >>>> Ph.D. Candidate > >>>> Ruth Ley Lab > >>>> 467 Biotech > >>>> Field of Microbiology, Cornell University > >>>> Lab: 607.255.4954 > >>>> Cell: 412.889.3675 > >>>> _________________________________________________________________ > >>>> > >>>> [8]3D Earth Screensaver Preview > >>>> Free 3D Earth Screensaver > >>>> Watch the Earth right on your desktop! Check it out > at > >>>> [9]www.inbox.com/earth > >>>> > >>>> References > >>>> > >>>> 1. mailto:[hidden email] > >>>> 2. mailto:[hidden email] > >>>> 3. mailto:[hidden email] > >>>> 4. mailto:[hidden email] > >>>> 5. https://stat.ethz.ch/mailman/listinfo/r-help > >>>> 6. http://www.R-project.org/posting-guide.html > >>>> 7. http://www.inbox.com/marineaquarium > >>>> 8. http://www.inbox.com/earth > >>>> 9. http://www.inbox.com/earth > >>>> ______________________________________________ > >>>> [hidden email] mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >>> > >>> -- > >>> > >>> Bert Gunter > >>> Genentech Nonclinical Biostatistics > >>> > >>> Internal Contact Info: > >>> Phone: 467-7374 > >>> Website: > >>> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > >> > >> > >> > >> -- > >> > >> Bert Gunter > >> Genentech Nonclinical Biostatistics > >> > >> Internal Contact Info: > >> Phone: 467-7374 > >> Website: > >> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > > > > > -- > > > > Bert Gunter > > Genentech Nonclinical Biostatistics > > > > Internal Contact Info: > > Phone: 467-7374 > > Website: > > > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
No it's actually telling it to split by the two variables (variable, value) if I understand your question correctly. The confusion is my fault. I tend to be lazy when running examples and did not rename the melt() output to something meaningful. I sometimes forget that it's not just me reading the code. If you run: md1 <- melt(mydata, id = "Time_zero", variable.name="xvars", value.name="aminos") ddply(md1, .(xvars, aminos), summarise, sum = sum(Time_zero)/time0total) I think it will show what is happening. John Kane Kingston ON Canada -----Original Message----- From: [hidden email] Sent: Tue, 24 Jul 2012 15:26:52 -0400 To: [hidden email] Subject: Re: [R] How to do the same thing for all levels of a column? Hi John and Bert, Thank you so much for your replies. Both of your scripts worked well, so now I've learnt two ways to do it. :) Bert: I was not very clear on what I wanted to do. I just would like to calculate the residues shown in the table, not all residues. The apply functions are amazing! John: as I am still digesting the codes, I am not sure if I fully understood the argument .(variables, value) in the ddply line. The description of ddply says that .variables show the variables to split data frame by, as quoted variables, a formula or character vector. So does .(variables, value) tell R to split the data frame by values, which are the types of amino acid residues? Thank you all again. Cheers, Zhao 2012/7/24 Bert Gunter <[1][hidden email]> ... and I neglected to mention that f = myfiles[,2] Sigh.... More coffee needed. -- Bert On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[2][hidden email]> wrote: > Sorry. Typo in my previous. Should be: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) > $X1 > L R T > 0.91491320 0.03675651 0.04833030 > > $X2 > E M > 0.9827278 0.0172722 > > $X3 > N Y > 0.0483303 0.9516697 > > $X4 > I L Q > 0.8976410 0.0850868 0.0172722 > > $X5 > I V > 0.9516697 0.0483303 > > $X6 > P S > 0.96324349 0.03675651 > > $X7 > D E G > 0.8976410 0.0540287 0.0483303 > > $X8 > A C > 0.9827278 0.0172722 > > > > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[3][hidden email]> wrote: >> OK, I admit it: I re-read what you wrote and now I'm confused. Is: >> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) >> >> X1 X2 X3 X4 X5 X6 X7 X8 >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 >> >> what you want? >> >> -- Bert >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[4][hidden email]> wrote: >>> The OP's request is a bit ambiguous to me: at a given residue, do you >>> wish to calculate the proportions for only those amino acids that >>> appear at that residue, or do you wish to include the proportions for >>> all amino acids, some of which might then be 0. >>> >>> Assuming the former, then I don't think one needs to go to the lengths >>> described by John below. >>> >>> Using your example (thanks!), the following seems to suffice: >>> >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) >>> >>> $X1 >>> x >>> L R T >>> 0.50 0.25 0.25 >>> >>> $X2 >>> x >>> E M >>> 0.75 0.25 >>> >>> $X3 >>> x >>> N Y >>> 0.25 0.75 >>> >>> $X4 >>> x >>> I L Q >>> 0.25 0.50 0.25 >>> >>> $X5 >>> x >>> I V >>> 0.75 0.25 >>> >>> $X6 >>> x >>> P S >>> 0.75 0.25 >>> >>> $X7 >>> x >>> D E G >>> 0.25 0.50 0.25 >>> >>> $X8 >>> x >>> A C >>> 0.75 0.25 >>> >>> >>> This could, of course, then be modified to add zero proportions for >>> all non-appearing amino acids. >>> >>> -- Cheers, >>> Bert >>> >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[5][hidden email]> wrote: >>>> >>>> I think this does what you want using two packages, plyr and reshape2 that >>>> you may have to install. If so install.packages("plyr", "reshape2") should >>>> do the trick. >>>> library(plyr) >>>> library(reshape2) >>>> # using supplied file 'myfile" from below >>>> time0total = sum(myfile[,2]) >>>> mydata <- myfile[, 2:10] >>>> md1 <- melt(mydata, id = "Time_zero") >>>> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) >>>> >>>> >>>> John Kane >>>> Kingston ON Canada >>>> >>>> -----Original Message----- >>>> From: [6][hidden email] >>>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >>>> To: [7][hidden email] >>>> Subject: Re: [R] How to do the same thing for all levels of a column? >>>> >>>> Hi John, >>>> Thank you for the tips. My apologies about the unreadable sample data... >>>> So here is the output of the sample data, and hopefully it works this time >>>> :) >>>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", >>>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >>>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >>>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >>>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >>>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, >>>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >>>> structure(c(1L, >>>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, >>>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, >>>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >>>> structure(c(1L, >>>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >>>> c("Proteins", >>>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >>>> c(NA, >>>> 4L), class = "data.frame") >>>> And here is my original question: >>>> Basically, I have a bunch of protein sequences composed of different amino >>>> acid residues, and each residue is represented by an uppercase letter. I >>>> want to calculate the ratio of different amino acid residues at each >>>> position of the proteins. >>>> >>>> If I name this table as myfile.txt, I have the following scripts to >>>> calculate the ratio of each amino acid residue at position 1: >>>> >>>> # showing levels of the 3rd column, which means the types of residues >>>> >>>> >myfile[,3] >>>> >>>> >>>> # calculating the ratio of L >>>> >>>> >list=c(which(myfile[,3]=="L")) >>>> >>>> >time0total=sum(myfile[,2]) >>>> >>>> >AA_L=0 >>>> >>>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> >>>> >ratio_L=AA_L/time0total >>>> >>>> >>>> So how can I write a script to do the same thing for the other two levels (T >>>> and R) in column 3, and also do this for every column that contains amino >>>> acid residues? >>>> >>>> Thanks a lot! >>>> >>>> Regards, >>>> >>>> Zhao >>>> 2012/7/24 John Kane <[1][8][hidden email]> >>>> >>>> First thing is to supply the data in a useable format. As is it is >>>> essenatially unreadable. All R-beginners do this. :) >>>> Have a look at the dput function (?dput) for a good way to supply sample >>>> data in an email. >>>> If you have a large dataset probably a few dozen lines of data would be >>>> fine. >>>> Something like dput(head(mydata)) should be fine. Just copy and paste the >>>> output into your email. >>>> Welcome to R. I think you will like it. >>>> John Kane >>>> Kingston ON Canada >>>> >>>> > -----Original Message----- >>>> > From: [2][9][hidden email] >>>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >>>> > To: [3][10][hidden email] >>>> > Subject: [R] How to do the same thing for all levels of a column? >>>> > >>>> > Dear all, >>>> > >>>> > >>>> > >>>> > I am a R beginner, and I am looking for a way to do the same thing for >>>> > all >>>> > levels of a column in a table. >>>> > >>>> > >>>> > >>>> > Basically, I have a bunch of protein sequences composed of different >>>> > amino >>>> > acid residues, and each residue is represented by an uppercase letter. I >>>> > want to calculate the ratio of different amino acid residues at each >>>> > position of the proteins. Here is an example table: >>>> > >>>> > Proteins >>>> > >>>> > Time_zero >>>> > >>>> > 1 >>>> > >>>> > 2 >>>> > >>>> > 3 >>>> > >>>> > 4 >>>> > >>>> > 5 >>>> > >>>> > 6 >>>> > >>>> > 7 >>>> > >>>> > 8 >>>> > >>>> > p1 >>>> > >>>> > 0.0050723 >>>> > >>>> > L >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > I >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > D >>>> > >>>> > A >>>> > >>>> > p2 >>>> > >>>> > 0.0002731 >>>> > >>>> > T >>>> > >>>> > E >>>> > >>>> > N >>>> > >>>> > L >>>> > >>>> > V >>>> > >>>> > P >>>> > >>>> > G >>>> > >>>> > A >>>> > >>>> > p3 >>>> > >>>> > 9.757E-05 >>>> > >>>> > L >>>> > >>>> > M >>>> > >>>> > Y >>>> > >>>> > Q >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > E >>>> > >>>> > C >>>> > >>>> > p4 >>>> > >>>> > 0.0002077 >>>> > >>>> > R >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > L >>>> > >>>> > I >>>> > >>>> > S >>>> > >>>> > E >>>> > >>>> > A >>>> > >>>> > >>>> > >>>> > If I name this table as myfile.txt, I have the following scripts to >>>> > calculate the ratio of each amino acid residue at position 1: >>>> > >>>> > # showing levels of the 3rd column, which means the types of residues >>>> > >>>> > >myfile[,3] >>>> > >>>> > >>>> > >>>> > # calculating the ratio of L >>>> > >>>> > >list=c(which(myfile[,3]=="L")) >>>> > >>>> > >time0total=sum(myfile[,2]) >>>> > >>>> > >AA_L=0 >>>> > >>>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> > >>>> > >ratio_L=AA_L/time0total >>>> > >>>> > >>>> > >>>> > So how can I write a script to do the same thing for the other two levels >>>> > (T and R) in column 3, and also do this for every column that contains >>>> > amino acid residues? >>>> > >>>> > >>>> > >>>> > Many thanks for any help you could give me on this topic! :) >>>> > >>>> > >>>> > >>>> > Regards, >>>> > >>>> > Zhao >>>> > -- >>>> > Zhao JIN >>>> > Ph.D. Candidate >>>> > Ruth Ley Lab >>>> > 467 Biotech >>>> > Field of Microbiology, Cornell University >>>> > Lab: 607.255.4954 >>>> > Cell: 412.889.3675 >>>> > >>>> >>>> > [[alternative HTML version deleted]] >>>> > >>>> > ______________________________________________ >>>> > [4][11][hidden email] mailing list >>>> > [5][12]https://stat.ethz.ch/mailman/listinfo/r-help >>>> > PLEASE do read the posting guide >>>> > [6][13]http://www.R-project.org/posting-guide.html >>>> > and provide commented, minimal, self-contained, reproducible code. >>>> ____________________________________________________________ >>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >>>> your desktop! >>>> Check it out at [7][14]http://www.inbox.com/marineaquarium >>>> >>>> -- >>>> Zhao JIN >>>> Ph.D. Candidate >>>> Ruth Ley Lab >>>> 467 Biotech >>>> Field of Microbiology, Cornell University >>>> Lab: 607.255.4954 >>>> Cell: 412.889.3675 >>>> _________________________________________________________________ >>>> >>>> [8]3D Earth Screensaver Preview >>>> Free 3D Earth Screensaver >>>> Watch the Earth right on your desktop! Check it out at >>>> [9][15]www.inbox.com/earth >>>> >>>> References >>>> >>>> 1. mailto:[16][hidden email] >>>> 2. mailto:[17][hidden email] >>>> 3. mailto:[18][hidden email] >>>> 4. mailto:[19][hidden email] >>>> 5. [20]https://stat.ethz.ch/mailman/listinfo/r-help >>>> 6. [21]http://www.R-project.org/posting-guide.html >>>> 7. [22]http://www.inbox.com/marineaquarium >>>> 8. [23]http://www.inbox.com/earth >>>> 9. [24]http://www.inbox.com/earth >>>> ______________________________________________ >>>> [25][hidden email] mailing list >>>> [26]https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide [27]http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> >>> Internal Contact Info: >>> Phone: 467-7374 >>> Website: >>> [28]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> [29]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > [30]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: [31]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 _________________________________________________________________ [32]3D Marine Aquarium Screensaver Preview Free 3D Marine Aquarium Screensaver Watch dolphins, sharks & orcas on your desktop! Check it out at [33]www.inbox.com/marineaquarium References 1. mailto:[hidden email] 2. mailto:[hidden email] 3. mailto:[hidden email] 4. mailto:[hidden email] 5. mailto:[hidden email] 6. mailto:[hidden email] 7. mailto:[hidden email] 8. mailto:[hidden email] 9. mailto:[hidden email] 10. mailto:[hidden email] 11. mailto:[hidden email] 12. https://stat.ethz.ch/mailman/listinfo/r-help 13. http://www.R-project.org/posting-guide.html 14. http://www.inbox.com/marineaquarium 15. http://www.inbox.com/earth 16. mailto:[hidden email] 17. mailto:[hidden email] 18. mailto:[hidden email] 19. mailto:[hidden email] 20. https://stat.ethz.ch/mailman/listinfo/r-help 21. http://www.R-project.org/posting-guide.html 22. http://www.inbox.com/marineaquarium 23. http://www.inbox.com/earth 24. http://www.inbox.com/earth 25. mailto:[hidden email] 26. https://stat.ethz.ch/mailman/listinfo/r-help 27. http://www.R-project.org/posting-guide.html 28. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 29. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 30. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 31. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 32. http://www.inbox.com/marineaquarium 33. http://www.inbox.com/marineaquarium ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
