Quantcast

How to do the same thing for all levels of a column?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

How to do the same thing for all levels of a column?

Zhao Jin-2
Dear all,



I am a R beginner, and I am looking for a way to do the same thing for all
levels of a column in a table.



Basically, I have a bunch of protein sequences composed of different amino
acid residues, and each residue is represented by an uppercase letter. I
want to calculate the ratio of different amino acid residues at each
position of the proteins. Here is an example table:

Proteins

Time_zero

1

2

3

4

5

6

7

8

p1

0.0050723

L

E

Y

I

I

P

D

A

p2

0.0002731

T

E

N

L

V

P

G

A

p3

9.757E-05

L

M

Y

Q

I

P

E

C

p4

0.0002077

R

E

Y

L

I

S

E

A



If I name this table as myfile.txt, I have the following scripts to
calculate the ratio of each amino acid residue at position 1:

# showing levels of the 3rd column, which means the types of residues

>myfile[,3]



# calculating the ratio of L

>list=c(which(myfile[,3]=="L"))

>time0total=sum(myfile[,2])

>AA_L=0

>for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}

>ratio_L=AA_L/time0total



So how can I write a script to do the same thing for the other two levels
(T and R) in column 3, and also do this for every column that contains
amino acid residues?



Many thanks for any help you could give me on this topic! :)



Regards,

Zhao
--
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

John Kane
First thing is to supply the data in a useable format.  As is it is essenatially unreadable.  All R-beginners do this. :)

Have a look at the dput function  (?dput) for a good way to supply sample data in an email.

If you have a large dataset probably a few dozen lines of data would be fine.

Something like dput(head(mydata)) should be fine.  Just copy and paste the output into your email.

Welcome to R.  I think you will like it.

John Kane
Kingston ON Canada


> -----Original Message-----
> From: [hidden email]
> Sent: Mon, 23 Jul 2012 18:01:11 -0400
> To: [hidden email]
> Subject: [R] How to do the same thing for all levels of a column?
>
> Dear all,
>
>
>
> I am a R beginner, and I am looking for a way to do the same thing for
> all
> levels of a column in a table.
>
>
>
> Basically, I have a bunch of protein sequences composed of different
> amino
> acid residues, and each residue is represented by an uppercase letter. I
> want to calculate the ratio of different amino acid residues at each
> position of the proteins. Here is an example table:
>
> Proteins
>
> Time_zero
>
> 1
>
> 2
>
> 3
>
> 4
>
> 5
>
> 6
>
> 7
>
> 8
>
> p1
>
> 0.0050723
>
> L
>
> E
>
> Y
>
> I
>
> I
>
> P
>
> D
>
> A
>
> p2
>
> 0.0002731
>
> T
>
> E
>
> N
>
> L
>
> V
>
> P
>
> G
>
> A
>
> p3
>
> 9.757E-05
>
> L
>
> M
>
> Y
>
> Q
>
> I
>
> P
>
> E
>
> C
>
> p4
>
> 0.0002077
>
> R
>
> E
>
> Y
>
> L
>
> I
>
> S
>
> E
>
> A
>
>
>
> If I name this table as myfile.txt, I have the following scripts to
> calculate the ratio of each amino acid residue at position 1:
>
> # showing levels of the 3rd column, which means the types of residues
>
> >myfile[,3]
>
>
>
> # calculating the ratio of L
>
> >list=c(which(myfile[,3]=="L"))
>
> >time0total=sum(myfile[,2])
>
> >AA_L=0
>
> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>
> >ratio_L=AA_L/time0total
>
>
>
> So how can I write a script to do the same thing for the other two levels
> (T and R) in column 3, and also do this for every column that contains
> amino acid residues?
>
>
>
> Many thanks for any help you could give me on this topic! :)
>
>
>
> Regards,
>
> Zhao
> --
> Zhao JIN
> Ph.D. Candidate
> Ruth Ley Lab
> 467 Biotech
> Field of Microbiology, Cornell University
> Lab: 607.255.4954
> Cell: 412.889.3675
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

Zhao Jin-2
Hi John,

Thank you for the tips. My apologies about the unreadable sample data...

So here is the output of the sample data, and hopefully it works this time
:)

structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
"p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
"R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 =
structure(c(1L,
2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 =
structure(c(1L,
1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names =
c("Proteins",
"Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
c(NA,
4L), class = "data.frame")

And here is my original question:
Basically, I have a bunch of protein sequences composed of different amino
acid residues, and each residue is represented by an uppercase letter. I
want to calculate the ratio of different amino acid residues at each
position of the proteins.

If I name this table as myfile.txt, I have the following scripts to
calculate the ratio of each amino acid residue at position 1:

# showing levels of the 3rd column, which means the types of residues

>myfile[,3]



# calculating the ratio of L

>list=c(which(myfile[,3]=="L"))

>time0total=sum(myfile[,2])

>AA_L=0

>for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}

>ratio_L=AA_L/time0total



So how can I write a script to do the same thing for the other two levels
(T and R) in column 3, and also do this for every column that contains
amino acid residues?


Thanks a lot!


Regards,

Zhao

2012/7/24 John Kane <[hidden email]>

> First thing is to supply the data in a useable format.  As is it is
> essenatially unreadable.  All R-beginners do this. :)
>
> Have a look at the dput function  (?dput) for a good way to supply sample
> data in an email.
>
> If you have a large dataset probably a few dozen lines of data would be
> fine.
>
> Something like dput(head(mydata)) should be fine.  Just copy and paste the
> output into your email.
>
> Welcome to R.  I think you will like it.
>
> John Kane
> Kingston ON Canada
>
>
> > -----Original Message-----
> > From: [hidden email]
> > Sent: Mon, 23 Jul 2012 18:01:11 -0400
> > To: [hidden email]
> > Subject: [R] How to do the same thing for all levels of a column?
> >
> > Dear all,
> >
> >
> >
> > I am a R beginner, and I am looking for a way to do the same thing for
> > all
> > levels of a column in a table.
> >
> >
> >
> > Basically, I have a bunch of protein sequences composed of different
> > amino
> > acid residues, and each residue is represented by an uppercase letter. I
> > want to calculate the ratio of different amino acid residues at each
> > position of the proteins. Here is an example table:
> >
> > Proteins
> >
> > Time_zero
> >
> > 1
> >
> > 2
> >
> > 3
> >
> > 4
> >
> > 5
> >
> > 6
> >
> > 7
> >
> > 8
> >
> > p1
> >
> > 0.0050723
> >
> > L
> >
> > E
> >
> > Y
> >
> > I
> >
> > I
> >
> > P
> >
> > D
> >
> > A
> >
> > p2
> >
> > 0.0002731
> >
> > T
> >
> > E
> >
> > N
> >
> > L
> >
> > V
> >
> > P
> >
> > G
> >
> > A
> >
> > p3
> >
> > 9.757E-05
> >
> > L
> >
> > M
> >
> > Y
> >
> > Q
> >
> > I
> >
> > P
> >
> > E
> >
> > C
> >
> > p4
> >
> > 0.0002077
> >
> > R
> >
> > E
> >
> > Y
> >
> > L
> >
> > I
> >
> > S
> >
> > E
> >
> > A
> >
> >
> >
> > If I name this table as myfile.txt, I have the following scripts to
> > calculate the ratio of each amino acid residue at position 1:
> >
> > # showing levels of the 3rd column, which means the types of residues
> >
> > >myfile[,3]
> >
> >
> >
> > # calculating the ratio of L
> >
> > >list=c(which(myfile[,3]=="L"))
> >
> > >time0total=sum(myfile[,2])
> >
> > >AA_L=0
> >
> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> >
> > >ratio_L=AA_L/time0total
> >
> >
> >
> > So how can I write a script to do the same thing for the other two levels
> > (T and R) in column 3, and also do this for every column that contains
> > amino acid residues?
> >
> >
> >
> > Many thanks for any help you could give me on this topic! :)
> >
> >
> >
> > Regards,
> >
> > Zhao
> > --
> > Zhao JIN
> > Ph.D. Candidate
> > Ruth Ley Lab
> > 467 Biotech
> > Field of Microbiology, Cornell University
> > Lab: 607.255.4954
> > Cell: 412.889.3675
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ____________________________________________________________
> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
> your desktop!
> Check it out at http://www.inbox.com/marineaquarium
>
>
>


--
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

John Kane

   I think this does what you want using two packages, plyr and reshape2 that
   you may have to install.  If so install.packages("plyr", "reshape2") should
   do the trick.
   library(plyr)
   library(reshape2)
   # using supplied file 'myfile" from below
   time0total = sum(myfile[,2])
   mydata  <-  myfile[, 2:10]
   md1  <-  melt(mydata, id = "Time_zero")
   ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)


   John Kane
   Kingston ON Canada

   -----Original Message-----
   From: [hidden email]
   Sent: Tue, 24 Jul 2012 10:25:21 -0400
   To: [hidden email]
   Subject: Re: [R] How to do the same thing for all levels of a column?

   Hi John,
   Thank you for the tips. My apologies about the unreadable sample data...
   So here is the output of the sample data, and hopefully it works this time
   :)
   myfile  <-  structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
   "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
   9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
   "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
   ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
   1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
   2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5 =
   structure(c(1L,
   2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
   1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
   3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8 =
   structure(c(1L,
   1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names =
   c("Proteins",
   "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
   c(NA,
   4L), class = "data.frame")
   And here is my original question:
   Basically, I have a bunch of protein sequences composed of different amino
   acid residues, and each residue is represented by an uppercase letter. I
   want  to  calculate the ratio of different amino acid residues at each
   position of the proteins.

   If  I  name  this table as myfile.txt, I have the following scripts to
   calculate the ratio of each amino acid residue at position 1:

   # showing levels of the 3rd column, which means the types of residues

   >myfile[,3]


   # calculating the ratio of L

   >list=c(which(myfile[,3]=="L"))

   >time0total=sum(myfile[,2])

   >AA_L=0

   >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}

   >ratio_L=AA_L/time0total


   So how can I write a script to do the same thing for the other two levels (T
   and R) in column 3, and also do this for every column that contains amino
   acid residues?

   Thanks a lot!

   Regards,

   Zhao
   2012/7/24 John Kane <[1][hidden email]>

     First thing is to supply the data in a useable format.  As is it is
     essenatially unreadable.  All R-beginners do this. :)
     Have a look at the dput function  (?dput) for a good way to supply sample
     data in an email.
     If you have a large dataset probably a few dozen lines of data would be
     fine.
     Something like dput(head(mydata)) should be fine.  Just copy and paste the
     output into your email.
     Welcome to R.  I think you will like it.
     John Kane
     Kingston ON Canada

   > -----Original Message-----
   > From: [2][hidden email]
   > Sent: Mon, 23 Jul 2012 18:01:11 -0400
   > To: [3][hidden email]
   > Subject: [R] How to do the same thing for all levels of a column?
   >
   > Dear all,
   >
   >
   >
   > I am a R beginner, and I am looking for a way to do the same thing for
   > all
   > levels of a column in a table.
   >
   >
   >
   > Basically, I have a bunch of protein sequences composed of different
   > amino
   > acid residues, and each residue is represented by an uppercase letter. I
   > want to calculate the ratio of different amino acid residues at each
   > position of the proteins. Here is an example table:
   >
   > Proteins
   >
   > Time_zero
   >
   > 1
   >
   > 2
   >
   > 3
   >
   > 4
   >
   > 5
   >
   > 6
   >
   > 7
   >
   > 8
   >
   > p1
   >
   > 0.0050723
   >
   > L
   >
   > E
   >
   > Y
   >
   > I
   >
   > I
   >
   > P
   >
   > D
   >
   > A
   >
   > p2
   >
   > 0.0002731
   >
   > T
   >
   > E
   >
   > N
   >
   > L
   >
   > V
   >
   > P
   >
   > G
   >
   > A
   >
   > p3
   >
   > 9.757E-05
   >
   > L
   >
   > M
   >
   > Y
   >
   > Q
   >
   > I
   >
   > P
   >
   > E
   >
   > C
   >
   > p4
   >
   > 0.0002077
   >
   > R
   >
   > E
   >
   > Y
   >
   > L
   >
   > I
   >
   > S
   >
   > E
   >
   > A
   >
   >
   >
   > If I name this table as myfile.txt, I have the following scripts to
   > calculate the ratio of each amino acid residue at position 1:
   >
   > # showing levels of the 3rd column, which means the types of residues
   >
   > >myfile[,3]
   >
   >
   >
   > # calculating the ratio of L
   >
   > >list=c(which(myfile[,3]=="L"))
   >
   > >time0total=sum(myfile[,2])
   >
   > >AA_L=0
   >
   > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
   >
   > >ratio_L=AA_L/time0total
   >
   >
   >
   > So how can I write a script to do the same thing for the other two levels
   > (T and R) in column 3, and also do this for every column that contains
   > amino acid residues?
   >
   >
   >
   > Many thanks for any help you could give me on this topic! :)
   >
   >
   >
   > Regards,
   >
   > Zhao
   > --
   > Zhao JIN
   > Ph.D. Candidate
   > Ruth Ley Lab
   > 467 Biotech
   > Field of Microbiology, Cornell University
   > Lab: 607.255.4954
   > Cell: 412.889.3675
   >

     >       [[alternative HTML version deleted]]
     >
     > ______________________________________________
     > [4][hidden email] mailing list
     > [5]https://stat.ethz.ch/mailman/listinfo/r-help
     > PLEASE do read the posting guide
     > [6]http://www.R-project.org/posting-guide.html
     > and provide commented, minimal, self-contained, reproducible code.
     ____________________________________________________________
     FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
     your desktop!
     Check it out at [7]http://www.inbox.com/marineaquarium

   --
   Zhao JIN
   Ph.D. Candidate
   Ruth Ley Lab
   467 Biotech
   Field of Microbiology, Cornell University
   Lab: 607.255.4954
   Cell: 412.889.3675
     _________________________________________________________________

   [8]3D Earth Screensaver Preview
   Free 3D Earth Screensaver
   Watch   the   Earth   right   on   your   desktop!  Check  it  out  at
   [9]www.inbox.com/earth

References

   1. mailto:[hidden email]
   2. mailto:[hidden email]
   3. mailto:[hidden email]
   4. mailto:[hidden email]
   5. https://stat.ethz.ch/mailman/listinfo/r-help
   6. http://www.R-project.org/posting-guide.html
   7. http://www.inbox.com/marineaquarium
   8. http://www.inbox.com/earth
   9. http://www.inbox.com/earth
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

Bert Gunter
The OP's request is a bit ambiguous to me: at a given residue, do you
wish to calculate the proportions for only those amino acids that
appear at that residue, or do you wish to include the proportions for
all amino acids, some of which might then be 0.

Assuming the former, then I don't think one needs to go to the lengths
described by John below.

Using your example (thanks!), the following seems to suffice:

> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))

$X1
x
   L    R    T
0.50 0.25 0.25

$X2
x
   E    M
0.75 0.25

$X3
x
   N    Y
0.25 0.75

$X4
x
   I    L    Q
0.25 0.50 0.25

$X5
x
   I    V
0.75 0.25

$X6
x
   P    S
0.75 0.25

$X7
x
   D    E    G
0.25 0.50 0.25

$X8
x
   A    C
0.75 0.25


This could, of course, then be modified to add zero proportions for
all non-appearing amino acids.

-- Cheers,
Bert

On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote:

>
>    I think this does what you want using two packages, plyr and reshape2 that
>    you may have to install.  If so install.packages("plyr", "reshape2") should
>    do the trick.
>    library(plyr)
>    library(reshape2)
>    # using supplied file 'myfile" from below
>    time0total = sum(myfile[,2])
>    mydata  <-  myfile[, 2:10]
>    md1  <-  melt(mydata, id = "Time_zero")
>    ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)
>
>
>    John Kane
>    Kingston ON Canada
>
>    -----Original Message-----
>    From: [hidden email]
>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
>    To: [hidden email]
>    Subject: Re: [R] How to do the same thing for all levels of a column?
>
>    Hi John,
>    Thank you for the tips. My apologies about the unreadable sample data...
>    So here is the output of the sample data, and hopefully it works this time
>    :)
>    myfile  <-  structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
>    1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5 =
>    structure(c(1L,
>    2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
>    1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8 =
>    structure(c(1L,
>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names =
>    c("Proteins",
>    "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
>    c(NA,
>    4L), class = "data.frame")
>    And here is my original question:
>    Basically, I have a bunch of protein sequences composed of different amino
>    acid residues, and each residue is represented by an uppercase letter. I
>    want  to  calculate the ratio of different amino acid residues at each
>    position of the proteins.
>
>    If  I  name  this table as myfile.txt, I have the following scripts to
>    calculate the ratio of each amino acid residue at position 1:
>
>    # showing levels of the 3rd column, which means the types of residues
>
>    >myfile[,3]
>
>
>    # calculating the ratio of L
>
>    >list=c(which(myfile[,3]=="L"))
>
>    >time0total=sum(myfile[,2])
>
>    >AA_L=0
>
>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>
>    >ratio_L=AA_L/time0total
>
>
>    So how can I write a script to do the same thing for the other two levels (T
>    and R) in column 3, and also do this for every column that contains amino
>    acid residues?
>
>    Thanks a lot!
>
>    Regards,
>
>    Zhao
>    2012/7/24 John Kane <[1][hidden email]>
>
>      First thing is to supply the data in a useable format.  As is it is
>      essenatially unreadable.  All R-beginners do this. :)
>      Have a look at the dput function  (?dput) for a good way to supply sample
>      data in an email.
>      If you have a large dataset probably a few dozen lines of data would be
>      fine.
>      Something like dput(head(mydata)) should be fine.  Just copy and paste the
>      output into your email.
>      Welcome to R.  I think you will like it.
>      John Kane
>      Kingston ON Canada
>
>    > -----Original Message-----
>    > From: [2][hidden email]
>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
>    > To: [3][hidden email]
>    > Subject: [R] How to do the same thing for all levels of a column?
>    >
>    > Dear all,
>    >
>    >
>    >
>    > I am a R beginner, and I am looking for a way to do the same thing for
>    > all
>    > levels of a column in a table.
>    >
>    >
>    >
>    > Basically, I have a bunch of protein sequences composed of different
>    > amino
>    > acid residues, and each residue is represented by an uppercase letter. I
>    > want to calculate the ratio of different amino acid residues at each
>    > position of the proteins. Here is an example table:
>    >
>    > Proteins
>    >
>    > Time_zero
>    >
>    > 1
>    >
>    > 2
>    >
>    > 3
>    >
>    > 4
>    >
>    > 5
>    >
>    > 6
>    >
>    > 7
>    >
>    > 8
>    >
>    > p1
>    >
>    > 0.0050723
>    >
>    > L
>    >
>    > E
>    >
>    > Y
>    >
>    > I
>    >
>    > I
>    >
>    > P
>    >
>    > D
>    >
>    > A
>    >
>    > p2
>    >
>    > 0.0002731
>    >
>    > T
>    >
>    > E
>    >
>    > N
>    >
>    > L
>    >
>    > V
>    >
>    > P
>    >
>    > G
>    >
>    > A
>    >
>    > p3
>    >
>    > 9.757E-05
>    >
>    > L
>    >
>    > M
>    >
>    > Y
>    >
>    > Q
>    >
>    > I
>    >
>    > P
>    >
>    > E
>    >
>    > C
>    >
>    > p4
>    >
>    > 0.0002077
>    >
>    > R
>    >
>    > E
>    >
>    > Y
>    >
>    > L
>    >
>    > I
>    >
>    > S
>    >
>    > E
>    >
>    > A
>    >
>    >
>    >
>    > If I name this table as myfile.txt, I have the following scripts to
>    > calculate the ratio of each amino acid residue at position 1:
>    >
>    > # showing levels of the 3rd column, which means the types of residues
>    >
>    > >myfile[,3]
>    >
>    >
>    >
>    > # calculating the ratio of L
>    >
>    > >list=c(which(myfile[,3]=="L"))
>    >
>    > >time0total=sum(myfile[,2])
>    >
>    > >AA_L=0
>    >
>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>    >
>    > >ratio_L=AA_L/time0total
>    >
>    >
>    >
>    > So how can I write a script to do the same thing for the other two levels
>    > (T and R) in column 3, and also do this for every column that contains
>    > amino acid residues?
>    >
>    >
>    >
>    > Many thanks for any help you could give me on this topic! :)
>    >
>    >
>    >
>    > Regards,
>    >
>    > Zhao
>    > --
>    > Zhao JIN
>    > Ph.D. Candidate
>    > Ruth Ley Lab
>    > 467 Biotech
>    > Field of Microbiology, Cornell University
>    > Lab: 607.255.4954
>    > Cell: 412.889.3675
>    >
>
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > [4][hidden email] mailing list
>      > [5]https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>      > [6]http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
>      ____________________________________________________________
>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>      your desktop!
>      Check it out at [7]http://www.inbox.com/marineaquarium
>
>    --
>    Zhao JIN
>    Ph.D. Candidate
>    Ruth Ley Lab
>    467 Biotech
>    Field of Microbiology, Cornell University
>    Lab: 607.255.4954
>    Cell: 412.889.3675
>      _________________________________________________________________
>
>    [8]3D Earth Screensaver Preview
>    Free 3D Earth Screensaver
>    Watch   the   Earth   right   on   your   desktop!  Check  it  out  at
>    [9]www.inbox.com/earth
>
> References
>
>    1. mailto:[hidden email]
>    2. mailto:[hidden email]
>    3. mailto:[hidden email]
>    4. mailto:[hidden email]
>    5. https://stat.ethz.ch/mailman/listinfo/r-help
>    6. http://www.R-project.org/posting-guide.html
>    7. http://www.inbox.com/marineaquarium
>    8. http://www.inbox.com/earth
>    9. http://www.inbox.com/earth
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

Bert Gunter
OK, I admit it: I re-read what you wrote and now I'm confused. Is:

> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))

            X1       X2        X3       X4     X5  X6    X7  X8
[1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
[2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
[3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
[4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2

what you want?

-- Bert
On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote:

> The OP's request is a bit ambiguous to me: at a given residue, do you
> wish to calculate the proportions for only those amino acids that
> appear at that residue, or do you wish to include the proportions for
> all amino acids, some of which might then be 0.
>
> Assuming the former, then I don't think one needs to go to the lengths
> described by John below.
>
> Using your example (thanks!), the following seems to suffice:
>
>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
>
> $X1
> x
>    L    R    T
> 0.50 0.25 0.25
>
> $X2
> x
>    E    M
> 0.75 0.25
>
> $X3
> x
>    N    Y
> 0.25 0.75
>
> $X4
> x
>    I    L    Q
> 0.25 0.50 0.25
>
> $X5
> x
>    I    V
> 0.75 0.25
>
> $X6
> x
>    P    S
> 0.75 0.25
>
> $X7
> x
>    D    E    G
> 0.25 0.50 0.25
>
> $X8
> x
>    A    C
> 0.75 0.25
>
>
> This could, of course, then be modified to add zero proportions for
> all non-appearing amino acids.
>
> -- Cheers,
> Bert
>
> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote:
>>
>>    I think this does what you want using two packages, plyr and reshape2 that
>>    you may have to install.  If so install.packages("plyr", "reshape2") should
>>    do the trick.
>>    library(plyr)
>>    library(reshape2)
>>    # using supplied file 'myfile" from below
>>    time0total = sum(myfile[,2])
>>    mydata  <-  myfile[, 2:10]
>>    md1  <-  melt(mydata, id = "Time_zero")
>>    ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)
>>
>>
>>    John Kane
>>    Kingston ON Canada
>>
>>    -----Original Message-----
>>    From: [hidden email]
>>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
>>    To: [hidden email]
>>    Subject: Re: [R] How to do the same thing for all levels of a column?
>>
>>    Hi John,
>>    Thank you for the tips. My apologies about the unreadable sample data...
>>    So here is the output of the sample data, and hopefully it works this time
>>    :)
>>    myfile  <-  structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
>>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
>>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
>>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
>>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
>>    1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
>>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5 =
>>    structure(c(1L,
>>    2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
>>    1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
>>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8 =
>>    structure(c(1L,
>>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names =
>>    c("Proteins",
>>    "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
>>    c(NA,
>>    4L), class = "data.frame")
>>    And here is my original question:
>>    Basically, I have a bunch of protein sequences composed of different amino
>>    acid residues, and each residue is represented by an uppercase letter. I
>>    want  to  calculate the ratio of different amino acid residues at each
>>    position of the proteins.
>>
>>    If  I  name  this table as myfile.txt, I have the following scripts to
>>    calculate the ratio of each amino acid residue at position 1:
>>
>>    # showing levels of the 3rd column, which means the types of residues
>>
>>    >myfile[,3]
>>
>>
>>    # calculating the ratio of L
>>
>>    >list=c(which(myfile[,3]=="L"))
>>
>>    >time0total=sum(myfile[,2])
>>
>>    >AA_L=0
>>
>>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>
>>    >ratio_L=AA_L/time0total
>>
>>
>>    So how can I write a script to do the same thing for the other two levels (T
>>    and R) in column 3, and also do this for every column that contains amino
>>    acid residues?
>>
>>    Thanks a lot!
>>
>>    Regards,
>>
>>    Zhao
>>    2012/7/24 John Kane <[1][hidden email]>
>>
>>      First thing is to supply the data in a useable format.  As is it is
>>      essenatially unreadable.  All R-beginners do this. :)
>>      Have a look at the dput function  (?dput) for a good way to supply sample
>>      data in an email.
>>      If you have a large dataset probably a few dozen lines of data would be
>>      fine.
>>      Something like dput(head(mydata)) should be fine.  Just copy and paste the
>>      output into your email.
>>      Welcome to R.  I think you will like it.
>>      John Kane
>>      Kingston ON Canada
>>
>>    > -----Original Message-----
>>    > From: [2][hidden email]
>>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
>>    > To: [3][hidden email]
>>    > Subject: [R] How to do the same thing for all levels of a column?
>>    >
>>    > Dear all,
>>    >
>>    >
>>    >
>>    > I am a R beginner, and I am looking for a way to do the same thing for
>>    > all
>>    > levels of a column in a table.
>>    >
>>    >
>>    >
>>    > Basically, I have a bunch of protein sequences composed of different
>>    > amino
>>    > acid residues, and each residue is represented by an uppercase letter. I
>>    > want to calculate the ratio of different amino acid residues at each
>>    > position of the proteins. Here is an example table:
>>    >
>>    > Proteins
>>    >
>>    > Time_zero
>>    >
>>    > 1
>>    >
>>    > 2
>>    >
>>    > 3
>>    >
>>    > 4
>>    >
>>    > 5
>>    >
>>    > 6
>>    >
>>    > 7
>>    >
>>    > 8
>>    >
>>    > p1
>>    >
>>    > 0.0050723
>>    >
>>    > L
>>    >
>>    > E
>>    >
>>    > Y
>>    >
>>    > I
>>    >
>>    > I
>>    >
>>    > P
>>    >
>>    > D
>>    >
>>    > A
>>    >
>>    > p2
>>    >
>>    > 0.0002731
>>    >
>>    > T
>>    >
>>    > E
>>    >
>>    > N
>>    >
>>    > L
>>    >
>>    > V
>>    >
>>    > P
>>    >
>>    > G
>>    >
>>    > A
>>    >
>>    > p3
>>    >
>>    > 9.757E-05
>>    >
>>    > L
>>    >
>>    > M
>>    >
>>    > Y
>>    >
>>    > Q
>>    >
>>    > I
>>    >
>>    > P
>>    >
>>    > E
>>    >
>>    > C
>>    >
>>    > p4
>>    >
>>    > 0.0002077
>>    >
>>    > R
>>    >
>>    > E
>>    >
>>    > Y
>>    >
>>    > L
>>    >
>>    > I
>>    >
>>    > S
>>    >
>>    > E
>>    >
>>    > A
>>    >
>>    >
>>    >
>>    > If I name this table as myfile.txt, I have the following scripts to
>>    > calculate the ratio of each amino acid residue at position 1:
>>    >
>>    > # showing levels of the 3rd column, which means the types of residues
>>    >
>>    > >myfile[,3]
>>    >
>>    >
>>    >
>>    > # calculating the ratio of L
>>    >
>>    > >list=c(which(myfile[,3]=="L"))
>>    >
>>    > >time0total=sum(myfile[,2])
>>    >
>>    > >AA_L=0
>>    >
>>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>    >
>>    > >ratio_L=AA_L/time0total
>>    >
>>    >
>>    >
>>    > So how can I write a script to do the same thing for the other two levels
>>    > (T and R) in column 3, and also do this for every column that contains
>>    > amino acid residues?
>>    >
>>    >
>>    >
>>    > Many thanks for any help you could give me on this topic! :)
>>    >
>>    >
>>    >
>>    > Regards,
>>    >
>>    > Zhao
>>    > --
>>    > Zhao JIN
>>    > Ph.D. Candidate
>>    > Ruth Ley Lab
>>    > 467 Biotech
>>    > Field of Microbiology, Cornell University
>>    > Lab: 607.255.4954
>>    > Cell: 412.889.3675
>>    >
>>
>>      >       [[alternative HTML version deleted]]
>>      >
>>      > ______________________________________________
>>      > [4][hidden email] mailing list
>>      > [5]https://stat.ethz.ch/mailman/listinfo/r-help
>>      > PLEASE do read the posting guide
>>      > [6]http://www.R-project.org/posting-guide.html
>>      > and provide commented, minimal, self-contained, reproducible code.
>>      ____________________________________________________________
>>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>>      your desktop!
>>      Check it out at [7]http://www.inbox.com/marineaquarium
>>
>>    --
>>    Zhao JIN
>>    Ph.D. Candidate
>>    Ruth Ley Lab
>>    467 Biotech
>>    Field of Microbiology, Cornell University
>>    Lab: 607.255.4954
>>    Cell: 412.889.3675
>>      _________________________________________________________________
>>
>>    [8]3D Earth Screensaver Preview
>>    Free 3D Earth Screensaver
>>    Watch   the   Earth   right   on   your   desktop!  Check  it  out  at
>>    [9]www.inbox.com/earth
>>
>> References
>>
>>    1. mailto:[hidden email]
>>    2. mailto:[hidden email]
>>    3. mailto:[hidden email]
>>    4. mailto:[hidden email]
>>    5. https://stat.ethz.ch/mailman/listinfo/r-help
>>    6. http://www.R-project.org/posting-guide.html
>>    7. http://www.inbox.com/marineaquarium
>>    8. http://www.inbox.com/earth
>>    9. http://www.inbox.com/earth
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

Bert Gunter
Sorry. Typo in my previous. Should be:

> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum)))
$X1
         L          R          T
0.91491320 0.03675651 0.04833030

$X2
        E         M
0.9827278 0.0172722

$X3
        N         Y
0.0483303 0.9516697

$X4
        I         L         Q
0.8976410 0.0850868 0.0172722

$X5
        I         V
0.9516697 0.0483303

$X6
         P          S
0.96324349 0.03675651

$X7
        D         E         G
0.8976410 0.0540287 0.0483303

$X8
        A         C
0.9827278 0.0172722



On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote:

> OK, I admit it: I re-read what you wrote and now I'm confused. Is:
>
>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
>
>             X1       X2        X3       X4     X5  X6    X7  X8
> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
>
> what you want?
>
> -- Bert
> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote:
>> The OP's request is a bit ambiguous to me: at a given residue, do you
>> wish to calculate the proportions for only those amino acids that
>> appear at that residue, or do you wish to include the proportions for
>> all amino acids, some of which might then be 0.
>>
>> Assuming the former, then I don't think one needs to go to the lengths
>> described by John below.
>>
>> Using your example (thanks!), the following seems to suffice:
>>
>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
>>
>> $X1
>> x
>>    L    R    T
>> 0.50 0.25 0.25
>>
>> $X2
>> x
>>    E    M
>> 0.75 0.25
>>
>> $X3
>> x
>>    N    Y
>> 0.25 0.75
>>
>> $X4
>> x
>>    I    L    Q
>> 0.25 0.50 0.25
>>
>> $X5
>> x
>>    I    V
>> 0.75 0.25
>>
>> $X6
>> x
>>    P    S
>> 0.75 0.25
>>
>> $X7
>> x
>>    D    E    G
>> 0.25 0.50 0.25
>>
>> $X8
>> x
>>    A    C
>> 0.75 0.25
>>
>>
>> This could, of course, then be modified to add zero proportions for
>> all non-appearing amino acids.
>>
>> -- Cheers,
>> Bert
>>
>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote:
>>>
>>>    I think this does what you want using two packages, plyr and reshape2 that
>>>    you may have to install.  If so install.packages("plyr", "reshape2") should
>>>    do the trick.
>>>    library(plyr)
>>>    library(reshape2)
>>>    # using supplied file 'myfile" from below
>>>    time0total = sum(myfile[,2])
>>>    mydata  <-  myfile[, 2:10]
>>>    md1  <-  melt(mydata, id = "Time_zero")
>>>    ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)
>>>
>>>
>>>    John Kane
>>>    Kingston ON Canada
>>>
>>>    -----Original Message-----
>>>    From: [hidden email]
>>>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
>>>    To: [hidden email]
>>>    Subject: Re: [R] How to do the same thing for all levels of a column?
>>>
>>>    Hi John,
>>>    Thank you for the tips. My apologies about the unreadable sample data...
>>>    So here is the output of the sample data, and hopefully it works this time
>>>    :)
>>>    myfile  <-  structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
>>>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
>>>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
>>>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
>>>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
>>>    1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
>>>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5 =
>>>    structure(c(1L,
>>>    2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
>>>    1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
>>>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8 =
>>>    structure(c(1L,
>>>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names =
>>>    c("Proteins",
>>>    "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
>>>    c(NA,
>>>    4L), class = "data.frame")
>>>    And here is my original question:
>>>    Basically, I have a bunch of protein sequences composed of different amino
>>>    acid residues, and each residue is represented by an uppercase letter. I
>>>    want  to  calculate the ratio of different amino acid residues at each
>>>    position of the proteins.
>>>
>>>    If  I  name  this table as myfile.txt, I have the following scripts to
>>>    calculate the ratio of each amino acid residue at position 1:
>>>
>>>    # showing levels of the 3rd column, which means the types of residues
>>>
>>>    >myfile[,3]
>>>
>>>
>>>    # calculating the ratio of L
>>>
>>>    >list=c(which(myfile[,3]=="L"))
>>>
>>>    >time0total=sum(myfile[,2])
>>>
>>>    >AA_L=0
>>>
>>>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>>
>>>    >ratio_L=AA_L/time0total
>>>
>>>
>>>    So how can I write a script to do the same thing for the other two levels (T
>>>    and R) in column 3, and also do this for every column that contains amino
>>>    acid residues?
>>>
>>>    Thanks a lot!
>>>
>>>    Regards,
>>>
>>>    Zhao
>>>    2012/7/24 John Kane <[1][hidden email]>
>>>
>>>      First thing is to supply the data in a useable format.  As is it is
>>>      essenatially unreadable.  All R-beginners do this. :)
>>>      Have a look at the dput function  (?dput) for a good way to supply sample
>>>      data in an email.
>>>      If you have a large dataset probably a few dozen lines of data would be
>>>      fine.
>>>      Something like dput(head(mydata)) should be fine.  Just copy and paste the
>>>      output into your email.
>>>      Welcome to R.  I think you will like it.
>>>      John Kane
>>>      Kingston ON Canada
>>>
>>>    > -----Original Message-----
>>>    > From: [2][hidden email]
>>>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
>>>    > To: [3][hidden email]
>>>    > Subject: [R] How to do the same thing for all levels of a column?
>>>    >
>>>    > Dear all,
>>>    >
>>>    >
>>>    >
>>>    > I am a R beginner, and I am looking for a way to do the same thing for
>>>    > all
>>>    > levels of a column in a table.
>>>    >
>>>    >
>>>    >
>>>    > Basically, I have a bunch of protein sequences composed of different
>>>    > amino
>>>    > acid residues, and each residue is represented by an uppercase letter. I
>>>    > want to calculate the ratio of different amino acid residues at each
>>>    > position of the proteins. Here is an example table:
>>>    >
>>>    > Proteins
>>>    >
>>>    > Time_zero
>>>    >
>>>    > 1
>>>    >
>>>    > 2
>>>    >
>>>    > 3
>>>    >
>>>    > 4
>>>    >
>>>    > 5
>>>    >
>>>    > 6
>>>    >
>>>    > 7
>>>    >
>>>    > 8
>>>    >
>>>    > p1
>>>    >
>>>    > 0.0050723
>>>    >
>>>    > L
>>>    >
>>>    > E
>>>    >
>>>    > Y
>>>    >
>>>    > I
>>>    >
>>>    > I
>>>    >
>>>    > P
>>>    >
>>>    > D
>>>    >
>>>    > A
>>>    >
>>>    > p2
>>>    >
>>>    > 0.0002731
>>>    >
>>>    > T
>>>    >
>>>    > E
>>>    >
>>>    > N
>>>    >
>>>    > L
>>>    >
>>>    > V
>>>    >
>>>    > P
>>>    >
>>>    > G
>>>    >
>>>    > A
>>>    >
>>>    > p3
>>>    >
>>>    > 9.757E-05
>>>    >
>>>    > L
>>>    >
>>>    > M
>>>    >
>>>    > Y
>>>    >
>>>    > Q
>>>    >
>>>    > I
>>>    >
>>>    > P
>>>    >
>>>    > E
>>>    >
>>>    > C
>>>    >
>>>    > p4
>>>    >
>>>    > 0.0002077
>>>    >
>>>    > R
>>>    >
>>>    > E
>>>    >
>>>    > Y
>>>    >
>>>    > L
>>>    >
>>>    > I
>>>    >
>>>    > S
>>>    >
>>>    > E
>>>    >
>>>    > A
>>>    >
>>>    >
>>>    >
>>>    > If I name this table as myfile.txt, I have the following scripts to
>>>    > calculate the ratio of each amino acid residue at position 1:
>>>    >
>>>    > # showing levels of the 3rd column, which means the types of residues
>>>    >
>>>    > >myfile[,3]
>>>    >
>>>    >
>>>    >
>>>    > # calculating the ratio of L
>>>    >
>>>    > >list=c(which(myfile[,3]=="L"))
>>>    >
>>>    > >time0total=sum(myfile[,2])
>>>    >
>>>    > >AA_L=0
>>>    >
>>>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>>    >
>>>    > >ratio_L=AA_L/time0total
>>>    >
>>>    >
>>>    >
>>>    > So how can I write a script to do the same thing for the other two levels
>>>    > (T and R) in column 3, and also do this for every column that contains
>>>    > amino acid residues?
>>>    >
>>>    >
>>>    >
>>>    > Many thanks for any help you could give me on this topic! :)
>>>    >
>>>    >
>>>    >
>>>    > Regards,
>>>    >
>>>    > Zhao
>>>    > --
>>>    > Zhao JIN
>>>    > Ph.D. Candidate
>>>    > Ruth Ley Lab
>>>    > 467 Biotech
>>>    > Field of Microbiology, Cornell University
>>>    > Lab: 607.255.4954
>>>    > Cell: 412.889.3675
>>>    >
>>>
>>>      >       [[alternative HTML version deleted]]
>>>      >
>>>      > ______________________________________________
>>>      > [4][hidden email] mailing list
>>>      > [5]https://stat.ethz.ch/mailman/listinfo/r-help
>>>      > PLEASE do read the posting guide
>>>      > [6]http://www.R-project.org/posting-guide.html
>>>      > and provide commented, minimal, self-contained, reproducible code.
>>>      ____________________________________________________________
>>>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>>>      your desktop!
>>>      Check it out at [7]http://www.inbox.com/marineaquarium
>>>
>>>    --
>>>    Zhao JIN
>>>    Ph.D. Candidate
>>>    Ruth Ley Lab
>>>    467 Biotech
>>>    Field of Microbiology, Cornell University
>>>    Lab: 607.255.4954
>>>    Cell: 412.889.3675
>>>      _________________________________________________________________
>>>
>>>    [8]3D Earth Screensaver Preview
>>>    Free 3D Earth Screensaver
>>>    Watch   the   Earth   right   on   your   desktop!  Check  it  out  at
>>>    [9]www.inbox.com/earth
>>>
>>> References
>>>
>>>    1. mailto:[hidden email]
>>>    2. mailto:[hidden email]
>>>    3. mailto:[hidden email]
>>>    4. mailto:[hidden email]
>>>    5. https://stat.ethz.ch/mailman/listinfo/r-help
>>>    6. http://www.R-project.org/posting-guide.html
>>>    7. http://www.inbox.com/marineaquarium
>>>    8. http://www.inbox.com/earth
>>>    9. http://www.inbox.com/earth
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

Bert Gunter
... and I neglected to mention that f = myfiles[,2]

Sigh....  More coffee needed.

-- Bert

On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[hidden email]> wrote:

> Sorry. Typo in my previous. Should be:
>
>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum)))
> $X1
>          L          R          T
> 0.91491320 0.03675651 0.04833030
>
> $X2
>         E         M
> 0.9827278 0.0172722
>
> $X3
>         N         Y
> 0.0483303 0.9516697
>
> $X4
>         I         L         Q
> 0.8976410 0.0850868 0.0172722
>
> $X5
>         I         V
> 0.9516697 0.0483303
>
> $X6
>          P          S
> 0.96324349 0.03675651
>
> $X7
>         D         E         G
> 0.8976410 0.0540287 0.0483303
>
> $X8
>         A         C
> 0.9827278 0.0172722
>
>
>
> On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote:
>> OK, I admit it: I re-read what you wrote and now I'm confused. Is:
>>
>>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
>>
>>             X1       X2        X3       X4     X5  X6    X7  X8
>> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
>> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
>> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
>> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
>>
>> what you want?
>>
>> -- Bert
>> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote:
>>> The OP's request is a bit ambiguous to me: at a given residue, do you
>>> wish to calculate the proportions for only those amino acids that
>>> appear at that residue, or do you wish to include the proportions for
>>> all amino acids, some of which might then be 0.
>>>
>>> Assuming the former, then I don't think one needs to go to the lengths
>>> described by John below.
>>>
>>> Using your example (thanks!), the following seems to suffice:
>>>
>>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
>>>
>>> $X1
>>> x
>>>    L    R    T
>>> 0.50 0.25 0.25
>>>
>>> $X2
>>> x
>>>    E    M
>>> 0.75 0.25
>>>
>>> $X3
>>> x
>>>    N    Y
>>> 0.25 0.75
>>>
>>> $X4
>>> x
>>>    I    L    Q
>>> 0.25 0.50 0.25
>>>
>>> $X5
>>> x
>>>    I    V
>>> 0.75 0.25
>>>
>>> $X6
>>> x
>>>    P    S
>>> 0.75 0.25
>>>
>>> $X7
>>> x
>>>    D    E    G
>>> 0.25 0.50 0.25
>>>
>>> $X8
>>> x
>>>    A    C
>>> 0.75 0.25
>>>
>>>
>>> This could, of course, then be modified to add zero proportions for
>>> all non-appearing amino acids.
>>>
>>> -- Cheers,
>>> Bert
>>>
>>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]> wrote:
>>>>
>>>>    I think this does what you want using two packages, plyr and reshape2 that
>>>>    you may have to install.  If so install.packages("plyr", "reshape2") should
>>>>    do the trick.
>>>>    library(plyr)
>>>>    library(reshape2)
>>>>    # using supplied file 'myfile" from below
>>>>    time0total = sum(myfile[,2])
>>>>    mydata  <-  myfile[, 2:10]
>>>>    md1  <-  melt(mydata, id = "Time_zero")
>>>>    ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total)
>>>>
>>>>
>>>>    John Kane
>>>>    Kingston ON Canada
>>>>
>>>>    -----Original Message-----
>>>>    From: [hidden email]
>>>>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
>>>>    To: [hidden email]
>>>>    Subject: Re: [R] How to do the same thing for all levels of a column?
>>>>
>>>>    Hi John,
>>>>    Thank you for the tips. My apologies about the unreadable sample data...
>>>>    So here is the output of the sample data, and hopefully it works this time
>>>>    :)
>>>>    myfile  <-  structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
>>>>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
>>>>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
>>>>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
>>>>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
>>>>    1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
>>>>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5 =
>>>>    structure(c(1L,
>>>>    2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
>>>>    1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
>>>>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8 =
>>>>    structure(c(1L,
>>>>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names =
>>>>    c("Proteins",
>>>>    "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
>>>>    c(NA,
>>>>    4L), class = "data.frame")
>>>>    And here is my original question:
>>>>    Basically, I have a bunch of protein sequences composed of different amino
>>>>    acid residues, and each residue is represented by an uppercase letter. I
>>>>    want  to  calculate the ratio of different amino acid residues at each
>>>>    position of the proteins.
>>>>
>>>>    If  I  name  this table as myfile.txt, I have the following scripts to
>>>>    calculate the ratio of each amino acid residue at position 1:
>>>>
>>>>    # showing levels of the 3rd column, which means the types of residues
>>>>
>>>>    >myfile[,3]
>>>>
>>>>
>>>>    # calculating the ratio of L
>>>>
>>>>    >list=c(which(myfile[,3]=="L"))
>>>>
>>>>    >time0total=sum(myfile[,2])
>>>>
>>>>    >AA_L=0
>>>>
>>>>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>>>
>>>>    >ratio_L=AA_L/time0total
>>>>
>>>>
>>>>    So how can I write a script to do the same thing for the other two levels (T
>>>>    and R) in column 3, and also do this for every column that contains amino
>>>>    acid residues?
>>>>
>>>>    Thanks a lot!
>>>>
>>>>    Regards,
>>>>
>>>>    Zhao
>>>>    2012/7/24 John Kane <[1][hidden email]>
>>>>
>>>>      First thing is to supply the data in a useable format.  As is it is
>>>>      essenatially unreadable.  All R-beginners do this. :)
>>>>      Have a look at the dput function  (?dput) for a good way to supply sample
>>>>      data in an email.
>>>>      If you have a large dataset probably a few dozen lines of data would be
>>>>      fine.
>>>>      Something like dput(head(mydata)) should be fine.  Just copy and paste the
>>>>      output into your email.
>>>>      Welcome to R.  I think you will like it.
>>>>      John Kane
>>>>      Kingston ON Canada
>>>>
>>>>    > -----Original Message-----
>>>>    > From: [2][hidden email]
>>>>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
>>>>    > To: [3][hidden email]
>>>>    > Subject: [R] How to do the same thing for all levels of a column?
>>>>    >
>>>>    > Dear all,
>>>>    >
>>>>    >
>>>>    >
>>>>    > I am a R beginner, and I am looking for a way to do the same thing for
>>>>    > all
>>>>    > levels of a column in a table.
>>>>    >
>>>>    >
>>>>    >
>>>>    > Basically, I have a bunch of protein sequences composed of different
>>>>    > amino
>>>>    > acid residues, and each residue is represented by an uppercase letter. I
>>>>    > want to calculate the ratio of different amino acid residues at each
>>>>    > position of the proteins. Here is an example table:
>>>>    >
>>>>    > Proteins
>>>>    >
>>>>    > Time_zero
>>>>    >
>>>>    > 1
>>>>    >
>>>>    > 2
>>>>    >
>>>>    > 3
>>>>    >
>>>>    > 4
>>>>    >
>>>>    > 5
>>>>    >
>>>>    > 6
>>>>    >
>>>>    > 7
>>>>    >
>>>>    > 8
>>>>    >
>>>>    > p1
>>>>    >
>>>>    > 0.0050723
>>>>    >
>>>>    > L
>>>>    >
>>>>    > E
>>>>    >
>>>>    > Y
>>>>    >
>>>>    > I
>>>>    >
>>>>    > I
>>>>    >
>>>>    > P
>>>>    >
>>>>    > D
>>>>    >
>>>>    > A
>>>>    >
>>>>    > p2
>>>>    >
>>>>    > 0.0002731
>>>>    >
>>>>    > T
>>>>    >
>>>>    > E
>>>>    >
>>>>    > N
>>>>    >
>>>>    > L
>>>>    >
>>>>    > V
>>>>    >
>>>>    > P
>>>>    >
>>>>    > G
>>>>    >
>>>>    > A
>>>>    >
>>>>    > p3
>>>>    >
>>>>    > 9.757E-05
>>>>    >
>>>>    > L
>>>>    >
>>>>    > M
>>>>    >
>>>>    > Y
>>>>    >
>>>>    > Q
>>>>    >
>>>>    > I
>>>>    >
>>>>    > P
>>>>    >
>>>>    > E
>>>>    >
>>>>    > C
>>>>    >
>>>>    > p4
>>>>    >
>>>>    > 0.0002077
>>>>    >
>>>>    > R
>>>>    >
>>>>    > E
>>>>    >
>>>>    > Y
>>>>    >
>>>>    > L
>>>>    >
>>>>    > I
>>>>    >
>>>>    > S
>>>>    >
>>>>    > E
>>>>    >
>>>>    > A
>>>>    >
>>>>    >
>>>>    >
>>>>    > If I name this table as myfile.txt, I have the following scripts to
>>>>    > calculate the ratio of each amino acid residue at position 1:
>>>>    >
>>>>    > # showing levels of the 3rd column, which means the types of residues
>>>>    >
>>>>    > >myfile[,3]
>>>>    >
>>>>    >
>>>>    >
>>>>    > # calculating the ratio of L
>>>>    >
>>>>    > >list=c(which(myfile[,3]=="L"))
>>>>    >
>>>>    > >time0total=sum(myfile[,2])
>>>>    >
>>>>    > >AA_L=0
>>>>    >
>>>>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
>>>>    >
>>>>    > >ratio_L=AA_L/time0total
>>>>    >
>>>>    >
>>>>    >
>>>>    > So how can I write a script to do the same thing for the other two levels
>>>>    > (T and R) in column 3, and also do this for every column that contains
>>>>    > amino acid residues?
>>>>    >
>>>>    >
>>>>    >
>>>>    > Many thanks for any help you could give me on this topic! :)
>>>>    >
>>>>    >
>>>>    >
>>>>    > Regards,
>>>>    >
>>>>    > Zhao
>>>>    > --
>>>>    > Zhao JIN
>>>>    > Ph.D. Candidate
>>>>    > Ruth Ley Lab
>>>>    > 467 Biotech
>>>>    > Field of Microbiology, Cornell University
>>>>    > Lab: 607.255.4954
>>>>    > Cell: 412.889.3675
>>>>    >
>>>>
>>>>      >       [[alternative HTML version deleted]]
>>>>      >
>>>>      > ______________________________________________
>>>>      > [4][hidden email] mailing list
>>>>      > [5]https://stat.ethz.ch/mailman/listinfo/r-help
>>>>      > PLEASE do read the posting guide
>>>>      > [6]http://www.R-project.org/posting-guide.html
>>>>      > and provide commented, minimal, self-contained, reproducible code.
>>>>      ____________________________________________________________
>>>>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
>>>>      your desktop!
>>>>      Check it out at [7]http://www.inbox.com/marineaquarium
>>>>
>>>>    --
>>>>    Zhao JIN
>>>>    Ph.D. Candidate
>>>>    Ruth Ley Lab
>>>>    467 Biotech
>>>>    Field of Microbiology, Cornell University
>>>>    Lab: 607.255.4954
>>>>    Cell: 412.889.3675
>>>>      _________________________________________________________________
>>>>
>>>>    [8]3D Earth Screensaver Preview
>>>>    Free 3D Earth Screensaver
>>>>    Watch   the   Earth   right   on   your   desktop!  Check  it  out  at
>>>>    [9]www.inbox.com/earth
>>>>
>>>> References
>>>>
>>>>    1. mailto:[hidden email]
>>>>    2. mailto:[hidden email]
>>>>    3. mailto:[hidden email]
>>>>    4. mailto:[hidden email]
>>>>    5. https://stat.ethz.ch/mailman/listinfo/r-help
>>>>    6. http://www.R-project.org/posting-guide.html
>>>>    7. http://www.inbox.com/marineaquarium
>>>>    8. http://www.inbox.com/earth
>>>>    9. http://www.inbox.com/earth
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

Zhao Jin-2
Hi John and Bert,

Thank you so much for your replies. Both of your scripts worked well, so
now I've learnt two ways to do it. :)

Bert: I was not very clear on what I wanted to do. I just would like to
calculate the residues shown in the table, not all residues. The
*apply*functions
* *are amazing!

John: as I am still digesting the codes, I am not sure if I fully
understood the argument .(variables, value) in the *ddply* line. The
description of *ddply* says that .variables show the variables to split
data frame by, as quoted variables, a formula or character vector. So does
.(variables, value) tell R to split the data frame by values, which are the
types of amino acid residues?

Thank you all again.

Cheers,
Zhao



2012/7/24 Bert Gunter <[hidden email]>

> ... and I neglected to mention that f = myfiles[,2]
>
> Sigh....  More coffee needed.
>
> -- Bert
>
> On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[hidden email]> wrote:
> > Sorry. Typo in my previous. Should be:
> >
> >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum)))
> > $X1
> >          L          R          T
> > 0.91491320 0.03675651 0.04833030
> >
> > $X2
> >         E         M
> > 0.9827278 0.0172722
> >
> > $X3
> >         N         Y
> > 0.0483303 0.9516697
> >
> > $X4
> >         I         L         Q
> > 0.8976410 0.0850868 0.0172722
> >
> > $X5
> >         I         V
> > 0.9516697 0.0483303
> >
> > $X6
> >          P          S
> > 0.96324349 0.03675651
> >
> > $X7
> >         D         E         G
> > 0.8976410 0.0540287 0.0483303
> >
> > $X8
> >         A         C
> > 0.9827278 0.0172722
> >
> >
> >
> > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[hidden email]> wrote:
> >> OK, I admit it: I re-read what you wrote and now I'm confused. Is:
> >>
> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
> >>
> >>             X1       X2        X3       X4     X5  X6    X7  X8
> >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
> >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
> >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
> >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
> >>
> >> what you want?
> >>
> >> -- Bert
> >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[hidden email]> wrote:
> >>> The OP's request is a bit ambiguous to me: at a given residue, do you
> >>> wish to calculate the proportions for only those amino acids that
> >>> appear at that residue, or do you wish to include the proportions for
> >>> all amino acids, some of which might then be 0.
> >>>
> >>> Assuming the former, then I don't think one needs to go to the lengths
> >>> described by John below.
> >>>
> >>> Using your example (thanks!), the following seems to suffice:
> >>>
> >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
> >>>
> >>> $X1
> >>> x
> >>>    L    R    T
> >>> 0.50 0.25 0.25
> >>>
> >>> $X2
> >>> x
> >>>    E    M
> >>> 0.75 0.25
> >>>
> >>> $X3
> >>> x
> >>>    N    Y
> >>> 0.25 0.75
> >>>
> >>> $X4
> >>> x
> >>>    I    L    Q
> >>> 0.25 0.50 0.25
> >>>
> >>> $X5
> >>> x
> >>>    I    V
> >>> 0.75 0.25
> >>>
> >>> $X6
> >>> x
> >>>    P    S
> >>> 0.75 0.25
> >>>
> >>> $X7
> >>> x
> >>>    D    E    G
> >>> 0.25 0.50 0.25
> >>>
> >>> $X8
> >>> x
> >>>    A    C
> >>> 0.75 0.25
> >>>
> >>>
> >>> This could, of course, then be modified to add zero proportions for
> >>> all non-appearing amino acids.
> >>>
> >>> -- Cheers,
> >>> Bert
> >>>
> >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[hidden email]>
> wrote:
> >>>>
> >>>>    I think this does what you want using two packages, plyr and
> reshape2 that
> >>>>    you may have to install.  If so install.packages("plyr",
> "reshape2") should
> >>>>    do the trick.
> >>>>    library(plyr)
> >>>>    library(reshape2)
> >>>>    # using supplied file 'myfile" from below
> >>>>    time0total = sum(myfile[,2])
> >>>>    mydata  <-  myfile[, 2:10]
> >>>>    md1  <-  melt(mydata, id = "Time_zero")
> >>>>    ddply(md1, .(variable, value), summarise, sum =
> sum(Time_zero)/time0total)
> >>>>
> >>>>
> >>>>    John Kane
> >>>>    Kingston ON Canada
> >>>>
> >>>>    -----Original Message-----
> >>>>    From: [hidden email]
> >>>>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
> >>>>    To: [hidden email]
> >>>>    Subject: Re: [R] How to do the same thing for all levels of a
> column?
> >>>>
> >>>>    Hi John,
> >>>>    Thank you for the tips. My apologies about the unreadable sample
> data...
> >>>>    So here is the output of the sample data, and hopefully it works
> this time
> >>>>    :)
> >>>>    myfile  <-  structure(list(Proteins = structure(1:4, .Label =
> c("p1", "p2",
> >>>>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
> >>>>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label =
> c("L",
> >>>>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
> >>>>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
> >>>>    1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 =
> structure(c(1L,
> >>>>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"),
> X5 =
> >>>>    structure(c(1L,
> >>>>    2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 =
> structure(c(1L,
> >>>>    1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 =
> structure(c(1L,
> >>>>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"),
> X8 =
> >>>>    structure(c(1L,
> >>>>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")),
> .Names =
> >>>>    c("Proteins",
> >>>>    "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"),
> row.names =
> >>>>    c(NA,
> >>>>    4L), class = "data.frame")
> >>>>    And here is my original question:
> >>>>    Basically, I have a bunch of protein sequences composed of
> different amino
> >>>>    acid residues, and each residue is represented by an uppercase
> letter. I
> >>>>    want  to  calculate the ratio of different amino acid residues at
> each
> >>>>    position of the proteins.
> >>>>
> >>>>    If  I  name  this table as myfile.txt, I have the following
> scripts to
> >>>>    calculate the ratio of each amino acid residue at position 1:
> >>>>
> >>>>    # showing levels of the 3rd column, which means the types of
> residues
> >>>>
> >>>>    >myfile[,3]
> >>>>
> >>>>
> >>>>    # calculating the ratio of L
> >>>>
> >>>>    >list=c(which(myfile[,3]=="L"))
> >>>>
> >>>>    >time0total=sum(myfile[,2])
> >>>>
> >>>>    >AA_L=0
> >>>>
> >>>>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> >>>>
> >>>>    >ratio_L=AA_L/time0total
> >>>>
> >>>>
> >>>>    So how can I write a script to do the same thing for the other two
> levels (T
> >>>>    and R) in column 3, and also do this for every column that
> contains amino
> >>>>    acid residues?
> >>>>
> >>>>    Thanks a lot!
> >>>>
> >>>>    Regards,
> >>>>
> >>>>    Zhao
> >>>>    2012/7/24 John Kane <[1][hidden email]>
> >>>>
> >>>>      First thing is to supply the data in a useable format.  As is it
> is
> >>>>      essenatially unreadable.  All R-beginners do this. :)
> >>>>      Have a look at the dput function  (?dput) for a good way to
> supply sample
> >>>>      data in an email.
> >>>>      If you have a large dataset probably a few dozen lines of data
> would be
> >>>>      fine.
> >>>>      Something like dput(head(mydata)) should be fine.  Just copy and
> paste the
> >>>>      output into your email.
> >>>>      Welcome to R.  I think you will like it.
> >>>>      John Kane
> >>>>      Kingston ON Canada
> >>>>
> >>>>    > -----Original Message-----
> >>>>    > From: [2][hidden email]
> >>>>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
> >>>>    > To: [3][hidden email]
> >>>>    > Subject: [R] How to do the same thing for all levels of a column?
> >>>>    >
> >>>>    > Dear all,
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > I am a R beginner, and I am looking for a way to do the same
> thing for
> >>>>    > all
> >>>>    > levels of a column in a table.
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > Basically, I have a bunch of protein sequences composed of
> different
> >>>>    > amino
> >>>>    > acid residues, and each residue is represented by an uppercase
> letter. I
> >>>>    > want to calculate the ratio of different amino acid residues at
> each
> >>>>    > position of the proteins. Here is an example table:
> >>>>    >
> >>>>    > Proteins
> >>>>    >
> >>>>    > Time_zero
> >>>>    >
> >>>>    > 1
> >>>>    >
> >>>>    > 2
> >>>>    >
> >>>>    > 3
> >>>>    >
> >>>>    > 4
> >>>>    >
> >>>>    > 5
> >>>>    >
> >>>>    > 6
> >>>>    >
> >>>>    > 7
> >>>>    >
> >>>>    > 8
> >>>>    >
> >>>>    > p1
> >>>>    >
> >>>>    > 0.0050723
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > Y
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > P
> >>>>    >
> >>>>    > D
> >>>>    >
> >>>>    > A
> >>>>    >
> >>>>    > p2
> >>>>    >
> >>>>    > 0.0002731
> >>>>    >
> >>>>    > T
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > N
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > V
> >>>>    >
> >>>>    > P
> >>>>    >
> >>>>    > G
> >>>>    >
> >>>>    > A
> >>>>    >
> >>>>    > p3
> >>>>    >
> >>>>    > 9.757E-05
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > M
> >>>>    >
> >>>>    > Y
> >>>>    >
> >>>>    > Q
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > P
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > C
> >>>>    >
> >>>>    > p4
> >>>>    >
> >>>>    > 0.0002077
> >>>>    >
> >>>>    > R
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > Y
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > S
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > A
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > If I name this table as myfile.txt, I have the following scripts
> to
> >>>>    > calculate the ratio of each amino acid residue at position 1:
> >>>>    >
> >>>>    > # showing levels of the 3rd column, which means the types of
> residues
> >>>>    >
> >>>>    > >myfile[,3]
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > # calculating the ratio of L
> >>>>    >
> >>>>    > >list=c(which(myfile[,3]=="L"))
> >>>>    >
> >>>>    > >time0total=sum(myfile[,2])
> >>>>    >
> >>>>    > >AA_L=0
> >>>>    >
> >>>>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> >>>>    >
> >>>>    > >ratio_L=AA_L/time0total
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > So how can I write a script to do the same thing for the other
> two levels
> >>>>    > (T and R) in column 3, and also do this for every column that
> contains
> >>>>    > amino acid residues?
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > Many thanks for any help you could give me on this topic! :)
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > Regards,
> >>>>    >
> >>>>    > Zhao
> >>>>    > --
> >>>>    > Zhao JIN
> >>>>    > Ph.D. Candidate
> >>>>    > Ruth Ley Lab
> >>>>    > 467 Biotech
> >>>>    > Field of Microbiology, Cornell University
> >>>>    > Lab: 607.255.4954
> >>>>    > Cell: 412.889.3675
> >>>>    >
> >>>>
> >>>>      >       [[alternative HTML version deleted]]
> >>>>      >
> >>>>      > ______________________________________________
> >>>>      > [4][hidden email] mailing list
> >>>>      > [5]https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>      > PLEASE do read the posting guide
> >>>>      > [6]http://www.R-project.org/posting-guide.html
> >>>>      > and provide commented, minimal, self-contained, reproducible
> code.
> >>>>      ____________________________________________________________
> >>>>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks &
> orcas on
> >>>>      your desktop!
> >>>>      Check it out at [7]http://www.inbox.com/marineaquarium
> >>>>
> >>>>    --
> >>>>    Zhao JIN
> >>>>    Ph.D. Candidate
> >>>>    Ruth Ley Lab
> >>>>    467 Biotech
> >>>>    Field of Microbiology, Cornell University
> >>>>    Lab: 607.255.4954
> >>>>    Cell: 412.889.3675
> >>>>      _________________________________________________________________
> >>>>
> >>>>    [8]3D Earth Screensaver Preview
> >>>>    Free 3D Earth Screensaver
> >>>>    Watch   the   Earth   right   on   your   desktop!  Check  it  out
>  at
> >>>>    [9]www.inbox.com/earth
> >>>>
> >>>> References
> >>>>
> >>>>    1. mailto:[hidden email]
> >>>>    2. mailto:[hidden email]
> >>>>    3. mailto:[hidden email]
> >>>>    4. mailto:[hidden email]
> >>>>    5. https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>    6. http://www.R-project.org/posting-guide.html
> >>>>    7. http://www.inbox.com/marineaquarium
> >>>>    8. http://www.inbox.com/earth
> >>>>    9. http://www.inbox.com/earth
> >>>> ______________________________________________
> >>>> [hidden email] mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Bert Gunter
> >>> Genentech Nonclinical Biostatistics
> >>>
> >>> Internal Contact Info:
> >>> Phone: 467-7374
> >>> Website:
> >>>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> >>
> >>
> >>
> >> --
> >>
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >> Internal Contact Info:
> >> Phone: 467-7374
> >> Website:
> >>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> >
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>



--
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do the same thing for all levels of a column?

John Kane

   No it's actually telling it to split by the two variables (variable, value)
   if I understand your question correctly.
   The confusion is my fault. I tend to be lazy when running examples and did
   not rename the melt() output to something meaningful. I sometimes forget
   that it's not just me reading the code.
   If you run:
   md1  <-  melt(mydata, id = "Time_zero",
            variable.name="xvars",
           value.name="aminos")
   ddply(md1, .(xvars, aminos), summarise, sum = sum(Time_zero)/time0total)
   I think it will show what is happening.



   John Kane
   Kingston ON Canada

   -----Original Message-----
   From: [hidden email]
   Sent: Tue, 24 Jul 2012 15:26:52 -0400
   To: [hidden email]
   Subject: Re: [R] How to do the same thing for all levels of a column?

   Hi John and Bert,
   Thank you so much for your replies. Both of your scripts worked well, so now
   I've learnt two ways to do it. :)
   Bert: I was not very clear on what I wanted to do. I just would like to
   calculate the residues shown in the table, not all residues. The apply
   functions are amazing!
   John: as I am still digesting the codes, I am not sure if I fully understood
   the argument .(variables, value) in the ddply line. The description of ddply
   says that .variables show the variables to split data frame by, as quoted
   variables, a formula or character vector. So does .(variables, value) tell R
   to  split  the data frame by values, which are the types of amino acid
   residues?
   Thank you all again.
   Cheers,
   Zhao
   2012/7/24 Bert Gunter <[1][hidden email]>

     ... and I neglected to mention that f = myfiles[,2]
     Sigh....  More coffee needed.
     -- Bert

   On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[2][hidden email]> wrote:
   > Sorry. Typo in my previous. Should be:
   >
   >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum)))
   > $X1
   >          L          R          T
   > 0.91491320 0.03675651 0.04833030
   >
   > $X2
   >         E         M
   > 0.9827278 0.0172722
   >
   > $X3
   >         N         Y
   > 0.0483303 0.9516697
   >
   > $X4
   >         I         L         Q
   > 0.8976410 0.0850868 0.0172722
   >
   > $X5
   >         I         V
   > 0.9516697 0.0483303
   >
   > $X6
   >          P          S
   > 0.96324349 0.03675651
   >
   > $X7
   >         D         E         G
   > 0.8976410 0.0540287 0.0483303
   >
   > $X8
   >         A         C
   > 0.9827278 0.0172722
   >
   >
   >
   > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[3][hidden email]> wrote:
   >> OK, I admit it: I re-read what you wrote and now I'm confused. Is:
   >>
   >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
   >>
   >>             X1       X2        X3       X4     X5  X6    X7  X8
   >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
   >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
   >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
   >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
   >>
   >> what you want?
   >>
   >> -- Bert
   >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[4][hidden email]> wrote:
   >>> The OP's request is a bit ambiguous to me: at a given residue, do you
   >>> wish to calculate the proportions for only those amino acids that
   >>> appear at that residue, or do you wish to include the proportions for
   >>> all amino acids, some of which might then be 0.
   >>>
   >>> Assuming the former, then I don't think one needs to go to the lengths
   >>> described by John below.
   >>>
   >>> Using your example (thanks!), the following seems to suffice:
   >>>
   >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
   >>>
   >>> $X1
   >>> x
   >>>    L    R    T
   >>> 0.50 0.25 0.25
   >>>
   >>> $X2
   >>> x
   >>>    E    M
   >>> 0.75 0.25
   >>>
   >>> $X3
   >>> x
   >>>    N    Y
   >>> 0.25 0.75
   >>>
   >>> $X4
   >>> x
   >>>    I    L    Q
   >>> 0.25 0.50 0.25
   >>>
   >>> $X5
   >>> x
   >>>    I    V
   >>> 0.75 0.25
   >>>
   >>> $X6
   >>> x
   >>>    P    S
   >>> 0.75 0.25
   >>>
   >>> $X7
   >>> x
   >>>    D    E    G
   >>> 0.25 0.50 0.25
   >>>
   >>> $X8
   >>> x
   >>>    A    C
   >>> 0.75 0.25
   >>>
   >>>
   >>> This could, of course, then be modified to add zero proportions for
   >>> all non-appearing amino acids.
   >>>
   >>> -- Cheers,
   >>> Bert
   >>>
   >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[5][hidden email]>
   wrote:
   >>>>
   >>>>      I think this does what you want using two packages, plyr and
   reshape2 that
   >>>>    you may have to install.  If so install.packages("plyr", "reshape2")
   should
   >>>>    do the trick.
   >>>>    library(plyr)
   >>>>    library(reshape2)
   >>>>    # using supplied file 'myfile" from below
   >>>>    time0total = sum(myfile[,2])
   >>>>    mydata  <-  myfile[, 2:10]
   >>>>    md1  <-  melt(mydata, id = "Time_zero")
   >>>>         ddply(md1,   .(variable,   value),   summarise,   sum   =
   sum(Time_zero)/time0total)
   >>>>
   >>>>
   >>>>    John Kane
   >>>>    Kingston ON Canada
   >>>>
   >>>>    -----Original Message-----
   >>>>    From: [6][hidden email]
   >>>>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
   >>>>    To: [7][hidden email]
   >>>>     Subject: Re: [R] How to do the same thing for all levels of a
   column?
   >>>>
   >>>>    Hi John,
   >>>>    Thank you for the tips. My apologies about the unreadable sample
   data...
   >>>>    So here is the output of the sample data, and hopefully it works
   this time
   >>>>    :)
   >>>>     myfile  <-  structure(list(Proteins = structure(1:4, .Label =
   c("p1", "p2",
   >>>>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
   >>>>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label =
   c("L",
   >>>>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
   >>>>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
   >>>>      1L,  2L,  2L), .Label = c("N", "Y"), class = "factor"), X4 =
   structure(c(1L,
   >>>>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"), X5
   =
   >>>>    structure(c(1L,
   >>>>      2L,  1L,  1L), .Label = c("I", "V"), class = "factor"), X6 =
   structure(c(1L,
   >>>>      1L,  1L,  2L), .Label = c("P", "S"), class = "factor"), X7 =
   structure(c(1L,
   >>>>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"), X8
   =
   >>>>    structure(c(1L,
   >>>>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")), .Names
   =
   >>>>    c("Proteins",
   >>>>     "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"),
   row.names =
   >>>>    c(NA,
   >>>>    4L), class = "data.frame")
   >>>>    And here is my original question:
   >>>>    Basically, I have a bunch of protein sequences composed of different
   amino
   >>>>    acid residues, and each residue is represented by an uppercase
   letter. I
   >>>>    want  to  calculate the ratio of different amino acid residues at
   each
   >>>>    position of the proteins.
   >>>>
   >>>>    If  I  name  this table as myfile.txt, I have the following scripts
   to
   >>>>    calculate the ratio of each amino acid residue at position 1:
   >>>>
   >>>>      # showing levels of the 3rd column, which means the types of
   residues
   >>>>
   >>>>    >myfile[,3]
   >>>>
   >>>>
   >>>>    # calculating the ratio of L
   >>>>
   >>>>    >list=c(which(myfile[,3]=="L"))
   >>>>
   >>>>    >time0total=sum(myfile[,2])
   >>>>
   >>>>    >AA_L=0
   >>>>
   >>>>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
   >>>>
   >>>>    >ratio_L=AA_L/time0total
   >>>>
   >>>>
   >>>>    So how can I write a script to do the same thing for the other two
   levels (T
   >>>>    and R) in column 3, and also do this for every column that contains
   amino
   >>>>    acid residues?
   >>>>
   >>>>    Thanks a lot!
   >>>>
   >>>>    Regards,
   >>>>
   >>>>    Zhao
   >>>>    2012/7/24 John Kane <[1][8][hidden email]>
   >>>>
   >>>>      First thing is to supply the data in a useable format.  As is it
   is
   >>>>      essenatially unreadable.  All R-beginners do this. :)
   >>>>      Have a look at the dput function  (?dput) for a good way to supply
   sample
   >>>>      data in an email.
   >>>>      If you have a large dataset probably a few dozen lines of data
   would be
   >>>>      fine.
   >>>>      Something like dput(head(mydata)) should be fine.  Just copy and
   paste the
   >>>>      output into your email.
   >>>>      Welcome to R.  I think you will like it.
   >>>>      John Kane
   >>>>      Kingston ON Canada
   >>>>
   >>>>    > -----Original Message-----
   >>>>    > From: [2][9][hidden email]
   >>>>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
   >>>>    > To: [3][10][hidden email]
   >>>>    > Subject: [R] How to do the same thing for all levels of a column?
   >>>>    >
   >>>>    > Dear all,
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>    > I am a R beginner, and I am looking for a way to do the same thing
   for
   >>>>    > all
   >>>>    > levels of a column in a table.
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>      > Basically, I have a bunch of protein sequences composed of
   different
   >>>>    > amino
   >>>>    > acid residues, and each residue is represented by an uppercase
   letter. I
   >>>>    > want to calculate the ratio of different amino acid residues at
   each
   >>>>    > position of the proteins. Here is an example table:
   >>>>    >
   >>>>    > Proteins
   >>>>    >
   >>>>    > Time_zero
   >>>>    >
   >>>>    > 1
   >>>>    >
   >>>>    > 2
   >>>>    >
   >>>>    > 3
   >>>>    >
   >>>>    > 4
   >>>>    >
   >>>>    > 5
   >>>>    >
   >>>>    > 6
   >>>>    >
   >>>>    > 7
   >>>>    >
   >>>>    > 8
   >>>>    >
   >>>>    > p1
   >>>>    >
   >>>>    > 0.0050723
   >>>>    >
   >>>>    > L
   >>>>    >
   >>>>    > E
   >>>>    >
   >>>>    > Y
   >>>>    >
   >>>>    > I
   >>>>    >
   >>>>    > I
   >>>>    >
   >>>>    > P
   >>>>    >
   >>>>    > D
   >>>>    >
   >>>>    > A
   >>>>    >
   >>>>    > p2
   >>>>    >
   >>>>    > 0.0002731
   >>>>    >
   >>>>    > T
   >>>>    >
   >>>>    > E
   >>>>    >
   >>>>    > N
   >>>>    >
   >>>>    > L
   >>>>    >
   >>>>    > V
   >>>>    >
   >>>>    > P
   >>>>    >
   >>>>    > G
   >>>>    >
   >>>>    > A
   >>>>    >
   >>>>    > p3
   >>>>    >
   >>>>    > 9.757E-05
   >>>>    >
   >>>>    > L
   >>>>    >
   >>>>    > M
   >>>>    >
   >>>>    > Y
   >>>>    >
   >>>>    > Q
   >>>>    >
   >>>>    > I
   >>>>    >
   >>>>    > P
   >>>>    >
   >>>>    > E
   >>>>    >
   >>>>    > C
   >>>>    >
   >>>>    > p4
   >>>>    >
   >>>>    > 0.0002077
   >>>>    >
   >>>>    > R
   >>>>    >
   >>>>    > E
   >>>>    >
   >>>>    > Y
   >>>>    >
   >>>>    > L
   >>>>    >
   >>>>    > I
   >>>>    >
   >>>>    > S
   >>>>    >
   >>>>    > E
   >>>>    >
   >>>>    > A
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>    > If I name this table as myfile.txt, I have the following scripts
   to
   >>>>    > calculate the ratio of each amino acid residue at position 1:
   >>>>    >
   >>>>    > # showing levels of the 3rd column, which means the types of
   residues
   >>>>    >
   >>>>    > >myfile[,3]
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>    > # calculating the ratio of L
   >>>>    >
   >>>>    > >list=c(which(myfile[,3]=="L"))
   >>>>    >
   >>>>    > >time0total=sum(myfile[,2])
   >>>>    >
   >>>>    > >AA_L=0
   >>>>    >
   >>>>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
   >>>>    >
   >>>>    > >ratio_L=AA_L/time0total
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>    > So how can I write a script to do the same thing for the other two
   levels
   >>>>    > (T and R) in column 3, and also do this for every column that
   contains
   >>>>    > amino acid residues?
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>    > Many thanks for any help you could give me on this topic! :)
   >>>>    >
   >>>>    >
   >>>>    >
   >>>>    > Regards,
   >>>>    >
   >>>>    > Zhao
   >>>>    > --
   >>>>    > Zhao JIN
   >>>>    > Ph.D. Candidate
   >>>>    > Ruth Ley Lab
   >>>>    > 467 Biotech
   >>>>    > Field of Microbiology, Cornell University
   >>>>    > Lab: 607.255.4954
   >>>>    > Cell: 412.889.3675
   >>>>    >
   >>>>
   >>>>      >       [[alternative HTML version deleted]]
   >>>>      >
   >>>>      > ______________________________________________
   >>>>      > [4][11][hidden email] mailing list
   >>>>      > [5][12]https://stat.ethz.ch/mailman/listinfo/r-help
   >>>>      > PLEASE do read the posting guide
   >>>>      > [6][13]http://www.R-project.org/posting-guide.html
   >>>>      > and provide commented, minimal, self-contained, reproducible
   code.
   >>>>      ____________________________________________________________
   >>>>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks &
   orcas on
   >>>>      your desktop!
   >>>>      Check it out at [7][14]http://www.inbox.com/marineaquarium
   >>>>
   >>>>    --
   >>>>    Zhao JIN
   >>>>    Ph.D. Candidate
   >>>>    Ruth Ley Lab
   >>>>    467 Biotech
   >>>>    Field of Microbiology, Cornell University
   >>>>    Lab: 607.255.4954
   >>>>    Cell: 412.889.3675
   >>>>      _________________________________________________________________
   >>>>
   >>>>    [8]3D Earth Screensaver Preview
   >>>>    Free 3D Earth Screensaver
   >>>>    Watch   the   Earth   right   on   your   desktop!  Check  it  out
   at
   >>>>    [9][15]www.inbox.com/earth
   >>>>
   >>>> References
   >>>>
   >>>>    1. mailto:[16][hidden email]
   >>>>    2. mailto:[17][hidden email]
   >>>>    3. mailto:[18][hidden email]
   >>>>    4. mailto:[19][hidden email]
   >>>>    5. [20]https://stat.ethz.ch/mailman/listinfo/r-help
   >>>>    6. [21]http://www.R-project.org/posting-guide.html
   >>>>    7. [22]http://www.inbox.com/marineaquarium
   >>>>    8. [23]http://www.inbox.com/earth
   >>>>    9. [24]http://www.inbox.com/earth
   >>>> ______________________________________________
   >>>> [25][hidden email] mailing list
   >>>> [26]https://stat.ethz.ch/mailman/listinfo/r-help
   >>>> PLEASE do read the posting guide
   [27]http://www.R-project.org/posting-guide.html
   >>>> and provide commented, minimal, self-contained, reproducible code.
   >>>
   >>>
   >>>
   >>> --
   >>>
   >>> Bert Gunter
   >>> Genentech Nonclinical Biostatistics
   >>>
   >>> Internal Contact Info:
   >>> Phone: 467-7374
   >>> Website:
   >>>
   [28]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
   iostatistics/pdb-ncb-home.htm
   >>
   >>
   >>
   >> --
   >>
   >> Bert Gunter
   >> Genentech Nonclinical Biostatistics
   >>
   >> Internal Contact Info:
   >> Phone: 467-7374
   >> Website:
   >>
   [29]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
   iostatistics/pdb-ncb-home.htm
   >
   >
   >
   > --
   >
   > Bert Gunter
   > Genentech Nonclinical Biostatistics
   >
   > Internal Contact Info:
   > Phone: 467-7374
   > Website:
   >
   [30]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
   iostatistics/pdb-ncb-home.htm
   --
   Bert Gunter
   Genentech Nonclinical Biostatistics
   Internal Contact Info:
   Phone: 467-7374
   Website:
   [31]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b
   iostatistics/pdb-ncb-home.htm

   --
   Zhao JIN
   Ph.D. Candidate
   Ruth Ley Lab
   467 Biotech
   Field of Microbiology, Cornell University
   Lab: 607.255.4954
   Cell: 412.889.3675
     _________________________________________________________________

   [32]3D Marine Aquarium Screensaver Preview
   Free 3D Marine Aquarium Screensaver
   Watch  dolphins,  sharks  &  orcas  on  your  desktop! Check it out at
   [33]www.inbox.com/marineaquarium

References

   1. mailto:[hidden email]
   2. mailto:[hidden email]
   3. mailto:[hidden email]
   4. mailto:[hidden email]
   5. mailto:[hidden email]
   6. mailto:[hidden email]
   7. mailto:[hidden email]
   8. mailto:[hidden email]
   9. mailto:[hidden email]
  10. mailto:[hidden email]
  11. mailto:[hidden email]
  12. https://stat.ethz.ch/mailman/listinfo/r-help
  13. http://www.R-project.org/posting-guide.html
  14. http://www.inbox.com/marineaquarium
  15. http://www.inbox.com/earth
  16. mailto:[hidden email]
  17. mailto:[hidden email]
  18. mailto:[hidden email]
  19. mailto:[hidden email]
  20. https://stat.ethz.ch/mailman/listinfo/r-help
  21. http://www.R-project.org/posting-guide.html
  22. http://www.inbox.com/marineaquarium
  23. http://www.inbox.com/earth
  24. http://www.inbox.com/earth
  25. mailto:[hidden email]
  26. https://stat.ethz.ch/mailman/listinfo/r-help
  27. http://www.R-project.org/posting-guide.html
  28. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
  29. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
  30. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
  31. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
  32. http://www.inbox.com/marineaquarium
  33. http://www.inbox.com/marineaquarium
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...