

Dear Rexperts,
I am struggling with the following problem, and I am looking for advice
from more experienced Rusers: I have a data frame with 2 identifying
variables (comn and mi), and an output variable (x). comn is a variable for
a company and mi is a variable for a month.
comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
x< c("0.0031", "0.0009", "0.007", "0.1929","0.0087", "0.099","0.089",
"0.005", "0.0078", "0.67" )
df< data.frame(comn=comn, mi=mi, x=x)
For each company, within a particular month, I would like to compute the
standard deviation of x: for example, for abc, I would like to compute the
sd of x for month1 (when mi=1) and for month2 (when mi=2).
In other languages (Stata for instance), I would create a grouping variable
(group comnn and mi) and then, apply the sd function for each group.
However, I don't find an elegant way to do the same in R:
I was thinking about the following: I could subset my data frame by mi and
create one file per month, and then make a loop and in each file, use a
"by" operator for each comn. I am sure it would work, but I feel that it
would be like killing an ant with a tank.
I was wondering if anyone knew a more straightforward way to implement that
kind of operation?
Thanks a lot,
Best,
Aurelien
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Like this?
library(plyr)
ddply(df,.(comn,mi),summarise,stDEV=sd(x))
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspxFrom: Aurélien PHILIPPOT < [hidden email]>
>To: [hidden email]
>Sent: Sunday, December 4, 2011 12:32 PM
>Subject: [R] Group several variables and apply a function to the group
>
>Dear Rexperts,
>I am struggling with the following problem, and I am looking for advice
>from more experienced Rusers: I have a data frame with 2 identifying
>variables (comn and mi), and an output variable (x). comn is a variable for
>a company and mi is a variable for a month.
>
>comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
>mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
>x< c("0.0031", "0.0009", "0.007", "0.1929","0.0087", "0.099","0.089",
>"0.005", "0.0078", "0.67" )
>df< data.frame(comn=comn, mi=mi, x=x)
>
>
>For each company, within a particular month, I would like to compute the
>standard deviation of x: for example, for abc, I would like to compute the
>sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
>In other languages (Stata for instance), I would create a grouping variable
>(group comnn and mi) and then, apply the sd function for each group.
>
>However, I don't find an elegant way to do the same in R:
>
>I was thinking about the following: I could subset my data frame by mi and
>create one file per month, and then make a loop and in each file, use a
>"by" operator for each comn. I am sure it would work, but I feel that it
>would be like killing an ant with a tank.
>
>I was wondering if anyone knew a more straightforward way to implement that
>kind of operation?
>
>Thanks a lot,
>
>Best,
>Aurelien
>
> [[alternative HTML version deleted]]
>
>______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>and provide commented, minimal, selfcontained, reproducible code.
>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


exactly like that!
thanks a lot.
Aurelien
2011/12/4 Felipe Carrillo < [hidden email]>
> Like this?
> library(plyr)
> ddply(df,.(comn,mi),summarise,stDEV=sd(x))
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
> http://www.fws.gov/redbluff/rbdd_jsmp.aspx>
> *From:* Aurélien PHILIPPOT < [hidden email]>
> *To:* [hidden email]
> *Sent:* Sunday, December 4, 2011 12:32 PM
> *Subject:* [R] Group several variables and apply a function to the group
>
> Dear Rexperts,
> I am struggling with the following problem, and I am looking for advice
> from more experienced Rusers: I have a data frame with 2 identifying
> variables (comn and mi), and an output variable (x). comn is a variable for
> a company and mi is a variable for a month.
>
> comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz",
> "xyz")
> mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
> x< c("0.0031", "0.0009", "0.007", "0.1929","0.0087", "0.099","0.089",
> "0.005", "0.0078", "0.67" )
> df< data.frame(comn=comn, mi=mi, x=x)
>
>
> For each company, within a particular month, I would like to compute the
> standard deviation of x: for example, for abc, I would like to compute the
> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
> In other languages (Stata for instance), I would create a grouping variable
> (group comnn and mi) and then, apply the sd function for each group.
>
> However, I don't find an elegant way to do the same in R:
>
> I was thinking about the following: I could subset my data frame by mi and
> create one file per month, and then make a loop and in each file, use a
> "by" operator for each comn. I am sure it would work, but I feel that it
> would be like killing an ant with a tank.
>
> I was wondering if anyone knew a more straightforward way to implement that
> kind of operation?
>
> Thanks a lot,
>
> Best,
> Aurelien
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Aurélien PHILIPPOT wrote
Dear Rexperts,
I am struggling with the following problem, and I am looking for advice
from more experienced Rusers: I have a data frame with 2 identifying
variables (comn and mi), and an output variable (x). comn is a variable for
a company and mi is a variable for a month.
comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
x< c("0.0031", "0.0009", "0.007", "0.1929","0.0087", "0.099","0.089",
"0.005", "0.0078", "0.67" )
df< data.frame(comn=comn, mi=mi, x=x)
For each company, within a particular month, I would like to compute the
standard deviation of x: for example, for abc, I would like to compute the
sd of x for month1 (when mi=1) and for month2 (when mi=2).
In other languages (Stata for instance), I would create a grouping variable
(group comnn and mi) and then, apply the sd function for each group.
However, I don't find an elegant way to do the same in R:
I was thinking about the following: I could subset my data frame by mi and
create one file per month, and then make a loop and in each file, use a
"by" operator for each comn. I am sure it would work, but I feel that it
would be like killing an ant with a tank.
I was wondering if anyone knew a more straightforward way to implement that
kind of operation?
Thanks a lot,
Best,
Aurelien
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
One way would be to use the aggregate function.
# Your Data ...
# Note: I have removed the quotes off the output variable x
comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
x< c(0.0031, 0.0009, 0.007, 0.1929,0.0087, 0.099,0.089, 0.005, 0.0078, 0.67)
df< data.frame(comn=comn, mi=mi, x=x)
# Aggregate Function
aggregate(df$x, by=list(df$comn,df$mi),FUN=sd)
HTH
Pete


?aggregate should do it
aggregate(df$x,list(df$comn, df$mi), sd)
There are other ways of course
Using the reshape2 package
library(reshape2)
x1 < melt(df, id=c("comn", "mi"))
dcast(x1, comn + mi ~ variable, sd)
 On Sun, 12/4/11, Aurélien PHILIPPOT < [hidden email]> wrote:
> From: Aurélien PHILIPPOT < [hidden email]>
> Subject: [R] Group several variables and apply a function to the group
> To: [hidden email]
> Received: Sunday, December 4, 2011, 3:32 PM
> Dear Rexperts,
> I am struggling with the following problem, and I am
> looking for advice
> from more experienced Rusers: I have a data frame with 2
> identifying
> variables (comn and mi), and an output variable (x). comn
> is a variable for
> a company and mi is a variable for a month.
>
> comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz",
> "xyz","xyz", "xyz")
> mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
> x< c("0.0031", "0.0009", "0.007", "0.1929","0.0087",
> "0.099","0.089",
> "0.005", "0.0078", "0.67" )
> df< data.frame(comn=comn, mi=mi, x=x)
>
>
> For each company, within a particular month, I would like
> to compute the
> standard deviation of x: for example, for abc, I would like
> to compute the
> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
> In other languages (Stata for instance), I would create a
> grouping variable
> (group comnn and mi) and then, apply the sd function for
> each group.
>
> However, I don't find an elegant way to do the same in R:
>
> I was thinking about the following: I could subset my data
> frame by mi and
> create one file per month, and then make a loop and in each
> file, use a
> "by" operator for each comn. I am sure it would work, but I
> feel that it
> would be like killing an ant with a tank.
>
> I was wondering if anyone knew a more straightforward way
> to implement that
> kind of operation?
>
> Thanks a lot,
>
> Best,
> Aurelien
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]
> mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained,
> reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


... with() is useful here: e.g. in base R, simply tapply() or ave() with with()
with(df,ave(x, comn,mi, FUN = sd))
 Bert
On Sun, Dec 4, 2011 at 1:07 PM, John Kane < [hidden email]> wrote:
> ?aggregate should do it
>
> aggregate(df$x,list(df$comn, df$mi), sd)
>
> There are other ways of course
>
> Using the reshape2 package
>
> library(reshape2)
> x1 < melt(df, id=c("comn", "mi"))
> dcast(x1, comn + mi ~ variable, sd)
>
>
>
>
>  On Sun, 12/4/11, Aurélien PHILIPPOT < [hidden email]> wrote:
>
>> From: Aurélien PHILIPPOT < [hidden email]>
>> Subject: [R] Group several variables and apply a function to the group
>> To: [hidden email]
>> Received: Sunday, December 4, 2011, 3:32 PM
>> Dear Rexperts,
>> I am struggling with the following problem, and I am
>> looking for advice
>> from more experienced Rusers: I have a data frame with 2
>> identifying
>> variables (comn and mi), and an output variable (x). comn
>> is a variable for
>> a company and mi is a variable for a month.
>>
>> comn<c("abc", "abc", "abc", "abc", "abc", "abc", "xyz",
>> "xyz","xyz", "xyz")
>> mi< c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
>> x< c("0.0031", "0.0009", "0.007", "0.1929","0.0087",
>> "0.099","0.089",
>> "0.005", "0.0078", "0.67" )
>> df< data.frame(comn=comn, mi=mi, x=x)
>>
>>
>> For each company, within a particular month, I would like
>> to compute the
>> standard deviation of x: for example, for abc, I would like
>> to compute the
>> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>>
>> In other languages (Stata for instance), I would create a
>> grouping variable
>> (group comnn and mi) and then, apply the sd function for
>> each group.
>>
>> However, I don't find an elegant way to do the same in R:
>>
>> I was thinking about the following: I could subset my data
>> frame by mi and
>> create one file per month, and then make a loop and in each
>> file, use a
>> "by" operator for each comn. I am sure it would work, but I
>> feel that it
>> would be like killing an ant with a tank.
>>
>> I was wondering if anyone knew a more straightforward way
>> to implement that
>> kind of operation?
>>
>> Thanks a lot,
>>
>> Best,
>> Aurelien
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email]
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained,
>> reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 4677374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdbfunctionalgroups/pdbbiostatistics/pdbncbhome.htm______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

