Group several variables and apply a function to the group

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Group several variables and apply a function to the group

pipo
Dear R-experts,
I am struggling with the following problem, and I am looking for advice
from more experienced R-users: I have a data frame with 2 identifying
variables (comn and mi), and an output variable (x). comn is a variable for
a company and mi is a variable for a month.

comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089",
"0.005", "-0.0078", "0.67" )
df<- data.frame(comn=comn, mi=mi, x=x)


For each company, within a particular month, I would like to compute the
standard deviation of x: for example, for abc, I would like to compute the
sd of x for month1 (when mi=1) and for month2 (when mi=2).

In other languages (Stata for instance), I would create a grouping variable
(group comnn and mi) and then, apply the sd function for each group.

However, I don't find an elegant way to do the same in R:

I was thinking about the following: I could subset my data frame by mi and
create one file per month, and then make a loop and in each file, use a
"by" operator for each comn. I am sure it would work, but I feel that it
would be like killing an ant with a tank.

I was wondering if anyone knew a more straightforward way to implement that
kind of operation?

Thanks a lot,

Best,
Aurelien

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Group several variables and apply a function to the group

Felipe Carrillo
 Like this?
library(plyr)
ddply(df,.(comn,mi),summarise,stDEV=sd(x))

Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx


From: Aurélien PHILIPPOT <[hidden email]>

>To: [hidden email]
>Sent: Sunday, December 4, 2011 12:32 PM
>Subject: [R] Group several variables and apply a function to the group
>
>Dear R-experts,
>I am struggling with the following problem, and I am looking for advice
>from more experienced R-users: I have a data frame with 2 identifying
>variables (comn and mi), and an output variable (x). comn is a variable for
>a company and mi is a variable for a month.
>
>comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
>mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
>x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089",
>"0.005", "-0.0078", "0.67" )
>df<- data.frame(comn=comn, mi=mi, x=x)
>
>
>For each company, within a particular month, I would like to compute the
>standard deviation of x: for example, for abc, I would like to compute the
>sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
>In other languages (Stata for instance), I would create a grouping variable
>(group comnn and mi) and then, apply the sd function for each group.
>
>However, I don't find an elegant way to do the same in R:
>
>I was thinking about the following: I could subset my data frame by mi and
>create one file per month, and then make a loop and in each file, use a
>"by" operator for each comn. I am sure it would work, but I feel that it
>would be like killing an ant with a tank.
>
>I was wondering if anyone knew a more straightforward way to implement that
>kind of operation?
>
>Thanks a lot,
>
>Best,
>Aurelien
>
>    [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Group several variables and apply a function to the group

pipo
exactly like that!
thanks a lot.

Aurelien

2011/12/4 Felipe Carrillo <[hidden email]>

>  Like this?
> library(plyr)
> ddply(df,.(comn,mi),summarise,stDEV=sd(x))
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
> http://www.fws.gov/redbluff/rbdd_jsmp.aspx
>
>   *From:* Aurélien PHILIPPOT <[hidden email]>
> *To:* [hidden email]
> *Sent:* Sunday, December 4, 2011 12:32 PM
> *Subject:* [R] Group several variables and apply a function to the group
>
> Dear R-experts,
> I am struggling with the following problem, and I am looking for advice
> from more experienced R-users: I have a data frame with 2 identifying
> variables (comn and mi), and an output variable (x). comn is a variable for
> a company and mi is a variable for a month.
>
> comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz",
> "xyz")
> mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
> x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089",
> "0.005", "-0.0078", "0.67" )
> df<- data.frame(comn=comn, mi=mi, x=x)
>
>
> For each company, within a particular month, I would like to compute the
> standard deviation of x: for example, for abc, I would like to compute the
> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
> In other languages (Stata for instance), I would create a grouping variable
> (group comnn and mi) and then, apply the sd function for each group.
>
> However, I don't find an elegant way to do the same in R:
>
> I was thinking about the following: I could subset my data frame by mi and
> create one file per month, and then make a loop and in each file, use a
> "by" operator for each comn. I am sure it would work, but I feel that it
> would be like killing an ant with a tank.
>
> I was wondering if anyone knew a more straightforward way to implement that
> kind of operation?
>
> Thanks a lot,
>
> Best,
> Aurelien
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Group several variables and apply a function to the group

Pete Brecknock
In reply to this post by pipo
Aurélien PHILIPPOT wrote
Dear R-experts,
I am struggling with the following problem, and I am looking for advice
from more experienced R-users: I have a data frame with 2 identifying
variables (comn and mi), and an output variable (x). comn is a variable for
a company and mi is a variable for a month.

comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089",
"0.005", "-0.0078", "0.67" )
df<- data.frame(comn=comn, mi=mi, x=x)


For each company, within a particular month, I would like to compute the
standard deviation of x: for example, for abc, I would like to compute the
sd of x for month1 (when mi=1) and for month2 (when mi=2).

In other languages (Stata for instance), I would create a grouping variable
(group comnn and mi) and then, apply the sd function for each group.

However, I don't find an elegant way to do the same in R:

I was thinking about the following: I could subset my data frame by mi and
create one file per month, and then make a loop and in each file, use a
"by" operator for each comn. I am sure it would work, but I feel that it
would be like killing an ant with a tank.

I was wondering if anyone knew a more straightforward way to implement that
kind of operation?

Thanks a lot,

Best,
Aurelien

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
One way would be to use the aggregate function.

# Your Data ...
# Note:  I have removed the quotes off the output variable x
comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz")
mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
x<- c(-0.0031, 0.0009, -0.007, 0.1929,0.0087, 0.099,-0.089, 0.005, -0.0078, 0.67)
df<- data.frame(comn=comn, mi=mi, x=x)

# Aggregate Function
aggregate(df$x, by=list(df$comn,df$mi),FUN=sd)

HTH

Pete
Reply | Threaded
Open this post in threaded view
|

Re: Group several variables and apply a function to the group

John Kane-2
In reply to this post by pipo
?aggregate should do it

aggregate(df$x,list(df$comn, df$mi), sd)

There are other ways of course

Using the reshape2 package

library(reshape2)
x1 <- melt(df, id=c("comn", "mi"))
dcast(x1, comn + mi ~ variable, sd)




--- On Sun, 12/4/11, Aurélien PHILIPPOT <[hidden email]> wrote:

> From: Aurélien PHILIPPOT <[hidden email]>
> Subject: [R] Group several variables and apply a function to the group
> To: [hidden email]
> Received: Sunday, December 4, 2011, 3:32 PM
> Dear R-experts,
> I am struggling with the following problem, and I am
> looking for advice
> from more experienced R-users: I have a data frame with 2
> identifying
> variables (comn and mi), and an output variable (x). comn
> is a variable for
> a company and mi is a variable for a month.
>
> comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz",
> "xyz","xyz", "xyz")
> mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
> x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087",
> "0.099","-0.089",
> "0.005", "-0.0078", "0.67" )
> df<- data.frame(comn=comn, mi=mi, x=x)
>
>
> For each company, within a particular month, I would like
> to compute the
> standard deviation of x: for example, for abc, I would like
> to compute the
> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
> In other languages (Stata for instance), I would create a
> grouping variable
> (group comnn and mi) and then, apply the sd function for
> each group.
>
> However, I don't find an elegant way to do the same in R:
>
> I was thinking about the following: I could subset my data
> frame by mi and
> create one file per month, and then make a loop and in each
> file, use a
> "by" operator for each comn. I am sure it would work, but I
> feel that it
> would be like killing an ant with a tank.
>
> I was wondering if anyone knew a more straightforward way
> to implement that
> kind of operation?
>
> Thanks a lot,
>
> Best,
> Aurelien
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Group several variables and apply a function to the group

Bert Gunter
... with() is useful here: e.g. in base R, simply  tapply() or ave() with with()

with(df,ave(x, comn,mi, FUN = sd))

-- Bert



On Sun, Dec 4, 2011 at 1:07 PM, John Kane <[hidden email]> wrote:

> ?aggregate should do it
>
> aggregate(df$x,list(df$comn, df$mi), sd)
>
> There are other ways of course
>
> Using the reshape2 package
>
> library(reshape2)
> x1 <- melt(df, id=c("comn", "mi"))
> dcast(x1, comn + mi ~ variable, sd)
>
>
>
>
> --- On Sun, 12/4/11, Aurélien PHILIPPOT <[hidden email]> wrote:
>
>> From: Aurélien PHILIPPOT <[hidden email]>
>> Subject: [R] Group several variables and apply a function to the group
>> To: [hidden email]
>> Received: Sunday, December 4, 2011, 3:32 PM
>> Dear R-experts,
>> I am struggling with the following problem, and I am
>> looking for advice
>> from more experienced R-users: I have a data frame with 2
>> identifying
>> variables (comn and mi), and an output variable (x). comn
>> is a variable for
>> a company and mi is a variable for a month.
>>
>> comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz",
>> "xyz","xyz", "xyz")
>> mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3")
>> x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087",
>> "0.099","-0.089",
>> "0.005", "-0.0078", "0.67" )
>> df<- data.frame(comn=comn, mi=mi, x=x)
>>
>>
>> For each company, within a particular month, I would like
>> to compute the
>> standard deviation of x: for example, for abc, I would like
>> to compute the
>> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>>
>> In other languages (Stata for instance), I would create a
>> grouping variable
>> (group comnn and mi) and then, apply the sd function for
>> each group.
>>
>> However, I don't find an elegant way to do the same in R:
>>
>> I was thinking about the following: I could subset my data
>> frame by mi and
>> create one file per month, and then make a loop and in each
>> file, use a
>> "by" operator for each comn. I am sure it would work, but I
>> feel that it
>> would be like killing an ant with a tank.
>>
>> I was wondering if anyone knew a more straightforward way
>> to implement that
>> kind of operation?
>>
>> Thanks a lot,
>>
>> Best,
>> Aurelien
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email]
>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained,
>> reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.