Dear R-experts,
I am struggling with the following problem, and I am looking for advice from more experienced R-users: I have a data frame with 2 identifying variables (comn and mi), and an output variable (x). comn is a variable for a company and mi is a variable for a month. comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz") mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089", "0.005", "-0.0078", "0.67" ) df<- data.frame(comn=comn, mi=mi, x=x) For each company, within a particular month, I would like to compute the standard deviation of x: for example, for abc, I would like to compute the sd of x for month1 (when mi=1) and for month2 (when mi=2). In other languages (Stata for instance), I would create a grouping variable (group comnn and mi) and then, apply the sd function for each group. However, I don't find an elegant way to do the same in R: I was thinking about the following: I could subset my data frame by mi and create one file per month, and then make a loop and in each file, use a "by" operator for each comn. I am sure it would work, but I feel that it would be like killing an ant with a tank. I was wondering if anyone knew a more straightforward way to implement that kind of operation? Thanks a lot, Best, Aurelien [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Like this?
library(plyr) ddply(df,.(comn,mi),summarise,stDEV=sd(x)) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish & Wildlife Service California, USA http://www.fws.gov/redbluff/rbdd_jsmp.aspx From: Aurélien PHILIPPOT <[hidden email]> >To: [hidden email] >Sent: Sunday, December 4, 2011 12:32 PM >Subject: [R] Group several variables and apply a function to the group > >Dear R-experts, >I am struggling with the following problem, and I am looking for advice >from more experienced R-users: I have a data frame with 2 identifying >variables (comn and mi), and an output variable (x). comn is a variable for >a company and mi is a variable for a month. > >comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz") >mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") >x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089", >"0.005", "-0.0078", "0.67" ) >df<- data.frame(comn=comn, mi=mi, x=x) > > >For each company, within a particular month, I would like to compute the >standard deviation of x: for example, for abc, I would like to compute the >sd of x for month1 (when mi=1) and for month2 (when mi=2). > >In other languages (Stata for instance), I would create a grouping variable >(group comnn and mi) and then, apply the sd function for each group. > >However, I don't find an elegant way to do the same in R: > >I was thinking about the following: I could subset my data frame by mi and >create one file per month, and then make a loop and in each file, use a >"by" operator for each comn. I am sure it would work, but I feel that it >would be like killing an ant with a tank. > >I was wondering if anyone knew a more straightforward way to implement that >kind of operation? > >Thanks a lot, > >Best, >Aurelien > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
exactly like that!
thanks a lot. Aurelien 2011/12/4 Felipe Carrillo <[hidden email]> > Like this? > library(plyr) > ddply(df,.(comn,mi),summarise,stDEV=sd(x)) > > Felipe D. Carrillo > Supervisory Fishery Biologist > Department of the Interior > US Fish & Wildlife Service > California, USA > http://www.fws.gov/redbluff/rbdd_jsmp.aspx > > *From:* Aurélien PHILIPPOT <[hidden email]> > *To:* [hidden email] > *Sent:* Sunday, December 4, 2011 12:32 PM > *Subject:* [R] Group several variables and apply a function to the group > > Dear R-experts, > I am struggling with the following problem, and I am looking for advice > from more experienced R-users: I have a data frame with 2 identifying > variables (comn and mi), and an output variable (x). comn is a variable for > a company and mi is a variable for a month. > > comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", > "xyz") > mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") > x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089", > "0.005", "-0.0078", "0.67" ) > df<- data.frame(comn=comn, mi=mi, x=x) > > > For each company, within a particular month, I would like to compute the > standard deviation of x: for example, for abc, I would like to compute the > sd of x for month1 (when mi=1) and for month2 (when mi=2). > > In other languages (Stata for instance), I would create a grouping variable > (group comnn and mi) and then, apply the sd function for each group. > > However, I don't find an elegant way to do the same in R: > > I was thinking about the following: I could subset my data frame by mi and > create one file per month, and then make a loop and in each file, use a > "by" operator for each comn. I am sure it would work, but I feel that it > would be like killing an ant with a tank. > > I was wondering if anyone knew a more straightforward way to implement that > kind of operation? > > Thanks a lot, > > Best, > Aurelien > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by pipo
One way would be to use the aggregate function. # Your Data ... # Note: I have removed the quotes off the output variable x comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz") mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") x<- c(-0.0031, 0.0009, -0.007, 0.1929,0.0087, 0.099,-0.089, 0.005, -0.0078, 0.67) df<- data.frame(comn=comn, mi=mi, x=x) # Aggregate Function aggregate(df$x, by=list(df$comn,df$mi),FUN=sd) HTH Pete |
In reply to this post by pipo
?aggregate should do it
aggregate(df$x,list(df$comn, df$mi), sd) There are other ways of course Using the reshape2 package library(reshape2) x1 <- melt(df, id=c("comn", "mi")) dcast(x1, comn + mi ~ variable, sd) --- On Sun, 12/4/11, Aurélien PHILIPPOT <[hidden email]> wrote: > From: Aurélien PHILIPPOT <[hidden email]> > Subject: [R] Group several variables and apply a function to the group > To: [hidden email] > Received: Sunday, December 4, 2011, 3:32 PM > Dear R-experts, > I am struggling with the following problem, and I am > looking for advice > from more experienced R-users: I have a data frame with 2 > identifying > variables (comn and mi), and an output variable (x). comn > is a variable for > a company and mi is a variable for a month. > > comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", > "xyz","xyz", "xyz") > mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") > x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", > "0.099","-0.089", > "0.005", "-0.0078", "0.67" ) > df<- data.frame(comn=comn, mi=mi, x=x) > > > For each company, within a particular month, I would like > to compute the > standard deviation of x: for example, for abc, I would like > to compute the > sd of x for month1 (when mi=1) and for month2 (when mi=2). > > In other languages (Stata for instance), I would create a > grouping variable > (group comnn and mi) and then, apply the sd function for > each group. > > However, I don't find an elegant way to do the same in R: > > I was thinking about the following: I could subset my data > frame by mi and > create one file per month, and then make a loop and in each > file, use a > "by" operator for each comn. I am sure it would work, but I > feel that it > would be like killing an ant with a tank. > > I was wondering if anyone knew a more straightforward way > to implement that > kind of operation? > > Thanks a lot, > > Best, > Aurelien > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
... with() is useful here: e.g. in base R, simply tapply() or ave() with with()
with(df,ave(x, comn,mi, FUN = sd)) -- Bert On Sun, Dec 4, 2011 at 1:07 PM, John Kane <[hidden email]> wrote: > ?aggregate should do it > > aggregate(df$x,list(df$comn, df$mi), sd) > > There are other ways of course > > Using the reshape2 package > > library(reshape2) > x1 <- melt(df, id=c("comn", "mi")) > dcast(x1, comn + mi ~ variable, sd) > > > > > --- On Sun, 12/4/11, Aurélien PHILIPPOT <[hidden email]> wrote: > >> From: Aurélien PHILIPPOT <[hidden email]> >> Subject: [R] Group several variables and apply a function to the group >> To: [hidden email] >> Received: Sunday, December 4, 2011, 3:32 PM >> Dear R-experts, >> I am struggling with the following problem, and I am >> looking for advice >> from more experienced R-users: I have a data frame with 2 >> identifying >> variables (comn and mi), and an output variable (x). comn >> is a variable for >> a company and mi is a variable for a month. >> >> comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", >> "xyz","xyz", "xyz") >> mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") >> x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", >> "0.099","-0.089", >> "0.005", "-0.0078", "0.67" ) >> df<- data.frame(comn=comn, mi=mi, x=x) >> >> >> For each company, within a particular month, I would like >> to compute the >> standard deviation of x: for example, for abc, I would like >> to compute the >> sd of x for month1 (when mi=1) and for month2 (when mi=2). >> >> In other languages (Stata for instance), I would create a >> grouping variable >> (group comnn and mi) and then, apply the sd function for >> each group. >> >> However, I don't find an elegant way to do the same in R: >> >> I was thinking about the following: I could subset my data >> frame by mi and >> create one file per month, and then make a loop and in each >> file, use a >> "by" operator for each comn. I am sure it would work, but I >> feel that it >> would be like killing an ant with a tank. >> >> I was wondering if anyone knew a more straightforward way >> to implement that >> kind of operation? >> >> Thanks a lot, >> >> Best, >> Aurelien >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] >> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, >> reproducible code. >> > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |