|
Hello!
I wrote a code that works, but it looks ugly to me - it's full of loops. I am sure there is a much more elegant and shorter way to do it. Thanks a lot for any hints! Dimitri # I have a data frame: x<-data.frame(group=c("group1","group2","group1","group2"), myweight=c(0.4,0.6,0.4,0.6), myweek=as.Date(c("2012-07-09","2012-07-09","2012-07-16","2012-07-16")), var1=c(1,10,2,20),var2=c(10,1,20,2)) (x) # For each week in "myweek", I'd like to build a weighted mean for var1 and var2 - using "myweight" as weight. # Below is my inelegant code: myweeks<-unique(x$myweek) nr.of.weeks<-length(myweeks) myvars<-c("var1","var2") mylist<-NULL for(i in 1:nr.of.weeks){ # i<-1 out<-NULL for(var in myvars){ # var<-myvars[2] temp.x<-x[x$myweek %in% myweeks[i],c("myweight",var)] temp.out<-weighted.mean(temp.x[[2]],temp.x[[1]]) out<-c(out,temp.out) } mylist[[i]]<-out names(mylist)[i]<-as.character(myweeks[i]) } desired<-as.data.frame(do.call(rbind,mylist)) names(desired)<-myvars (desired) -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
The plyr package is very helpful for this:
library(plyr) ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight), m2=weighted.mean(var2,myweight)) |
|
Thanks a lot, David.
Indeed, it's much shorter. Unfortunately, in my real task I am dozens and dozens of variables like var1 and var2 so that manually specifying things like in "m1=weighted.mean(var1,myweight)" would take a lot of code and a very long time. Dimitri On Tue, Jul 17, 2012 at 6:34 PM, David Freedman <[hidden email]> wrote: > The plyr package is very helpful for this: > > library(plyr) > ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight), > m2=weighted.mean(var2,myweight)) > > > -- > View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636816.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
If there are many variables, I'd then suggest the data.table package:
library(data.table) dt=data.table(x) dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group', 'myweek')] The '.SD' is an abbreviation for all the variables in the data table (excluding the grouping variables). There's an .SDcols= 'variables of interest' option if you want to limit the dozens of variables to only some of them. Or, in the data.table(x) statement, you could limit the created data table to only the variables your interested in. As an added benefit, the data.table approach is amazingly fast (particularly when there are numerous grouping categories) |
|
David, many thanks.
Did something get ommitted from your line?: ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight), m2=weighted.mean(var2,myweight)) Because it just reproduces x - in a somewhat different order... Thank you! Dimitri On Tue, Jul 17, 2012 at 9:22 PM, David Freedman <[hidden email]> wrote: > If there are many variables, I'd then suggest the data.table package: > > library(data.table) > dt=data.table(x) > dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group', > 'myweek')] > > The '.SD' is an abbreviation for all the variables in the data table > (excluding the grouping variables). There's an .SDcols= 'variables of > interest' option if you want to limit the dozens of variables to only some > of them. Or, in the data.table(x) statement, you could limit the created > data table to only the variables your interested in. > > As an added benefit, the data.table approach is amazingly fast (particularly > when there are numerous grouping categories) > > > -- > View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636825.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Honestly, I wasn't sure what you wanted to do with 'group'. Here it is with the 'group' variable deleted
library(data.table) dt=data.table(x[,-1]) dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek'] |
|
David, thanks a lot!
I tried x[-1] myself but forgot to delete 'group' from the keyby statement - this explains why it did not work for me. This is amazing - just 2 lines instead of my many-many. Great learning! Dimitri On Tue, Jul 17, 2012 at 10:49 PM, David Freedman <[hidden email]> wrote: > Honestly, I wasn't sure what you wanted to do with 'group'. Here it is with > the 'group' variable deleted > > library(data.table) > dt=data.table(x[,-1]) > dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek'] > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636828.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski marketfusionanalytics.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
