Quantcast

weighted mean by week

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

weighted mean by week

Dimitri Liakhovitski-2
Hello!

I wrote a code that works, but it looks ugly to me - it's full of loops.
I am sure there is a much more elegant and shorter way to do it.
Thanks a lot for any hints!
Dimitri

# I have a data frame:
x<-data.frame(group=c("group1","group2","group1","group2"),
  myweight=c(0.4,0.6,0.4,0.6),
  myweek=as.Date(c("2012-07-09","2012-07-09","2012-07-16","2012-07-16")),
  var1=c(1,10,2,20),var2=c(10,1,20,2))
(x)

# For each week in "myweek", I'd like to build a weighted mean for
var1 and var2  - using  "myweight" as weight.
# Below is my inelegant code:

myweeks<-unique(x$myweek)
nr.of.weeks<-length(myweeks)
myvars<-c("var1","var2")

mylist<-NULL
for(i in 1:nr.of.weeks){   # i<-1
  out<-NULL
  for(var in myvars){ # var<-myvars[2]
        temp.x<-x[x$myweek %in% myweeks[i],c("myweight",var)]
        temp.out<-weighted.mean(temp.x[[2]],temp.x[[1]])
        out<-c(out,temp.out)
  }
  mylist[[i]]<-out
  names(mylist)[i]<-as.character(myweeks[i])
}
desired<-as.data.frame(do.call(rbind,mylist))
names(desired)<-myvars
(desired)

--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: weighted mean by week

David Freedman
The plyr package is very helpful for this:

library(plyr)
ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight), m2=weighted.mean(var2,myweight))
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: weighted mean by week

Dimitri Liakhovitski-2
Thanks a lot, David.
Indeed, it's much shorter.
Unfortunately, in my real task I am dozens and dozens of variables
like var1 and var2 so that manually specifying things like in
"m1=weighted.mean(var1,myweight)" would take a lot of code and a very
long time.
Dimitri

On Tue, Jul 17, 2012 at 6:34 PM, David Freedman <[hidden email]> wrote:

> The plyr package is very helpful for this:
>
> library(plyr)
> ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight),
> m2=weighted.mean(var2,myweight))
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636816.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: weighted mean by week

David Freedman
If there are many variables, I'd then suggest the data.table package:

library(data.table)
dt=data.table(x)
dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group', 'myweek')]

The '.SD' is an abbreviation for all the variables in the data table (excluding the grouping variables).  There's an .SDcols= 'variables of interest' option if you want to limit the dozens of variables to only some of them.  Or, in the data.table(x) statement, you could limit the created data table to only the variables your interested in.

As an added benefit, the data.table approach is amazingly fast (particularly when there are numerous grouping categories)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: weighted mean by week

Dimitri Liakhovitski-2
David, many thanks.
Did something get ommitted from your line?:

ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight),
m2=weighted.mean(var2,myweight))

Because it just reproduces x - in a somewhat different order...

Thank you!
Dimitri

On Tue, Jul 17, 2012 at 9:22 PM, David Freedman <[hidden email]> wrote:

> If there are many variables, I'd then suggest the data.table package:
>
> library(data.table)
> dt=data.table(x)
> dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group',
> 'myweek')]
>
> The '.SD' is an abbreviation for all the variables in the data table
> (excluding the grouping variables).  There's an .SDcols= 'variables of
> interest' option if you want to limit the dozens of variables to only some
> of them.  Or, in the data.table(x) statement, you could limit the created
> data table to only the variables your interested in.
>
> As an added benefit, the data.table approach is amazingly fast (particularly
> when there are numerous grouping categories)
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636825.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: weighted mean by week

David Freedman
Honestly, I wasn't sure what you wanted to do with 'group'.  Here it is with the 'group' variable deleted

library(data.table)
dt=data.table(x[,-1])
dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek']


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: weighted mean by week

Dimitri Liakhovitski-2
David, thanks a lot!
I tried x[-1] myself but forgot to delete 'group' from the keyby
statement - this explains why it did not work for me.
This is amazing - just 2 lines instead of my many-many.
Great learning!
Dimitri


On Tue, Jul 17, 2012 at 10:49 PM, David Freedman <[hidden email]> wrote:

> Honestly, I wasn't sure what you wanted to do with 'group'.  Here it is with
> the 'group' variable deleted
>
> library(data.table)
> dt=data.table(x[,-1])
> dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek']
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636828.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...