Dear Soren and R users:
I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: library(doBy) ## make up some data response = rnorm(100) group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) weights = runif(100, 0, 1) mydata = data.frame(response,group,weights) ## run summaryBy without weights: summaryBy(response~group, data = mydata, FUN = mean) ## attempt to run summaryBy with weights, throws error summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) ## throws the error: # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : # arguments must have same length My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean(). Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group? I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. Thanks, Solomon Messing www.stanford.edu/~messing PS I know this basic example can be done using lapply(split(...)) approach referenced here: http://www.mail-archive.com/r-help@.../msg12349.html but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Solomon,
On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing <[hidden email]> wrote: > Dear Soren and R users: > > I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: > > library(doBy) > ## make up some data > response = rnorm(100) > group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) > weights = runif(100, 0, 1) > mydata = data.frame(response,group,weights) > > ## run summaryBy without weights: > summaryBy(response~group, data = mydata, FUN = mean) > > ## attempt to run summaryBy with weights, throws error > summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) > > ## throws the error: > # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : > # arguments must have same length > > My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean(). Yes, of course. It has no way of knowing that the weights should also be being broken down by group....they are not in the formula. > Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group? Ideally there would be a way to pass more than one variable to a function (e.g., response and weights) or just an entire object (mydata) broken down by group. Then you would just make a wrapper function to pass the right values to the x and w arguments of weighted.mean. Instead here is a somewhat hacked version: library(doBy) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) ## manually compute weighted mean tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) tmp ## weighted means ## here's the 'problem', if you will, even with +, they are passed one at a time summaryBy(response + weights ~ group, data = mydata, FUN = str) summaryBy(mydata ~ group, data = mydata, FUN = str) ## here is an option using by(): xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) xy ## if you don't like the formatting.... data.frame(group = names(c(xy)), weighted.mean = c(xy)) HTH, Josh > > I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. > > Thanks, > > Solomon Messing > www.stanford.edu/~messing > > PS I know this basic example can be done using lapply(split(...)) approach referenced here: > > http://www.mail-archive.com/r-help@.../msg12349.html > > but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. > > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Solomon Messing
You might use the plyr package to get group-wise weighted means
library(plyr) ddply(mydata,~group,summarise, b=mean(weights), c=weighted.mean(response,weights)) hth david freedman |
In reply to this post by Joshua Wiley-2
Thanks Josh. I built on your example and ended up with the code below--if you or anyone sees any issues please let me know. It would be great if there were a slicker way to get these kinds of summary stats in R, but this gets the job done.
# takes data frame z with weights w and data x, returns weighted mean, weighted SE, and N msenw = function(z){ N = length(na.omit(z)$response) i = which(!is.na(z$response)) return( c( W.M = weighted.mean(z$response, z$weights, na.rm=T), W.SE = sqrt(wtd.var(z$response, weights = z$weights))/sqrt(sum(z$weights[i])), N=N ) ) } library(doBy) library(Hmisc) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) xy <- by(mydata, mydata$group, msenw) data.frame( group = names(c(xy)), do.call(rbind, xy) ) ## can be extended to other data using: xy <- by(data.frame(response = mydata$response, weights = mydata$weights), mydata$group, msenw) Solomon Messing www.stanford.edu/~messing On Jan 16, 2011, at 11:16 PM, Joshua Wiley wrote: > Dear Solomon, > > On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing > <[hidden email]> wrote: >> Dear Soren and R users: >> >> I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: >> >> library(doBy) >> ## make up some data >> response = rnorm(100) >> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) >> weights = runif(100, 0, 1) >> mydata = data.frame(response,group,weights) >> >> ## run summaryBy without weights: >> summaryBy(response~group, data = mydata, FUN = mean) >> >> ## attempt to run summaryBy with weights, throws error >> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) >> >> ## throws the error: >> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : >> # arguments must have same length >> >> My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean(). > > Yes, of course. It has no way of knowing that the weights should also > be being broken down by group....they are not in the formula. > >> Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group? > > Ideally there would be a way to pass more than one variable to a > function (e.g., response and weights) or just an entire object > (mydata) broken down by group. Then you would just make a wrapper > function to pass the right values to the x and w arguments of > weighted.mean. Instead here is a somewhat hacked version: > > library(doBy) > ## make up some data (easier) > mydata <- data.frame(response = rnorm(100), > group = rep(1:5, each = 20), weights = runif(100, 0, 1)) > > ## manually compute weighted mean > tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) > tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) > tmp ## weighted means > > ## here's the 'problem', if you will, even with +, they are passed > one at a time > summaryBy(response + weights ~ group, data = mydata, FUN = str) > summaryBy(mydata ~ group, data = mydata, FUN = str) > > ## here is an option using by(): > xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) > xy > ## if you don't like the formatting.... > data.frame(group = names(c(xy)), weighted.mean = c(xy)) > > HTH, > > Josh > >> >> I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. >> >> Thanks, >> >> Solomon Messing >> www.stanford.edu/~messing >> >> PS I know this basic example can be done using lapply(split(...)) approach referenced here: >> >> http://www.mail-archive.com/r-help@.../msg12349.html >> >> but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi:
Does this do what you need? wstats <- function(d) { require(Hmisc) N <- length(d$response[!is.na(d$response)]) c(WM = wtd.mean(d$response, d$weights), WSE = sqrt(wtd.var(d$response, d$weights)), N = N) } library(plyr) ddply(mydata, .(group), wstats) group WM WSE N 1 1 0.1302255752 1.1911298 20 2 2 -0.2814664362 0.8582928 20 3 3 -0.3640550516 1.2618343 20 4 4 0.0002852392 1.1463205 20 5 5 -0.0070283053 1.2315683 20 The trick to writing this function for input into plyr is that the argument is a data frame. When called in ddply(), the function wstats() will be applied to each sub-frame corresponding to the grouping factor(s). Inside it, the variables of interest are extracted relative to the input data frame and the three quantities are computed. I used wtd.mean() and wtd.var() from Hmisc, as both will remove NAs by default. In the ddply call, the function name is simply cited since a sub-data frame is the sole argument of the function. I couldn't figure out how to get doBy to get this to work, as it seems best suited to functions of one argument (a single response), but here's an alternative using the data.table package: library(data.table) # Assumes Hmisc is already loaded... myDT <- data.table(mydata, key = 'group') myDT[, list(N = length(response[!is.na(response)]), wtdMean = wtd.mean(response, weights), wtdSE = sqrt(wtd.var(response, weights))), by = 'group'] group N wtdMean wtdSE [1,] 1 20 0.1302255752 1.1911298 [2,] 2 20 -0.2814664362 0.8582928 [3,] 3 20 -0.3640550516 1.2618343 [4,] 4 20 0.0002852392 1.1463205 [5,] 5 20 -0.0070283053 1.2315683 data.table uses a different model of data organization from data frames. A simplistic description is that it you can think of a data.table as analogous to a table in a DBMS. Notice that the 'function call' is indexed inside the data table: the first 'subscript' corresponds to what are called I() operations (analogous to 'select' statements in an SQL); the second 'subscript' corresponds to J() operations, (analogous to 'where' statements), while the third argument is the by group(s), or sub-data tables, to which (in this case) the J() operations apply. For functions that take multiple arguments and that are meant to be applied in a groupwise fashion, I find plyr and data.table to be very good options. There are also base package alternatives (e.g., some combination of lapply(), mapply() and do.call()) and several other packages, but plyr and data.table are generally pretty good at handling most of the niggling details. Having said that, both have learning curves - data.table, in particular, will be much easier to pick up if you have some background in SQLs, since its syntax uses primary principles of SQL. data.table has a vignette and FAQ, along with an independent help list - for details, see its page on R-forge: http://r-forge.r-project.org/projects/datatable/ For plyr's documentation, see http://had.co.nz/plyr/ A link to its mailing list is found on that page as well. HTH, Dennis On Mon, Jan 17, 2011 at 10:24 AM, Solomon Messing <[hidden email] > wrote: > Thanks Josh. I built on your example and ended up with the code below--if > you or anyone sees any issues please let me know. It would be great if > there were a slicker way to get these kinds of summary stats in R, but this > gets the job done. > > # takes data frame z with weights w and data x, returns weighted mean, > weighted SE, and N > msenw = function(z){ > N = length(na.omit(z)$response) > i = which(!is.na(z$response)) > return( > c( W.M = weighted.mean(z$response, z$weights, > na.rm=T), > W.SE = sqrt(wtd.var(z$response, weights = > z$weights))/sqrt(sum(z$weights[i])), > N=N ) ) > } > > library(doBy) > library(Hmisc) > ## make up some data (easier) > mydata <- data.frame(response = rnorm(100), > group = rep(1:5, each = 20), weights = runif(100, 0, 1)) > > xy <- by(mydata, mydata$group, msenw) > data.frame( group = names(c(xy)), do.call(rbind, xy) ) > > ## can be extended to other data using: > xy <- by(data.frame(response = mydata$response, weights = mydata$weights), > mydata$group, msenw) > > > Solomon Messing > www.stanford.edu/~messing <http://www.stanford.edu/%7Emessing> > > > > On Jan 16, 2011, at 11:16 PM, Joshua Wiley wrote: > > > Dear Solomon, > > > > On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing > > <[hidden email]> wrote: > >> Dear Soren and R users: > >> > >> I am trying to use the summaryBy function with weights. Is this > possible? An example that illustrates what I am trying to do follows: > >> > >> library(doBy) > >> ## make up some data > >> response = rnorm(100) > >> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) > >> weights = runif(100, 0, 1) > >> mydata = data.frame(response,group,weights) > >> > >> ## run summaryBy without weights: > >> summaryBy(response~group, data = mydata, FUN = mean) > >> > >> ## attempt to run summaryBy with weights, throws error > >> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) > >> > >> ## throws the error: > >> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { > : > >> # arguments must have same length > >> > >> My guess is that summaryBy is not giving weighted.mean() each group of > weights, but instead is passing all of the weights in the data set each time > it calls weighted.mean(). > > > > Yes, of course. It has no way of knowing that the weights should also > > be being broken down by group....they are not in the formula. > > > >> Do you know if there is some way to get summaryBy to pass weights to > weighted.mean() only for each group? > > > > Ideally there would be a way to pass more than one variable to a > > function (e.g., response and weights) or just an entire object > > (mydata) broken down by group. Then you would just make a wrapper > > function to pass the right values to the x and w arguments of > > weighted.mean. Instead here is a somewhat hacked version: > > > > library(doBy) > > ## make up some data (easier) > > mydata <- data.frame(response = rnorm(100), > > group = rep(1:5, each = 20), weights = runif(100, 0, 1)) > > > > ## manually compute weighted mean > > tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) > > tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) > > tmp ## weighted means > > > > ## here's the 'problem', if you will, even with +, they are passed > > one at a time > > summaryBy(response + weights ~ group, data = mydata, FUN = str) > > summaryBy(mydata ~ group, data = mydata, FUN = str) > > > > ## here is an option using by(): > > xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, > z$weights)) > > xy > > ## if you don't like the formatting.... > > data.frame(group = names(c(xy)), weighted.mean = c(xy)) > > > > HTH, > > > > Josh > > > >> > >> I suspect this functionality would be a tremendous benefit to R users > who regularly work with weighted data, such as myself. > >> > >> Thanks, > >> > >> Solomon Messing > >> www.stanford.edu/~messing <http://www.stanford.edu/%7Emessing> > >> > >> PS I know this basic example can be done using lapply(split(...)) > approach referenced here: > >> > >> http://www.mail-archive.com/r-help@.../msg12349.html > >> > >> but for more complex tasks the lapply approach will mean writing a lot > of extra code to run everything and then to get things formatted as nicely > as summaryBy() was designed to do. > >> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > > > -- > > Joshua Wiley > > Ph.D. Student, Health Psychology > > University of California, Los Angeles > > http://www.joshuawiley.com/ > > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Joshua Wiley-2
It is currently not possible to pass weights in summaryBy.
Regards Søren ________________________________________ Fra: Joshua Wiley [[hidden email]] Sendt: 17. januar 2011 08:16 Til: Solomon Messing Cc: [hidden email]; Søren Højsgaard Emne: Re: [R] Using summaryBy with weighted data Dear Solomon, On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing <[hidden email]> wrote: > Dear Soren and R users: > > I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: > > library(doBy) > ## make up some data > response = rnorm(100) > group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) > weights = runif(100, 0, 1) > mydata = data.frame(response,group,weights) > > ## run summaryBy without weights: > summaryBy(response~group, data = mydata, FUN = mean) > > ## attempt to run summaryBy with weights, throws error > summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) > > ## throws the error: > # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : > # arguments must have same length > > My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean(). Yes, of course. It has no way of knowing that the weights should also be being broken down by group....they are not in the formula. > Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group? Ideally there would be a way to pass more than one variable to a function (e.g., response and weights) or just an entire object (mydata) broken down by group. Then you would just make a wrapper function to pass the right values to the x and w arguments of weighted.mean. Instead here is a somewhat hacked version: library(doBy) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) ## manually compute weighted mean tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) tmp ## weighted means ## here's the 'problem', if you will, even with +, they are passed one at a time summaryBy(response + weights ~ group, data = mydata, FUN = str) summaryBy(mydata ~ group, data = mydata, FUN = str) ## here is an option using by(): xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) xy ## if you don't like the formatting.... data.frame(group = names(c(xy)), weighted.mean = c(xy)) HTH, Josh > > I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. > > Thanks, > > Solomon Messing > www.stanford.edu/~messing > > PS I know this basic example can be done using lapply(split(...)) approach referenced here: > > http://www.mail-archive.com/r-help@.../msg12349.html > > but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. > > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by djmuseR
Thanks Dennis, looks like there's even less boiler plate code with plyr. By the way, what I labelled "W.SE" is meant to represent the weighted standard error of the mean. Your "WSE" calculations appear to be providing the weighted standard deviation of the variable. Is this a matter of needing to change my labels to W.SEM to avoid this kind of confusion, or is there literature suggesting that I should be using the standard deviation of the variable to estimate the weighted standard error of the mean?
Thanks, -Solomon On Jan 17, 2011, at 1:11 PM, Dennis Murphy wrote: > Hi: > > Does this do what you need? > > wstats <- function(d) { > require(Hmisc) > N <- length(d$response[!is.na(d$response)]) > c(WM = wtd.mean(d$response, d$weights), > WSE = sqrt(wtd.var(d$response, d$weights)), > N = N) > } > library(plyr) > ddply(mydata, .(group), wstats) > group WM WSE N > 1 1 0.1302255752 1.1911298 20 > 2 2 -0.2814664362 0.8582928 20 > 3 3 -0.3640550516 1.2618343 20 > 4 4 0.0002852392 1.1463205 20 > 5 5 -0.0070283053 1.2315683 20 > > The trick to writing this function for input into plyr is that the argument is a data frame. When called in ddply(), the function wstats() will be applied to each sub-frame corresponding to the grouping factor(s). Inside it, the variables of interest are extracted relative to the input data frame and the three quantities are computed. I used wtd.mean() and wtd.var() from Hmisc, as both will remove NAs by default. In the ddply call, the function name is simply cited since a sub-data frame is the sole argument of the function. > > I couldn't figure out how to get doBy to get this to work, as it seems best suited to functions of one argument (a single response), but here's an alternative using the data.table package: > > library(data.table) > # Assumes Hmisc is already loaded... > myDT <- data.table(mydata, key = 'group') > myDT[, list(N = length(response[!is.na(response)]), > wtdMean = wtd.mean(response, weights), > wtdSE = sqrt(wtd.var(response, weights))), by = 'group'] > group N wtdMean wtdSE > [1,] 1 20 0.1302255752 1.1911298 > [2,] 2 20 -0.2814664362 0.8582928 > [3,] 3 20 -0.3640550516 1.2618343 > [4,] 4 20 0.0002852392 1.1463205 > [5,] 5 20 -0.0070283053 1.2315683 > > data.table uses a different model of data organization from data frames. A simplistic description is that it you can think of a data.table as analogous to a table in a DBMS. Notice that the 'function call' is indexed inside the data table: the first 'subscript' corresponds to what are called I() operations (analogous to 'select' statements in an SQL); the second 'subscript' corresponds to J() operations, (analogous to 'where' statements), while the third argument is the by group(s), or sub-data tables, to which (in this case) the J() operations apply. > > For functions that take multiple arguments and that are meant to be applied in a groupwise fashion, I find plyr and data.table to be very good options. There are also base package alternatives (e.g., some combination of lapply(), mapply() and do.call()) and several other packages, but plyr and data.table are generally pretty good at handling most of the niggling details. Having said that, both have learning curves - data.table, in particular, will be much easier to pick up if you have some background in SQLs, since its syntax uses primary principles of SQL. > > data.table has a vignette and FAQ, along with an independent help list - for details, see its page on R-forge: > http://r-forge.r-project.org/projects/datatable/ > For plyr's documentation, see > http://had.co.nz/plyr/ > A link to its mailing list is found on that page as well. > > HTH, > Dennis > > On Mon, Jan 17, 2011 at 10:24 AM, Solomon Messing <[hidden email]> wrote: > Thanks Josh. I built on your example and ended up with the code below--if you or anyone sees any issues please let me know. It would be great if there were a slicker way to get these kinds of summary stats in R, but this gets the job done. > > # takes data frame z with weights w and data x, returns weighted mean, weighted SE, and N > msenw = function(z){ > N = length(na.omit(z)$response) > i = which(!is.na(z$response)) > return( > c( W.M = weighted.mean(z$response, z$weights, na.rm=T), > W.SE = sqrt(wtd.var(z$response, weights = z$weights))/sqrt(sum(z$weights[i])), > N=N ) ) > } > > library(doBy) > library(Hmisc) > ## make up some data (easier) > mydata <- data.frame(response = rnorm(100), > group = rep(1:5, each = 20), weights = runif(100, 0, 1)) > > xy <- by(mydata, mydata$group, msenw) > data.frame( group = names(c(xy)), do.call(rbind, xy) ) > > ## can be extended to other data using: > xy <- by(data.frame(response = mydata$response, weights = mydata$weights), mydata$group, msenw) > > > Solomon Messing > www.stanford.edu/~messing > > > > On Jan 16, 2011, at 11:16 PM, Joshua Wiley wrote: > > > Dear Solomon, > > > > On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing > > <[hidden email]> wrote: > >> Dear Soren and R users: > >> > >> I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows: > >> > >> library(doBy) > >> ## make up some data > >> response = rnorm(100) > >> group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) > >> weights = runif(100, 0, 1) > >> mydata = data.frame(response,group,weights) > >> > >> ## run summaryBy without weights: > >> summaryBy(response~group, data = mydata, FUN = mean) > >> > >> ## attempt to run summaryBy with weights, throws error > >> summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) > >> > >> ## throws the error: > >> # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : > >> # arguments must have same length > >> > >> My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean(). > > > > Yes, of course. It has no way of knowing that the weights should also > > be being broken down by group....they are not in the formula. > > > >> Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group? > > > > Ideally there would be a way to pass more than one variable to a > > function (e.g., response and weights) or just an entire object > > (mydata) broken down by group. Then you would just make a wrapper > > function to pass the right values to the x and w arguments of > > weighted.mean. Instead here is a somewhat hacked version: > > > > library(doBy) > > ## make up some data (easier) > > mydata <- data.frame(response = rnorm(100), > > group = rep(1:5, each = 20), weights = runif(100, 0, 1)) > > > > ## manually compute weighted mean > > tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) > > tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) > > tmp ## weighted means > > > > ## here's the 'problem', if you will, even with +, they are passed > > one at a time > > summaryBy(response + weights ~ group, data = mydata, FUN = str) > > summaryBy(mydata ~ group, data = mydata, FUN = str) > > > > ## here is an option using by(): > > xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) > > xy > > ## if you don't like the formatting.... > > data.frame(group = names(c(xy)), weighted.mean = c(xy)) > > > > HTH, > > > > Josh > > > >> > >> I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. > >> > >> Thanks, > >> > >> Solomon Messing > >> www.stanford.edu/~messing > >> > >> PS I know this basic example can be done using lapply(split(...)) approach referenced here: > >> > >> http://www.mail-archive.com/r-help@.../msg12349.html > >> > >> but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. > >> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > > > -- > > Joshua Wiley > > Ph.D. Student, Health Psychology > > University of California, Los Angeles > > http://www.joshuawiley.com/ > > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi everyone,
I am trying to run Sweave.bat (batchfiles_0.6-1) from the command line on Windows, but I get this error: C:\batchfiles_0.6-1>Sweave.bat Sweave-test-1 "Error: rterm.exe not found" I don't know how to set up the path if this one were the problem... I ran rcmd.bat and I got this... so I don't know if it is a path problem. C:\batchfiles_0.6-1>Rcmd,bat R_ARCH=/x64 R_ARCH0=x64 R_ARCH0=x64 cmdpath=C:\R\R-2.12.1\bin\x64\Rcmd.exe args=,bat 'bat' is not recognized as an internal or external command, operable program or batch file. the path of rterm.exe in my computer is: C:\R\R-2.12.1\bin\x64 thank you in advance! -- Sebastián Daza [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Solomon Messing
Hi everyone,
I am trying to run Sweave.bat (batchfiles_0.6-1) from the command line on Windows, but I get this error: C:\batchfiles_0.6-1>Sweave.bat Sweave-test-1 "Error: rterm.exe not found" I don't know how to set up the path if this one were the problem... I ran rcmd.bat and I got this... so I don't know if it is a path problem. C:\batchfiles_0.6-1>Rcmd,bat R_ARCH=/x64 R_ARCH0=x64 R_ARCH0=x64 cmdpath=C:\R\R-2.12.1\bin\x64\Rcmd.exe args=,bat 'bat' is not recognized as an internal or external command, operable program or batch file. the path of rterm.exe in my computer is: C:\R\R-2.12.1\bin\x64 thank you in advance! -- Sebastián Daza [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
2011/1/17 Sebastián Daza <[hidden email]>:
> Hi everyone, > I am trying to run Sweave.bat (batchfiles_0.6-1) from the command line on > Windows, but I get this error: > > C:\batchfiles_0.6-1>Sweave.bat Sweave-test-1 > "Error: rterm.exe not found" > > I don't know how to set up the path if this one were the problem... I ran > rcmd.bat and I got this... so I don't know if it is a path problem. > > C:\batchfiles_0.6-1>Rcmd,bat > R_ARCH=/x64 > R_ARCH0=x64 > R_ARCH0=x64 > cmdpath=C:\R\R-2.12.1\bin\x64\Rcmd.exe > args=,bat > 'bat' is not recognized as an internal or external command, > operable program or batch file. > > the path of rterm.exe in my computer is: C:\R\R-2.12.1\bin\x64 > thank you in advance! > You should not have to set up any paths. The whole point of these batch commands are to save you from doing that. If you contact me privately we can try to determine what has gone wrong with Sweave.bat . Regarding Rcmd.bat, the comma in your command line should be a dot. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Powered by Nabble | Edit this page |