# Problem with Weighted Variance in Hmisc

6 messages
Open this post in threaded view
|

## Problem with Weighted Variance in Hmisc

 The function wtd.var(x,w) in Hmisc calculates the weighted variance of x where w are the weights.  It appears to me that wtd.var(x,w) = var(x) if all of the weights are equal, but this does not appear to be the case. Can someone point out to me where I am going wrong here?  Thanks.   Tom La Bone         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Problem with Weighted Variance in Hmisc

 On 2007-June-01  , at 01:03 , Tom La Bone wrote: > The function wtd.var(x,w) in Hmisc calculates the weighted variance   > of x > where w are the weights.  It appears to me that wtd.var(x,w) = var > (x) if all > of the weights are equal, but this does not appear to be the case. Can > someone point out to me where I am going wrong here?  Thanks. The true formula of weighted variance is this one:         http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/  weighvar.pdf But for computation purposes, wtd.var uses another definition which   considers the weights as repeats instead of true weights. However if   the weights are normalized (sum to one) to two formulas are equal. If   you consider weights as real weights instead of repeats, I would   recommend to use this option. With normwt=T, your issue is solved:  > a=1:10  > b=a  > b[]=2  > b [1] 2 2 2 2 2 2 2 2 2 2  > wtd.var(a,b) [1] 8.68421 # all weights equal 2 <=> there are two repeats of each element of a  > var(c(a,a)) [1] 8.68421  > wtd.var(a,b,normwt=T) [1] 9.166667  > var(a) [1] 9.166667 Cheers, JiHO --- http://jo.irisson.free.fr/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Problem with Weighted Variance in Hmisc

 Thanks.  I have another related question:   The equation for weighted variance given in the NIST DataPlot documentation is the usual variance equation with the weights inserted.  The weighted variance of the weighted mean is this weighted variance divided by N. There is another approach to calculating the weighted variance of the weighted mean that propagates the uncertainty of each term in the weighted mean (see Data Reduction and Error Analysis for the Physical Sciences by Bevington & Robinson).  The two approaches do not give the same answer. Can anyone suggest a reference that discusses the merits of the DataPlot approach versus the Bevington approach? Tom La Bone -----Original Message----- From: jiho [mailto:[hidden email]] Sent: Friday, June 01, 2007 2:17 AM To: [hidden email]; R-help Subject: Re: [R] Problem with Weighted Variance in Hmisc On 2007-June-01  , at 01:03 , Tom La Bone wrote: > The function wtd.var(x,w) in Hmisc calculates the weighted variance   > of x > where w are the weights.  It appears to me that wtd.var(x,w) = var > (x) if all > of the weights are equal, but this does not appear to be the case. Can > someone point out to me where I am going wrong here?  Thanks. The true formula of weighted variance is this one:         http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/  weighvar.pdf But for computation purposes, wtd.var uses another definition which   considers the weights as repeats instead of true weights. However if   the weights are normalized (sum to one) to two formulas are equal. If   you consider weights as real weights instead of repeats, I would   recommend to use this option. With normwt=T, your issue is solved:  > a=1:10  > b=a  > b[]=2  > b [1] 2 2 2 2 2 2 2 2 2 2  > wtd.var(a,b) [1] 8.68421 # all weights equal 2 <=> there are two repeats of each element of a  > var(c(a,a)) [1] 8.68421  > wtd.var(a,b,normwt=T) [1] 9.166667  > var(a) [1] 9.166667 Cheers, JiHO --- http://jo.irisson.free.fr/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Problem with Weighted Variance in Hmisc

 In reply to this post by jiho jiho wrote: > On 2007-June-01  , at 01:03 , Tom La Bone wrote: >> The function wtd.var(x,w) in Hmisc calculates the weighted variance   >> of x >> where w are the weights.  It appears to me that wtd.var(x,w) = var >> (x) if all >> of the weights are equal, but this does not appear to be the case. Can >> someone point out to me where I am going wrong here?  Thanks. > > The true formula of weighted variance is this one: > http://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/  > weighvar.pdf > But for computation purposes, wtd.var uses another definition which   > considers the weights as repeats instead of true weights. However if   > the weights are normalized (sum to one) to two formulas are equal. If   > you consider weights as real weights instead of repeats, I would   > recommend to use this option. > With normwt=T, your issue is solved: > >  > a=1:10 >  > b=a >  > b[]=2 >  > b > [1] 2 2 2 2 2 2 2 2 2 2 >  > wtd.var(a,b) > [1] 8.68421 > # all weights equal 2 <=> there are two repeats of each element of a >  > var(c(a,a)) > [1] 8.68421 >  > wtd.var(a,b,normwt=T) > [1] 9.166667 >  > var(a) > [1] 9.166667 > > Cheers, > > JiHO The issue is what is being assumed for N in the denominator of the variance formula, since the unbiased estimator subtracts one.  Using normwt=TRUE means you are in effect assuming N is the number of elements in the data vector, ignoring the weights. Frank Harrell > --- > http://jo.irisson.free.fr/> > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > -- Frank E Harrell Jr   Professor and Chair           School of Medicine                       Department of Biostatistics   Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. Frank Harrell Department of Biostatistics, Vanderbilt University