Weighted variance function?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Weighted variance function?

Arun.stat
There is a R function to calculate weighted mean : weighted.mean() under
stats package. Is there any direct R function for calculating weighted
variance as well?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weighted variance function?

Gavin Simpson
On Thu, 2008-07-24 at 02:25 +0530, Arun Kumar Saha wrote:
> There is a R function to calculate weighted mean : weighted.mean() under
> stats package. Is there any direct R function for calculating weighted
> variance as well?

Here are two ways; weighted.var() is via the usual formula and
weighted.var2() uses a running sums approach. The formulae for which are
both on the weighted mean entry page on wikipedia for example.

The removal of NA is as per weighted.mean, but I have not included any
of the sanity checks that that functions contains.

weighted.var <- function(x, w, na.rm = FALSE) {
    if (na.rm) {
        w <- w[i <- !is.na(x)]
        x <- x[i]
    }
    sum.w <- sum(w)
    sum.w2 <- sum(w^2)
    mean.w <- sum(x * w) / sum(w)
    (sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm =
na.rm)
}

weighted.var2 <- function(x, w, na.rm = FALSE) {
    if (na.rm) {
        w <- w[i <- !is.na(x)]
        x <- x[i]
    }
    sum.w <- sum(w)
    (sum(w*x^2) * sum.w - sum(w*x)^2) / (sum.w^2 - sum(w^2))
}

## from example section in ?weighted.mean
## GPA from Siegel 1994
wt <- c(5,  5,  4,  1)/15
x <- c(3.7,3.3,3.5,2.8)
weighted.mean(x,wt)

weighted.var(x, wt)

weighted.var2(x, wt)

And some timings:

> system.time(replicate(100000, weighted.var(x, wt)))
   user  system elapsed
  2.679   0.014   2.820
> system.time(replicate(100000, weighted.var2(x, wt)))
   user  system elapsed
  2.224   0.010   2.315

HTH

G
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weighted variance function?

Arun.stat
ur prog gives following result:

weighted.var(c(1,-1), c(0.5,0.5))
[1] 2

is it ok?

On Thu, Jul 24, 2008 at 7:57 PM, Gavin Simpson <[hidden email]>wrote:

> On Thu, 2008-07-24 at 02:25 +0530, Arun Kumar Saha wrote:
> > There is a R function to calculate weighted mean : weighted.mean() under
> > stats package. Is there any direct R function for calculating weighted
> > variance as well?
>
> Here are two ways; weighted.var() is via the usual formula and
> weighted.var2() uses a running sums approach. The formulae for which are
> both on the weighted mean entry page on wikipedia for example.
>
> The removal of NA is as per weighted.mean, but I have not included any
> of the sanity checks that that functions contains.
>
> weighted.var <- function(x, w, na.rm = FALSE) {
>    if (na.rm) {
>        w <- w[i <- !is.na(x)]
>        x <- x[i]
>    }
>    sum.w <- sum(w)
>    sum.w2 <- sum(w^2)
>    mean.w <- sum(x * w) / sum(w)
>    (sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm =
> na.rm)
> }
>
> weighted.var2 <- function(x, w, na.rm = FALSE) {
>    if (na.rm) {
>        w <- w[i <- !is.na(x)]
>        x <- x[i]
>    }
>    sum.w <- sum(w)
>    (sum(w*x^2) * sum.w - sum(w*x)^2) / (sum.w^2 - sum(w^2))
> }
>
> ## from example section in ?weighted.mean
> ## GPA from Siegel 1994
> wt <- c(5,  5,  4,  1)/15
> x <- c(3.7,3.3,3.5,2.8)
> weighted.mean(x,wt)
>
> weighted.var(x, wt)
>
> weighted.var2(x, wt)
>
> And some timings:
>
> > system.time(replicate(100000, weighted.var(x, wt)))
>   user  system elapsed
>  2.679   0.014   2.820
> > system.time(replicate(100000, weighted.var2(x, wt)))
>   user  system elapsed
>  2.224   0.010   2.315
>
> HTH
>
> G
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weighted variance function?

Arun.stat
I feel there is something wrong in definition in weighted variance.

Say for a fair dice the variance of outcome should be : 2.92 (
http://en.wikipedia.org/wiki/Variance)

However if I use cov.wt() or weighted.var() by Gavin, I get :

cov.wt(as.data.frame(1:6), rep(0.6, 6))
$cov
    1:6
1:6 3.5

$center
1:6
3.5

$n.obs
[1] 6

$wt
[1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667

i.e. 3.5

Therefore if I want to calculate Variance for a r.v. with different prob for
different values then should not use those formulae. Is it the case?

On Sun, Jul 27, 2008 at 2:30 PM, Patrick Burns <[hidden email]>wrote:

> Have you seen 'cov.wt'?
>
>
> Patrick Burns
> [hidden email]
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> Arun Kumar Saha wrote:
>
>> ur prog gives following result:
>>
>> weighted.var(c(1,-1), c(0.5,0.5))
>> [1] 2
>>
>> is it ok?
>>
>> On Thu, Jul 24, 2008 at 7:57 PM, Gavin Simpson <[hidden email]
>> >wrote:
>>
>>
>>
>>> On Thu, 2008-07-24 at 02:25 +0530, Arun Kumar Saha wrote:
>>>
>>>
>>>> There is a R function to calculate weighted mean : weighted.mean() under
>>>> stats package. Is there any direct R function for calculating weighted
>>>> variance as well?
>>>>
>>>>
>>> Here are two ways; weighted.var() is via the usual formula and
>>> weighted.var2() uses a running sums approach. The formulae for which are
>>> both on the weighted mean entry page on wikipedia for example.
>>>
>>> The removal of NA is as per weighted.mean, but I have not included any
>>> of the sanity checks that that functions contains.
>>>
>>> weighted.var <- function(x, w, na.rm = FALSE) {
>>>   if (na.rm) {
>>>       w <- w[i <- !is.na(x)]
>>>       x <- x[i]
>>>   }
>>>   sum.w <- sum(w)
>>>   sum.w2 <- sum(w^2)
>>>   mean.w <- sum(x * w) / sum(w)
>>>   (sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm =
>>> na.rm)
>>> }
>>>
>>> weighted.var2 <- function(x, w, na.rm = FALSE) {
>>>   if (na.rm) {
>>>       w <- w[i <- !is.na(x)]
>>>       x <- x[i]
>>>   }
>>>   sum.w <- sum(w)
>>>   (sum(w*x^2) * sum.w - sum(w*x)^2) / (sum.w^2 - sum(w^2))
>>> }
>>>
>>> ## from example section in ?weighted.mean
>>> ## GPA from Siegel 1994
>>> wt <- c(5,  5,  4,  1)/15
>>> x <- c(3.7,3.3,3.5,2.8)
>>> weighted.mean(x,wt)
>>>
>>> weighted.var(x, wt)
>>>
>>> weighted.var2(x, wt)
>>>
>>> And some timings:
>>>
>>>
>>>
>>>> system.time(replicate(100000, weighted.var(x, wt)))
>>>>
>>>>
>>>  user  system elapsed
>>>  2.679   0.014   2.820
>>>
>>>
>>>> system.time(replicate(100000, weighted.var2(x, wt)))
>>>>
>>>>
>>>  user  system elapsed
>>>  2.224   0.010   2.315
>>>
>>> HTH
>>>
>>> G
>>> --
>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>>>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>>>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>>>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
>>> <http://www.ucl.ac.uk/%7Eucfagls/>
>>>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>>
>>>
>>>
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>


--

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weighted variance function?

Arun.stat
oh now I got, R is taking unbiased estimator, anyways thanks

On Sun, Jul 27, 2008 at 2:54 PM, Arun Kumar Saha
<[hidden email]>wrote:

> I feel there is something wrong in definition in weighted variance.
>
> Say for a fair dice the variance of outcome should be : 2.92 (
> http://en.wikipedia.org/wiki/Variance)
>
> However if I use cov.wt() or weighted.var() by Gavin, I get :
>
> cov.wt(as.data.frame(1:6), rep(0.6, 6))
> $cov
>     1:6
> 1:6 3.5
>
> $center
> 1:6
> 3.5
>
> $n.obs
> [1] 6
>
> $wt
> [1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
>
> i.e. 3.5
>
> Therefore if I want to calculate Variance for a r.v. with different prob
> for different values then should not use those formulae. Is it the case?
>
>
> On Sun, Jul 27, 2008 at 2:30 PM, Patrick Burns <[hidden email]>wrote:
>
>> Have you seen 'cov.wt'?
>>
>>
>> Patrick Burns
>> [hidden email]
>> +44 (0)20 8525 0696
>> http://www.burns-stat.com
>> (home of S Poetry and "A Guide for the Unwilling S User")
>>
>> Arun Kumar Saha wrote:
>>
>>> ur prog gives following result:
>>>
>>> weighted.var(c(1,-1), c(0.5,0.5))
>>> [1] 2
>>>
>>> is it ok?
>>>
>>> On Thu, Jul 24, 2008 at 7:57 PM, Gavin Simpson <[hidden email]
>>> >wrote:
>>>
>>>
>>>
>>>> On Thu, 2008-07-24 at 02:25 +0530, Arun Kumar Saha wrote:
>>>>
>>>>
>>>>> There is a R function to calculate weighted mean : weighted.mean()
>>>>> under
>>>>> stats package. Is there any direct R function for calculating weighted
>>>>> variance as well?
>>>>>
>>>>>
>>>> Here are two ways; weighted.var() is via the usual formula and
>>>> weighted.var2() uses a running sums approach. The formulae for which are
>>>> both on the weighted mean entry page on wikipedia for example.
>>>>
>>>> The removal of NA is as per weighted.mean, but I have not included any
>>>> of the sanity checks that that functions contains.
>>>>
>>>> weighted.var <- function(x, w, na.rm = FALSE) {
>>>>   if (na.rm) {
>>>>       w <- w[i <- !is.na(x)]
>>>>       x <- x[i]
>>>>   }
>>>>   sum.w <- sum(w)
>>>>   sum.w2 <- sum(w^2)
>>>>   mean.w <- sum(x * w) / sum(w)
>>>>   (sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm =
>>>> na.rm)
>>>> }
>>>>
>>>> weighted.var2 <- function(x, w, na.rm = FALSE) {
>>>>   if (na.rm) {
>>>>       w <- w[i <- !is.na(x)]
>>>>       x <- x[i]
>>>>   }
>>>>   sum.w <- sum(w)
>>>>   (sum(w*x^2) * sum.w - sum(w*x)^2) / (sum.w^2 - sum(w^2))
>>>> }
>>>>
>>>> ## from example section in ?weighted.mean
>>>> ## GPA from Siegel 1994
>>>> wt <- c(5,  5,  4,  1)/15
>>>> x <- c(3.7,3.3,3.5,2.8)
>>>> weighted.mean(x,wt)
>>>>
>>>> weighted.var(x, wt)
>>>>
>>>> weighted.var2(x, wt)
>>>>
>>>> And some timings:
>>>>
>>>>
>>>>
>>>>> system.time(replicate(100000, weighted.var(x, wt)))
>>>>>
>>>>>
>>>>  user  system elapsed
>>>>  2.679   0.014   2.820
>>>>
>>>>
>>>>> system.time(replicate(100000, weighted.var2(x, wt)))
>>>>>
>>>>>
>>>>  user  system elapsed
>>>>  2.224   0.010   2.315
>>>>
>>>> HTH
>>>>
>>>> G
>>>> --
>>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>>>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>>>>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>>>>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>>>>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
>>>> <http://www.ucl.ac.uk/%7Eucfagls/>
>>>>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>>>
>>>>
>>>>
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
>>
>
>
> --
>
>


--

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.