Percentiles/Quantiles with Weighting

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Percentiles/Quantiles with Weighting

Brigid Mooney
Hi All,

I am looking at applications of percentiles to time sequenced data.  I had
just been using the quantile function to get percentiles over various
periods, but am more interested in if there is an accepted (and/or
R-implemented) method to apply weighting to the data so as to weigh recent
data more heavily.

I wrote the following function, but it seems quite inefficient, and not
really very flexible in its applications - so if anyone has any suggestions
on how to look at quantiles/percentiles within R while also using a
weighting schema, I would be very interested.

Note - this function supposes the data in X is time-sequenced, with the most
recent (and thus heaviest weighted) data at the end of the vector

WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
{
  Xprime <- NA

  for(i in 1:length(X))
  {
    Xprime <- c(Xprime, rep(X[i], times=i))
  }

  print("Percentiles:")
  print(quantile(X, pctile))
  print("Weighted:")
  print(Xprime)
  print("Weighted Percentiles:")
  print(quantile(Xprime, pctile, na.rm=TRUE))
}

WtPercentile(1:10)
WtPercentile(rnorm(10))

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Percentiles/Quantiles with Weighting

David Winsemius
I do know that Harrell's Quantile function in the Hmisc package will  
allow quantile estimates from models. Whether it is general enough to  
extend to time series, I have no experience and cannot say.

--
David Winsemius


On Feb 17, 2009, at 11:57 AM, Brigid Mooney wrote:

> Hi All,
>
> I am looking at applications of percentiles to time sequenced data.  
> I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> R-implemented) method to apply weighting to the data so as to weigh  
> recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and  
> not
> really very flexible in its applications - so if anyone has any  
> suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note - this function supposes the data in X is time-sequenced, with  
> the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
>  Xprime <- NA
>
>  for(i in 1:length(X))
>  {
>    Xprime <- c(Xprime, rep(X[i], times=i))
>  }
>
>  print("Percentiles:")
>  print(quantile(X, pctile))
>  print("Weighted:")
>  print(Xprime)
>  print("Weighted Percentiles:")
>  print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Percentiles/Quantiles with Weighting

Brigid Mooney
In reply to this post by Brigid Mooney
Thanks for pointing me to the quantreg package as a resource.  I was hoping
to ask be able to address one quick follow-up question...

I get slightly different variants between using the rq funciton with formula
= mydata ~ 1 as I would if I ran the same data using the quantile function.

Example:

mydata <- (1:10)^2/2
pctile <- seq(.59, .99, .1)

quantile(mydata, pctile)
59%    69%    79%    89%    99%
20.015 26.075 32.935 40.595 49.145

rq(mydata~1, tau=pctile)
Call:
rq(formula = mydata ~ 1, tau = pctile)
Coefficients:
            tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
(Intercept)        18      24.5        32      40.5        50
Degrees of freedom: 10 total; 9 residual

Is it correct to assume this is due to the different accepted methods of
calculating quantiles?  If so, do you know where I would be able to see the
algorithms used in these functions?  I'm not finding it in the documentation
for function rq, and am new enough to R that I don't know where those
references would generally be.




On Tue, Feb 17, 2009 at 12:29 PM, roger koenker <[hidden email]> wrote:

> http://www.nabble.com/weighted-quantiles-to19864562.html#a19865869
>
> gives one possibility...
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Champaign, IL 61820
>
>
>
>
> On Feb 17, 2009, at 10:57 AM, Brigid Mooney wrote:
>
>   Hi All,
>>
>> I am looking at applications of percentiles to time sequenced data.  I had
>> just been using the quantile function to get percentiles over various
>> periods, but am more interested in if there is an accepted (and/or
>> R-implemented) method to apply weighting to the data so as to weigh recent
>> data more heavily.
>>
>> I wrote the following function, but it seems quite inefficient, and not
>> really very flexible in its applications - so if anyone has any
>> suggestions
>> on how to look at quantiles/percentiles within R while also using a
>> weighting schema, I would be very interested.
>>
>> Note - this function supposes the data in X is time-sequenced, with the
>> most
>> recent (and thus heaviest weighted) data at the end of the vector
>>
>> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
>> {
>>  Xprime <- NA
>>
>>  for(i in 1:length(X))
>>  {
>>   Xprime <- c(Xprime, rep(X[i], times=i))
>>  }
>>
>>  print("Percentiles:")
>>  print(quantile(X, pctile))
>>  print("Weighted:")
>>  print(Xprime)
>>  print("Weighted Percentiles:")
>>  print(quantile(Xprime, pctile, na.rm=TRUE))
>> }
>>
>> WtPercentile(1:10)
>> WtPercentile(rnorm(10))
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Percentiles/Quantiles with Weighting

RKoenker

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820



On Feb 17, 2009, at 1:58 PM, Brigid Mooney wrote:

> Thanks for pointing me to the quantreg package as a resource.  I was  
> hoping to ask be able to address one quick follow-up question...
>
> I get slightly different variants between using the rq funciton with  
> formula = mydata ~ 1 as I would if I ran the same data using the  
> quantile function.
> Example:
>
> mydata <- (1:10)^2/2
> pctile <- seq(.59, .99, .1)
>
> quantile(mydata, pctile)
> 59%    69%    79%    89%    99%
> 20.015 26.075 32.935 40.595 49.145
>
> rq(mydata~1, tau=pctile)
> Call:
> rq(formula = mydata ~ 1, tau = pctile)
> Coefficients:
>             tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
> (Intercept)        18      24.5        32      40.5        50
> Degrees of freedom: 10 total; 9 residual
>
> Is it correct to assume this is due to the different accepted  
> methods of calculating quantiles?  If so, do you know where I would  
> be able to see the algorithms used in these functions?  I'm not  
> finding it in the documentation for function rq, and am new enough  
> to R that I don't know where those references would generally be.
>

Yes,  quantile() in base R documents 9 varieties of quantiles, 2 more  
than
William Empson's famous 7 Types of Ambiguity.  In quantreg the function
rq() finds a solution to an underlying optimization problem and  
doesn't ask
any further into the nature of the ambiguity -- it does often produce a
warning indicating that there may be more than one solution.  The  
default
base R quantile is interpolated, while the default rq() with method =  
"br"
using the simplex algorithm finds an order statistic, typically.  If  
you prefer
something more like interpolation, you can try rq() with method = "fn"
which is using an interior point algorithm and when there are multiple
solutions it tends to produce something more like the centroid of the
solution set.  I hope that this helps.

>
>
>
> On Tue, Feb 17, 2009 at 12:29 PM, roger koenker <[hidden email]>  
> wrote:
> http://www.nabble.com/weighted-quantiles-to19864562.html#a19865869
>
> gives one possibility...
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Champaign, IL 61820
>
>
>
>
> On Feb 17, 2009, at 10:57 AM, Brigid Mooney wrote:
>
> Hi All,
>
> I am looking at applications of percentiles to time sequenced data.  
> I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> R-implemented) method to apply weighting to the data so as to weigh  
> recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and  
> not
> really very flexible in its applications - so if anyone has any  
> suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note - this function supposes the data in X is time-sequenced, with  
> the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
>  Xprime <- NA
>
>  for(i in 1:length(X))
>  {
>   Xprime <- c(Xprime, rep(X[i], times=i))
>  }
>
>  print("Percentiles:")
>  print(quantile(X, pctile))
>  print("Weighted:")
>  print(Xprime)
>  print("Weighted Percentiles:")
>  print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Percentiles/Quantiles with Weighting

macrakis
In reply to this post by Brigid Mooney
Here is one kind of weighted quantile function.

The basic idea is very simple:

wquantile <- function( v, w, p )
  {
    v <- v[order(v)]
    w <- w[order(v)]
    v [ which.max( cumsum(w) / sum(w) >= p ) ]
  }

With some more error-checking and general clean-up, it looks like this:

# Simple weighted quantile
#
# v  A numeric vector of observations
# w  A numeric vector of positive weights
# p  The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.

# Basic idea

wquantile <- function( v, w, p )
  {
    v <- v[order(v)]
    w <- w[order(v)]
    v [ which.max( cumsum(w) / sum(w) >= p ) ]
  }

# Simple weighted quantile
#
# v  A numeric vector of observations
# w  A numeric vector of positive weights
# p  The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.

wquantile <- function(v,w=rep(1,length(v)),p=.5)
   {
     if (!is.numeric(v) || !is.numeric(w) || length(v) != length(w))
       stop("Values and weights must be equal-length numeric vectors")
     if ( !is.numeric(p) || any( p<0 | p>1 ) )
       stop("Quantiles must be 0<=p<=1")
     ranking <- order(v)
     sumw <- cumsum(w[ranking])
     if ( is.na(w[1]) || w[1]<0 ) stop("Weights must be non-negative numbers")
     plist <- sumw/sumw[length(sumw)]
     sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ])
   }

I would appreciate any comments people have on this -- whether
correctness, efficiency, style, ....

              -s


On Tue, Feb 17, 2009 at 11:57 AM, Brigid Mooney <[hidden email]> wrote:

> Hi All,
>
> I am looking at applications of percentiles to time sequenced data.  I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> R-implemented) method to apply weighting to the data so as to weigh recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and not
> really very flexible in its applications - so if anyone has any suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note - this function supposes the data in X is time-sequenced, with the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
>  Xprime <- NA
>
>  for(i in 1:length(X))
>  {
>    Xprime <- c(Xprime, rep(X[i], times=i))
>  }
>
>  print("Percentiles:")
>  print(quantile(X, pctile))
>  print("Weighted:")
>  print(Xprime)
>  print("Weighted Percentiles:")
>  print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Fwd: Percentiles/Quantiles with Weighting

macrakis
Some minor improvements and corrections below

# Simple weighted quantile
#
# v  A vector of sortable observations
# w A numeric vector of positive weights
# p  The quantile 0<=p<=1
#
# Nothing fancy: no interpolation etc.; NA cases not thought through

 wquantile <- function(v,w=rep(1,length(v)),p=.5)
  {
    if ( !is.numeric(w) || length(v) != length(w) )
      stop("Values and weights must be equal-length vectors")
    if ( !is.numeric(p) || any( p<0 | p>1) )
      stop("Quantiles must be 0<=p<=1")
    if ( min(w) < 0 ) stop("Weights must be non-negative numbers")
    ranking <- order(v)
    sumw <- cumsum(w[ranking])
    plist <- sumw / sumw[ length(sumw) ]
    sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ] )
  }

I would appreciate any comments people have on this -- whether
correctness, efficiency, style, ....

             -s

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Percentiles/Quantiles with Weighting

Dieter Menne
Stavros Macrakis <macrakis <at> alum.mit.edu> writes:

>
> Some minor improvements and corrections below
>
> # Simple weighted quantile
> #
> # v  A vector of sortable observations
> # w A numeric vector of positive weights
> # p  The quantile 0<=p<=1
> #
> # Nothing fancy: no interpolation etc.; NA cases not thought through
>
>  wquantile <- function(v,w=rep(1,length(v)),p=.5)

You could compare it with wtd.quantile in Hmisc.

Dieter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Percentiles/Quantiles with Weighting

Thomas Lumley
In reply to this post by Brigid Mooney
On Tue, 17 Feb 2009, Brigid Mooney wrote:

> Thanks for pointing me to the quantreg package as a resource.  I was hoping
> to ask be able to address one quick follow-up question...
>
> I get slightly different variants between using the rq funciton with formula
> = mydata ~ 1 as I would if I ran the same data using the quantile function.
>
> Example:
>
> mydata <- (1:10)^2/2
> pctile <- seq(.59, .99, .1)
>
> quantile(mydata, pctile)
> 59%    69%    79%    89%    99%
> 20.015 26.075 32.935 40.595 49.145
>
> rq(mydata~1, tau=pctile)
> Call:
> rq(formula = mydata ~ 1, tau = pctile)
> Coefficients:
>            tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
> (Intercept)        18      24.5        32      40.5        50
> Degrees of freedom: 10 total; 9 residual
>
> Is it correct to assume this is due to the different accepted methods of
> calculating quantiles?

If you try
  lapply(1:9, function(i)quantile(mydata, pctile,type=i))
the answers from type=1 or 2 agree with rq().

       -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.