

Hi All,
I am looking at applications of percentiles to time sequenced data. I had
just been using the quantile function to get percentiles over various
periods, but am more interested in if there is an accepted (and/or
Rimplemented) method to apply weighting to the data so as to weigh recent
data more heavily.
I wrote the following function, but it seems quite inefficient, and not
really very flexible in its applications  so if anyone has any suggestions
on how to look at quantiles/percentiles within R while also using a
weighting schema, I would be very interested.
Note  this function supposes the data in X is timesequenced, with the most
recent (and thus heaviest weighted) data at the end of the vector
WtPercentile < function(X=rnorm(100), pctile=seq(.1,1,.1))
{
Xprime < NA
for(i in 1:length(X))
{
Xprime < c(Xprime, rep(X[i], times=i))
}
print("Percentiles:")
print(quantile(X, pctile))
print("Weighted:")
print(Xprime)
print("Weighted Percentiles:")
print(quantile(Xprime, pctile, na.rm=TRUE))
}
WtPercentile(1:10)
WtPercentile(rnorm(10))
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I do know that Harrell's Quantile function in the Hmisc package will
allow quantile estimates from models. Whether it is general enough to
extend to time series, I have no experience and cannot say.

David Winsemius
On Feb 17, 2009, at 11:57 AM, Brigid Mooney wrote:
> Hi All,
>
> I am looking at applications of percentiles to time sequenced data.
> I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> Rimplemented) method to apply weighting to the data so as to weigh
> recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and
> not
> really very flexible in its applications  so if anyone has any
> suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note  this function supposes the data in X is timesequenced, with
> the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile < function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
> Xprime < NA
>
> for(i in 1:length(X))
> {
> Xprime < c(Xprime, rep(X[i], times=i))
> }
>
> print("Percentiles:")
> print(quantile(X, pctile))
> print("Weighted:")
> print(Xprime)
> print("Weighted Percentiles:")
> print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks for pointing me to the quantreg package as a resource. I was hoping
to ask be able to address one quick followup question...
I get slightly different variants between using the rq funciton with formula
= mydata ~ 1 as I would if I ran the same data using the quantile function.
Example:
mydata < (1:10)^2/2
pctile < seq(.59, .99, .1)
quantile(mydata, pctile)
59% 69% 79% 89% 99%
20.015 26.075 32.935 40.595 49.145
rq(mydata~1, tau=pctile)
Call:
rq(formula = mydata ~ 1, tau = pctile)
Coefficients:
tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
(Intercept) 18 24.5 32 40.5 50
Degrees of freedom: 10 total; 9 residual
Is it correct to assume this is due to the different accepted methods of
calculating quantiles? If so, do you know where I would be able to see the
algorithms used in these functions? I'm not finding it in the documentation
for function rq, and am new enough to R that I don't know where those
references would generally be.
On Tue, Feb 17, 2009 at 12:29 PM, roger koenker < [hidden email]> wrote:
> http://www.nabble.com/weightedquantilesto19864562.html#a19865869>
> gives one possibility...
>
> url: www.econ.uiuc.edu/~roger Roger Koenker
> email [hidden email] Department of Economics
> vox: 2173334558 University of Illinois
> fax: 2172446678 Champaign, IL 61820
>
>
>
>
> On Feb 17, 2009, at 10:57 AM, Brigid Mooney wrote:
>
> Hi All,
>>
>> I am looking at applications of percentiles to time sequenced data. I had
>> just been using the quantile function to get percentiles over various
>> periods, but am more interested in if there is an accepted (and/or
>> Rimplemented) method to apply weighting to the data so as to weigh recent
>> data more heavily.
>>
>> I wrote the following function, but it seems quite inefficient, and not
>> really very flexible in its applications  so if anyone has any
>> suggestions
>> on how to look at quantiles/percentiles within R while also using a
>> weighting schema, I would be very interested.
>>
>> Note  this function supposes the data in X is timesequenced, with the
>> most
>> recent (and thus heaviest weighted) data at the end of the vector
>>
>> WtPercentile < function(X=rnorm(100), pctile=seq(.1,1,.1))
>> {
>> Xprime < NA
>>
>> for(i in 1:length(X))
>> {
>> Xprime < c(Xprime, rep(X[i], times=i))
>> }
>>
>> print("Percentiles:")
>> print(quantile(X, pctile))
>> print("Weighted:")
>> print(Xprime)
>> print("Weighted Percentiles:")
>> print(quantile(Xprime, pctile, na.rm=TRUE))
>> }
>>
>> WtPercentile(1:10)
>> WtPercentile(rnorm(10))
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide
>> http://www.Rproject.org/postingguide.html< http://www.rproject.org/postingguide.html>
>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


url: www.econ.uiuc.edu/~roger Roger Koenker
email [hidden email] Department of Economics
vox: 2173334558 University of Illinois
fax: 2172446678 Champaign, IL 61820
On Feb 17, 2009, at 1:58 PM, Brigid Mooney wrote:
> Thanks for pointing me to the quantreg package as a resource. I was
> hoping to ask be able to address one quick followup question...
>
> I get slightly different variants between using the rq funciton with
> formula = mydata ~ 1 as I would if I ran the same data using the
> quantile function.
> Example:
>
> mydata < (1:10)^2/2
> pctile < seq(.59, .99, .1)
>
> quantile(mydata, pctile)
> 59% 69% 79% 89% 99%
> 20.015 26.075 32.935 40.595 49.145
>
> rq(mydata~1, tau=pctile)
> Call:
> rq(formula = mydata ~ 1, tau = pctile)
> Coefficients:
> tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
> (Intercept) 18 24.5 32 40.5 50
> Degrees of freedom: 10 total; 9 residual
>
> Is it correct to assume this is due to the different accepted
> methods of calculating quantiles? If so, do you know where I would
> be able to see the algorithms used in these functions? I'm not
> finding it in the documentation for function rq, and am new enough
> to R that I don't know where those references would generally be.
>
Yes, quantile() in base R documents 9 varieties of quantiles, 2 more
than
William Empson's famous 7 Types of Ambiguity. In quantreg the function
rq() finds a solution to an underlying optimization problem and
doesn't ask
any further into the nature of the ambiguity  it does often produce a
warning indicating that there may be more than one solution. The
default
base R quantile is interpolated, while the default rq() with method =
"br"
using the simplex algorithm finds an order statistic, typically. If
you prefer
something more like interpolation, you can try rq() with method = "fn"
which is using an interior point algorithm and when there are multiple
solutions it tends to produce something more like the centroid of the
solution set. I hope that this helps.
>
>
>
> On Tue, Feb 17, 2009 at 12:29 PM, roger koenker < [hidden email]>
> wrote:
> http://www.nabble.com/weightedquantilesto19864562.html#a19865869>
> gives one possibility...
>
> url: www.econ.uiuc.edu/~roger Roger Koenker
> email [hidden email] Department of Economics
> vox: 2173334558 University of Illinois
> fax: 2172446678 Champaign, IL 61820
>
>
>
>
> On Feb 17, 2009, at 10:57 AM, Brigid Mooney wrote:
>
> Hi All,
>
> I am looking at applications of percentiles to time sequenced data.
> I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> Rimplemented) method to apply weighting to the data so as to weigh
> recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and
> not
> really very flexible in its applications  so if anyone has any
> suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note  this function supposes the data in X is timesequenced, with
> the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile < function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
> Xprime < NA
>
> for(i in 1:length(X))
> {
> Xprime < c(Xprime, rep(X[i], times=i))
> }
>
> print("Percentiles:")
> print(quantile(X, pctile))
> print("Weighted:")
> print(Xprime)
> print("Weighted Percentiles:")
> print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Here is one kind of weighted quantile function.
The basic idea is very simple:
wquantile < function( v, w, p )
{
v < v[order(v)]
w < w[order(v)]
v [ which.max( cumsum(w) / sum(w) >= p ) ]
}
With some more errorchecking and general cleanup, it looks like this:
# Simple weighted quantile
#
# v A numeric vector of observations
# w A numeric vector of positive weights
# p The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.
# Basic idea
wquantile < function( v, w, p )
{
v < v[order(v)]
w < w[order(v)]
v [ which.max( cumsum(w) / sum(w) >= p ) ]
}
# Simple weighted quantile
#
# v A numeric vector of observations
# w A numeric vector of positive weights
# p The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.
wquantile < function(v,w=rep(1,length(v)),p=.5)
{
if (!is.numeric(v)  !is.numeric(w)  length(v) != length(w))
stop("Values and weights must be equallength numeric vectors")
if ( !is.numeric(p)  any( p<0  p>1 ) )
stop("Quantiles must be 0<=p<=1")
ranking < order(v)
sumw < cumsum(w[ranking])
if ( is.na(w[1])  w[1]<0 ) stop("Weights must be nonnegative numbers")
plist < sumw/sumw[length(sumw)]
sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ])
}
I would appreciate any comments people have on this  whether
correctness, efficiency, style, ....
s
On Tue, Feb 17, 2009 at 11:57 AM, Brigid Mooney < [hidden email]> wrote:
> Hi All,
>
> I am looking at applications of percentiles to time sequenced data. I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> Rimplemented) method to apply weighting to the data so as to weigh recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and not
> really very flexible in its applications  so if anyone has any suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note  this function supposes the data in X is timesequenced, with the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile < function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
> Xprime < NA
>
> for(i in 1:length(X))
> {
> Xprime < c(Xprime, rep(X[i], times=i))
> }
>
> print("Percentiles:")
> print(quantile(X, pctile))
> print("Weighted:")
> print(Xprime)
> print("Weighted Percentiles:")
> print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Some minor improvements and corrections below
# Simple weighted quantile
#
# v A vector of sortable observations
# w A numeric vector of positive weights
# p The quantile 0<=p<=1
#
# Nothing fancy: no interpolation etc.; NA cases not thought through
wquantile < function(v,w=rep(1,length(v)),p=.5)
{
if ( !is.numeric(w)  length(v) != length(w) )
stop("Values and weights must be equallength vectors")
if ( !is.numeric(p)  any( p<0  p>1) )
stop("Quantiles must be 0<=p<=1")
if ( min(w) < 0 ) stop("Weights must be nonnegative numbers")
ranking < order(v)
sumw < cumsum(w[ranking])
plist < sumw / sumw[ length(sumw) ]
sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ] )
}
I would appreciate any comments people have on this  whether
correctness, efficiency, style, ....
s
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Stavros Macrakis <macrakis <at> alum.mit.edu> writes:
>
> Some minor improvements and corrections below
>
> # Simple weighted quantile
> #
> # v A vector of sortable observations
> # w A numeric vector of positive weights
> # p The quantile 0<=p<=1
> #
> # Nothing fancy: no interpolation etc.; NA cases not thought through
>
> wquantile < function(v,w=rep(1,length(v)),p=.5)
You could compare it with wtd.quantile in Hmisc.
Dieter
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Tue, 17 Feb 2009, Brigid Mooney wrote:
> Thanks for pointing me to the quantreg package as a resource. I was hoping
> to ask be able to address one quick followup question...
>
> I get slightly different variants between using the rq funciton with formula
> = mydata ~ 1 as I would if I ran the same data using the quantile function.
>
> Example:
>
> mydata < (1:10)^2/2
> pctile < seq(.59, .99, .1)
>
> quantile(mydata, pctile)
> 59% 69% 79% 89% 99%
> 20.015 26.075 32.935 40.595 49.145
>
> rq(mydata~1, tau=pctile)
> Call:
> rq(formula = mydata ~ 1, tau = pctile)
> Coefficients:
> tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
> (Intercept) 18 24.5 32 40.5 50
> Degrees of freedom: 10 total; 9 residual
>
> Is it correct to assume this is due to the different accepted methods of
> calculating quantiles?
If you try
lapply(1:9, function(i)quantile(mydata, pctile,type=i))
the answers from type=1 or 2 agree with rq().
thomas
Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

