Please do the comparison with R-devel. That has a different policy,

including switching to quicksort when partial has more than 10 elements.

BTW you need to compare with quicksort (which suffices here and is quite a

lot faster than shellsort for large numeric vectors).

For 1 million+1 I am getting timings on my Windows laptop

sort 0.39

quicksort 0.24

quantile(50%) 0.13

quantile() 0.17

kuantile(50%) 0.16

kuantile() 0.17

(timings for a million are about 50% more in the last four lines as

roughly twice as many raw quantiles are needed.)

Note that there is never more than about a factor of three gain over

quicksort (remembering that quantile has some overhead of its own, so I

also timed a version of quantile always using quicksort, using about

0.34). Given how rare this is as a task for R, it would probably be quite

sufficient to always use quicksort.

On Sat, 11 Mar 2006, roger koenker wrote:

> Motivated by Deepayan's recent inquiries about the efficiency of the

> R 'quantile'

> function:

>

>

http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html>

http://tolstoy.newcastle.edu.au/R/devel/06/03/4358.html>

> I decided to try to revive an old project to implement a version of

> the Floyd

> and Rivest (1975) algorithm for finding quantiles with O(n)

> comparisons. I

> used to have a direct translation from FR's algol68 in my quantreg

> package,

> but was forced to remove it when it encountered g77 compilers that

> didn't like the way I had handled recursion.

>

> Fortunately, in the interim, I've corresponded with Krzysztof Kiwiel

> in Warsaw

> who has made a detailed study of the algorithm:

>

> K.C. Kiwiel: On Floyd and Rivest's SELECT Algorithm, Theoretical

> Computer Sci. 347 (2005) 214-238.

>

> He has also kindly agreed to allow me to incorporate his

> implementation of

> the FR algorithm in R under GPL.

>

> For the moment, I have made an alpha-test package with a single

> function --

> 'kuantile' -- that attempts to reproduce the functionality of the

> current

> base quantile function, essentially replacing the partial sorting done

> there with calls to Kiwiel's 'select' function. This package is

> available

> from:

>

>

http://www.econ.uiuc.edu/~roger/research/rq/kuantile>

> The good news is that the new function _is_ somewhat quicker than

> the base 'quantile' function. The following table is based on means

> of 10 replications. The first two columns report times for computing

> just the median, the next two columns for computing the five default

> quantiles: seq(0,1,.25), and the last column is the time needed for

> a call

> of sort().

>

> mean system.time()[1] for various calls*

>

> median only default 5

> n quantile kuantile quantile kuantile sort

>

> 100 0.002 0.001 0.004 0.004 0.000

> 1000 0.002 0.001 0.003 0.002 0.002

> 10000 0.003 0.002 0.009 0.005 0.005

> 1e+05 0.024 0.010 0.061 0.013 0.031

> 1e+06 0.206 0.120 0.535 0.142 0.502

> 1e+07 2.132 0.790 5.575 1.035 7.484

>

> * On a mac G5 running R 2.2.1.

>

> The bad news is that this approach doesn't help with Deepayan's

> original question which involved instances in which quantile() was

> being called for rather large number of probabilities. In such

> cases it seems that it is difficult to improve upon doing a full sort.

>

> I would welcome comments on any of this....

>

> url: www.econ.uiuc.edu/~roger Roger Koenker

> email:

[hidden email] Department of Economics

> vox: 217-333-4558 University of Illinois

> fax: 217-244-6678 Champaign, IL 61820

>

> ______________________________________________

>

[hidden email] mailing list

>

https://stat.ethz.ch/mailman/listinfo/r-devel>

>

--

Brian D. Ripley,

[hidden email]
Professor of Applied Statistics,

http://www.stats.ox.ac.uk/~ripley/University of Oxford, Tel: +44 1865 272861 (self)

1 South Parks Road, +44 1865 272866 (PA)

Oxford OX1 3TG, UK Fax: +44 1865 272595

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel