Quantcast

Fast Kendall's Tau

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Fast Kendall's Tau

Adler, Avraham
Hello.

Has any further action been taken regarding implementing David Simcha's fast Kendall tau code (now found in the package pcaPP as cor.fk) into R-base? It is literally hundreds of times faster, although I am uncertain as to whether he wrote code for testing the significance of the parameter. The last mention I have seen of this was in 2010 <https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>.

Thank you,

--Avraham Adler

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Fast Kendall's Tau

Duncan Murdoch-2
On 12-06-25 2:48 PM, Adler, Avraham wrote:
> Hello.
>
> Has any further action been taken regarding implementing David Simcha's fast Kendall tau code (now found in the package pcaPP as cor.fk) into R-base? It is literally hundreds of times faster, although I am uncertain as to whether he wrote code for testing the significance of the parameter. The last mention I have seen of this was in 2010<https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>.

You could check the NEWS file, but I don't remember anything being done
along these lines.  If the code is in a CRAN package, there doesn't seem
to be any need to move it to base R.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Fast Kendall's Tau

Prof Brian Ripley
On 26/06/2012 22:44, Duncan Murdoch wrote:

> On 12-06-25 2:48 PM, Adler, Avraham wrote:
>> Hello.
>>
>> Has any further action been taken regarding implementing David
>> Simcha's fast Kendall tau code (now found in the package pcaPP as
>> cor.fk) into R-base? It is literally hundreds of times faster,
>> although I am uncertain as to whether he wrote code for testing the
>> significance of the parameter. The last mention I have seen of this
>> was in
>> 2010<https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>.
>
> You could check the NEWS file, but I don't remember anything being done
> along these lines.  If the code is in a CRAN package, there doesn't seem
> to be any need to move it to base R.

In addition, this is something very specialized, and the code in R is
fast enough for all but the most unusual instances of that specialized
task.  example(cor.fk) shows the R implementation takes well under a
second for 2000 cases (a far higher value than is usual).

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Fast Kendall's tau

Terry Therneau-2
In reply to this post by Adler, Avraham
Note that the survConcordance function, which is equivalent to Kendall's
tau, also is O(n log n) and it does compute a variance.   The variance
is about 4/5 of the work.

Using R 2.15.0 on an older Linux box:
 > require(survival)
 > require(pcaPP)
 > tfun <- function(n) {
+     x <- 1:n + runif(n)*n
+     y <- 1:n
+     t1 <- system.time(cor.test(x,y, method="kendall"))
+     t2 <- system.time(cor.fk(x,y))
+     t3 <- system.time(survConcordance(Surv(y) ~ x))
+     rbind("cor.test"=t1, "cor.fk"=t2, "survConcordance"= t3)
+ }
 > tfun(1e2)
                 user.self sys.self elapsed user.child sys.child
cor.test            0.000        0   0.004          0         0
cor.fk              0.000        0   0.001          0         0
survConcordance     0.004        0   0.006          0         0

 > tfun(1e3)
                 user.self sys.self elapsed user.child sys.child
cor.test            0.024        0   0.026          0         0
cor.fk              0.000        0   0.000          0         0
survConcordance     0.004        0   0.004          0         0

 > tfun(1e4)
                 user.self sys.self elapsed user.child sys.child
cor.test            2.224    0.004   2.227          0         0
cor.fk              0.004    0.000   0.003          0         0
survConcordance     0.028    0.000   0.028          0         0

 > tfun(5e4)
                 user.self sys.self elapsed user.child sys.child
cor.test           55.551    0.008  55.574          0         0
cor.fk              0.016    0.000   0.018          0         0
survConcordance     0.204    0.016   0.221          0         0

I agree with Brian, especially since the Spearman and Kendall results
rarely (ever?) disagree on their main message for n>50.   At the very
most, one might add a footnote to the the help page for  cor.test
pointing to the faster codes.

Terry T.


Brian R wrote:

>> On 12-06-25 2:48 PM, Adler, Avraham wrote:
>>> Hello.
>>>
>>> Has any further action been taken regarding implementing David
>>> Simcha's fast Kendall tau code (now found in the package pcaPP as
>>> cor.fk) into R-base? It is literally hundreds of times faster,
>>> although I am uncertain as to whether he wrote code for testing the
>>> significance of the parameter. The last mention I have seen of this
>>> was in
>>> 2010<https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>.
>> You could check the NEWS file, but I don't remember anything being done
>> along these lines.  If the code is in a CRAN package, there doesn't seem
>> to be any need to move it to base R.
> In addition, this is something very specialized, and the code in R is
> fast enough for all but the most unusual instances of that specialized
> task.  example(cor.fk) shows the R implementation takes well under a
> second for 2000 cases (a far higher value than is usual).
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Fast Kendall's Tau

Adler, Avraham
In reply to this post by Prof Brian Ripley
> -----Original Message-----
> From: Prof Brian Ripley [mailto:[hidden email]]
> Sent: Wednesday, June 27, 2012 1:24 AM
> To: Duncan Murdoch
> Cc: Adler, Avraham; [hidden email]
> Subject: Re: [Rd] Fast Kendall's Tau
>
> On 26/06/2012 22:44, Duncan Murdoch wrote:
>> On 12-06-25 2:48 PM, Adler, Avraham wrote:
>>> Hello.
>>>
>>> Has any further action been taken regarding implementing David
>>> Simcha's fast Kendall tau code (now found in the package pcaPP as
>>> cor.fk) into R-base? It is literally hundreds of times faster,
>>> although I am uncertain as to whether he wrote code for testing the
>>> significance of the parameter. The last mention I have seen of this
>>> was in
>>> 2010<https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>.
>>
>> You could check the NEWS file, but I don't remember anything being
>> done along these lines.  If the code is in a CRAN package, there
>> doesn't seem to be any need to move it to base R.
>
> In addition, this is something very specialized, and the code in R is fast
> enough for all but the most unusual instances of that specialized task.
> example(cor.fk) shows the R implementation takes well under a second for 2000
> cases (a far higher value than is usual).

Thank you all very much for the replies. I was approaching the problem from the vantage point of trying to fit Archimedean copulas to events which come from non-elliptical distributions, and had a few hundred thousand data points. Not as bad as the authors of this paper, <http://vigna.dsi.unimi.it/ftp/papers/ParadoxicalPageRank.pdf> who needed to calculate Kendall's tau based on hundreds of millions of pairs(!). I wrote an implementation in VBA, and when I went to R to confirm my calculations, I was surprised to see that even my VBA code was probably hundreds of times as fast as R (on a vector of exactly 100,000 pairs). The implementation in pcaPP runs in a second or less on the same vector.

Perhaps, as was suggested in another e-mail, the least intrusive (and best bang-for-buck) option is to have the documentation/help of "cor" updated to refer to cor.fk so that more people can be made aware of the availability for those of us who have to deal with ungainly data sets.

Thank you again,

Avraham Adler
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Fast Kendall's tau

Adler, Avraham
In reply to this post by Terry Therneau-2
> From: Terry Therneau [mailto:[hidden email]]
> Sent: Wednesday, June 27, 2012 8:16 AM
> To: [hidden email]; [hidden email]; Adler, Avraham
> Subject: Re: Fast Kendall's tau
>
> Note that the survConcordance function, which is equivalent to Kendall's tau, also is O(n log n)
> and it does compute a variance.   The variance is about 4/5 of the work.

[snip]
       
> I agree with Brian, especially since the Spearman and Kendall results rarely (ever?) disagree
> on their main message for n>50.   At the very most, one might add a footnote to the the help page for
> cor.test pointing to the faster codes.
>
> Terry T.


Thank you, Terry, for pointing me to survConcordance. As for documentation, I agree that would probably be optimal in terms of reward for least amount of work/disruption.

Thank you again,

Avraham Adler

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Fast Kendall's Tau

dsimcha-2
In reply to this post by Duncan Murdoch-2
Sorry for the late reply.  I've been extremely busy, out of town, etc.
lately.  I wrote the core algorithm for this function back in 2010, but did
not make any attempt to integrate it into R-base, for several reasons:

1.  I was having difficulty figuring out what all the missing value options
(which I never use) are supposed to do and how to efficiently adapt my code
to them.

2.  The cor() function (IIRC; this is from memory from over a year ago)
implements Pearson, Spearman and Kendall very messily all in one function.
I felt that this could use some refactoring to make the integration of a
new Kendall algorithm sane, but didn't know the codebase well enough to do
it myself w/o a significant learning curve.

3.  Testing the integration would require setting up a build environment
for R.

Basically, I wanted to contribute this algorithm but didn't want to go
through the necessary learning curve to become a regular R-base
contributor.  I was hoping the integration work would be trivial to someone
who contributes to this codebase regularly.

On Tue, Jun 26, 2012 at 5:44 PM, Duncan Murdoch <[hidden email]>wrote:

> On 12-06-25 2:48 PM, Adler, Avraham wrote:
>
>> Hello.
>>
>> Has any further action been taken regarding implementing David Simcha's
>> fast Kendall tau code (now found in the package pcaPP as cor.fk) into
>> R-base? It is literally hundreds of times faster, although I am uncertain
>> as to whether he wrote code for testing the significance of the parameter.
>> The last mention I have seen of this was in 2010<https://stat.ethz.ch/**
>> pipermail/r-devel/2010-**February/056745.html<https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>
>> >.
>>
>
> You could check the NEWS file, but I don't remember anything being done
> along these lines.  If the code is in a CRAN package, there doesn't seem to
> be any need to move it to base R.
>
> Duncan Murdoch
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...