dist like function but where you can configure the method

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

dist like function but where you can configure the method

Witold E Wolski
Looking for an  fast dist implementation
where I could pass my own dist function to the "method" parameter


i.e.

mydistfun = function(x,y){
 return(ks.test(x,y)$p.value)   #some mystique implementation
}


wow = dist(data,method=mydistfun)


thanks


--
Witold Eryk Wolski

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Jari Oksanen
Witold E Wolski <wewolski <at> gmail.com> writes:

>
> Looking for an  fast dist implementation
> where I could pass my own dist function to the "method" parameter
>
> i.e.
>
> mydistfun = function(x,y){
>  return(ks.test(x,y)$p.value)   #some mystique implementation
> }
>
> wow = dist(data,method=mydistfun)

I think it is best to write that function yourself.

The "dist" object is a vector corresponding to a lower triangle
(without the diagonal) of a symmetric matrix and with attributes.
The attributes are class which should be c("mydist", "dist"), Size
which is the length(x), Labels (optional) which are the
names of your items and if given, should have length(x),
call = match.call(), Diag = FALSE, Upper = FALSE and method name.
All you need is a vector with attributes.

All this will add very little overhead to your calculation, so
for all practical purposes this implementation is just as fast as
is your "mystique implementation" of pairwise distances. Your
example (ks.test()) probably would be pretty slow. If you can
vectorize your distance, it can be really fast, even if you
calculate the full symmetric matrix and throw away the diagonal and
upper triangle.

Cheers, Jari Oksanen

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Witold E Wolski
Dear Jari,

Thanks for your reply...

The overhead would be
2 for loops
for(i in 1:dim(x)[2])
for(j in i:dim(x)[2])

isn't it? Or are you seeing a different way to implement it?

A for loop is pretty expensive in R. Therefore I am looking for an
implementation similar to apply or lapply were the iteration is made
in native code.





On 16 May 2014 15:57, Jari Oksanen <[hidden email]> wrote:

> Witold E Wolski <wewolski <at> gmail.com> writes:
>
>>
>> Looking for an  fast dist implementation
>> where I could pass my own dist function to the "method" parameter
>>
>> i.e.
>>
>> mydistfun = function(x,y){
>>  return(ks.test(x,y)$p.value)   #some mystique implementation
>> }
>>
>> wow = dist(data,method=mydistfun)
>
> I think it is best to write that function yourself.
>
> The "dist" object is a vector corresponding to a lower triangle
> (without the diagonal) of a symmetric matrix and with attributes.
> The attributes are class which should be c("mydist", "dist"), Size
> which is the length(x), Labels (optional) which are the
> names of your items and if given, should have length(x),
> call = match.call(), Diag = FALSE, Upper = FALSE and method name.
> All you need is a vector with attributes.
>
> All this will add very little overhead to your calculation, so
> for all practical purposes this implementation is just as fast as
> is your "mystique implementation" of pairwise distances. Your
> example (ks.test()) probably would be pretty slow. If you can
> vectorize your distance, it can be really fast, even if you
> calculate the full symmetric matrix and throw away the diagonal and
> upper triangle.
>
> Cheers, Jari Oksanen
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Witold Eryk Wolski

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

barry rowlingson
In reply to this post by Jari Oksanen
On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]> wrote:

> Dear Jari,
>
> Thanks for your reply...
>
> The overhead would be
> 2 for loops
> for(i in 1:dim(x)[2])
> for(j in i:dim(x)[2])
>
> isn't it? Or are you seeing a different way to implement it?
>
> A for loop is pretty expensive in R. Therefore I am looking for an
> implementation similar to apply or lapply were the iteration is made
> in native code.

No, a for loop is not pretty expensive in R -- at least not compared
to doing a k-s test:

 > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
   user  system elapsed
  3.680   0.012   3.697

 3.68 seconds to do 10000 ks tests (and generate 200 runifs)

 > system.time(for(i in 1:10000){})
   user  system elapsed
  0.000   0.000   0.001

 0.000s time to do 10000 loops. Oh lets nest it for fun:

 > system.time(for(i in 1:100){for(i in 1:100){ks.test(runif(100),runif(100))}})
   user  system elapsed
  3.692   0.004   3.701

 no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

Moral: don't worry about the for loops.

Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Bert Gunter
Yes, ... and further

apply-type functions still have to loop at the interpreter level, and
generally take about the same time as their translation to for loops
(with suitable caveats for this kind of vague assertion). Their chief
advantage is readability and adherence to R's functional paradigm
(again with suitable caveats).

Alternatively, byte code compilation with the compiler package **may**
(significantly) improve speed, but it very much depends ...

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Fri, May 16, 2014 at 9:12 AM, Barry Rowlingson
<[hidden email]> wrote:

> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]> wrote:
>> Dear Jari,
>>
>> Thanks for your reply...
>>
>> The overhead would be
>> 2 for loops
>> for(i in 1:dim(x)[2])
>> for(j in i:dim(x)[2])
>>
>> isn't it? Or are you seeing a different way to implement it?
>>
>> A for loop is pretty expensive in R. Therefore I am looking for an
>> implementation similar to apply or lapply were the iteration is made
>> in native code.
>
> No, a for loop is not pretty expensive in R -- at least not compared
> to doing a k-s test:
>
>  > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>    user  system elapsed
>   3.680   0.012   3.697
>
>  3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>
>  > system.time(for(i in 1:10000){})
>    user  system elapsed
>   0.000   0.000   0.001
>
>  0.000s time to do 10000 loops. Oh lets nest it for fun:
>
>  > system.time(for(i in 1:100){for(i in 1:100){ks.test(runif(100),runif(100))}})
>    user  system elapsed
>   3.692   0.004   3.701
>
>  no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>
> Moral: don't worry about the for loops.
>
> Barry
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Jari Oksanen
In reply to this post by Witold E Wolski
I did not regard the loops as the overhead but a part of the process. Overhead is setting attributes. The loop is not so very expensive compared to ks.test(). You can always replace the loop with an apply on the vector of indices, but about the only way to speed up calculations is to use parallel processing (parLapply, parSapply, parRapply functions of the parallel processing.

I wrote about vectorization: that would be faster, but it cannot be done blindly to just "any function", but you must deconstruct the function to see if it can decomposed into operations of vectors. In vegan:::designdist we do that for some function types, but you really must *think* about the function you are using to know if you can write it in vectorized form. It is not automatic.

Cheers, Jari Oksanen
On 16/05/2014, at 18:46 PM, Witold E Wolski wrote:

> Dear Jari,
>
> Thanks for your reply...
>
> The overhead would be
> 2 for loops
> for(i in 1:dim(x)[2])
> for(j in i:dim(x)[2])
>
> isn't it? Or are you seeing a different way to implement it?
>
> A for loop is pretty expensive in R. Therefore I am looking for an
> implementation similar to apply or lapply were the iteration is made
> in native code.
>
>
>
>
>
> On 16 May 2014 15:57, Jari Oksanen <[hidden email]> wrote:
>> Witold E Wolski <wewolski <at> gmail.com> writes:
>>
>>>
>>> Looking for an  fast dist implementation
>>> where I could pass my own dist function to the "method" parameter
>>>
>>> i.e.
>>>
>>> mydistfun = function(x,y){
>>> return(ks.test(x,y)$p.value)   #some mystique implementation
>>> }
>>>
>>> wow = dist(data,method=mydistfun)
>>
>> I think it is best to write that function yourself.
>>
>> The "dist" object is a vector corresponding to a lower triangle
>> (without the diagonal) of a symmetric matrix and with attributes.
>> The attributes are class which should be c("mydist", "dist"), Size
>> which is the length(x), Labels (optional) which are the
>> names of your items and if given, should have length(x),
>> call = match.call(), Diag = FALSE, Upper = FALSE and method name.
>> All you need is a vector with attributes.
>>
>> All this will add very little overhead to your calculation, so
>> for all practical purposes this implementation is just as fast as
>> is your "mystique implementation" of pairwise distances. Your
>> example (ks.test()) probably would be pretty slow. If you can
>> vectorize your distance, it can be really fast, even if you
>> calculate the full symmetric matrix and throw away the diagonal and
>> upper triangle.
>>
>> Cheers, Jari Oksanen
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Witold Eryk Wolski

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Rui Barradas
In reply to this post by barry rowlingson
Hello,

The compiler package is good at speeding up for loops but in this case
the gain is neglectable. The ks test is the real time problem.

library(compiler)

f1 <- function(n){
        for(i in 1:100){
                for(i in 1:100){
                        ks.test(runif(100),runif(100))
                }
        }
}

f1.c <- cmpfun(f1)

system.time(f1())
    user  system elapsed
    3.50    0.00    3.53
system.time(f1.c())
    user  system elapsed
    3.47    0.00    3.48


Rui Barradas

Em 16-05-2014 17:12, Barry Rowlingson escreveu:

> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]> wrote:
>> Dear Jari,
>>
>> Thanks for your reply...
>>
>> The overhead would be
>> 2 for loops
>> for(i in 1:dim(x)[2])
>> for(j in i:dim(x)[2])
>>
>> isn't it? Or are you seeing a different way to implement it?
>>
>> A for loop is pretty expensive in R. Therefore I am looking for an
>> implementation similar to apply or lapply were the iteration is made
>> in native code.
>
> No, a for loop is not pretty expensive in R -- at least not compared
> to doing a k-s test:
>
>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>     user  system elapsed
>    3.680   0.012   3.697
>
>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>
>   > system.time(for(i in 1:10000){})
>     user  system elapsed
>    0.000   0.000   0.001
>
>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>
>   > system.time(for(i in 1:100){for(i in 1:100){ks.test(runif(100),runif(100))}})
>     user  system elapsed
>    3.692   0.004   3.701
>
>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>
> Moral: don't worry about the for loops.
>
> Barry
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Witold E Wolski
Ouch,

First : my question was not how to implement dist but if there is a
more generic dist function than stats:dist.

Secondly: ks.test is ment as a placeholder (see the comment in the
code I did send) for any other function taking two vector arguments.

Third: I do subscribe to the idea that a function call is easier to
read and understand than a for loop. @Bert apply is a native C
function and the loop is not interpreted AFAIK

@Rui @Barry @Jari What do you benchmark? an empty loop?

Look at the trivial benchmarks below: _apply_ clearly outperforms a
for loop in R , It always has, it outperforms even an empty for

# an empty unrealistic for loop as suggested by Rui , Barry and Jari
f1 <- function(n){
  for(i in 1:n){
    for(j in 1:n){
    }
  }}


myfunc = function(x,y=x){x-y}

# a for loop which does actually something
f2 <- function(n){
  mm <- matrix(0,ncol=n,nrow=n)
  for(i in 1:n){
    for(j in 1:n){
      mm[i,j] = myfunc(i,j)
    }
  }
  return(mm)
}

# and array
f3 = function(n){
  res = rep(0,n*n)
  for(i in 1:(n*n))
  {
    res[i] = myfunc(i)
  }
}


n = 1000
system.time(f1(n))
system.time(f2(n))
system.time(f3(n))
system.time(apply(t(1:(n*n)),1,myfunc))


> system.time(f1(n))
       User      System verstrichen
       0.28        0.00        0.28
> system.time(f2(n))
       User      System verstrichen
       6.80        0.00        7.09
> system.time(f3(n))
       User      System verstrichen
       5.83        0.00        5.98
> system.time(apply(t(1:(n*n)),1,myfunc))
       User      System verstrichen
       0.19        0.00        0.19






On 16 May 2014 20:55, Rui Barradas <[hidden email]> wrote:

> Hello,
>
> The compiler package is good at speeding up for loops but in this case the
> gain is neglectable. The ks test is the real time problem.
>
> library(compiler)
>
> f1 <- function(n){
>
>         for(i in 1:100){
>                 for(i in 1:100){
>                         ks.test(runif(100),runif(100))
>                 }
>         }
> }
>
> f1.c <- cmpfun(f1)
>
> system.time(f1())
>    user  system elapsed
>    3.50    0.00    3.53
> system.time(f1.c())
>    user  system elapsed
>    3.47    0.00    3.48
>
>
> Rui Barradas
>
> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>
>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]>
>> wrote:
>>>
>>> Dear Jari,
>>>
>>> Thanks for your reply...
>>>
>>> The overhead would be
>>> 2 for loops
>>> for(i in 1:dim(x)[2])
>>> for(j in i:dim(x)[2])
>>>
>>> isn't it? Or are you seeing a different way to implement it?
>>>
>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>> implementation similar to apply or lapply were the iteration is made
>>> in native code.
>>
>>
>> No, a for loop is not pretty expensive in R -- at least not compared
>> to doing a k-s test:
>>
>>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>>     user  system elapsed
>>    3.680   0.012   3.697
>>
>>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>
>>   > system.time(for(i in 1:10000){})
>>     user  system elapsed
>>    0.000   0.000   0.001
>>
>>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>>
>>   > system.time(for(i in 1:100){for(i in
>> 1:100){ks.test(runif(100),runif(100))}})
>>     user  system elapsed
>>    3.692   0.004   3.701
>>
>>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>
>> Moral: don't worry about the for loops.
>>
>> Barry
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



--
Witold Eryk Wolski

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

William Dunlap
>> system.time(apply(t(1:(n*n)),1,myfunc))
>        User      System verstrichen
>        0.19        0.00        0.19

That calls 'myfunc' exactly once:

> system.time(apply(t(1:(3*3)), 1, print))
[1] 1 2 3 4 5 6 7 8 9
   user  system elapsed
      0       0       0


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, May 16, 2014 at 1:00 PM, Witold E Wolski <[hidden email]> wrote:

> Ouch,
>
> First : my question was not how to implement dist but if there is a
> more generic dist function than stats:dist.
>
> Secondly: ks.test is ment as a placeholder (see the comment in the
> code I did send) for any other function taking two vector arguments.
>
> Third: I do subscribe to the idea that a function call is easier to
> read and understand than a for loop. @Bert apply is a native C
> function and the loop is not interpreted AFAIK
>
> @Rui @Barry @Jari What do you benchmark? an empty loop?
>
> Look at the trivial benchmarks below: _apply_ clearly outperforms a
> for loop in R , It always has, it outperforms even an empty for
>
> # an empty unrealistic for loop as suggested by Rui , Barry and Jari
> f1 <- function(n){
>   for(i in 1:n){
>     for(j in 1:n){
>     }
>   }}
>
>
> myfunc = function(x,y=x){x-y}
>
> # a for loop which does actually something
> f2 <- function(n){
>   mm <- matrix(0,ncol=n,nrow=n)
>   for(i in 1:n){
>     for(j in 1:n){
>       mm[i,j] = myfunc(i,j)
>     }
>   }
>   return(mm)
> }
>
> # and array
> f3 = function(n){
>   res = rep(0,n*n)
>   for(i in 1:(n*n))
>   {
>     res[i] = myfunc(i)
>   }
> }
>
>
> n = 1000
> system.time(f1(n))
> system.time(f2(n))
> system.time(f3(n))
> system.time(apply(t(1:(n*n)),1,myfunc))
>
>
>> system.time(f1(n))
>        User      System verstrichen
>        0.28        0.00        0.28
>> system.time(f2(n))
>        User      System verstrichen
>        6.80        0.00        7.09
>> system.time(f3(n))
>        User      System verstrichen
>        5.83        0.00        5.98
>> system.time(apply(t(1:(n*n)),1,myfunc))
>        User      System verstrichen
>        0.19        0.00        0.19
>
>
>
>
>
>
> On 16 May 2014 20:55, Rui Barradas <[hidden email]> wrote:
>> Hello,
>>
>> The compiler package is good at speeding up for loops but in this case the
>> gain is neglectable. The ks test is the real time problem.
>>
>> library(compiler)
>>
>> f1 <- function(n){
>>
>>         for(i in 1:100){
>>                 for(i in 1:100){
>>                         ks.test(runif(100),runif(100))
>>                 }
>>         }
>> }
>>
>> f1.c <- cmpfun(f1)
>>
>> system.time(f1())
>>    user  system elapsed
>>    3.50    0.00    3.53
>> system.time(f1.c())
>>    user  system elapsed
>>    3.47    0.00    3.48
>>
>>
>> Rui Barradas
>>
>> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>>
>>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]>
>>> wrote:
>>>>
>>>> Dear Jari,
>>>>
>>>> Thanks for your reply...
>>>>
>>>> The overhead would be
>>>> 2 for loops
>>>> for(i in 1:dim(x)[2])
>>>> for(j in i:dim(x)[2])
>>>>
>>>> isn't it? Or are you seeing a different way to implement it?
>>>>
>>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>>> implementation similar to apply or lapply were the iteration is made
>>>> in native code.
>>>
>>>
>>> No, a for loop is not pretty expensive in R -- at least not compared
>>> to doing a k-s test:
>>>
>>>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>>>     user  system elapsed
>>>    3.680   0.012   3.697
>>>
>>>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>>
>>>   > system.time(for(i in 1:10000){})
>>>     user  system elapsed
>>>    0.000   0.000   0.001
>>>
>>>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>>>
>>>   > system.time(for(i in 1:100){for(i in
>>> 1:100){ks.test(runif(100),runif(100))}})
>>>     user  system elapsed
>>>    3.692   0.004   3.701
>>>
>>>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>>
>>> Moral: don't worry about the for loops.
>>>
>>> Barry
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>
> --
> Witold Eryk Wolski
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

Bert Gunter
In reply to this post by Witold E Wolski
If the apply() call is not empty, its contents must of course be
interpreted. That's where the time goes.

>system.time(for(i in 1:1e6)rnorm(1))
   user  system elapsed
   5.25    0.00    5.29

> system.time(lapply(1:1e6,rnorm,n=1))
   user  system elapsed
   9.64    0.01    9.72

> system.time(vapply(1:1e6,rnorm,FUN.VALUE=0,n=1))
   user  system elapsed
   5.69    0.00    5.73


I rest my case.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Fri, May 16, 2014 at 1:00 PM, Witold E Wolski <[hidden email]> wrote:

> Ouch,
>
> First : my question was not how to implement dist but if there is a
> more generic dist function than stats:dist.
>
> Secondly: ks.test is ment as a placeholder (see the comment in the
> code I did send) for any other function taking two vector arguments.
>
> Third: I do subscribe to the idea that a function call is easier to
> read and understand than a for loop. @Bert apply is a native C
> function and the loop is not interpreted AFAIK
>
> @Rui @Barry @Jari What do you benchmark? an empty loop?
>
> Look at the trivial benchmarks below: _apply_ clearly outperforms a
> for loop in R , It always has, it outperforms even an empty for
>
> # an empty unrealistic for loop as suggested by Rui , Barry and Jari
> f1 <- function(n){
>   for(i in 1:n){
>     for(j in 1:n){
>     }
>   }}
>
>
> myfunc = function(x,y=x){x-y}
>
> # a for loop which does actually something
> f2 <- function(n){
>   mm <- matrix(0,ncol=n,nrow=n)
>   for(i in 1:n){
>     for(j in 1:n){
>       mm[i,j] = myfunc(i,j)
>     }
>   }
>   return(mm)
> }
>
> # and array
> f3 = function(n){
>   res = rep(0,n*n)
>   for(i in 1:(n*n))
>   {
>     res[i] = myfunc(i)
>   }
> }
>
>
> n = 1000
> system.time(f1(n))
> system.time(f2(n))
> system.time(f3(n))
> system.time(apply(t(1:(n*n)),1,myfunc))
>
>
>> system.time(f1(n))
>        User      System verstrichen
>        0.28        0.00        0.28
>> system.time(f2(n))
>        User      System verstrichen
>        6.80        0.00        7.09
>> system.time(f3(n))
>        User      System verstrichen
>        5.83        0.00        5.98
>> system.time(apply(t(1:(n*n)),1,myfunc))
>        User      System verstrichen
>        0.19        0.00        0.19
>
>
>
>
>
>
> On 16 May 2014 20:55, Rui Barradas <[hidden email]> wrote:
>> Hello,
>>
>> The compiler package is good at speeding up for loops but in this case the
>> gain is neglectable. The ks test is the real time problem.
>>
>> library(compiler)
>>
>> f1 <- function(n){
>>
>>         for(i in 1:100){
>>                 for(i in 1:100){
>>                         ks.test(runif(100),runif(100))
>>                 }
>>         }
>> }
>>
>> f1.c <- cmpfun(f1)
>>
>> system.time(f1())
>>    user  system elapsed
>>    3.50    0.00    3.53
>> system.time(f1.c())
>>    user  system elapsed
>>    3.47    0.00    3.48
>>
>>
>> Rui Barradas
>>
>> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>>
>>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]>
>>> wrote:
>>>>
>>>> Dear Jari,
>>>>
>>>> Thanks for your reply...
>>>>
>>>> The overhead would be
>>>> 2 for loops
>>>> for(i in 1:dim(x)[2])
>>>> for(j in i:dim(x)[2])
>>>>
>>>> isn't it? Or are you seeing a different way to implement it?
>>>>
>>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>>> implementation similar to apply or lapply were the iteration is made
>>>> in native code.
>>>
>>>
>>> No, a for loop is not pretty expensive in R -- at least not compared
>>> to doing a k-s test:
>>>
>>>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>>>     user  system elapsed
>>>    3.680   0.012   3.697
>>>
>>>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>>
>>>   > system.time(for(i in 1:10000){})
>>>     user  system elapsed
>>>    0.000   0.000   0.001
>>>
>>>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>>>
>>>   > system.time(for(i in 1:100){for(i in
>>> 1:100){ks.test(runif(100),runif(100))}})
>>>     user  system elapsed
>>>    3.692   0.004   3.701
>>>
>>>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>>
>>> Moral: don't worry about the for loops.
>>>
>>> Barry
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>
> --
> Witold Eryk Wolski
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dist like function but where you can configure the method

David Carlson
In reply to this post by Witold E Wolski
Function designdist() in package vegan lets you define your own distance measure, but it does not let you simply provide a function as your original request indicated. Function distance() in package ecodist() indicates that it is written to make it simple to add new distance functions, but warns that it is not efficient for large matrices.

David Carlson

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Witold E Wolski
Sent: Friday, May 16, 2014 3:00 PM
To: Rui Barradas
Cc: Jari Oksanen; [hidden email]; Barry Rowlingson
Subject: Re: [R] dist like function but where you can configure the method

Ouch,

First : my question was not how to implement dist but if there is a more generic dist function than stats:dist.

Secondly: ks.test is ment as a placeholder (see the comment in the code I did send) for any other function taking two vector arguments.

Third: I do subscribe to the idea that a function call is easier to read and understand than a for loop. @Bert apply is a native C function and the loop is not interpreted AFAIK

@Rui @Barry @Jari What do you benchmark? an empty loop?

Look at the trivial benchmarks below: _apply_ clearly outperforms a for loop in R , It always has, it outperforms even an empty for

# an empty unrealistic for loop as suggested by Rui , Barry and Jari
f1 <- function(n){
  for(i in 1:n){
    for(j in 1:n){
    }
  }}


myfunc = function(x,y=x){x-y}

# a for loop which does actually something
f2 <- function(n){
  mm <- matrix(0,ncol=n,nrow=n)
  for(i in 1:n){
    for(j in 1:n){
      mm[i,j] = myfunc(i,j)
    }
  }
  return(mm)
}

# and array
f3 = function(n){
  res = rep(0,n*n)
  for(i in 1:(n*n))
  {
    res[i] = myfunc(i)
  }
}


n = 1000
system.time(f1(n))
system.time(f2(n))
system.time(f3(n))
system.time(apply(t(1:(n*n)),1,myfunc))


> system.time(f1(n))
       User      System verstrichen
       0.28        0.00        0.28
> system.time(f2(n))
       User      System verstrichen
       6.80        0.00        7.09
> system.time(f3(n))
       User      System verstrichen
       5.83        0.00        5.98
> system.time(apply(t(1:(n*n)),1,myfunc))
       User      System verstrichen
       0.19        0.00        0.19






On 16 May 2014 20:55, Rui Barradas <[hidden email]> wrote:

> Hello,
>
> The compiler package is good at speeding up for loops but in this case
> the gain is neglectable. The ks test is the real time problem.
>
> library(compiler)
>
> f1 <- function(n){
>
>         for(i in 1:100){
>                 for(i in 1:100){
>                         ks.test(runif(100),runif(100))
>                 }
>         }
> }
>
> f1.c <- cmpfun(f1)
>
> system.time(f1())
>    user  system elapsed
>    3.50    0.00    3.53
> system.time(f1.c())
>    user  system elapsed
>    3.47    0.00    3.48
>
>
> Rui Barradas
>
> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>
>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <[hidden email]>
>> wrote:
>>>
>>> Dear Jari,
>>>
>>> Thanks for your reply...
>>>
>>> The overhead would be
>>> 2 for loops
>>> for(i in 1:dim(x)[2])
>>> for(j in i:dim(x)[2])
>>>
>>> isn't it? Or are you seeing a different way to implement it?
>>>
>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>> implementation similar to apply or lapply were the iteration is made
>>> in native code.
>>
>>
>> No, a for loop is not pretty expensive in R -- at least not compared
>> to doing a k-s test:
>>
>>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>>     user  system elapsed
>>    3.680   0.012   3.697
>>
>>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>
>>   > system.time(for(i in 1:10000){})
>>     user  system elapsed
>>    0.000   0.000   0.001
>>
>>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>>
>>   > system.time(for(i in 1:100){for(i in
>> 1:100){ks.test(runif(100),runif(100))}})
>>     user  system elapsed
>>    3.692   0.004   3.701
>>
>>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>
>> Moral: don't worry about the for loops.
>>
>> Barry
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



--
Witold Eryk Wolski

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.