how to replace my double for loop which is little efficient!

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

how to replace my double for loop which is little efficient!

Kevin Hao
Dear all,

My double for loop as follows, but it is little efficient, I hope all friends can give me a "vectorized" program to replace my code. thanks

x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
      if (i!=j) {
        S1<-sum(x[i,]*x[j,])
        S2<-sum(x[i,]^2)
        S3<-sum(x[j,]^2)
        sim2<-S1/(S2+S3-S1)
        diss2<-1-sim2
        diss.all<-diss.all+diss2}

it will cost a long time to finish this computation! i really need "rapid" code to replace my code.

thanks

kevin

Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Berend Hasselman
bbslover wrote
x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
      if (i!=j) {
        S1<-sum(x[i,]*x[j,])
        S2<-sum(x[i,]^2)
        S3<-sum(x[j,]^2)
        sim2<-S1/(S2+S3-S1)
        diss2<-1-sim2
        diss.all<-diss.all+diss2}

it will cost a long time to finish this computation! i really need "rapid" code to replace my code.
Alternative 1:  j-loop only needs to start at i+1 so

for( i in 1:num.compd) {
    for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
            S1<-sum(x[i,]*x[j,])
            S2<-sum(x[i,]^2)
            S3<-sum(x[j,]^2)
            sim2<-S1/(S2+S3-S1)
            diss2<-1-sim2
            diss2.all<-diss2.all+diss2
    }
}
diss2.all <- 2 * diss2.all

On my pc this is about twice as fast as your version (with 202 samples and 263 variables)

Alternative 2: all sum() are not necessary. Use some matrix algebra:

xtx <- x %*% t(x)
diss3.all <- 0
for( i in 1:num.compd) {
    for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {
            S1 <- xtx[i,j]
            S2 <- xtx[i,i]
            S3 <- xtx[j,j]
            sim2<-S1/(S2+S3-S1)
            diss2<-1-sim2
            diss3.all<-diss3.all+diss2
    }
}
diss3.all <- 2 * diss3.all

This is about four times as fast as alternative 1.

I'm quite sure that more expert R gurus can get some more speed up.

Note: I generated the x matrix with: set.seed(1);x<-matrix(runif(202*263),nrow=202)
(Timings on iMac 2.16Ghz and using 64-bit R)

Berend
Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

djmuseR
In reply to this post by Kevin Hao
Hi:


On Sun, Dec 26, 2010 at 4:18 AM, bbslover <[hidden email]> wrote:

>
> Dear all,
>
> My double for loop as follows, but it is little efficient, I hope all
> friends can give me a "vectorized" program to replace my code. thanks
>
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
>
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>   for (j in 1:num.compd)
>      if (i!=j) {
>

Isn't this just X'X?

>        S1<-sum(x[i,]*x[j,])
>
Aren't each of S2 and S3 just diag(X'X)?

>        S2<-sum(x[i,]^2)
>
       S3<-sum(x[j,]^2)
>        sim2<-S1/(S2+S3-S1)
>        diss2<-1-sim2
>        diss.all<-diss.all+diss2}
>

I tried
s1 <- crossprod(x)
s2 <- diag(s1)
s3 <-outer(s2, s2, '+') - s1
s1/s3

This yields a symmetric matrix with 1's along the diagonal and quantities
between 0 and 1 in the off-diagonal. Something like it could conceivably be
used as a similarity matrix. Is that what you're looking for with sim2?

I agree with Berend: it looks like a problem that could be easily solved
with some matrix algebra. R can do matrix algebra quite efficiently,
y'know...

(BTW, I tried this on a 1000 x 1000 input matrix:
system.time(myfunc(x))
   user  system elapsed
   0.99    0.02    1.02

I expect it could be improved by an order of magnitude if one actually knew
what you were computing... )

HTH,
Dennis

it will cost a long time to finish this computation! i really need "rapid"

> code to replace my code.
>
> thanks
>
> kevin
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Kevin Hao
In reply to this post by Berend Hasselman
thanks for your help, it is great. In addition, In the beginning, the format of x is dataframe, and i run my code, it is so slow, after your help, I change x for matirx, it is so quick. I am very grateful your kind help, and your code is so good!

kevin
Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Kevin Hao
In reply to this post by djmuseR
thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment:

the code form Berend can get the answer i want to get.

my_data.rar

Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Berend Hasselman
In reply to this post by djmuseR
djmuseR wrote
On Sun, Dec 26, 2010 at 4:18 AM, bbslover <dluthm@yeah.net> wrote:

>
> x: is a matrix  202*263,  that is 202 samples, and 263 independent
> variables
>
> num.compd<-nrow(x); # number of compounds
> diss.all<-0
> for( i in 1:num.compd)
>   for (j in 1:num.compd)
>      if (i!=j) {
>

Isn't this just X'X?

>        S1<-sum(x[i,]*x[j,])
>
Aren't each of S2 and S3 just diag(X'X)?

>        S2<-sum(x[i,]^2)
>
       S3<-sum(x[j,]^2)
>        sim2<-S1/(S2+S3-S1)
>        diss2<-1-sim2
>        diss.all<-diss.all+diss2}
>

I tried
s1 <- crossprod(x)
s2 <- diag(s1)
s3 <-outer(s2, s2, '+') - s1
s1/s3

This yields a symmetric matrix with 1's along the diagonal and quantities
between 0 and 1 in the off-diagonal. Something like it could conceivably be
used as a similarity matrix. Is that what you're looking for with sim2?

I agree with Berend: it looks like a problem that could be easily solved
with some matrix algebra. R can do matrix algebra quite efficiently,
y'know...

(BTW, I tried this on a 1000 x 1000 input matrix:
system.time(myfunc(x))
   user  system elapsed
   0.99    0.02    1.02

I expect it could be improved by an order of magnitude if one actually knew
what you were computing... )
I did some more work along Dennis' lines

xtx <- tcrossprod(x)
xtd <- diag(xtx)
xzz <- outer(xtd,xtd,'+')
zz  <- 1 - xtx/(xzz-xtx)
diss.all <- sum(zz)

this appears to give the desired result and it's quite a bit faster than my alternative 2.
It would indeed be nice to know what is being computed.

Berend
Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Berend Hasselman
In reply to this post by Kevin Hao
bbslover wrote
thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment:

the code form Berend can get the answer i want to get.
I've seen what you want to do.
OpenOffice mangled the equations so I used an online conversion to pdf.
Thanks.

See my previous post for the fastest version. It's just converting your formulas with some matrix algebra.

Berend
Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Berend Hasselman


Found this: http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29

and an R site search with "tanimoto" yielded some more interesting stuff.

The rest is up to you.

Berend
Reply | Threaded
Open this post in threaded view
|

Re: how to replace my double for loop which is little efficient!

Kevin Hao
In reply to this post by Berend Hasselman
Thank Berend,

It seems like that it is better to attach a PDF file for avoiding messy code.

Yes, I want to obtain is Tanimoto coefficient and your web site "wikipedia" is about this coefficient. I also search R site about tanimoto coefficient and learn it more.

About your code, I has saved and learned it.

Thank Berend and Dennis




Kevin