# how to replace my double for loop which is little efficient!

9 messages
Open this post in threaded view
|

## how to replace my double for loop which is little efficient!

 Dear all, My double for loop as follows, but it is little efficient, I hope all friends can give me a "vectorized" program to replace my code. thanks x: is a matrix  202*263,  that is 202 samples, and 263 independent variables num.compd<-nrow(x); # number of compounds diss.all<-0 for( i in 1:num.compd)    for (j in 1:num.compd)       if (i!=j) {         S1<-sum(x[i,]*x[j,])         S2<-sum(x[i,]^2)         S3<-sum(x[j,]^2)         sim2<-S1/(S2+S3-S1)         diss2<-1-sim2         diss.all<-diss.all+diss2} it will cost a long time to finish this computation! i really need "rapid" code to replace my code. thanks kevin
Open this post in threaded view
|

## Re: how to replace my double for loop which is little efficient!

 bbslover wrote x: is a matrix  202*263,  that is 202 samples, and 263 independent variables num.compd<-nrow(x); # number of compounds diss.all<-0 for( i in 1:num.compd)    for (j in 1:num.compd)       if (i!=j) {         S1<-sum(x[i,]*x[j,])         S2<-sum(x[i,]^2)         S3<-sum(x[j,]^2)         sim2<-S1/(S2+S3-S1)         diss2<-1-sim2         diss.all<-diss.all+diss2} it will cost a long time to finish this computation! i really need "rapid" code to replace my code. Alternative 1:  j-loop only needs to start at i+1 so for( i in 1:num.compd) {     for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {             S1<-sum(x[i,]*x[j,])             S2<-sum(x[i,]^2)             S3<-sum(x[j,]^2)             sim2<-S1/(S2+S3-S1)             diss2<-1-sim2             diss2.all<-diss2.all+diss2     } } diss2.all <- 2 * diss2.all On my pc this is about twice as fast as your version (with 202 samples and 263 variables) Alternative 2: all sum() are not necessary. Use some matrix algebra: xtx <- x %*% t(x) diss3.all <- 0 for( i in 1:num.compd) {     for (j in seq(from=i+1,to=num.compd,length.out=max(0,num.compd-i))) {             S1 <- xtx[i,j]             S2 <- xtx[i,i]             S3 <- xtx[j,j]             sim2<-S1/(S2+S3-S1)             diss2<-1-sim2             diss3.all<-diss3.all+diss2     } } diss3.all <- 2 * diss3.all This is about four times as fast as alternative 1. I'm quite sure that more expert R gurus can get some more speed up. Note: I generated the x matrix with: set.seed(1);x<-matrix(runif(202*263),nrow=202) (Timings on iMac 2.16Ghz and using 64-bit R) Berend
Open this post in threaded view
|

## Re: how to replace my double for loop which is little efficient!

 In reply to this post by Kevin Hao Hi: On Sun, Dec 26, 2010 at 4:18 AM, bbslover <[hidden email]> wrote: > > Dear all, > > My double for loop as follows, but it is little efficient, I hope all > friends can give me a "vectorized" program to replace my code. thanks > > x: is a matrix  202*263,  that is 202 samples, and 263 independent > variables > > num.compd<-nrow(x); # number of compounds > diss.all<-0 > for( i in 1:num.compd) >   for (j in 1:num.compd) >      if (i!=j) { > Isn't this just X'X? >        S1<-sum(x[i,]*x[j,]) > Aren't each of S2 and S3 just diag(X'X)? >        S2<-sum(x[i,]^2) >        S3<-sum(x[j,]^2) >        sim2<-S1/(S2+S3-S1) >        diss2<-1-sim2 >        diss.all<-diss.all+diss2} > I tried s1 <- crossprod(x) s2 <- diag(s1) s3 <-outer(s2, s2, '+') - s1 s1/s3 This yields a symmetric matrix with 1's along the diagonal and quantities between 0 and 1 in the off-diagonal. Something like it could conceivably be used as a similarity matrix. Is that what you're looking for with sim2? I agree with Berend: it looks like a problem that could be easily solved with some matrix algebra. R can do matrix algebra quite efficiently, y'know... (BTW, I tried this on a 1000 x 1000 input matrix: system.time(myfunc(x))    user  system elapsed    0.99    0.02    1.02 I expect it could be improved by an order of magnitude if one actually knew what you were computing... ) HTH, Dennis it will cost a long time to finish this computation! i really need "rapid" > code to replace my code. > > thanks > > kevin > > > -- > View this message in context: > http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: how to replace my double for loop which is little efficient!

 In reply to this post by Berend Hasselman thanks for your help, it is great. In addition, In the beginning, the format of x is dataframe, and i run my code, it is so slow, after your help, I change x for matirx, it is so quick. I am very grateful your kind help, and your code is so good! kevin
Open this post in threaded view
|

## Re: how to replace my double for loop which is little efficient!

 In reply to this post by djmuseR thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment: the code form Berend can get the answer i want to get. my_data.rar
Open this post in threaded view
|

## Re: how to replace my double for loop which is little efficient!

 In reply to this post by djmuseR djmuseR wrote On Sun, Dec 26, 2010 at 4:18 AM, bbslover  wrote: > > x: is a matrix  202*263,  that is 202 samples, and 263 independent > variables > > num.compd<-nrow(x); # number of compounds > diss.all<-0 > for( i in 1:num.compd) >   for (j in 1:num.compd) >      if (i!=j) { > Isn't this just X'X? >        S1<-sum(x[i,]*x[j,]) > Aren't each of S2 and S3 just diag(X'X)? >        S2<-sum(x[i,]^2) >        S3<-sum(x[j,]^2) >        sim2<-S1/(S2+S3-S1) >        diss2<-1-sim2 >        diss.all<-diss.all+diss2} > I tried s1 <- crossprod(x) s2 <- diag(s1) s3 <-outer(s2, s2, '+') - s1 s1/s3 This yields a symmetric matrix with 1's along the diagonal and quantities between 0 and 1 in the off-diagonal. Something like it could conceivably be used as a similarity matrix. Is that what you're looking for with sim2? I agree with Berend: it looks like a problem that could be easily solved with some matrix algebra. R can do matrix algebra quite efficiently, y'know... (BTW, I tried this on a 1000 x 1000 input matrix: system.time(myfunc(x))    user  system elapsed    0.99    0.02    1.02 I expect it could be improved by an order of magnitude if one actually knew what you were computing... ) I did some more work along Dennis' lines xtx <- tcrossprod(x) xtd <- diag(xtx) xzz <- outer(xtd,xtd,'+') zz  <- 1 - xtx/(xzz-xtx) diss.all <- sum(zz) this appears to give the desired result and it's quite a bit faster than my alternative 2. It would indeed be nice to know what is being computed. Berend
Open this post in threaded view
|

## Re: how to replace my double for loop which is little efficient!

 In reply to this post by Kevin Hao bbslover wrote thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment: the code form Berend can get the answer i want to get. I've seen what you want to do. OpenOffice mangled the equations so I used an online conversion to pdf. Thanks. See my previous post for the fastest version. It's just converting your formulas with some matrix algebra. Berend