|
This post was updated on .
Hi R developers,
I am new to this forum and hope someone can help me with .Call in R. Greatly appreciate any help! Say, I have a vector called "vecA" of length 10000, I generate a vector called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA and vecR are of double type. I want to replace elements in vecA by elements in vecR only if sum of elements in vecR is greater than or equal to 5000. Otherwise, vecR remains unchanged. This is easy to do in R, which reads vecA<-something; vecR<-runif(10000); if (sum(vecR)>=5000){ vecA<-vecR; } Now my question is, if I am going to do the same thing in R using .Call, how can I achieve it in a more efficient way (i.e. less computation time compared with pure R code above.)? My c code (called "change_vecA.c") using .Call looks like this: SEXP change_vecA(SEXP vecA){ int i,vecA_len; double sum,*res_ptr,*vecR_ptr,*vecA_ptr; vecA_ptr=REAL(vecA); vecA_len=length(vecA); SEXP res_vec,vecR; PROTECT(res_vec=allocVector(REALSXP, vecA_len)); PROTECT(vecR=allocVector(REALSXP, vecA_len)); res_ptr=REAL(res_vec); vecR_ptr=REAL(vecR); GetRNGstate(); sum=0.0; for (i=0;i<vecA_len;i++){ vecR_ptr[i]=runif(0,1); sum+=vecR_ptr[i]; } if (sum>=5000){ /*copy vecR to the vector to be returned*/ for (i=0;i<vecA_len;i++){ res_ptr[i]=vecR_ptr[i]; } } else{ /*copy vecA to the vector to be returned*/ for (i=0;i<vecA_len;i++){ res_ptr[i]=vecA_ptr[i]; } } PutRNGstate(); UNPROTECT(2); resturn(res_vec); } My R wrapper function is change_vecA<-function(vecA){ dyn.load("change_vecA.so"); .Call("change_vecA",vecA); } Now my question is, due to two loops (one generates the random vector and one determines the vector to be returned), can .Call still be faster than pure R code (only one loop to copy vecR to vecA given condition is met)? Or, how can I improve my c code to avoid redundant loops if any? My biggest concern is if vecA is large (say of length 1000000 or even bigger), loops in C code can slow things down. Thanks for any help! |
|
On 17 November 2011 at 09:09, Raymond wrote: | Hi R developers, | | I am new to this forum and hope someone can help me with .Call in R. | Greatly appreciate any help! | | Say, I have a vector called "vecA" of length 10000, I generate a vector | called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA | and vecR are of double type. I want to replace elements vecA by elements in | vecR only if sum of elements in vecR is greater than or equal to 5000. | Otherwise, vecR remain unchanged. This is easy to do in R, which reads | vecA<-something; | vecR<-runif(10000); | if (sum(vecR)>=5000)){ | vecA<-vecR; | } | | | Now my question is, if I am going to do the same thing in R using .Call. | How can I achieve it in a more efficient way (i.e. less computation time | compared with pure R code above.). My c code (called "change_vecA.c") using | .Call is like this: Here is my take on it, using about the same number of commands in C++ thanks to Rcpp and its vectorised sum() and runif() commands (which mimick the R commands): R> library(inline) R> library(Rcpp) R> R> set.seed(42) # fix RNG seed R> vecA <- rt(10000, 6) # 'something' in vecA: t-dist with 6 df R> R> fun <- cxxfunction(signature(va="numeric"), # pass in a vector + plugin="Rcpp", # use Rcpp, and code below + body=' + + Rcpp::NumericVector vA(va); + Rcpp::RNGScope tmp; // make sure RNG is set up + Rcpp::NumericVector vR = runif(10000); // 10k of a U(0,1) + + if (sum(vR) >= 5000) { // sum is an Rcpp sugar op. + vA = vR; // swap vR into vA + } + return(vA); // return vA + ') R> R> sum( fun( vecA ) ) [1] 5033 R> sum( fun( vecA ) ) [1] 5015 R> sum( fun( vecA ) ) [1] 66 R> sum( fun( vecA ) ) [1] 66 R> sum( fun( vecA ) ) [1] 5015 R> sum( fun( vecA ) ) [1] 5024 R> sum( fun( vecA ) ) [1] 5020 R> R> sum(vecA) [1] 66 R> You can learn about Rcpp from the vignettes in the package, at my page at http://dirk.eddelbuettel.com/code/rcpp.html as well different posts on my blog, and of course the rcpp-devel mailing list. The example above uses cxxfunction() from the wonderful inline package you may find useful too as it compiles, links and loads your C or C++ snippets. Dirk -- "Outside of a dog, a book is a man's best friend. Inside of a dog, it is too dark to read." -- Groucho Marx ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Raymond
Hi,
A probably very naive remark, but I believe that the probability of sum( runif(10000) ) >= 50000 is exactly 0.5. So why not just test that, and generate the uniform values only if needed ? Karl Forner On Thu, Nov 17, 2011 at 6:09 PM, Raymond <[hidden email]> wrote: > Hi R developers, > > I am new to this forum and hope someone can help me with .Call in R. > Greatly appreciate any help! > > Say, I have a vector called "vecA" of length 10000, I generate a vector > called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA > and vecR are of double type. I want to replace elements vecA by elements in > vecR only if sum of elements in vecR is greater than or equal to 5000. > Otherwise, vecR remain unchanged. This is easy to do in R, which reads > vecA<-something; > vecR<-runif(10000); > if (sum(vecR)>=5000)){ > vecA<-vecR; > } > > > Now my question is, if I am going to do the same thing in R using .Call. > How can I achieve it in a more efficient way (i.e. less computation time > compared with pure R code above.). My c code (called "change_vecA.c") > using > .Call is like this: > > SEXP change_vecA(SEXP vecA){ > int i,vecA_len; > double sum,*res_ptr,*vecR_ptr,*vecA_ptr; > > vecA_ptr=REAL(vecA); > vecA_len=length(vecA); > SEXP res_vec,vecR; > > PROTECT(res_vec=allocVector(REALSXP, vec_len)); > PROTECT(vecR=allocVector(REALSXP, vec_len)); > res_ptr=REAL(res_vec); > vecR_ptr=REAL(vecR); > GetRNGstate(); > sum=0.0; > for (i=0;i<vecA_len;i++){ > vecR_ptr[i]=runif(0,1); > sum+=vecR_ptr[i]; > } > if (sum>=5000){ > /*copy vecR to the vector to be returned*/ > for (i=0;i<vecA_len;i++){ > res_ptr[i]=vecR_ptr[i]; > } > } > else{ > /*copy vecA to the vector to be returned*/ > for (i=0;i<vecA_len;i++){ > res_ptr[i]=vecA_ptr[i]; > } > } > > PutRNGstate(); > UNPROTECT(2); > resturn(res); > } > My R wrapper function is > change_vecA<-function(vecA){ > dyn.load("change_vecA.so"); > .Call("change_vecA",vecA); > } > > Now my question is, due to two loops (one generates the random > vector and one determines the vector to be returned), can .Call still be > faster than pure R code (only one loop to copy vecR to vecA given condition > is met)? Or, how can I improve my c code to avoid redundant loops if any. > My > concern is if vecA is large (say of length 1000000 or even bigger), loops > in > C code can slow things down. Thanks for any help! > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
On 11/18/2011 07:08 AM, Karl Forner wrote:
> Hi, > > A probably very naive remark, but I believe that the probability of sum( > runif(10000) )>= 50000 is exactly 0.5. So why not just test that, and > generate the uniform values only if needed ? My thought as well, but actually the deviates need to have mean > .5 so you'd do something like repeat { vecA <- runif(10000) if (mean(vecA) > .5) break } You'd do this 1/2 the time, and you'd have to itearte on average 1 / (1/2) = 2 times before getting the vector satisfying the constraint, so the expected number of iterations is 1/2 * 2 = 1, the same as in the original implementation! It does suggest that there is only one allocation required, if this were coded at the C level. But since sum(), mean(), and runif() all go more or less directly to C anyway it doesn't seem like this is the right problem for a C solution. Martin > > > Karl Forner > > On Thu, Nov 17, 2011 at 6:09 PM, Raymond<[hidden email]> wrote: > >> Hi R developers, >> >> I am new to this forum and hope someone can help me with .Call in R. >> Greatly appreciate any help! >> >> Say, I have a vector called "vecA" of length 10000, I generate a vector >> called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA >> and vecR are of double type. I want to replace elements vecA by elements in >> vecR only if sum of elements in vecR is greater than or equal to 5000. >> Otherwise, vecR remain unchanged. This is easy to do in R, which reads >> vecA<-something; >> vecR<-runif(10000); >> if (sum(vecR)>=5000)){ >> vecA<-vecR; >> } >> >> >> Now my question is, if I am going to do the same thing in R using .Call. >> How can I achieve it in a more efficient way (i.e. less computation time >> compared with pure R code above.). My c code (called "change_vecA.c") >> using >> .Call is like this: >> >> SEXP change_vecA(SEXP vecA){ >> int i,vecA_len; >> double sum,*res_ptr,*vecR_ptr,*vecA_ptr; >> >> vecA_ptr=REAL(vecA); >> vecA_len=length(vecA); >> SEXP res_vec,vecR; >> >> PROTECT(res_vec=allocVector(REALSXP, vec_len)); >> PROTECT(vecR=allocVector(REALSXP, vec_len)); >> res_ptr=REAL(res_vec); >> vecR_ptr=REAL(vecR); >> GetRNGstate(); >> sum=0.0; >> for (i=0;i<vecA_len;i++){ >> vecR_ptr[i]=runif(0,1); >> sum+=vecR_ptr[i]; >> } >> if (sum>=5000){ >> /*copy vecR to the vector to be returned*/ >> for (i=0;i<vecA_len;i++){ >> res_ptr[i]=vecR_ptr[i]; >> } >> } >> else{ >> /*copy vecA to the vector to be returned*/ >> for (i=0;i<vecA_len;i++){ >> res_ptr[i]=vecA_ptr[i]; >> } >> } >> >> PutRNGstate(); >> UNPROTECT(2); >> resturn(res); >> } >> My R wrapper function is >> change_vecA<-function(vecA){ >> dyn.load("change_vecA.so"); >> .Call("change_vecA",vecA); >> } >> >> Now my question is, due to two loops (one generates the random >> vector and one determines the vector to be returned), can .Call still be >> faster than pure R code (only one loop to copy vecR to vecA given condition >> is met)? Or, how can I improve my c code to avoid redundant loops if any. >> My >> concern is if vecA is large (say of length 1000000 or even bigger), loops >> in >> C code can slow things down. Thanks for any help! >> >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html >> Sent from the R devel mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
In reply to this post by Karl Forner
Because if you calculate the probability and then make uniform values,
nothing guarantees that the sum of those uniform values actually is larger than 50,000. You only have 50% chance it is, in fact... Cheers Joris On Fri, Nov 18, 2011 at 4:08 PM, Karl Forner <[hidden email]> wrote: > Hi, > > A probably very naive remark, but I believe that the probability of sum( > runif(10000) ) >= 50000 is exactly 0.5. So why not just test that, and > generate the uniform values only if needed ? > > > Karl Forner > > On Thu, Nov 17, 2011 at 6:09 PM, Raymond <[hidden email]> wrote: > >> Hi R developers, >> >> I am new to this forum and hope someone can help me with .Call in R. >> Greatly appreciate any help! >> >> Say, I have a vector called "vecA" of length 10000, I generate a vector >> called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA >> and vecR are of double type. I want to replace elements vecA by elements in >> vecR only if sum of elements in vecR is greater than or equal to 5000. >> Otherwise, vecR remain unchanged. This is easy to do in R, which reads >> vecA<-something; >> vecR<-runif(10000); >> if (sum(vecR)>=5000)){ >> vecA<-vecR; >> } >> >> >> Now my question is, if I am going to do the same thing in R using .Call. >> How can I achieve it in a more efficient way (i.e. less computation time >> compared with pure R code above.). My c code (called "change_vecA.c") >> using >> .Call is like this: >> >> SEXP change_vecA(SEXP vecA){ >> int i,vecA_len; >> double sum,*res_ptr,*vecR_ptr,*vecA_ptr; >> >> vecA_ptr=REAL(vecA); >> vecA_len=length(vecA); >> SEXP res_vec,vecR; >> >> PROTECT(res_vec=allocVector(REALSXP, vec_len)); >> PROTECT(vecR=allocVector(REALSXP, vec_len)); >> res_ptr=REAL(res_vec); >> vecR_ptr=REAL(vecR); >> GetRNGstate(); >> sum=0.0; >> for (i=0;i<vecA_len;i++){ >> vecR_ptr[i]=runif(0,1); >> sum+=vecR_ptr[i]; >> } >> if (sum>=5000){ >> /*copy vecR to the vector to be returned*/ >> for (i=0;i<vecA_len;i++){ >> res_ptr[i]=vecR_ptr[i]; >> } >> } >> else{ >> /*copy vecA to the vector to be returned*/ >> for (i=0;i<vecA_len;i++){ >> res_ptr[i]=vecA_ptr[i]; >> } >> } >> >> PutRNGstate(); >> UNPROTECT(2); >> resturn(res); >> } >> My R wrapper function is >> change_vecA<-function(vecA){ >> dyn.load("change_vecA.so"); >> .Call("change_vecA",vecA); >> } >> >> Now my question is, due to two loops (one generates the random >> vector and one determines the vector to be returned), can .Call still be >> faster than pure R code (only one loop to copy vecR to vecA given condition >> is met)? Or, how can I improve my c code to avoid redundant loops if any. >> My >> concern is if vecA is large (say of length 1000000 or even bigger), loops >> in >> C code can slow things down. Thanks for any help! >> >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html >> Sent from the R devel mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 [hidden email] ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
Yes indeed. My mistake.
On Fri, Nov 18, 2011 at 4:45 PM, Joris Meys <[hidden email]> wrote: > Because if you calculate the probability and then make uniform values, > nothing guarantees that the sum of those uniform values actually is > larger than 50,000. You only have 50% chance it is, in fact... > Cheers > Joris > > On Fri, Nov 18, 2011 at 4:08 PM, Karl Forner <[hidden email]> > wrote: > > Hi, > > > > A probably very naive remark, but I believe that the probability of sum( > > runif(10000) ) >= 50000 is exactly 0.5. So why not just test that, and > > generate the uniform values only if needed ? > > > > > > Karl Forner > > > > On Thu, Nov 17, 2011 at 6:09 PM, Raymond <[hidden email]> > wrote: > > > >> Hi R developers, > >> > >> I am new to this forum and hope someone can help me with .Call in R. > >> Greatly appreciate any help! > >> > >> Say, I have a vector called "vecA" of length 10000, I generate a > vector > >> called "vecR" with elements randomly generated from Uniform[0,1]. Both > vecA > >> and vecR are of double type. I want to replace elements vecA by > elements in > >> vecR only if sum of elements in vecR is greater than or equal to 5000. > >> Otherwise, vecR remain unchanged. This is easy to do in R, which reads > >> vecA<-something; > >> vecR<-runif(10000); > >> if (sum(vecR)>=5000)){ > >> vecA<-vecR; > >> } > >> > >> > >> Now my question is, if I am going to do the same thing in R using > .Call. > >> How can I achieve it in a more efficient way (i.e. less computation time > >> compared with pure R code above.). My c code (called "change_vecA.c") > >> using > >> .Call is like this: > >> > >> SEXP change_vecA(SEXP vecA){ > >> int i,vecA_len; > >> double sum,*res_ptr,*vecR_ptr,*vecA_ptr; > >> > >> vecA_ptr=REAL(vecA); > >> vecA_len=length(vecA); > >> SEXP res_vec,vecR; > >> > >> PROTECT(res_vec=allocVector(REALSXP, vec_len)); > >> PROTECT(vecR=allocVector(REALSXP, vec_len)); > >> res_ptr=REAL(res_vec); > >> vecR_ptr=REAL(vecR); > >> GetRNGstate(); > >> sum=0.0; > >> for (i=0;i<vecA_len;i++){ > >> vecR_ptr[i]=runif(0,1); > >> sum+=vecR_ptr[i]; > >> } > >> if (sum>=5000){ > >> /*copy vecR to the vector to be returned*/ > >> for (i=0;i<vecA_len;i++){ > >> res_ptr[i]=vecR_ptr[i]; > >> } > >> } > >> else{ > >> /*copy vecA to the vector to be returned*/ > >> for (i=0;i<vecA_len;i++){ > >> res_ptr[i]=vecA_ptr[i]; > >> } > >> } > >> > >> PutRNGstate(); > >> UNPROTECT(2); > >> resturn(res); > >> } > >> My R wrapper function is > >> change_vecA<-function(vecA){ > >> dyn.load("change_vecA.so"); > >> .Call("change_vecA",vecA); > >> } > >> > >> Now my question is, due to two loops (one generates the random > >> vector and one determines the vector to be returned), can .Call still be > >> faster than pure R code (only one loop to copy vecR to vecA given > condition > >> is met)? Or, how can I improve my c code to avoid redundant loops if > any. > >> My > >> concern is if vecA is large (say of length 1000000 or even bigger), > loops > >> in > >> C code can slow things down. Thanks for any help! > >> > >> > >> > >> > >> > >> -- > >> View this message in context: > >> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html > >> Sent from the R devel mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 9 264 59 87 > [hidden email] > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
| Powered by Nabble | Edit this page |
