.Call in R

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

.Call in R

Raymond
This post was updated on .
Hi R developers,

    I am new to this forum and hope someone can help me with .Call in R. Greatly appreciate any help!

    Say, I have a vector called "vecA" of length 10000, I generate a vector called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA and vecR are of double type. I want to replace elements in vecA by elements in vecR only if sum of elements in vecR is greater than or equal to 5000. Otherwise, vecR remains unchanged. This is easy to do in R, which reads
    vecA<-something;
    vecR<-runif(10000);
    if (sum(vecR)>=5000){
       vecA<-vecR;
    }


    Now my question is, if I am going to do the same thing in R using .Call, how can I achieve it in a more efficient way (i.e. less computation time compared with pure R code above.)?  My c code (called "change_vecA.c") using .Call looks like this:

    SEXP change_vecA(SEXP vecA){
         int i,vecA_len;
         double sum,*res_ptr,*vecR_ptr,*vecA_ptr;

         vecA_ptr=REAL(vecA);
         vecA_len=length(vecA);
         SEXP res_vec,vecR;

         PROTECT(res_vec=allocVector(REALSXP, vecA_len));
         PROTECT(vecR=allocVector(REALSXP, vecA_len));
         res_ptr=REAL(res_vec);
         vecR_ptr=REAL(vecR);
         GetRNGstate();
         sum=0.0;
         for (i=0;i<vecA_len;i++){
              vecR_ptr[i]=runif(0,1);
              sum+=vecR_ptr[i];
         }
         if (sum>=5000){
            /*copy vecR to the vector to be returned*/
            for (i=0;i<vecA_len;i++){
                  res_ptr[i]=vecR_ptr[i];
            }
         }
         else{
                /*copy vecA to the vector to be returned*/
                for (i=0;i<vecA_len;i++){
                      res_ptr[i]=vecA_ptr[i];
                }
         }

         PutRNGstate();
         UNPROTECT(2);
         resturn(res_vec);
}
My R wrapper function is
        change_vecA<-function(vecA){
              dyn.load("change_vecA.so");
              .Call("change_vecA",vecA);
        }
   
         Now my question is, due to two loops (one generates the random vector and one determines the vector to be returned), can .Call still be faster than pure R code (only one loop to copy vecR to vecA given condition is met)? Or, how can I improve my c code to avoid redundant loops if any? My biggest concern is if vecA is large (say of length 1000000 or even bigger), loops in C code can slow things down.  Thanks for any help!  

         

Reply | Threaded
Open this post in threaded view
|

Re: .Call in R

Dirk Eddelbuettel

On 17 November 2011 at 09:09, Raymond wrote:
| Hi R developers,
|
|     I am new to this forum and hope someone can help me with .Call in R.
| Greatly appreciate any help!
|
|     Say, I have a vector called "vecA" of length 10000, I generate a vector
| called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA
| and vecR are of double type. I want to replace elements vecA by elements in
| vecR only if sum of elements in vecR is greater than or equal to 5000.
| Otherwise, vecR remain unchanged. This is easy to do in R, which reads
|     vecA<-something;
|     vecR<-runif(10000);
|     if (sum(vecR)>=5000)){
|        vecA<-vecR;
|     }
|
|
|     Now my question is, if I am going to do the same thing in R using .Call.
| How can I achieve it in a more efficient way (i.e. less computation time
| compared with pure R code above.).  My c code (called "change_vecA.c") using
| .Call is like this:

Here is my take on it, using about the same number of commands in C++ thanks
to Rcpp and its vectorised sum() and runif() commands (which mimick the R
commands):

R> library(inline)
R> library(Rcpp)
R>
R> set.seed(42)         # fix RNG seed
R> vecA <- rt(10000, 6) # 'something' in vecA: t-dist with 6 df
R>
R> fun <- cxxfunction(signature(va="numeric"), # pass in a vector
+                    plugin="Rcpp",      # use Rcpp, and code below
+                    body='
+
+    Rcpp::NumericVector vA(va);
+    Rcpp::RNGScope tmp;                     // make sure RNG is set up
+    Rcpp::NumericVector vR = runif(10000);  // 10k of a U(0,1)
+
+    if (sum(vR) >= 5000) {                  // sum is an Rcpp sugar op.
+       vA = vR;                             // swap vR into vA
+    }
+    return(vA);                             // return vA
+ ')
R>
R> sum( fun( vecA ) )
[1] 5033
R> sum( fun( vecA ) )
[1] 5015
R> sum( fun( vecA ) )
[1] 66
R> sum( fun( vecA ) )
[1] 66
R> sum( fun( vecA ) )
[1] 5015
R> sum( fun( vecA ) )
[1] 5024
R> sum( fun( vecA ) )
[1] 5020
R>
R> sum(vecA)
[1] 66
R>

You can learn about Rcpp from the vignettes in the package, at my page at
http://dirk.eddelbuettel.com/code/rcpp.html as well different posts on my
blog, and of course the rcpp-devel mailing list.  The example above uses
cxxfunction() from the wonderful inline package you may find useful too as it
compiles, links and loads your C or C++ snippets.

Dirk

--
"Outside of a dog, a book is a man's best friend. Inside of a dog, it is too
dark to read." -- Groucho Marx

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: .Call in R

Karl Forner
In reply to this post by Raymond
Hi,

A probably very naive remark, but I believe that the probability of sum(
runif(10000) ) >= 50000 is exactly 0.5. So why not just test that, and
generate the uniform values only if needed ?


Karl Forner

On Thu, Nov 17, 2011 at 6:09 PM, Raymond <[hidden email]> wrote:

> Hi R developers,
>
>    I am new to this forum and hope someone can help me with .Call in R.
> Greatly appreciate any help!
>
>    Say, I have a vector called "vecA" of length 10000, I generate a vector
> called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA
> and vecR are of double type. I want to replace elements vecA by elements in
> vecR only if sum of elements in vecR is greater than or equal to 5000.
> Otherwise, vecR remain unchanged. This is easy to do in R, which reads
>    vecA<-something;
>    vecR<-runif(10000);
>    if (sum(vecR)>=5000)){
>       vecA<-vecR;
>    }
>
>
>    Now my question is, if I am going to do the same thing in R using .Call.
> How can I achieve it in a more efficient way (i.e. less computation time
> compared with pure R code above.).  My c code (called "change_vecA.c")
> using
> .Call is like this:
>
>    SEXP change_vecA(SEXP vecA){
>         int i,vecA_len;
>         double sum,*res_ptr,*vecR_ptr,*vecA_ptr;
>
>         vecA_ptr=REAL(vecA);
>         vecA_len=length(vecA);
>         SEXP res_vec,vecR;
>
>         PROTECT(res_vec=allocVector(REALSXP, vec_len));
>         PROTECT(vecR=allocVector(REALSXP, vec_len));
>         res_ptr=REAL(res_vec);
>         vecR_ptr=REAL(vecR);
>         GetRNGstate();
>         sum=0.0;
>         for (i=0;i<vecA_len;i++){
>              vecR_ptr[i]=runif(0,1);
>              sum+=vecR_ptr[i];
>         }
>         if (sum>=5000){
>            /*copy vecR to the vector to be returned*/
>            for (i=0;i<vecA_len;i++){
>                  res_ptr[i]=vecR_ptr[i];
>            }
>         }
>         else{
>                /*copy vecA to the vector to be returned*/
>                for (i=0;i<vecA_len;i++){
>                      res_ptr[i]=vecA_ptr[i];
>                }
>         }
>
>         PutRNGstate();
>         UNPROTECT(2);
>         resturn(res);
> }
> My R wrapper function is
>        change_vecA<-function(vecA){
>              dyn.load("change_vecA.so");
>              .Call("change_vecA",vecA);
>        }
>
>         Now my question is, due to two loops (one generates the random
> vector and one determines the vector to be returned), can .Call still be
> faster than pure R code (only one loop to copy vecR to vecA given condition
> is met)? Or, how can I improve my c code to avoid redundant loops if any.
> My
> concern is if vecA is large (say of length 1000000 or even bigger), loops
> in
> C code can slow things down.  Thanks for any help!
>
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: .Call in R

Martin Morgan
On 11/18/2011 07:08 AM, Karl Forner wrote:
> Hi,
>
> A probably very naive remark, but I believe that the probability of sum(
> runif(10000) )>= 50000 is exactly 0.5. So why not just test that, and
> generate the uniform values only if needed ?

My thought as well, but actually the deviates need to have mean > .5 so
you'd do something like

   repeat {
      vecA <- runif(10000)
      if (mean(vecA) > .5) break
   }

You'd do this 1/2 the time, and you'd have to itearte on average 1 /
(1/2) = 2 times before getting the vector satisfying the constraint, so
the expected number of iterations is 1/2 * 2 = 1, the same as in the
original implementation!

It does suggest that there is only one allocation required, if this were
coded at the C level. But since sum(), mean(), and runif() all go more
or less directly to C anyway it doesn't seem like this is the right
problem for a C solution.

Martin

>
>
> Karl Forner
>
> On Thu, Nov 17, 2011 at 6:09 PM, Raymond<[hidden email]>  wrote:
>
>> Hi R developers,
>>
>>     I am new to this forum and hope someone can help me with .Call in R.
>> Greatly appreciate any help!
>>
>>     Say, I have a vector called "vecA" of length 10000, I generate a vector
>> called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA
>> and vecR are of double type. I want to replace elements vecA by elements in
>> vecR only if sum of elements in vecR is greater than or equal to 5000.
>> Otherwise, vecR remain unchanged. This is easy to do in R, which reads
>>     vecA<-something;
>>     vecR<-runif(10000);
>>     if (sum(vecR)>=5000)){
>>        vecA<-vecR;
>>     }
>>
>>
>>     Now my question is, if I am going to do the same thing in R using .Call.
>> How can I achieve it in a more efficient way (i.e. less computation time
>> compared with pure R code above.).  My c code (called "change_vecA.c")
>> using
>> .Call is like this:
>>
>>     SEXP change_vecA(SEXP vecA){
>>          int i,vecA_len;
>>          double sum,*res_ptr,*vecR_ptr,*vecA_ptr;
>>
>>          vecA_ptr=REAL(vecA);
>>          vecA_len=length(vecA);
>>          SEXP res_vec,vecR;
>>
>>          PROTECT(res_vec=allocVector(REALSXP, vec_len));
>>          PROTECT(vecR=allocVector(REALSXP, vec_len));
>>          res_ptr=REAL(res_vec);
>>          vecR_ptr=REAL(vecR);
>>          GetRNGstate();
>>          sum=0.0;
>>          for (i=0;i<vecA_len;i++){
>>               vecR_ptr[i]=runif(0,1);
>>               sum+=vecR_ptr[i];
>>          }
>>          if (sum>=5000){
>>             /*copy vecR to the vector to be returned*/
>>             for (i=0;i<vecA_len;i++){
>>                   res_ptr[i]=vecR_ptr[i];
>>             }
>>          }
>>          else{
>>                 /*copy vecA to the vector to be returned*/
>>                 for (i=0;i<vecA_len;i++){
>>                       res_ptr[i]=vecA_ptr[i];
>>                 }
>>          }
>>
>>          PutRNGstate();
>>          UNPROTECT(2);
>>          resturn(res);
>> }
>> My R wrapper function is
>>         change_vecA<-function(vecA){
>>               dyn.load("change_vecA.so");
>>               .Call("change_vecA",vecA);
>>         }
>>
>>          Now my question is, due to two loops (one generates the random
>> vector and one determines the vector to be returned), can .Call still be
>> faster than pure R code (only one loop to copy vecR to vecA given condition
>> is met)? Or, how can I improve my c code to avoid redundant loops if any.
>> My
>> concern is if vecA is large (say of length 1000000 or even bigger), loops
>> in
>> C code can slow things down.  Thanks for any help!
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html
>> Sent from the R devel mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: .Call in R

Joris FA Meys
In reply to this post by Karl Forner
Because if you calculate the probability and then make uniform values,
nothing guarantees that the sum of those uniform values actually is
larger than 50,000. You only have 50% chance it is, in fact...
Cheers
Joris

On Fri, Nov 18, 2011 at 4:08 PM, Karl Forner <[hidden email]> wrote:

> Hi,
>
> A probably very naive remark, but I believe that the probability of sum(
> runif(10000) ) >= 50000 is exactly 0.5. So why not just test that, and
> generate the uniform values only if needed ?
>
>
> Karl Forner
>
> On Thu, Nov 17, 2011 at 6:09 PM, Raymond <[hidden email]> wrote:
>
>> Hi R developers,
>>
>>    I am new to this forum and hope someone can help me with .Call in R.
>> Greatly appreciate any help!
>>
>>    Say, I have a vector called "vecA" of length 10000, I generate a vector
>> called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA
>> and vecR are of double type. I want to replace elements vecA by elements in
>> vecR only if sum of elements in vecR is greater than or equal to 5000.
>> Otherwise, vecR remain unchanged. This is easy to do in R, which reads
>>    vecA<-something;
>>    vecR<-runif(10000);
>>    if (sum(vecR)>=5000)){
>>       vecA<-vecR;
>>    }
>>
>>
>>    Now my question is, if I am going to do the same thing in R using .Call.
>> How can I achieve it in a more efficient way (i.e. less computation time
>> compared with pure R code above.).  My c code (called "change_vecA.c")
>> using
>> .Call is like this:
>>
>>    SEXP change_vecA(SEXP vecA){
>>         int i,vecA_len;
>>         double sum,*res_ptr,*vecR_ptr,*vecA_ptr;
>>
>>         vecA_ptr=REAL(vecA);
>>         vecA_len=length(vecA);
>>         SEXP res_vec,vecR;
>>
>>         PROTECT(res_vec=allocVector(REALSXP, vec_len));
>>         PROTECT(vecR=allocVector(REALSXP, vec_len));
>>         res_ptr=REAL(res_vec);
>>         vecR_ptr=REAL(vecR);
>>         GetRNGstate();
>>         sum=0.0;
>>         for (i=0;i<vecA_len;i++){
>>              vecR_ptr[i]=runif(0,1);
>>              sum+=vecR_ptr[i];
>>         }
>>         if (sum>=5000){
>>            /*copy vecR to the vector to be returned*/
>>            for (i=0;i<vecA_len;i++){
>>                  res_ptr[i]=vecR_ptr[i];
>>            }
>>         }
>>         else{
>>                /*copy vecA to the vector to be returned*/
>>                for (i=0;i<vecA_len;i++){
>>                      res_ptr[i]=vecA_ptr[i];
>>                }
>>         }
>>
>>         PutRNGstate();
>>         UNPROTECT(2);
>>         resturn(res);
>> }
>> My R wrapper function is
>>        change_vecA<-function(vecA){
>>              dyn.load("change_vecA.so");
>>              .Call("change_vecA",vecA);
>>        }
>>
>>         Now my question is, due to two loops (one generates the random
>> vector and one determines the vector to be returned), can .Call still be
>> faster than pure R code (only one loop to copy vecR to vecA given condition
>> is met)? Or, how can I improve my c code to avoid redundant loops if any.
>> My
>> concern is if vecA is large (say of length 1000000 or even bigger), loops
>> in
>> C code can slow things down.  Thanks for any help!
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html
>> Sent from the R devel mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: .Call in R

Karl Forner
Yes indeed. My mistake.

On Fri, Nov 18, 2011 at 4:45 PM, Joris Meys <[hidden email]> wrote:

> Because if you calculate the probability and then make uniform values,
> nothing guarantees that the sum of those uniform values actually is
> larger than 50,000. You only have 50% chance it is, in fact...
> Cheers
> Joris
>
> On Fri, Nov 18, 2011 at 4:08 PM, Karl Forner <[hidden email]>
> wrote:
> > Hi,
> >
> > A probably very naive remark, but I believe that the probability of sum(
> > runif(10000) ) >= 50000 is exactly 0.5. So why not just test that, and
> > generate the uniform values only if needed ?
> >
> >
> > Karl Forner
> >
> > On Thu, Nov 17, 2011 at 6:09 PM, Raymond <[hidden email]>
> wrote:
> >
> >> Hi R developers,
> >>
> >>    I am new to this forum and hope someone can help me with .Call in R.
> >> Greatly appreciate any help!
> >>
> >>    Say, I have a vector called "vecA" of length 10000, I generate a
> vector
> >> called "vecR" with elements randomly generated from Uniform[0,1]. Both
> vecA
> >> and vecR are of double type. I want to replace elements vecA by
> elements in
> >> vecR only if sum of elements in vecR is greater than or equal to 5000.
> >> Otherwise, vecR remain unchanged. This is easy to do in R, which reads
> >>    vecA<-something;
> >>    vecR<-runif(10000);
> >>    if (sum(vecR)>=5000)){
> >>       vecA<-vecR;
> >>    }
> >>
> >>
> >>    Now my question is, if I am going to do the same thing in R using
> .Call.
> >> How can I achieve it in a more efficient way (i.e. less computation time
> >> compared with pure R code above.).  My c code (called "change_vecA.c")
> >> using
> >> .Call is like this:
> >>
> >>    SEXP change_vecA(SEXP vecA){
> >>         int i,vecA_len;
> >>         double sum,*res_ptr,*vecR_ptr,*vecA_ptr;
> >>
> >>         vecA_ptr=REAL(vecA);
> >>         vecA_len=length(vecA);
> >>         SEXP res_vec,vecR;
> >>
> >>         PROTECT(res_vec=allocVector(REALSXP, vec_len));
> >>         PROTECT(vecR=allocVector(REALSXP, vec_len));
> >>         res_ptr=REAL(res_vec);
> >>         vecR_ptr=REAL(vecR);
> >>         GetRNGstate();
> >>         sum=0.0;
> >>         for (i=0;i<vecA_len;i++){
> >>              vecR_ptr[i]=runif(0,1);
> >>              sum+=vecR_ptr[i];
> >>         }
> >>         if (sum>=5000){
> >>            /*copy vecR to the vector to be returned*/
> >>            for (i=0;i<vecA_len;i++){
> >>                  res_ptr[i]=vecR_ptr[i];
> >>            }
> >>         }
> >>         else{
> >>                /*copy vecA to the vector to be returned*/
> >>                for (i=0;i<vecA_len;i++){
> >>                      res_ptr[i]=vecA_ptr[i];
> >>                }
> >>         }
> >>
> >>         PutRNGstate();
> >>         UNPROTECT(2);
> >>         resturn(res);
> >> }
> >> My R wrapper function is
> >>        change_vecA<-function(vecA){
> >>              dyn.load("change_vecA.so");
> >>              .Call("change_vecA",vecA);
> >>        }
> >>
> >>         Now my question is, due to two loops (one generates the random
> >> vector and one determines the vector to be returned), can .Call still be
> >> faster than pure R code (only one loop to copy vecR to vecA given
> condition
> >> is met)? Or, how can I improve my c code to avoid redundant loops if
> any.
> >> My
> >> concern is if vecA is large (say of length 1000000 or even bigger),
> loops
> >> in
> >> C code can slow things down.  Thanks for any help!
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html
> >> Sent from the R devel mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> [hidden email]
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: .Call in R

Raymond
In reply to this post by Martin Morgan
 I agree with Martin that this might not be suitable for a C solution.