_Rout.txt files. Notice that in the second _Rout.txt file the order

_Rout.txt files. The correct order has the pointer first, then the

dimensions. For the first _Rout.txt file this part of the code is not

used.

> On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani

> <

[hidden email]> wrote:

>> (I am using a LINUX machine)

>>

>> Jeff,

>>

>> In creating reproducible results, I 'partially' answered my question. I have

>> attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both

>> files into your chosen directory, then run 'Rscript mvMultiply.r' in that

>> directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and

>> 'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to

>> verify that the two methods produce the same output vector.)

>>

>> Below are the results that I get, along with discussion (tR and tCall are in

>> sec):

>>

>> INCLUDE_DATAPREP,ROWMAJOR,tR,tCall

>> F,F,13.536,13.875

>> F,T,13.824,14.299

>> T,F,13.688,18.167

>> T,T,13.982,30.730

>>

>> Interpretation: The execution time for the .Call line is nearly identical to

>> the call to R operator '%*%'. Two data preparation lines for the .Call

>> method create the overhead:

>>

>> A <- t(A) (~12sec, or 12usec per call)

>> dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)

>>

>> While the first line can be avoided by providing options in c++ function (as

>> is done in the BLAS API), I wonder if the second line can be avoided, aside

>> from the obvious option of rewriting the R scripts to use vectors instead of

>> matrices. But this defies one of the primary advantages of using R, which is

>> succinct, high-level coding. In particular, if one has several matrices as

>> input into a .Call function, then the overhead from matrix-to-vector

>> transformations can add up. To summarize, my questions are:

>>

>> 1- Do the above results seem reasonable to you? Is there a similar penalty

>> in R's '%*%' operator for transforming matrices to vectors before calling

>> BLAS functions?

>> 2- Are there techniques for reducing the above overhead for developers

>> looking to augment their R code with compiled code?

>>

>> Regards,

>> Alireza

>>

>> ---------------------------------------

>> # mvMultiply.r

>> # comparing performance of matrix multiplication in R (using '%*%' operator)

>> vs. calling compiled code (using .Call function)

>> # y [m x 1] = A [m x n] %*% x [n x 1]

>>

>> rm(list = ls())

>> system("R CMD SHLIB mvMultiply.cc")

>> dyn.load("mvMultiply.so")

>>

>> INCLUDE_DATAPREP <- F

>> ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or

>> column-major fashion

>>

>> m <- 100

>> n <- 10

>> N <- 1000000

>>

>> diffVec <- array(0, dim = N)

>> tR <- 0.0

>> tCall <- 0.0

>> for (i in 1:N) {

>>

>> A <- runif(m*n); dim(A) <- c(m,n)

>> x <- runif(n)

>>

>> t1 <- proc.time()[3]

>> y1 <- A %*% x

>> tR <- tR + proc.time()[3] - t1

>>

>> if (INCLUDE_DATAPREP) {

>> t1 <- proc.time()[3]

>> }

>> if (ROWMAJOR) { #since R will convert matrix to vector in a column-major

>> fashion, if the c++ function expects a row-major format, we need to

>> transpose A before converting it to a vector

>> A <- t(A)

>> }

>> dim(A) <- dim(A)[1] * dim(A)[2]

>> if (!INCLUDE_DATAPREP) {

>> t1 <- proc.time()[3]

>> }

>> y2 <- .Call("matvecMultiply", as.double(A), as.double(x),

>> as.logical(c(ROWMAJOR)))

>> tCall <- tCall + proc.time()[3] - t1

>>

>> diffVec[i] <- max(abs(y2 - y1))

>> }

>> cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")

>> cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")

>> cat("Time - Using '%*%' operator in R: ", tR, "sec\n")

>> cat("Time - Using '.Call' function: ", tCall, "sec\n")

>> cat("Maximum difference between methods: ", max(diffVec), "\n")

>>

>> dyn.unload("mvMultiply.so")

>> ---------------------------------------

>> # mvMultiply.cc

>> #include <Rinternals.h>

>> #include <R.h>

>>

>> extern "C" {

>>

>> SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {

>> double *rA = REAL(A), *rx = REAL(x), *ry;

>> int *rrm = LOGICAL(rowmajor);

>> int n = length(x);

>> int m = length(A) / n;

>> SEXP y;

>> PROTECT(y = allocVector(REALSXP, m));

>> ry = REAL(y);

>> for (int i = 0; i < m; i++) {

>> ry[i] = 0.0;

>> for (int j = 0; j < n; j++) {

>> if (rrm[0] == 1) {

>> ry[i] += rA[i * n + j] * rx[j];

>> } else {

>> ry[i] += rA[j * m + i] * rx[j];

>> }

>> }

>> }

>> UNPROTECT(1);

>> return(y);

>> }

>>

>> }

>>

>

> I realize that you are just beginning to use the .C and .Call

> interfaces and your example is therefore a simple one. However, if

> you plan to continue with such development it is worthwhile learning

> of some of the tools available. I think one of the most important is

> the "inline" package that can take a C or C++ code segment as a text

> string and go through all the steps of creating and loading a

> .Call'able compiled function.

>

> Second, if you are going to use C++ (the code you show could be C code

> as it doesn't use any C++ extensions) then you should look at the Rcpp

> package written by Dirk Eddelbuettel and Romain Francois which allows

> for comparatively painless interfacing of R objects and C++ objects.

> The Rcpp-devel list, which I have copied on this reply, is for

> questions related to that system. The inline package allows for

> various "plugin" constructions to wrap your code in the appropriate

> headers and point the compiler to the locations of header files and

> libraries. There are two extensions to Rcpp for numerical linear

> algebra in C++, RcppArmadillo and RcppEigen. I show the use of

> RcppEigen here.

>

> Third there are several packages in R that do the busy work of

> benchmarking expressions and neatly formulating the results. I use

> the rbenchmark package.

>

> Putting all these together yields the enclosed script and results.

>

> In Eigen, a MatrixXd object is the equivalent of R's numeric matrix

> (similarly MatrixXi for integer and MatrixXcd for complex) and a

> VectorXd object is the equivalent of a numeric vector. A "mapped"

> matrix or vector is one that uses the storage allocated by R, thereby

> avoiding a copy operation (similar to your accessing elements of the

> arrays through the pointer returned by REAL()). To adhere to R's

> functional programming semantics it is a good idea to declare such

> objects as const. The 'as' and 'wrap' functions are provided by Rcpp

> with extensions in RcppEigen to the Eigen classes in the development

> version. In the released versions of Rcpp and RcppEigen we use

> intermediate Rcpp objects. These functions have the advantage of

> checking the types of R objects being passed. The Eigen code for

> matrix multiplication will check the consistency of dimensions in the

> operation.

>

> Given the size of the matrix you are working with it is not surprising

> that interpretation overhead and checking will be a large part of the

> elapsed time, hence the relative differences between different methods

> of doing the numerical calculation will be small. The operation of

> multiplying a 100 x 10 matrix by a 10-vector involves "only" 1000

> floating point operations. Furthermore, each element of the matrix is

> used only once so sophisticated methods of manipulating cache contents

> won't buy you much. These benchmark results are from a system that

> uses Atlas BLAS (basic linear algebra subroutines); other systems will

> provide different results. Interestingly, I found on some systems

> using R's BLAS, which are not accelerated, the R code is closer in

> speed to the code using Eigen. An example is given in the second

> version of the output.

>