Greetings. I'd like to get some advice about using OpenBLAS with R, rather
than using the BLAS that comes built in to R. I've tried this on my Fedora 20 system (see the appended for details). I ran a simple test -- multiplying two large matrices -- and the results were very impressive, i.e., in favor of OpenBLAS, which is consistent with discussions I've seen on the web. My concern is that maybe this is too good to be true. I.e., the standard R configuration is vetted by thousands of people every day. Can I have the same degree of confidence with OpenBLAS that I have in the built-in version? And/or are there other caveats to using OpenBLAS of which I should be aware? Thanks. -- Mike #### Here's the version of R, compiled locally with configuration options: #### ./configure --enable-R-shlib --enable-BLAS-shlib $ R R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) . . . #### Here's the R source code for this little test: library(microbenchmark) mSize <- 10000 set.seed(42) aMat <- matrix(rnorm(mSize * mSize), nrow=mSize) bMat <- matrix(rnorm(mSize * mSize), nrow=mSize) cMat <- aMat %*% bMat ## do the calculation once to see that it works traceCMat <- sum(diag(cMat)) ## a mild sanity check on the calculation traceCMat microbenchmark(aMat %*% bMat, times=5L) ## repeat a few more times ----- #### Here is the output from code, running under various conditions: > traceCMat ###### Using the built-in BLAS from R [1] -11367.55 > microbenchmark(aMat %*% bMat, times=5L) Unit: seconds expr min lq mean median uq max neval aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662 5 ---------- > traceCMat ###### Using libopenblas.so from Fedora [1] -11367.55 > microbenchmark(aMat %*% bMat, times=5L) Unit: seconds expr min lq mean median uq max neval aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475 5 > ---------- > traceCMat <- sum(diag(cMat)) ###### libopenblas.so from Fedora with > traceCMat ###### export OMP_NUM_THREADS=6 [1] -11367.55 > microbenchmark(aMat %*% bMat, times=5L) Unit: seconds expr min lq mean median uq max neval aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866 5 > ###### Fedora libopenblas.so appears to be single-threaded ---------- > traceCMat <- sum(diag(cMat)) ###### libopenblas.so compiled locally > traceCMat ###### from source w/OMP_NUM_THREADS=6 [1] -11367.55 > microbenchmark(aMat %*% bMat, times=5L) Unit: seconds expr min lq mean median uq max neval aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705 5 > ###### Locally-compiled openblas appears to be multi-threaded ###### The microbenchmark appeared to use all 8 processors, even ###### though I asked for only 6. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On 16/11/2014 00:11, Michael Hannon wrote:
> Greetings. I'd like to get some advice about using OpenBLAS with R, rather > than using the BLAS that comes built in to R. That was really a topic for the R-devel list: see the posting guide. > I've tried this on my Fedora 20 system (see the appended for details). I ran > a simple test -- multiplying two large matrices -- and the results were very > impressive, i.e., in favor of OpenBLAS, which is consistent with discussions > I've seen on the web. If that is all you do, then you should be using an optimized BLAS, and choose the one(s) best for your (unstated) machine(s). > My concern is that maybe this is too good to be true. I.e., the standard R > configuration is vetted by thousands of people every day. Can I have the same > degree of confidence with OpenBLAS that I have in the built-in version? No. And it is 'too good to be true' for most users of R, for whom BLAS operations take a negligible proportion of their CPU time. > And/or are there other caveats to using OpenBLAS of which I should be aware? Yes: see the 'R Installation and Administration Manual'. Known issues include: 1) Optimized BLAS trade accuracy for speed. Surprisingly much published R code relies on using extended-precision FPU registers for intermediate results, which optimized BLAS do much less than the reference BLAS. Some packages rely on a particular sign of the solution to svd or eigen problems: people then report as bugs that optimized BLAS give a different sign from the reference BLAS. 2) Fast BLAS normally use multi-threading: that usually helps elapsed time for a single R task at the expense of increased total CPU time. Fine if you have unused CPU cores, but not advantageous in a fully-used multi-core machine, e.g. one that is doing many R sessions in parallel. 3) Many BLAS optimize their use of CPU caches. This works best if the BLAS-using process is the only task running on a particular core (or CPU where CPU cores share cache). (It also means that optimizing on one CPU model and running on another can be disastrous.) > > Thanks. > > -- Mike > > #### Here's the version of R, compiled locally with configuration options: > #### ./configure --enable-R-shlib --enable-BLAS-shlib > > $ R > > R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" > Copyright (C) 2014 The R Foundation for Statistical Computing > Platform: x86_64-unknown-linux-gnu (64-bit) > . > . > . > > #### Here's the R source code for this little test: > > library(microbenchmark) > > mSize <- 10000 > set.seed(42) > > aMat <- matrix(rnorm(mSize * mSize), nrow=mSize) > bMat <- matrix(rnorm(mSize * mSize), nrow=mSize) > > cMat <- aMat %*% bMat ## do the calculation once to see that it works > > traceCMat <- sum(diag(cMat)) ## a mild sanity check on the calculation > traceCMat > > microbenchmark(aMat %*% bMat, times=5L) ## repeat a few more times > > ----- > > #### Here is the output from code, running under various conditions: > >> traceCMat ###### Using the built-in BLAS from R > [1] -11367.55 >> microbenchmark(aMat %*% bMat, times=5L) > Unit: seconds > expr min lq mean median uq max neval > aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662 5 > > ---------- > >> traceCMat ###### Using libopenblas.so from Fedora > [1] -11367.55 >> microbenchmark(aMat %*% bMat, times=5L) > Unit: seconds > expr min lq mean median uq max neval > aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475 5 >> > > ---------- > >> traceCMat <- sum(diag(cMat)) ###### libopenblas.so from Fedora with >> traceCMat ###### export OMP_NUM_THREADS=6 > [1] -11367.55 >> microbenchmark(aMat %*% bMat, times=5L) > Unit: seconds > expr min lq mean median uq max neval > aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866 5 >> > > ###### Fedora libopenblas.so appears to be single-threaded > > ---------- > >> traceCMat <- sum(diag(cMat)) ###### libopenblas.so compiled locally >> traceCMat ###### from source w/OMP_NUM_THREADS=6 > [1] -11367.55 >> microbenchmark(aMat %*% bMat, times=5L) > Unit: seconds > expr min lq mean median uq max neval > aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705 5 >> > > ###### Locally-compiled openblas appears to be multi-threaded > ###### The microbenchmark appeared to use all 8 processors, even > ###### though I asked for only 6. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, [hidden email] Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Useful and interesting. Thanks for your prompt reply.
-- Mike On Sun, Nov 16, 2014 at 2:29 AM, Prof Brian Ripley <[hidden email]> wrote: > On 16/11/2014 00:11, Michael Hannon wrote: >> >> Greetings. I'd like to get some advice about using OpenBLAS with R, >> rather >> than using the BLAS that comes built in to R. > > > That was really a topic for the R-devel list: see the posting guide. > >> I've tried this on my Fedora 20 system (see the appended for details). I >> ran >> a simple test -- multiplying two large matrices -- and the results were >> very >> impressive, i.e., in favor of OpenBLAS, which is consistent with >> discussions >> I've seen on the web. > > > If that is all you do, then you should be using an optimized BLAS, and > choose the one(s) best for your (unstated) machine(s). > >> My concern is that maybe this is too good to be true. I.e., the standard >> R >> configuration is vetted by thousands of people every day. Can I have the >> same >> degree of confidence with OpenBLAS that I have in the built-in version? > > > No. And it is 'too good to be true' for most users of R, for whom BLAS > operations take a negligible proportion of their CPU time. > >> And/or are there other caveats to using OpenBLAS of which I should be >> aware? > > > Yes: see the 'R Installation and Administration Manual'. Known issues > include: > > 1) Optimized BLAS trade accuracy for speed. Surprisingly much published R > code relies on using extended-precision FPU registers for intermediate > results, which optimized BLAS do much less than the reference BLAS. > > Some packages rely on a particular sign of the solution to svd or eigen > problems: people then report as bugs that optimized BLAS give a different > sign from the reference BLAS. > > 2) Fast BLAS normally use multi-threading: that usually helps elapsed time > for a single R task at the expense of increased total CPU time. Fine if you > have unused CPU cores, but not advantageous in a fully-used multi-core > machine, e.g. one that is doing many R sessions in parallel. > > 3) Many BLAS optimize their use of CPU caches. This works best if the > BLAS-using process is the only task running on a particular core (or CPU > where CPU cores share cache). (It also means that optimizing on one CPU > model and running on another can be disastrous.) > > >> >> Thanks. >> >> -- Mike >> >> #### Here's the version of R, compiled locally with configuration options: >> #### ./configure --enable-R-shlib --enable-BLAS-shlib >> >> $ R >> >> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" >> Copyright (C) 2014 The R Foundation for Statistical Computing >> Platform: x86_64-unknown-linux-gnu (64-bit) >> . >> . >> . >> >> #### Here's the R source code for this little test: >> >> library(microbenchmark) >> >> mSize <- 10000 >> set.seed(42) >> >> aMat <- matrix(rnorm(mSize * mSize), nrow=mSize) >> bMat <- matrix(rnorm(mSize * mSize), nrow=mSize) >> >> cMat <- aMat %*% bMat ## do the calculation once to see that it works >> >> traceCMat <- sum(diag(cMat)) ## a mild sanity check on the calculation >> traceCMat >> >> microbenchmark(aMat %*% bMat, times=5L) ## repeat a few more times >> >> ----- >> >> #### Here is the output from code, running under various conditions: >> >>> traceCMat ###### Using the built-in BLAS from R >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max neval >> aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662 5 >> >> ---------- >> >>> traceCMat ###### Using libopenblas.so from Fedora >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max >> neval >> aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475 >> 5 >>> >>> >> >> ---------- >> >>> traceCMat <- sum(diag(cMat)) ###### libopenblas.so from Fedora with >>> traceCMat ###### export OMP_NUM_THREADS=6 >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max neval >> aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866 5 >>> >>> >> >> ###### Fedora libopenblas.so appears to be single-threaded >> >> ---------- >> >>> traceCMat <- sum(diag(cMat)) ###### libopenblas.so compiled locally >>> traceCMat ###### from source w/OMP_NUM_THREADS=6 >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max >> neval >> aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705 >> 5 >>> >>> >> >> ###### Locally-compiled openblas appears to be multi-threaded >> ###### The microbenchmark appeared to use all 8 processors, even >> ###### though I asked for only 6. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Brian D. Ripley, [hidden email] > Emeritus Professor of Applied Statistics, University of Oxford > 1 South Parks Road, Oxford OX1 3TG, UK ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |