Using OpenBLAS with R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using OpenBLAS with R

Michael Hannon-2
Greetings.  I'd like to get some advice about using OpenBLAS with R, rather
than using the BLAS that comes built in to R.

I've tried this on my Fedora 20 system (see the appended for details).  I ran
a simple test -- multiplying two large matrices -- and the results were very
impressive, i.e., in favor of OpenBLAS, which is consistent with discussions
I've seen on the web.

My concern is that maybe this is too good to be true.  I.e., the standard R
configuration is vetted by thousands of people every day.  Can I have the same
degree of confidence with OpenBLAS that I have in the built-in version?

And/or are there other caveats to using OpenBLAS of which I should be aware?

Thanks.

-- Mike

#### Here's the version of R, compiled locally with configuration options:
#### ./configure --enable-R-shlib --enable-BLAS-shlib

$ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
.
.
.

#### Here's the R source code for this little test:

library(microbenchmark)

mSize <- 10000
set.seed(42)

aMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
bMat <- matrix(rnorm(mSize * mSize), nrow=mSize)

cMat <- aMat %*% bMat  ## do the calculation once to see that it works

traceCMat <- sum(diag(cMat))  ## a mild sanity check on the calculation
traceCMat

microbenchmark(aMat %*% bMat, times=5L)  ## repeat a few more times

-----

#### Here is the output from code, running under various conditions:

> traceCMat ###### Using the built-in BLAS from R
[1] -11367.55
> microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
          expr      min       lq     mean   median       uq     max neval
 aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662     5

----------

> traceCMat  ###### Using libopenblas.so from Fedora
[1] -11367.55
> microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
          expr      min       lq     mean   median       uq      max neval
 aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475     5
>

----------

> traceCMat <- sum(diag(cMat))  ###### libopenblas.so from Fedora with
> traceCMat                     ###### export OMP_NUM_THREADS=6
[1] -11367.55
> microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
          expr      min       lq    mean   median       uq      max neval
 aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866     5
>

###### Fedora libopenblas.so appears to be single-threaded

----------

> traceCMat <- sum(diag(cMat))  ###### libopenblas.so compiled locally
> traceCMat                     ###### from source w/OMP_NUM_THREADS=6
[1] -11367.55
> microbenchmark(aMat %*% bMat, times=5L)
Unit: seconds
          expr      min       lq     mean   median       uq      max neval
 aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705     5
>

###### Locally-compiled openblas appears to be multi-threaded
###### The microbenchmark appeared to use all 8 processors, even
###### though I asked for only 6.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using OpenBLAS with R

Prof Brian Ripley
On 16/11/2014 00:11, Michael Hannon wrote:
> Greetings.  I'd like to get some advice about using OpenBLAS with R, rather
> than using the BLAS that comes built in to R.

That was really a topic for the R-devel list: see the posting guide.

> I've tried this on my Fedora 20 system (see the appended for details).  I ran
> a simple test -- multiplying two large matrices -- and the results were very
> impressive, i.e., in favor of OpenBLAS, which is consistent with discussions
> I've seen on the web.

If that is all you do, then you should be using an optimized BLAS, and
choose the one(s) best for your (unstated) machine(s).

> My concern is that maybe this is too good to be true.  I.e., the standard R
> configuration is vetted by thousands of people every day.  Can I have the same
> degree of confidence with OpenBLAS that I have in the built-in version?

No.  And it is 'too good to be true' for most users of R, for whom BLAS
operations take a negligible proportion of their CPU time.

> And/or are there other caveats to using OpenBLAS of which I should be aware?

Yes: see the 'R Installation and Administration Manual'.  Known issues
include:

1) Optimized BLAS trade accuracy for speed.   Surprisingly much
published R code relies on using extended-precision FPU registers for
intermediate results, which optimized BLAS do much less than the
reference BLAS.

Some packages rely on a particular sign of the solution to svd or eigen
problems: people then report as bugs that optimized BLAS give a
different sign from the reference BLAS.

2) Fast BLAS normally use multi-threading: that usually helps elapsed
time for a single R task at the expense of increased total CPU time.
Fine if you have unused CPU cores, but not advantageous in a fully-used
multi-core machine, e.g. one that is doing many R sessions in parallel.

3) Many BLAS optimize their use of CPU caches.  This works best if the
BLAS-using process is the only task running on a particular core (or CPU
where CPU cores share cache).  (It also means that optimizing on one CPU
model and running on another can be disastrous.)


>
> Thanks.
>
> -- Mike
>
> #### Here's the version of R, compiled locally with configuration options:
> #### ./configure --enable-R-shlib --enable-BLAS-shlib
>
> $ R
>
> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
> Copyright (C) 2014 The R Foundation for Statistical Computing
> Platform: x86_64-unknown-linux-gnu (64-bit)
> .
> .
> .
>
> #### Here's the R source code for this little test:
>
> library(microbenchmark)
>
> mSize <- 10000
> set.seed(42)
>
> aMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
> bMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
>
> cMat <- aMat %*% bMat  ## do the calculation once to see that it works
>
> traceCMat <- sum(diag(cMat))  ## a mild sanity check on the calculation
> traceCMat
>
> microbenchmark(aMat %*% bMat, times=5L)  ## repeat a few more times
>
> -----
>
> #### Here is the output from code, running under various conditions:
>
>> traceCMat ###### Using the built-in BLAS from R
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq     mean   median       uq     max neval
>   aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662     5
>
> ----------
>
>> traceCMat  ###### Using libopenblas.so from Fedora
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq     mean   median       uq      max neval
>   aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475     5
>>
>
> ----------
>
>> traceCMat <- sum(diag(cMat))  ###### libopenblas.so from Fedora with
>> traceCMat                     ###### export OMP_NUM_THREADS=6
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq    mean   median       uq      max neval
>   aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866     5
>>
>
> ###### Fedora libopenblas.so appears to be single-threaded
>
> ----------
>
>> traceCMat <- sum(diag(cMat))  ###### libopenblas.so compiled locally
>> traceCMat                     ###### from source w/OMP_NUM_THREADS=6
> [1] -11367.55
>> microbenchmark(aMat %*% bMat, times=5L)
> Unit: seconds
>            expr      min       lq     mean   median       uq      max neval
>   aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705     5
>>
>
> ###### Locally-compiled openblas appears to be multi-threaded
> ###### The microbenchmark appeared to use all 8 processors, even
> ###### though I asked for only 6.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Brian D. Ripley,                  [hidden email]
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using OpenBLAS with R

Michael Hannon-2
Useful and interesting.  Thanks for your prompt reply.

-- Mike

On Sun, Nov 16, 2014 at 2:29 AM, Prof Brian Ripley
<[hidden email]> wrote:

> On 16/11/2014 00:11, Michael Hannon wrote:
>>
>> Greetings.  I'd like to get some advice about using OpenBLAS with R,
>> rather
>> than using the BLAS that comes built in to R.
>
>
> That was really a topic for the R-devel list: see the posting guide.
>
>> I've tried this on my Fedora 20 system (see the appended for details).  I
>> ran
>> a simple test -- multiplying two large matrices -- and the results were
>> very
>> impressive, i.e., in favor of OpenBLAS, which is consistent with
>> discussions
>> I've seen on the web.
>
>
> If that is all you do, then you should be using an optimized BLAS, and
> choose the one(s) best for your (unstated) machine(s).
>
>> My concern is that maybe this is too good to be true.  I.e., the standard
>> R
>> configuration is vetted by thousands of people every day.  Can I have the
>> same
>> degree of confidence with OpenBLAS that I have in the built-in version?
>
>
> No.  And it is 'too good to be true' for most users of R, for whom BLAS
> operations take a negligible proportion of their CPU time.
>
>> And/or are there other caveats to using OpenBLAS of which I should be
>> aware?
>
>
> Yes: see the 'R Installation and Administration Manual'.  Known issues
> include:
>
> 1) Optimized BLAS trade accuracy for speed.   Surprisingly much published R
> code relies on using extended-precision FPU registers for intermediate
> results, which optimized BLAS do much less than the reference BLAS.
>
> Some packages rely on a particular sign of the solution to svd or eigen
> problems: people then report as bugs that optimized BLAS give a different
> sign from the reference BLAS.
>
> 2) Fast BLAS normally use multi-threading: that usually helps elapsed time
> for a single R task at the expense of increased total CPU time. Fine if you
> have unused CPU cores, but not advantageous in a fully-used multi-core
> machine, e.g. one that is doing many R sessions in parallel.
>
> 3) Many BLAS optimize their use of CPU caches.  This works best if the
> BLAS-using process is the only task running on a particular core (or CPU
> where CPU cores share cache).  (It also means that optimizing on one CPU
> model and running on another can be disastrous.)
>
>
>>
>> Thanks.
>>
>> -- Mike
>>
>> #### Here's the version of R, compiled locally with configuration options:
>> #### ./configure --enable-R-shlib --enable-BLAS-shlib
>>
>> $ R
>>
>> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
>> Copyright (C) 2014 The R Foundation for Statistical Computing
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> .
>> .
>> .
>>
>> #### Here's the R source code for this little test:
>>
>> library(microbenchmark)
>>
>> mSize <- 10000
>> set.seed(42)
>>
>> aMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
>> bMat <- matrix(rnorm(mSize * mSize), nrow=mSize)
>>
>> cMat <- aMat %*% bMat  ## do the calculation once to see that it works
>>
>> traceCMat <- sum(diag(cMat))  ## a mild sanity check on the calculation
>> traceCMat
>>
>> microbenchmark(aMat %*% bMat, times=5L)  ## repeat a few more times
>>
>> -----
>>
>> #### Here is the output from code, running under various conditions:
>>
>>> traceCMat ###### Using the built-in BLAS from R
>>
>> [1] -11367.55
>>>
>>> microbenchmark(aMat %*% bMat, times=5L)
>>
>> Unit: seconds
>>            expr      min       lq     mean   median       uq     max neval
>>   aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662     5
>>
>> ----------
>>
>>> traceCMat  ###### Using libopenblas.so from Fedora
>>
>> [1] -11367.55
>>>
>>> microbenchmark(aMat %*% bMat, times=5L)
>>
>> Unit: seconds
>>            expr      min       lq     mean   median       uq      max
>> neval
>>   aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475
>> 5
>>>
>>>
>>
>> ----------
>>
>>> traceCMat <- sum(diag(cMat))  ###### libopenblas.so from Fedora with
>>> traceCMat                     ###### export OMP_NUM_THREADS=6
>>
>> [1] -11367.55
>>>
>>> microbenchmark(aMat %*% bMat, times=5L)
>>
>> Unit: seconds
>>            expr      min       lq    mean   median       uq      max neval
>>   aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866     5
>>>
>>>
>>
>> ###### Fedora libopenblas.so appears to be single-threaded
>>
>> ----------
>>
>>> traceCMat <- sum(diag(cMat))  ###### libopenblas.so compiled locally
>>> traceCMat                     ###### from source w/OMP_NUM_THREADS=6
>>
>> [1] -11367.55
>>>
>>> microbenchmark(aMat %*% bMat, times=5L)
>>
>> Unit: seconds
>>            expr      min       lq     mean   median       uq      max
>> neval
>>   aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705
>> 5
>>>
>>>
>>
>> ###### Locally-compiled openblas appears to be multi-threaded
>> ###### The microbenchmark appeared to use all 8 processors, even
>> ###### though I asked for only 6.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Brian D. Ripley,                  [hidden email]
> Emeritus Professor of Applied Statistics, University of Oxford
> 1 South Parks Road, Oxford OX1 3TG, UK

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.