

Hi,
I would like to compute: A %*% B %*% t(A)
A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix, where m is large relative to n (e.g., m=50,000 and n=100).
Here is a sample code.
M < 10000
N < 100
A < matrix(rnorm(M*N), M, N)
B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric positivedefinite matrix
# method 1
system.time(D < A %*% B %*% t(A))
# I can obtain speedup by using a Cholesky decomposition of B
# method 2
system.time({
C < t(chol(B))
E < tcrossprod(A%*%C)
})
all.equal(D, E)
I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated.
Thanks,
Ravi
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Ravi,
You can achieve substantial speed up by using a faster BLAS (e.g.,
OpenBLAS or MKL), especially on systems with multiple CPUs. On my (6
year old, but 8 core) system your example takes 3.9 seconds with using
the reference BLAS and only 0.9 seconds using OpenBLAS.
Best,
Ista
On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan < [hidden email]> wrote:
>
> Hi,
>
> I would like to compute: A %*% B %*% t(A)
>
>
>
> A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix, where m is large relative to n (e.g., m=50,000 and n=100).
>
>
>
> Here is a sample code.
>
>
>
> M < 10000
>
> N < 100
>
> A < matrix(rnorm(M*N), M, N)
>
> B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric positivedefinite matrix
>
>
>
> # method 1
>
> system.time(D < A %*% B %*% t(A))
>
>
>
> # I can obtain speedup by using a Cholesky decomposition of B
>
> # method 2
>
> system.time({
>
> C < t(chol(B))
>
> E < tcrossprod(A%*%C)
>
> })
>
>
>
> all.equal(D, E)
>
>
>
> I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated.
>
>
>
> Thanks,
>
> Ravi
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Yeah, you might not be able to go much faster here unless A has some
specialized structure that you can take advantage of (e.g., sparsity)?
On 8/10/18, 11:22 AM, "Ravi Varadhan" < [hidden email]> wrote:
>Hi,
>
>I would like to compute: A %*% B %*% t(A)
>
>
>
>A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix,
>where m is large relative to n (e.g., m=50,000 and n=100).
>
>
>
>Here is a sample code.
>
>
>
>M < 10000
>
>N < 100
>
>A < matrix(rnorm(M*N), M, N)
>
>B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric
>positivedefinite matrix
>
>
>
># method 1
>
>system.time(D < A %*% B %*% t(A))
>
>
>
># I can obtain speedup by using a Cholesky decomposition of B
>
># method 2
>
>system.time({
>
>C < t(chol(B))
>
>E < tcrossprod(A%*%C)
>
>})
>
>
>
>all.equal(D, E)
>
>
>
>I am wondering how to obtain more substantial speedup. Any suggestions
>would be greatly appreciated.
>
>
>
>Thanks,
>
>Ravi
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html>and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Ravi,
Like Ista, I have also had success in using R with OpenBLAS on our Linux
compute cluster.
This will involve installing R from source. If you'd like, I can send you
the configure settings I used, as well as the environment settings used to
control OpenBLAS multithreading. The CRAN site also has some good
documentation of this.
Note that I had issues using the latest version of OpenBLAS (0.3.x) with R
3.5.1. I'm not sure if you would have the same issues as me, but to be safe
I would suggest using OpenBLAS 0.2.x instead.
Peter
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Ista,
Thank you for the response. I use Windows. Is there a precompiled version of openBLAS for windows that would make it easy for me to use it?
Thanks,
Ravi
Original Message
From: Ista Zahn < [hidden email]>
Sent: Friday, August 10, 2018 12:20 PM
To: Ravi Varadhan < [hidden email]>
Cc: [hidden email]
Subject: Re: [R] Fast matrix multiplication
Hi Ravi,
You can achieve substantial speed up by using a faster BLAS (e.g., OpenBLAS or MKL), especially on systems with multiple CPUs. On my (6 year old, but 8 core) system your example takes 3.9 seconds with using the reference BLAS and only 0.9 seconds using OpenBLAS.
Best,
Ista
On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan < [hidden email]> wrote:
>
> Hi,
>
> I would like to compute: A %*% B %*% t(A)
>
>
>
> A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix, where m is large relative to n (e.g., m=50,000 and n=100).
>
>
>
> Here is a sample code.
>
>
>
> M < 10000
>
> N < 100
>
> A < matrix(rnorm(M*N), M, N)
>
> B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric
> positivedefinite matrix
>
>
>
> # method 1
>
> system.time(D < A %*% B %*% t(A))
>
>
>
> # I can obtain speedup by using a Cholesky decomposition of B
>
> # method 2
>
> system.time({
>
> C < t(chol(B))
>
> E < tcrossprod(A%*%C)
>
> })
>
>
>
> all.equal(D, E)
>
>
>
> I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated.
>
>
>
> Thanks,
>
> Ravi
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Mon, Aug 13, 2018 at 2:41 PM Ravi Varadhan < [hidden email]> wrote:
>
> Hi Ista,
> Thank you for the response. I use Windows. Is there a precompiled version of openBLAS for windows that would make it easy for me to use it?
Not sure. If you want an easy way I would use MRO. More info at
https://mran.microsoft.com/rro#intelmkl1Ista
> Thanks,
> Ravi
>
> Original Message
> From: Ista Zahn < [hidden email]>
> Sent: Friday, August 10, 2018 12:20 PM
> To: Ravi Varadhan < [hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] Fast matrix multiplication
>
>
> Hi Ravi,
>
> You can achieve substantial speed up by using a faster BLAS (e.g., OpenBLAS or MKL), especially on systems with multiple CPUs. On my (6 year old, but 8 core) system your example takes 3.9 seconds with using the reference BLAS and only 0.9 seconds using OpenBLAS.
>
> Best,
> Ista
> On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan < [hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to compute: A %*% B %*% t(A)
> >
> >
> >
> > A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix, where m is large relative to n (e.g., m=50,000 and n=100).
> >
> >
> >
> > Here is a sample code.
> >
> >
> >
> > M < 10000
> >
> > N < 100
> >
> > A < matrix(rnorm(M*N), M, N)
> >
> > B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric
> > positivedefinite matrix
> >
> >
> >
> > # method 1
> >
> > system.time(D < A %*% B %*% t(A))
> >
> >
> >
> > # I can obtain speedup by using a Cholesky decomposition of B
> >
> > # method 2
> >
> > system.time({
> >
> > C < t(chol(B))
> >
> > E < tcrossprod(A%*%C)
> >
> > })
> >
> >
> >
> > all.equal(D, E)
> >
> >
> >
> > I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated.
> >
> >
> >
> > Thanks,
> >
> > Ravi
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Mon, Aug 13, 2018 at 12:18 PM Ista Zahn < [hidden email]> wrote:
>
> On Mon, Aug 13, 2018 at 2:41 PM Ravi Varadhan < [hidden email]> wrote:
> >
> > Hi Ista,
> > Thank you for the response. I use Windows. Is there a precompiled version of openBLAS for windows that would make it easy for me to use it?
>
> Not sure. If you want an easy way I would use MRO. More info at
> https://mran.microsoft.com/rro#intelmkl1OpenBLAS is provided as a binary for Windows, see http://www.openblas.net/ .
You may need to compile R from source though, unless you can use an
equivalent of the linux trick to replace libRblas.so with a symlink to
the compiled openBLAS library.
Peter
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Does Microsoft open R come with precompiled BLAS that is optimized for matrix computations?
Thanks,
Ravi
Original Message
From: Ista Zahn < [hidden email]>
Sent: Monday, August 13, 2018 3:18 PM
To: Ravi Varadhan < [hidden email]>
Cc: [hidden email]
Subject: Re: [R] Fast matrix multiplication
On Mon, Aug 13, 2018 at 2:41 PM Ravi Varadhan < [hidden email]> wrote:
>
> Hi Ista,
> Thank you for the response. I use Windows. Is there a precompiled version of openBLAS for windows that would make it easy for me to use it?
Not sure. If you want an easy way I would use MRO. More info at
https://mran.microsoft.com/rro#intelmkl1Ista
> Thanks,
> Ravi
>
> Original Message
> From: Ista Zahn < [hidden email]>
> Sent: Friday, August 10, 2018 12:20 PM
> To: Ravi Varadhan < [hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] Fast matrix multiplication
>
>
> Hi Ravi,
>
> You can achieve substantial speed up by using a faster BLAS (e.g., OpenBLAS or MKL), especially on systems with multiple CPUs. On my (6 year old, but 8 core) system your example takes 3.9 seconds with using the reference BLAS and only 0.9 seconds using OpenBLAS.
>
> Best,
> Ista
> On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan < [hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to compute: A %*% B %*% t(A)
> >
> >
> >
> > A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix, where m is large relative to n (e.g., m=50,000 and n=100).
> >
> >
> >
> > Here is a sample code.
> >
> >
> >
> > M < 10000
> >
> > N < 100
> >
> > A < matrix(rnorm(M*N), M, N)
> >
> > B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric
> > positivedefinite matrix
> >
> >
> >
> > # method 1
> >
> > system.time(D < A %*% B %*% t(A))
> >
> >
> >
> > # I can obtain speedup by using a Cholesky decomposition of B
> >
> > # method 2
> >
> > system.time({
> >
> > C < t(chol(B))
> >
> > E < tcrossprod(A%*%C)
> >
> > })
> >
> >
> >
> > all.equal(D, E)
> >
> >
> >
> > I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated.
> >
> >
> >
> > Thanks,
> >
> > Ravi
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Tue, Aug 14, 2018 at 9:41 AM Ravi Varadhan < [hidden email]> wrote:
>
> Does Microsoft open R come with precompiled BLAS that is optimized for matrix computations?
Yes, see https://mran.microsoft.com/rro#intelmkl1 for details.
Ista
>
> Thanks,
> Ravi
>
> Original Message
> From: Ista Zahn < [hidden email]>
> Sent: Monday, August 13, 2018 3:18 PM
> To: Ravi Varadhan < [hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] Fast matrix multiplication
>
>
> On Mon, Aug 13, 2018 at 2:41 PM Ravi Varadhan < [hidden email]> wrote:
> >
> > Hi Ista,
> > Thank you for the response. I use Windows. Is there a precompiled version of openBLAS for windows that would make it easy for me to use it?
>
> Not sure. If you want an easy way I would use MRO. More info at
> https://mran.microsoft.com/rro#intelmkl1>
> Ista
>
> > Thanks,
> > Ravi
> >
> > Original Message
> > From: Ista Zahn < [hidden email]>
> > Sent: Friday, August 10, 2018 12:20 PM
> > To: Ravi Varadhan < [hidden email]>
> > Cc: [hidden email]
> > Subject: Re: [R] Fast matrix multiplication
> >
> >
> > Hi Ravi,
> >
> > You can achieve substantial speed up by using a faster BLAS (e.g., OpenBLAS or MKL), especially on systems with multiple CPUs. On my (6 year old, but 8 core) system your example takes 3.9 seconds with using the reference BLAS and only 0.9 seconds using OpenBLAS.
> >
> > Best,
> > Ista
> > On Fri, Aug 10, 2018 at 11:46 AM Ravi Varadhan < [hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > I would like to compute: A %*% B %*% t(A)
> > >
> > >
> > >
> > > A is a mxn matrix and B is an nxn symmetric, positivedefinite matrix, where m is large relative to n (e.g., m=50,000 and n=100).
> > >
> > >
> > >
> > > Here is a sample code.
> > >
> > >
> > >
> > > M < 10000
> > >
> > > N < 100
> > >
> > > A < matrix(rnorm(M*N), M, N)
> > >
> > > B < crossprod(matrix(rnorm(N*N), N, N)) # creating a symmetric
> > > positivedefinite matrix
> > >
> > >
> > >
> > > # method 1
> > >
> > > system.time(D < A %*% B %*% t(A))
> > >
> > >
> > >
> > > # I can obtain speedup by using a Cholesky decomposition of B
> > >
> > > # method 2
> > >
> > > system.time({
> > >
> > > C < t(chol(B))
> > >
> > > E < tcrossprod(A%*%C)
> > >
> > > })
> > >
> > >
> > >
> > > all.equal(D, E)
> > >
> > >
> > >
> > > I am wondering how to obtain more substantial speedup. Any suggestions would be greatly appreciated.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Ravi
> > >
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/rhelp> > > PLEASE do read the posting guide
> > > http://www.Rproject.org/postingguide.html> > > and provide commented, minimal, selfcontained, reproducible code.
> >
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

