> On Mon, Mar 9, 2015 at 10:40 AM, Duncan Murdoch

> <

[hidden email]> wrote:

> It's now on the main site at CRAN, and should propagate to the mirrors

> reasonably quickly. I'm hoping that tomorrow's R-devel build will use it,

> but there may be some last minute problems.

Using Rtools 3.3, once it propagated through the cran servers, I have

successfully built a 64-bit version of R on Windows 7, up through make

rinstaller. This one includes using ICU_531, and also includes linking

to 64-bit OpenBLAS 2.13 (4 threads).

As with yesterday's build using 4.9.2.-seh (although that one left ICU

out) the only issue that seems to have failed in make check-all is the

internet connectivity, which is disabled by default. Loading R and

passing setinternet2() fixes that, and I plan on using the options

built into the installer I create to have that set at install (like

SDI). Is it at all possible to have that setting exposed in

Mkrules.dist so as to be set at compile?

I also built microbenchmark, which requires packages â€˜colorspaceâ€™,

â€˜Rcppâ€™, â€˜stringrâ€™, â€˜RColorBrewerâ€™, â€˜dichromatâ€™, â€˜munsellâ€™, â€˜labelingâ€™,

â€˜plyrâ€™, â€˜digestâ€™, â€˜gtableâ€™, â€˜reshape2â€™, â€˜scalesâ€™, â€˜protoâ€™, and

â€˜ggplot2â€™, and they all worked fine. For what it is worth, I forgot to

uncomment (unhash) Hsiu-Khuern's addition to the NM filter, yet Rcpp

built fine and compiled C++ code fine as well, although about 3%-5%

slower than what I recall from last night's seh version.

So, outside of this hiccup with somehow now needing internet2 (which

may have to do with microsoft Windows patches for all I know) which

cannot be set at default, it seems as if the toolchain is behaving

well! I have not tried building with curl, though; that looks a bit

more hairy, although it may address the internet2 issue, who knows.

For interest sakes, below is a comparison of speed across various

versions/compilers which may prove of interest. The takeaway for me is

that for matrix code a fast BLAS is significantly more important than

which version of GCC and exception handling is used. For non-BLAS

specific code, at least on my machine, the SJLJ performed about 1%â€“2%

*faster*. Go figure! Maybe someone will run Simon Urbanek's benchmark

against them.

Regardless, I'm much less apprehensive about 3.2's release in April.

Thank you, Duncan and all!

Avi

== Speed results compiled over a few months (except for the last two) ==

For the record, all code run on an Intel i7-2600K overclocked to

4.6Ghz, 16GB RAM, Windows 7 64bit Matrices A and B are 1000x1000 dense

matrices, of which A is positive semi-definite and B is not. I use

this to test BLAS builds. I hope that the fixed width works in plain

text model.

=== Non-BLAS dependent ===

#Test code

library(microbenchmark)

A <- as.matrix(read.csv(file="F:/R/A.csv", colClasses='numeric'))

B <- as.matrix(read.csv(file="F:/R/B.csv", colClasses='numeric'))

colnames(A) <- colnames(B) <- NULL

Z <- microbenchmark(A + 2, A - 2, A * 2, A / 2, A + B, A - B, A * B, A

/ B, A ^ 2, sqrt(A), control=list(order = 'block'), times = 1000L)

R-devel_2015-03-08 compiled using

x86_64-4.9.2-release-win32-seh-rt_v3-rev1 (EOPTS = -O3 -march=native

-mfpmath=sse -msse2avx -mavx256-split-unaligned-load

-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)

OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC

4.9.1 (MinGW-64)

Unit: microseconds

expr min lq mean median uq max neval

A + 2 923.001 1844.215 2205.385 1858.957 1990.900 21714.18 1000

A - 2 1742.652 1830.215 2196.901 1844.810 2507.798 21778.22 1000

A * 2 1743.247 1843.023 2208.374 1860.298 2547.112 21776.43 1000

A/2 2025.598 2111.375 2438.503 2122.097 2701.243 22034.06 1000

A + B 2016.662 2124.182 2554.006 2143.690 2948.896 21964.07 1000

A - B 2004.153 2103.930 2527.219 2128.203 2982.552 22295.27 1000

A * B 2023.215 2119.715 2540.680 2141.010 3154.553 22074.27 1000

A/B 3256.265 3354.700 3633.556 3368.252 3953.950 23189.67 1000

A^2 1745.332 1835.279 2204.023 1850.469 2554.856 21869.66 1000

sqrt(A) 49945.064 50066.434 50506.344 50187.356 50883.403 70006.25 1000

R-devel_2015-03-09 compiled using Rtools 3.3 (GCC 4.9.2, SJLJ, EOPTS =

-O3 -march=native -mfpmath=sse -msse2avx -mavx256-split-unaligned-load

-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)

OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC

4.9.1 (MinGW-64)

Unit: microseconds

expr min lq mean median uq max neval

A + 2 925.980 1777.350 2167.326 1791.795 2384.641 21660.28 1000

A - 2 1673.256 1777.648 2188.756 1806.687 2670.715 21724.01 1000

A * 2 1680.999 1786.434 2221.432 1835.130 2766.916 22254.16 1000

A/2 1992.836 2085.165 2450.455 2108.694 2865.203 22803.08 1000

A + B 1977.646 2089.632 2559.912 2121.204 3031.397 22884.99 1000

A - B 1979.135 2081.591 2516.943 2101.398 3003.548 22377.77 1000

A * B 1971.689 2073.699 2510.912 2092.462 2921.345 22308.37 1000

A/B 3247.031 3345.169 3633.351 3361.402 3941.590 23231.97 1000

A^2 1668.788 1771.244 2169.422 1788.220 2745.026 21786.86 1000

sqrt(A) 48662.871 48805.537 49357.270 49003.003 49715.283 69269.10 1000

=== BLAS dependent code (statistics gathered over a few months ===

#Test code

library(microbenchmark)

library(Matrix)

A <- as.matrix(read.csv(file="F:/R/A.csv", colClasses='numeric'))

B <- as.matrix(read.csv(file="F:/R/B.csv", colClasses='numeric'))

colnames(A) <- colnames(B) <- NULL

Z <- microbenchmark(

sort(A),

t(A) %*% B,

crossprod(A, B),

solve(A),

solve(A, diag(A)),

chol(A),

chol(B, pivot = TRUE),

qr(A, LAPACK=TRUE),

svd(A),

eigen(A, symmetric = TRUE),

eigen(A, symmetric = FALSE),

eigen(B, symmetric = FALSE),

lu(A),

fft(A),

times=100L, unit='ms', control = list(order = 'block'))

REFERENCE 3.1.1 compiled using Rtools 3.1 (GCC 4.6.3, default EOPTS flags)

reference BLAS

Unit: milliseconds

expr min lq mean

median uq max neval

sort(A) 89.364120 90.760662 95.096270

91.561537 92.573725 154.081306 100

t(A) %*% B 463.145756 470.406496 487.680120

474.872066 490.043866 642.640917 100

crossprod(A, B) 727.114903 729.128111 730.031458

729.785877 731.120320 733.078130 100

solve(A) 600.629979 604.814394 630.598703

608.606561 658.326032 662.879314 100

solve(A, diag(A)) 145.738089 146.774104 147.629655

147.959780 148.371535 148.883512 100

chol(A) 115.873110 116.019644 117.347118

116.212938 118.026150 172.853468 100

chol(B, pivot = TRUE) 2.415134 2.548564 3.227905

2.559286 4.568473 4.689393 100

qr(A, LAPACK = TRUE) 414.455301 416.033671 418.583569

416.972741 417.814271 473.541941 100

svd(A) 1952.765952 1957.070246 1974.547371

1959.374735 2010.263499 2017.405106 100

eigen(A, symmetric = TRUE) 917.120317 920.482414 923.423802

921.784990 924.577926 980.692929 100

eigen(A, symmetric = FALSE) 2981.049436 2985.640691 3007.526012

2991.149276 3014.926832 3130.924137 100

eigen(B, symmetric = FALSE) 3964.874086 3974.978839 3999.080880

3991.973829 4019.799690 4078.083071 100

lu(A) 137.437464 138.229850 141.696849

138.906528 142.217546 198.202991 100

fft(A) 109.981065 110.321042 111.753592

110.640916 111.268152 116.670410 100

3.1.2 compiled using Rtools 3.2 (GCC 4.6.3, EOPTS = -march=native -O3

-std=gnu++0x -msse2avx -mavx256-split-unaligned-load

-mavx256-split-unaligned-store -mvzeroupper --param

l1-cache-line-size=64 --param l1-cache-size=64 --param

l2-cache-size=256)

OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC

4.9.1 (MinGW-64)

Unit: milliseconds

expr min lq mean

median uq max neval

sort(A) 88.771066 89.748265 94.542642

90.596947 91.482709 149.171214 100

t(A) %*% B 27.507195 33.359067 40.378088

37.689446 41.512909 96.868916 100

crossprod(A, B) 17.783759 22.327538 26.787467

27.059399 31.918288 36.209055 100

solve(A) 45.964657 54.856090 80.761447

60.499775 109.150759 118.817308 100

solve(A, diag(A)) 24.704266 26.370058 26.805694

26.936840 27.400868 29.522052 100

chol(A) 6.762058 7.088337 8.725137

8.145653 8.973040 65.570275 100

chol(B, pivot = TRUE) 2.558110 2.702412 3.481314

2.831076 4.789643 5.346446 100

qr(A, LAPACK = TRUE) 78.757538 81.620631 85.132413

82.940043 85.099350 141.434937 100

svd(A) 361.539846 366.637747 386.533779

370.769323 421.736275 445.087770 100

eigen(A, symmetric = TRUE) 174.249560 180.402841 186.649060

182.628715 188.931063 241.414148 100

eigen(A, symmetric = FALSE) 734.881721 744.303748 772.203936

751.104077 795.883051 915.351575 100

eigen(B, symmetric = FALSE) 2522.750166 2551.112148 2596.798329

2581.940655 2633.440287 2861.722717 100

lu(A) 20.277535 21.227185 25.068971

23.319926 25.130468 84.837552 100

fft(A) 109.757747 110.347313 112.123488

110.725415 114.057152 120.250492 100

R-devel_2015-03-09 compiled using Rtools 3.3 (GCC 4.9.2, SJLJ, EOPTS =

-O3 -march=native -mfpmath=sse -msse2avx -mavx256-split-unaligned-load

-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)

OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC

4.9.1 (MinGW-64)

Unit: milliseconds

expr min lq mean

median uq max neval

sort(A) 88.025153 88.255828 92.701967

89.571826 90.320888 146.40380 100

t(A) %*% B 26.471552 30.866301 35.293662

34.069253 38.490212 85.57007 100

crossprod(A, B) 17.606699 17.898879 23.999433

22.228699 28.620007 37.06744 100

solve(A) 43.410199 48.448279 54.914690

51.338798 55.865639 116.81746 100

solve(A, diag(A)) 24.655633 25.414227 27.692980

27.301179 28.757458 38.95692 100

chol(A) 6.620942 6.891379 8.010618

7.474695 8.233586 12.62357 100

chol(B, pivot = TRUE) 2.456867 2.541751 3.737836

2.575556 2.722390 61.46246 100

qr(A, LAPACK = TRUE) 78.153905 80.980389 83.663278

82.458112 84.998671 101.89696 100

svd(A) 353.204099 365.191932 390.446252

377.001957 417.792818 475.73975 100

eigen(A, symmetric = TRUE) 173.627391 177.985954 186.068097

182.131711 187.866286 251.19902 100

eigen(A, symmetric = FALSE) 771.643075 788.242038 813.902106

801.689427 839.380539 921.24119 100

eigen(B, symmetric = FALSE) 2591.501370 2644.449833 2691.339277

2678.241053 2722.924657 2935.76884 100

lu(A) 19.969747 20.959164 24.298874

22.426017 24.017664 81.95253 100

fft(A) 106.862816 107.191480 108.985064

107.466682 110.465762 115.73511 100

R-devel_2015-03-08 compiled using

x86_64-4.9.2-release-win32-seh-rt_v3-rev1 (EOPTS = -O3 -march=native

-mfpmath=sse -msse2avx -mavx256-split-unaligned-load

-mavx256-split-unaligned-store -mvzeroupper -std=gnu++11 -pipe)

OpenBLAS 2.13 - Multi-threaded (max 4 threads) - compiled under GCC

4.9.1 (MinGW-64)

Unit: milliseconds

expr min lq mean

median uq max neval

sort(A) 88.372432 88.811892 93.321491

90.093638 90.754540 150.02760 100

t(A) %*% B 26.583837 30.443074 34.765044

33.903505 37.455374 82.54761 100

crossprod(A, B) 17.715707 22.088566 26.875521

27.185023 31.154311 36.72850 100

solve(A) 44.112203 49.217298 55.707862

52.651668 57.331152 116.44069 100

solve(A, diag(A)) 24.891819 25.468731 27.590369

27.302520 29.217172 37.90168 100

chol(A) 6.658469 6.872168 7.893779

7.058167 8.968203 13.32230 100

chol(B, pivot = TRUE) 2.451208 2.529540 3.742339

2.578981 2.646143 62.62224 100

qr(A, LAPACK = TRUE) 78.839230 80.413602 82.989497

81.778148 84.447373 98.13199 100

svd(A) 352.931278 362.746235 387.952468

374.631166 415.481743 500.52405 100

eigen(A, symmetric = TRUE) 172.696946 178.109557 187.816872

181.375053 190.414291 256.44276 100

eigen(A, symmetric = FALSE) 778.904964 793.941318 820.598107

812.244809 841.944627 919.02527 100

eigen(B, symmetric = FALSE) 2494.645617 2514.200623 2562.484197

2561.112354 2586.092481 2806.00525 100

lu(A) 19.762154 20.663114 24.555941

22.403382 24.369411 80.98218 100

fft(A) 106.374956 107.120148 108.625520

107.433176 108.786850 116.43563 100

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel