# pmin(), pmax() - slower than necessary for common cases

1 message
Open this post in threaded view
|

## pmin(), pmax() - slower than necessary for common cases

 A few hours ago, I was making a small point on the R-SIG-robust mailing list on the point that  ifelse() was not too efficient in a situation where  pmax() could easily be used instead. However, this has reminded me of some timing experiments that I did 13 years ago with S-plus -- where I found that pmin() / pmax() were really relatively slow for the most common case where they are used with only two arguments {and typically one of the arguments is a scalar; but that's not even important here}. The main reason is that the function accept an arbitrary number of arguments and that they do recycling. Their source is at   https://svn.R-project.org/R/trunk/src/library/base/R/pmax.RIn April 2001 (as I see), I had repeated my timings with R (1.2.2) which confirmed the picture more or less,  but for some reason I never drew "proper" consequences of my findings. Of course one can argue  pmax() & pmin() are still quite fast functions; OTOH the experiment below shows that -- at least the special case with 2 (matching) arguments could be made faster by about a factor of 19 ... I don't have yet a constructive proposition; just note the fact that   pmin. <- function(k,x) (x+k - abs(x-k))/2   pmax. <- function(k,x) (x+k + abs(x-k))/2 are probably the fastest way of computing  pmin() and pmax() of two arguments {yes, they "suffer" from rounding error of about 1 to 2 bits...} currently in R. One "solution" could be to provide  pmin2() and pmax2() functions based on trival .Internal() versions. The experiments below are for the special case of  k=0  where I found the above mentioned factor of 19 which is a bit overoptimistic for the general case; here is my  pmax-ex.R  source file (as text/plain attachment ASCII-code --> easy cut & paste) demonstrating what I claim above. #### Martin Maechler, Aug.1992 (S+ 3.x) --- the same applies to R 1.2.2 #### #### Observation:  (abs(x) + x) / 2  is  MUCH faster than  pmax(0,x) !! #### #### The function  pmax.fast below is  very slightly slower than (|x|+x)/2 ### "this" directory  --- adapt! thisDir <- "/u/maechler/R/MM/MISC/Speed" ### For R's source, ###      egrep 'pm(ax|in)'  src/library/*/R/*.R ### shows you that most uses of  pmax() / pmin() are really just with arguments ###  ( ,  )  where the fast versions would be much better! pmax.fast <- function(scalar, vector) {  ## Purpose: FAST substitute for pmax(s, v) when length(s) == 1  ## Author: Martin Maechler, Date: 21 Aug 1992 (for S)  vector[ scalar > vector ] <- scalar  vector } ## 2 things: ##   1) the above also works when 'scalar' == vector of same length ##   2) The following is even (quite a bit!) faster : pmin. <- function(k,x) (x+k - abs(x-k))/2 pmax. <- function(k,x) (x+k + abs(x-k))/2 ### The following are small scale timing simulations which confirm: N <- 20 ## number of experiment replications kinds <- c("abs", "[.>.]<-", "Fpmax", "pmax0.", "pmax.0") Tmat <- matrix(NA, nrow = N, ncol = length(kinds),                dimnames = list(NULL, kinds)) T0 <- Tmat[,1] # `control group' n.inner <- 800 # should depend on the speed of your R / S  (i.e. CPU) set.seed(101) for(k in 1:N) {     cat(k,"")     ## no longer set.seed(101)     x <- rnorm(1000)     Tmat[k, "pmax0."] <- system.time(for(i in 1:n.inner)y <- pmax(0,x))[1]     Tmat[k, "pmax.0"] <- system.time(for(i in 1:n.inner)y <- pmax(x,0))[1]     Tmat[k,    "abs"] <- system.time(for(i in 1:n.inner)y <- (x + abs(x))/2)[1]     Tmat[k,"[.>.]<-"] <- system.time(for(i in 1:n.inner)y <-{x[0 > x] <- 0;x})[1]     Tmat[k,  "Fpmax"] <- system.time(for(i in 1:n.inner)y <- pmax.fast(0,x))[1] } save(Tmat, file = file.path(thisDir, "pmax-Tmat.rda")) ###-- Restart here {saving simulation/timing}: if(!exists("Tmat"))     load(file.path(thisDir, "pmax-Tmat.rda")) (Tm <- apply(Tmat, 2, mean, trim = .1)) ##      abs  [.>.]<-    Fpmax   pmax0.   pmax.0 ## 0.025625 0.078750 0.077500 0.488125 0.511250 round(100 * Tm / Tm[1]) ##     abs [.>.]<-   Fpmax  pmax0.  pmax.0 ##     100     307     302    1905    1995 ## earlier: ##  abs [.>.]<-   Fpmax  pmax0.  pmax.0 ##  100     289     344    1804    1884 ## pmax0. is really a bit faster than pmax.0 : ## P < .001 (for Wilcoxon; outliers!) t.test(Tmat[,4], Tmat[,5], paired = TRUE) t.test(Tmat[,4], Tmat[,5], paired = FALSE)# since random samples wilcox.test(Tmat[,4], Tmat[,5])# P = 0.00012 {but ties -> doubt} boxplot(data.frame(Tmat, check.names = FALSE),         notch = TRUE, ylim = range(0,Tmat),         main = "CPU times used for versions of pmax(0,x)") mtext(paste("x <- rnorm(1000)","  ", N, "replicates"), side = 3) mtext(paste(R.version.string, file.path(thisDir, "pmax-ex.R")),       side = 4, cex =.8, adj = 0) ## see if trimmed means were robust enough: points(Tm, col = 2, cex = 1.5, pch = "X") mtext("X : mean( * , trim = 0.10)", side = 1, line = -2.5, col = 2) mtext(paste(round(100 * Tm / Tm[1]),"%"), side = 1, line = -1.2, col = 2,       at = 1:5) Martin Maechler, ETH Zurich______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel