Revert to R 3.2.x code of logicalSubscript in subscript.c?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Revert to R 3.2.x code of logicalSubscript in subscript.c?

R devel mailing list
Currently, in function 'logicalSubscript' in subscript.c, the case of no recycling is handled like the implentation of R function 'which'. It passes through the data only once, but uses more memory. It is since R 3.3.0. For the case of recycling, two passes are done, first to get number of elements in the result.

Also since R 3.3.0, function 'makeSubscript' in subscript.c doesn't call 'duplicate' on logical index vector.

A side note: I guess that it is safe not to call 'duplicate' on logical index vector, even if it is the one being modified in subassignment, because it is converted to positive indices before use in extraction or replacement. If so, isn't it true for character index vector as well?

Here are examples of subsetting a numeric vector of length 10^8 with logical index vector, inspired by Hong Ooi's answer in https://stackoverflow.com/questions/17510778/why-is-subsetting-on-a-logical-type-slower-than-subsetting-on-numeric-type . I presents two extreme cases, each with no-recycling and recycling versions that convert to the same positive indices. Difference between the two versions can be attributed to function 'logicalSubscript'.

Example 1: select none
x <- numeric(1e8)
i <- rep(FALSE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- FALSE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed
  0.083   0.000   0.083
   user  system elapsed
  0.085   0.000   0.085
   user  system elapsed
  0.144   0.000   0.144
   user  system elapsed
  0.143   0.000   0.144

Example 2: select all
x <- numeric(1e8)
i <- rep(TRUE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- TRUE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed
  0.538   0.741   1.292
   user  system elapsed
  0.506   0.668   1.175
   user  system elapsed
  0.448   0.534   0.986
   user  system elapsed
  0.431   0.528   0.960

The results were from R 3.3.2 on http://rextester.com/l/r_online_compiler . The no-recycling version took longer time than the recycling version for example 2, where more time was taken in both versions.

Function 'logicalSubscript' in subscript.c in R 3.2.x also use a faster code for the case of no recycling, but does two passes in all cases. Treatment for the case of recycling is identical with current code.

Function 'logicalSubscript' in subscript.c affects subsetting with negative indices, because negative indices are converted first to a logical index vector with the same length as the vector (no recycling).

Example, comparing times of x[-1] and its equivalent, x[2:length(x)] :
x <- numeric(1e8)
system.time(x[-1])
system.time(x[-1])
system.time(x[2:length(x)])
system.time(x[2:length(x)])

Output from R 3.3.2 on http://rextester.com/l/r_online_compiler :
   user  system elapsed
  0.591   0.903   1.515
   user  system elapsed
  0.558   0.822   1.384
   user  system elapsed
  0.620   0.659   1.285
   user  system elapsed
  0.607   0.663   1.274

Output from R 3.2.2 in Zenppelin Notebook, https://my.datascientistworkbench.com/tools/zeppelin-notebook/ :
user  system elapsed
  1.156   1.636   2.794
   user  system elapsed
  0.884   1.528   2.413
   user  system elapsed
  0.932   1.544   2.476
   user  system elapsed
  0.932   1.584   2.519

From above, apparently, x[-1] consistently took longer time than x[2:length(x)] with R 3.3.2, but not with R 3.2.2.

So, how about reverting to R 3.2.x code of function 'logicalSubscript' in subscript.c?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Revert to R 3.2.x code of logicalSubscript in subscript.c?

Radford Neal
Suharto,

If you're interested in performance with subscripting, you might want
to look at pqR (pqR-project.org).  It has some substantial performance
improvements for subscripting over R Core versions.  This is
especially true for the current development version of pqR (probably
leading to a new release in about a month).

You can look at a somewhat-stable snapshot of recent pqR development at

  https://github.com/radfordneal/pqR/tree/05e32fa6

In particular, src/main/subscript.c might be of interest.

Note that you should read mods-dir/README if you want to build this,
and in particular, you need to run create-configure in the top-level
source directory first.

I modified your tests a bit, including producing versions using both
vectors of length 1e8 like you did (which will not fit in cache) and
vectors of length 1e5 (which will fit in at least the L3 cache).  I
ran tests on an Intel Skylake processor (E3-1270v5 @ 3.6GHz), using
gcc 7.2 with -O3 -march=native -mtune=native.

I got the following results with R-3.4.2 (with R_ENABLE_JIT=0, which
is slightly faster than using the JIT compiler):

R-3.4.2, LARGE VECTORS:

  > N <- 1e8; R <- 5
  > #N <- 1e5; R <- 1000
  >
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.296   0.000   0.297
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.416   0.000   0.418
  >
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    1.416   0.352   1.773
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    1.348   0.264   1.613
  >
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed
    1.516   0.376   1.895
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed
    1.516   0.308   1.827
  >
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed
    1.416   0.268   1.689

R-3.4.2, SMALL VECTORS:

  > #N <- 1e8; R <- 5
  > N <- 1e5; R <- 1000
  >
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.088   0.000   0.089
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.084   0.000   0.084
  >
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.492   0.020   0.515
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.408   0.008   0.420
  >
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed
    0.508   0.004   0.516
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed
    0.464   0.008   0.473
  >
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed
    0.428   0.000   0.428

Here are the results with the development version of pqR (uncompressed
pointers, no byte compilation):

pqR (devel), LARGE VECTORS:

  > N <- 1e8; R <- 5
  > #N <- 1e5; R <- 1000
  >
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.192   0.000   0.193
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.436   0.000   0.434
  >
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.768   0.216   0.988
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.832   0.272   1.105
  >
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed
    0.280   0.156   0.435
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed
    0.252   0.184   0.436
  >
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed
    0.828   0.168   0.998

pqR (devel), SMALL VECTORS:

  > #N <- 1e8; R <- 5
  > N <- 1e5; R <- 1000
  >
  > x <- numeric(N)
  > i <- rep(FALSE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.040   0.000   0.038
  > i <- FALSE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.084   0.000   0.087
  >
  > x <- numeric(N)
  > i <- rep(TRUE, length(x))# no reycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.156   0.036   0.192
  > i <- TRUE# recycling
  > system.time(for (r in 1:R) a <- x[i])
     user  system elapsed
    0.184   0.012   0.195
  >
  > x <- numeric(N)
  > system.time(for (r in 1:R) a <- x[-1])
     user  system elapsed
    0.060   0.012   0.075
  > system.time(for (r in 1:R) a <- x[2:length(x)])
     user  system elapsed
    0.052   0.024   0.075
  >
  > v <- 2:length(x)
  > system.time(for (r in 1:R) a <- x[v])
     user  system elapsed
    0.180   0.004   0.182

Summarizing elapsed times:

  LARGE VECTORS   T1     T2     T3     T4     T5     T6     T7  

  R-3.4.2:      0.297  0.418  1.773  1.613  1.895  1.827  1.689
  pqR dev:      0.193  0.434  0.988  1.105  0.435  0.436  0.998

  SMALL VECTORS   T1     T2     T3     T4     T5     T6     T7  

  R-3.4.2:      0.089  0.084  0.515  0.420  0.516  0.473  0.428
  pqR dev:      0.038  0.087  0.192  0.195  0.075  0.075  0.182

As one can see, pqR is substantially faster for all except T2 (where
it's about the same).  The very large advantage of pqR on T5 and T6 is
partly because pqR has special code for efficiently handling things
like x[-1] and x[2:length(x)], so I added the x[v] test to see what
performance is like when this special handling isn't invoked.

There's no particular reason pqR's code for these operations couldn't
be adapted for use in the R Core implementaton, though there are
probably a few issues involving large vectors, and the special
handling of x[2:length(x)] would require implementing pqR's internal
"variant result" mechanism.  pqR also has much faster code for some
other subset and subset assignment operations.

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel