pre-allocation not always a timesaver

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

pre-allocation not always a timesaver

Ross Boylan
The R Inferno advises that if you are building up results in pieces it's
best to pre-allocate the result object and fill it in.  In some testing,
I see a benefit with this strategy for regular variables.  However, when
the results are held by a class, the opposite seems to be the case.

Comments?  Explanations?

Possibly for classes any update causes the entire object to be
replaced--perhaps to trigger the validation machinery?--and so
preallocation simply means on average a bigger object is being
manipulated.

Here is some test code, with CPU seconds given in the comments.  I tried
everything twice in case there was some "first-time" overhead such as
growing total memory in the image.  When the 2 times differed noticeably
I reported both values.

# class definitions
refbase <- setRefClass("refBase", fields = list(dispatch="ANY", myx="ANY"),
                       methods = list( initialize = function(x0=NULL, ...) {
                           usingMethods("foo")
                           dispatch <<- foo
                           myx <<- x0                                                                                                                                                                                                                                  
                       }
# some irrelevant methods edited out
                       ))

myclass <- setClass("simple", representation=list(myx="ANY"))

### Method 1: regular variables
pre <- function(n, j=1000) {
    x <- array(dim=(c(j, n)))
    for (i in 1:n) {
        x[,i] <- rnorm(j)
    }
    x
}
system.time(pre(1000)) #0.3s                                                                                                                                                                                                                                                  

nopre <- function(n, j=1000) {
    x <- numeric(0)
    for (i in 1:n)
        x <- c(x, rnorm(j))
    x
}

system.time(nopre(1000))  # 2.0s, 2.7s                                                                                                                                                                                                                                        

# Method 2: with ref class                                                                                                                                                                                                                                                              
pre2 <- function(n, j=1000) {
    a <- refbase(x0=numeric(0))
    a$myx <- array(dim=c(j, n))
    for (i in 1:n) {
        a$myx[,i] <- rnorm(j)
    }
    a$myx
}
system.time(pre2(1000)) # 4.0 s                                                                                                                                                                                                                                                

nopre2 <- function(n, j=1000) {
    a <- refbase(x0=numeric(0))
    for (i in 1:n)
        a$myx <- c(a$myx, rnorm(j))
    a$myx
}
system.time(nopre2(1000)) # 2.9s, 4.3                                                                                                                                                                                                                                          

# Method 3: with regular class                                                                                                                                                                                                                                                          
pre3 <- function(n, j=1000) {
    a <- myclass()
    a@myx <- array(dim=c(j, n))
    for (i in 1:n) {
        a@myx[,i] <- rnorm(j)
    }
    a@myx
}
system.time(pre3(1000)) # 7.3 s                                                                                                                                                                                                                                                

nopre3 <- function(n, j=1000) {
    a <- myclass(myx=numeric(0))
    for (i in 1:n)
        a@myx <- c(a@myx, rnorm(j))
    a@myx
}
system.time(nopre3(1000))  # 4.2s

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: pre-allocation not always a timesaver

Henrik Bengtsson-3
I don't think you got a response to this one;

    x <- array(dim=(c(j, n)))
    for (i in 1:n) {
        x[,i] <- rnorm(j)
    }

Note that array() allocates a logical array by default, which means
that in your first iteration (i==1) it has to be coerced to a double
array before assigning the value of rnorm(). That takes time. It also
takes time to garbage collect the "stray" logical array afterward.
Using,

    x <- array(NA_real_, dim=(c(j, n)))
    for (i in 1:n) {
        x[,i] <- rnorm(j)
    }

avoids this.

For updating list elements, you can avoid repetitive overhead from $<-
and $ by replacing:

   a$myx <- array(dim=c(j, n))
    for (i in 1:n) {
       a$myx[,i] <- rnorm(j)
    }
    a$myx

with

   myx <- array(NA_real, dim=c(j, n))
    for (i in 1:n) {
       myx[,i] <- rnorm(j)
    }
    a$myx <- myx
    myx

Similarly for S4 slots and @<- and @.

/Henrik

On Thu, Feb 27, 2014 at 7:53 PM, Ross Boylan <[hidden email]> wrote:

> The R Inferno advises that if you are building up results in pieces it's
> best to pre-allocate the result object and fill it in.  In some testing,
> I see a benefit with this strategy for regular variables.  However, when
> the results are held by a class, the opposite seems to be the case.
>
> Comments?  Explanations?
>
> Possibly for classes any update causes the entire object to be
> replaced--perhaps to trigger the validation machinery?--and so
> preallocation simply means on average a bigger object is being
> manipulated.
>
> Here is some test code, with CPU seconds given in the comments.  I tried
> everything twice in case there was some "first-time" overhead such as
> growing total memory in the image.  When the 2 times differed noticeably
> I reported both values.
>
> # class definitions
> refbase <- setRefClass("refBase", fields = list(dispatch="ANY", myx="ANY"),
>                        methods = list( initialize = function(x0=NULL, ...) {
>                            usingMethods("foo")
>                            dispatch <<- foo
>                            myx <<- x0
>                        }
> # some irrelevant methods edited out
>                        ))
>
> myclass <- setClass("simple", representation=list(myx="ANY"))
>
> ### Method 1: regular variables
> pre <- function(n, j=1000) {
>     x <- array(dim=(c(j, n)))
>     for (i in 1:n) {
>         x[,i] <- rnorm(j)
>     }
>     x
> }
> system.time(pre(1000)) #0.3s
>
> nopre <- function(n, j=1000) {
>     x <- numeric(0)
>     for (i in 1:n)
>         x <- c(x, rnorm(j))
>     x
> }
>
> system.time(nopre(1000))  # 2.0s, 2.7s
>
> # Method 2: with ref class
> pre2 <- function(n, j=1000) {
>     a <- refbase(x0=numeric(0))
>     a$myx <- array(dim=c(j, n))
>     for (i in 1:n) {
>         a$myx[,i] <- rnorm(j)
>     }
>     a$myx
> }
> system.time(pre2(1000)) # 4.0 s
>
> nopre2 <- function(n, j=1000) {
>     a <- refbase(x0=numeric(0))
>     for (i in 1:n)
>         a$myx <- c(a$myx, rnorm(j))
>     a$myx
> }
> system.time(nopre2(1000)) # 2.9s, 4.3
>
> # Method 3: with regular class
> pre3 <- function(n, j=1000) {
>     a <- myclass()
>     a@myx <- array(dim=c(j, n))
>     for (i in 1:n) {
>         a@myx[,i] <- rnorm(j)
>     }
>     a@myx
> }
> system.time(pre3(1000)) # 7.3 s
>
> nopre3 <- function(n, j=1000) {
>     a <- myclass(myx=numeric(0))
>     for (i in 1:n)
>         a@myx <- c(a@myx, rnorm(j))
>     a@myx
> }
> system.time(nopre3(1000))  # 4.2s
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.