Dear all,

This is the sequel to a question of mine a couple of days ago about how to

implement systematic resampling from a discrete distribution efficiently. (See

below the original question and description of the problem - the problem, in a

nutshell, is how to draw N points from a discrete distribution on N points).

I did my homework and found an implementation of systematic resampling which

is O(N) - my initial one was O(N^2). The code is given below. However,

compared with multinomial resampling, the difference in execution time is

huge. Is there anybody out there who sees a way of improving the efficiency of

my code?? Is coding it in C my only option?

Thank you in advance for your suggestions and insight,

Giovanni

######### R example #########

> ### Timing systematic resampling

> set.seed(2)

> N <- 1e6

> x <- LETTERS[1 : N] # labels

> w <- rexp(N)

> w <- w / sum(w) # probabilities

> W <- cumsum(w) # cdf

> system.time({

+ u <- (0 : (N - 1) + runif(1)) / N

+ indexNew <- rep(0, N)

+ h <- 1

+ k <- 1

+ for (i in 1 : N)

+ {

+ j <- h

+ while (u[h] < W[k] && h <= N) h <- h + 1

+ indexNew[i] <- h - j

+ k <- k + 1

+ }

+ (y <- rep(x, indexNew))

+ })

user system elapsed

12.791 0.073 12.887

> ### ...while using multinomial resampling:

> system.time(

+ index <- sample(N, N, replace = TRUE, prob = w)

+ )

user system elapsed

0.344 0.020 0.366

> Date: Wed, 29 Jul 2009 13:32:14 -0500 (CDT)

> From: Giovanni Petris <

[hidden email]>

> Sender:

[hidden email]
> Precedence: list

>

>

>

> Dear all,

>

> Here is a little coding problem. It falls in the category of "how can I do

> this efficiently?" rather than "how can I do this?" (I know how to do it

> inefficiently). So, if you want to take the challenge, keep reading, otherwise

> just skip to the next post - I won't be offended by that ;-)

>

> I am coding a particle filter and I want to use systematic resampling to

> "regenerate" particles. Basically, what this means is the following.

>

> 1. Start with a vector of "labels", x, say, of length N and a vector of

> probabilities, w, say - w[i] is the probability attached to x[i].

>

> 2. Draw N equally spaced random number in (0,1) as u = (1 : N + runif(1)) / N.

>

> 3. Define a new sample of "labels", y, say, by inverting the cdf defined by

> the probabilities w. That is, set

> y[i] = x[k] if cumsum(w)[k-1] <= u[i] < cumsum(w)[k], i=1, ..., N.

>

> The following is a piece of R code that implements the procedure.

>

> > ### Systematic resampling

> > set.seed(2)

> > x <- LETTERS[1 : 6] # labels

> > w <- rexp(6)

> > w <- w / sum(w) # probabilities

> > W <- c(0, cumsum(w)) # cdf

> > u <- (1 : 6 + runif(1)) / 6

> > indNew <- sapply(seq_along(u),

> + function(i) (sum(W[i] <= u & u < W[i + 1])))

> > (y <- rep(x, indNew))

> [1] "A" "B" "D" "D" "F"

> > ## graphically...

> > plot(stepfun(seq_along(u), W), xaxt = "n")

> > axis(1, at = seq_along(u), x)

> > abline(h = u, lty = "dotted")

>

> The question is the following. I believe the way I compute "newInd" is of

> order N^2. Is there a smart way of doing it which is of order N? (I know there

> is one, I just cannot find it...)

>

> Thanks in advance,

> Giovanni

>

> ______________________________________________

>

[hidden email] mailing list

>

https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

>

>

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.