Parallel processing question ...

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel processing question ...

Robbie-26
I am experimenting with parallel processing using foreach and seem to be
missing something fundamental. Cool stuff. I've gone through the list and
seen a couple of closely related issues, but nothing I've tried seems to
work.

I know that the results from foreach are combined, but what if there is more
than one variable within the loop?  Below is a snippet (non-functioning) of
code that I hope provides enough insight into what I am trying to do.  The
commented out lines are what I would be doing (successfully) if I wasn't
trying to implement the %dopar% . The goal is to do statistics on the
sequence of lambda vectors that were originally accumulated in the matrix
lambdas using cbind.

Thanks in advance for any suggestions,
Dave

---------------snip
update_N <- function(sets, indexes, lam) {
    n <- length(indexes)-1    # count of events
    N <- rep(0, K) # count of failures per node
    for (i in 1:n) {
        nodes <- as.numeric(sets[indexes[i]:(indexes[i+1]-1)])
        node <- resample(nodes, 1, prob=lam[nodes]/sum(lam[nodes]))
        N[node] = N[node] + 1
    }
    N
}

lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% {
    N <- update_N(min_sets, min_sets_indexes, lambda)
    lambda <- rgamma(K, shape=a+N, rate=bT)
    lambda
    if (j%%100==0) { print(j); print(lambda); print(N)}
#    if (j > burn_in) {
#        lambdas <- cbind(lambdas, lambda)
#    }
}

---------------snip

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Parallel processing question ...

Steve Lianoglou-6
Hi,

On Tue, Feb 8, 2011 at 6:18 PM, Robinson, David G <[hidden email]> wrote:

> I am experimenting with parallel processing using foreach and seem to be
> missing something fundamental. Cool stuff. I've gone through the list and
> seen a couple of closely related issues, but nothing I've tried seems to
> work.
>
> I know that the results from foreach are combined, but what if there is more
> than one variable within the loop?  Below is a snippet (non-functioning) of
> code that I hope provides enough insight into what I am trying to do.  The
> commented out lines are what I would be doing (successfully) if I wasn't
> trying to implement the %dopar% . The goal is to do statistics on the
> sequence of lambda vectors that were originally accumulated in the matrix
> lambdas using cbind.
>
> Thanks in advance for any suggestions,
> Dave
>
> ---------------snip
> update_N <- function(sets, indexes, lam) {
>    n <- length(indexes)-1    # count of events
>    N <- rep(0, K) # count of failures per node
>    for (i in 1:n) {
>        nodes <- as.numeric(sets[indexes[i]:(indexes[i+1]-1)])
>        node <- resample(nodes, 1, prob=lam[nodes]/sum(lam[nodes]))
>        N[node] = N[node] + 1
>    }
>    N
> }
>
> lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% {
>    N <- update_N(min_sets, min_sets_indexes, lambda)
>    lambda <- rgamma(K, shape=a+N, rate=bT)
>    lambda
>    if (j%%100==0) { print(j); print(lambda); print(N)}
> #    if (j > burn_in) {
> #        lambdas <- cbind(lambdas, lambda)
> #    }
> }
>
> ---------------snip

Sorry -- I don't get what you're asking/trying to do.

Is it a coincidence that your commented block uses the same variable
name as the one you are assigning the result of foreach() to?

Essentially, foreach will work just like an lapply ... if you changed
foreach to lapply here, what do you expect that %dopar% {} block to
return after each iteration?

I'm not sure if this is what you're asking, but if you want to return
two elements per iteration in your loop, just return a list with two
elements, and post process it later.

I'd start by trying to remove your .combine=cbind param/argument from
the foreach() function and get your code running so you get the right
"things" returned as a normal list (or list of lists, if you want to
return > 1 thing per foreach iteration). Once that's done, you can try
to auto 'cbind' your things if you think it's necessary.

Sorry if this isn't helpful .. it's not clear to me what you're trying
to do, so I'm kind of stabbing at the dark here.

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Parallel processing question ...

Robbie-26
Steve,
Thanks for taking the time to look at the question. my apologies for the confusing post. In an attempt to keep the post short, I seem to have confused the issue.

The variable of interest in each iteration is the vector lambda  and the goal is to collect all the lambda vectors and characterize the statistics of lambda over the course of the simulation (this is just a simply gibbs sampler) . In the series processing world I simply use cbind to accumulate the lambda vectors into an array called lambdas (as performed in  commented out  commands).

What I am trying to do now is use a combination of foreach/dopar to do the same type of accumulation.

I am not trying to capture any other variables from the loop except lambda. As you suggested, I have tried removing the .combine argument and simply collect the resulting list. Unfortunately, the lambda vectors don't appear in the resulting list.

Thanks again for taking the time to try to figure this out.

Cheers,
Dave



On 2/8/11 7:47 PM, "Steve Lianoglou" <[hidden email]> wrote:

Hi,

On Tue, Feb 8, 2011 at 6:18 PM, Robinson, David G <[hidden email]> wrote:

> I am experimenting with parallel processing using foreach and seem to be
> missing something fundamental. Cool stuff. I've gone through the list and
> seen a couple of closely related issues, but nothing I've tried seems to
> work.
>
> I know that the results from foreach are combined, but what if there is more
> than one variable within the loop?  Below is a snippet (non-functioning) of
> code that I hope provides enough insight into what I am trying to do.  The
> commented out lines are what I would be doing (successfully) if I wasn't
> trying to implement the %dopar% . The goal is to do statistics on the
> sequence of lambda vectors that were originally accumulated in the matrix
> lambdas using cbind.
>
> Thanks in advance for any suggestions,
> Dave
>
> ---------------snip
> update_N <- function(sets, indexes, lam) {
>    n <- length(indexes)-1    # count of events
>    N <- rep(0, K) # count of failures per node
>    for (i in 1:n) {
>        nodes <- as.numeric(sets[indexes[i]:(indexes[i+1]-1)])
>        node <- resample(nodes, 1, prob=lam[nodes]/sum(lam[nodes]))
>        N[node] = N[node] + 1
>    }
>    N
> }
>
> lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% {
>    N <- update_N(min_sets, min_sets_indexes, lambda)
>    lambda <- rgamma(K, shape=a+N, rate=bT)
>    lambda
>    if (j%%100==0) { print(j); print(lambda); print(N)}
> #    if (j > burn_in) {
> #        lambdas <- cbind(lambdas, lambda)
> #    }
> }
>
> ---------------snip

Sorry -- I don't get what you're asking/trying to do.

Is it a coincidence that your commented block uses the same variable
name as the one you are assigning the result of foreach() to?

Essentially, foreach will work just like an lapply ... if you changed
foreach to lapply here, what do you expect that %dopar% {} block to
return after each iteration?

I'm not sure if this is what you're asking, but if you want to return
two elements per iteration in your loop, just return a list with two
elements, and post process it later.

I'd start by trying to remove your .combine=cbind param/argument from
the foreach() function and get your code running so you get the right
"things" returned as a normal list (or list of lists, if you want to
return > 1 thing per foreach iteration). Once that's done, you can try
to auto 'cbind' your things if you think it's necessary.

Sorry if this isn't helpful .. it's not clear to me what you're trying
to do, so I'm kind of stabbing at the dark here.

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Parallel processing question ...

Steve Lianoglou-6
Hi David,

On Wed, Feb 9, 2011 at 10:11 AM, Robinson, David G <[hidden email]> wrote:

> Steve,
> Thanks for taking the time to look at the question. my apologies for the
> confusing post. In an attempt to keep the post short, I seem to have
> confused the issue.
>
> The variable of interest in each iteration is the vector lambda  and the
> goal is to collect all the lambda vectors and characterize the statistics of
> lambda over the course of the simulation (this is just a simply gibbs
> sampler) . In the series processing world I simply use cbind to accumulate
> the lambda vectors into an array called lambdas (as performed in  commented
> out  commands).
>
> What I am trying to do now is use a combination of foreach/dopar to do the
> same type of accumulation.
>
> I am not trying to capture any other variables from the loop except lambda.
> As you suggested, I have tried removing the .combine argument and simply
> collect the resulting list. Unfortunately, the lambda vectors don’t appear
> in the resulting list.

Ok, so let's take a look at your code again (without the commented block):

lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% {
   N <- update_N(min_sets, min_sets_indexes, lambda)
   lambda <- rgamma(K, shape=a+N, rate=bT)
   lambda
   if (j%%100==0) { print(j); print(lambda); print(N)}
}

Assuming you have all the vars defined properly. Note that your first
line in the loop passes a variable `lambda` into your update_N
function, then you reassign lambda to the result of your rgamma call.

I guess these two things are different, so why not name one of them
something else just in case (I've renamed it to LAMBDA below).

The last line of the block in your loop is an if-statement, which
essentially stops you from returning your lambda vector. Try:

lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% {
   N <- update_N(min_sets, min_sets_indexes, LAMBDA)
   lambda <- rgamma(K, shape=a+N, rate=bT)
   if (j%%100==0) { print(j); print(lambda); print(N)}
   lambda
}

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Parallel processing question ...

Steve Lianoglou-6
Hi David,

I'm CC-ing R-help inorder to finish this one off ;-)

On Wed, Feb 9, 2011 at 10:59 AM, Robinson, David G <[hidden email]> wrote:
[snip]
> One of you comments pointed me in the right direction and I found the
> problem. I simply commented out the line " if (j%%100==0) { ...print(N)}"
> and the original program ran fine.  Not sure I understand why, but... it
> runs.
[/snip]

It's because the last line of a "block" or "function" or whatever is
the implicit return value of that block/function (as you already know
-- the last line of your `update_N` function is `N`, which means
that's the value you want that function to return).

The last line of your the "block" inside your %dopar% { ... } was in
if-statement and not the value `lambda` that you wanted to return.

As a result the return value of your block was the result of that
if-statement. Keep in mind that in R, even `if` statements return
values, eg:

x <- if (FALSE) {
  1
} else {
  2
}

In the case above, x will be set to 2.

Does that make it more clear now why your lambda vector wasn't being
returned (and further processed) after each iteration of your foreach
loop?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.