Environments and parallel processing

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Environments and parallel processing

NiekB
This post was updated on .
While using parallelization R seems to clone all environments (that are normally passed by reference) that are returned from a child process. In particular, consider the following example:
library(parallel)
env1 <- new.env()
envs2 <- lapply(1:4, function(x) env1)

cl<-makeCluster(2, type="FORK")
envs3 <- parLapply(cl, 1:4, function(x) env1)
envs4 <- parLapply(cl, 1:4, function(x) capture.output(str(env1)))
stopCluster(cl)

First I make an environment (env1). Then using the non-parallel lapply I get a list (envs2) where all entries contain a pointer to env1. Now when using the parallel parLapply the entries in the list I get (envs3) contain pointers to different environments, which are supposedly clones of env1 (also note that the first two entries contain the same pointer as the last two; supposedly because I use 2 child nodes for a loop of length 4). This cloning seems to happen when the child node returns their results to the master. To see this I save the pointer of env1 in the child nodes to the list envs4.
Why are environments cloned at the moment they are returned?, and is there a way to pass environments by reference when using parallel processing in R?

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Environments and parallel processing

Gábor Csárdi
This is all normal, a fork cluster works with processes, that do not
share memory. When you create a fork cluster, you create a new
process, that has the same memory layout as the parent. But from this
moment its memory is independent of the parent process. When parLapply
is done, the results are serialized and copied back to the parent
process. The serialized environment is independent of the original
environment in the parent process, when parLapply unserializes the
results it creates new objects.

Environments have reference semantics, but not across processes.

Gabor
On Wed, Sep 12, 2018 at 7:09 PM Niek Bouman <[hidden email]> wrote:

>
> While using parallelization R seems to clone all environments (that are normally passed by reference) that are returned from a child process. In particular, consider the following example:
> library(parallel)
> env1 <- new.env()
> envs2 <- lapply(1:4, function(x) env1)
>
> cl<-makeCluster(2, type="FORK")
> envs3 <- parLapply(cl, 1:4, function(x) env1)
> envs4 <- parLapply(cl, 1:4, function(x) capture.output(str(env1)))
> stopCluster(cl)
>
> First I make an environment (env1). Then using the non-parallel lapply I get a list (envs2) where all entries contain a pointer to env1. Now when using the parallel parLapply the entries in the list I get (envs3) contain pointers to different environments, which are supposedly clones of env1 (also note that the first two entries contain the same pointer as the last two; supposedly because I use 2 child nodes for a loop of length 4). This cloning seems to happen when the child node returns their results to the master. To see this I save the pointer of env1 in the child nodes to the list envs4.
> Why are environments cloned at the moment they are returned?, and is there a way to pass environments by reference when using parallel processing in R?
>
> Keygene N.V. | P.O. Box 216 | 6700 AE Wageningen | The Netherlands
> T (+31) 317 46 68 66 | F (+31) 317 42 49 39 | CoC. 09066631 | http://www.keygene.com<http://www.keygene.com/>
>
>
>
> p/28de203a344b/keygene-invites-you-to-keygene-nodigt-u-uit>
>
> Stay up-to-date! Subscribe to our bimonthly newsletter here<http://www.keygene.com/newsletter>
>
>
> company/KeyGene>   [http://www.keygene.com/images/twitter-grey.png] <https://twitter.com/KeyGeneInfo>     [http://www.keygene.com/images/facebook-grey.png] <https://www.facebook.com/KeyGeneNV>
>
> The information contained in this message, and attachments if any, may be privileged and/or confidential and is intended to be received only by persons
> entitled to receive such information. Use of any part of this message and/or its attachments if any, in any other way than as explicitly stated by the sender is strictly prohibited. Should you receive this
> message unintentionally please notify the sender immediately, and delete it together with all attachments, if any. Thank you. The transmission of messages and/or information via the Internet is not
> secured and may be intercepted by third parties. KeyGene assumes no liability for any damage caused by any unintentional disclosure and/or use of the content of this message and attachments if any.
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Environments and parallel processing

Ralf Stubner
On 12.09.2018 20:20, Gábor Csárdi wrote:
> This is all normal, a fork cluster works with processes, that do not
> share memory.

And if you are after shared-memory parallelism, you can try the 'Rdsm'
package: https://cran.r-project.org/package=Rdsm

Greetings
Ralf

--
Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustraße 48
14467 Potsdam

T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: [hidden email]

Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Environments and parallel processing

Clark Fitzgerald
+1 to what Gabor and Ralf said.

In this case the memory address can be misleading. My understanding is that
the environments in all the processes, 1 parent and 2 child, have the
*same* memory address, but once you write to them the operating system
makes copies and maps the address to a *different* physical address. This
is the copy-on-write UNIX fork model.

The example below shows 3 processes that all use the same address for the
environment env1, yet there must be 3 different environments, because
env1$x has 3 different values simultaneously.

-Clark


library(parallel)
env1 <- new.env()
env1$x = 0
cl <- makeCluster(2, type="FORK")

# Now we write to the environment, and env1$x has two distinct values
clusterEvalQ(cl, env1$x <- rnorm(1))
# [[1]]
# [1] -1.296702
#
# [[2]]
# [1] -0.4001104

# The environments in the 2 child processes still have the same address,
# which is the same as the original address
parLapply(cl, 1:4, function(x) capture.output(str(env1)))
env1

# The original x is unchanged
env1$x

stopCluster(cl)


On Wed, Sep 12, 2018 at 11:31 AM Ralf Stubner <[hidden email]>
wrote:

> On 12.09.2018 20:20, Gábor Csárdi wrote:
> > This is all normal, a fork cluster works with processes, that do not
> > share memory.
>
> And if you are after shared-memory parallelism, you can try the 'Rdsm'
> package: https://cran.r-project.org/package=Rdsm
>
> Greetings
> Ralf
>
> --
> Ralf Stubner
> Senior Software Engineer / Trainer
>
> daqana GmbH
> Dortustraße 48
> 14467 Potsdam
>
> T: +49 331 23 61 93 11
> F: +49 331 23 61 93 90
> M: +49 162 20 91 196
> Mail: [hidden email]
>
> Sitz: Potsdam
> Register: AG Potsdam HRB 27966 P
> Ust.-IdNr.: DE300072622
> Geschäftsführer: Prof. Dr. Dr. Karl-Kuno Kunze
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel