[parallel-package] feature request: set default cluster type via environment variable

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[parallel-package] feature request: set default cluster type via environment variable

Christian Krause
Dear all,

I’m working as an administrator of a High-Performance Computing (HPC) Cluster which runs on Linux. A lot of people are using R on this Linux cluster and, of course, the *parallel* package to speed up their computations.

It has been our collective experience, that using |makeForkCluster| yields an overall better experience /on Linux/ than the |makePSOCKcluster|, for whatever definition of better. Let me just summarize that it works smoother. I believe, other people working with *parallel* on Linux can share this experience

Also, we did really welcome the environment variable |MC_CORES|, to be able to specify (in job submit scripts) the amount of CPU cores a user has been granted, most importantly for /dynamic resource requests/ (see below for an example).

What we would also appreciate - and now we finally get to the feature request - is another environment variable to choose the used cluster, as in:

|export MC_CLUSTER_TYPE=FORK |

Do you think something like this could be implemented in future releases?


      Parallel R job submit script

This works with the Univa Grid Engine and should work with other * Grid Engine products:

|#!/bin/bash # request a "parallel environment" with 2 to 20 cores #$ -pe smp 2-20 # set number of cores for the R cluster to the granted value (between 2 and 20) export MC_CORES=$NSLOTS # we want this: export MC_CLUSTER_TYPE=FORK Rscript /path/to/script.R |

Best Regards


--

Christian Krause

Scientific Computing Administration and Support

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Phone: +49 341 97 33144

Email: [hidden email]

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig

Deutscher Platz 5e

04103 Leipzig

Germany

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

iDiv is a research centre of the DFG – Deutsche Forschungsgemeinschaft

iDiv ist eine zentrale Einrichtung der Universität Leipzig im Sinne des § 92 Abs. 1 SächsHSFG und wird zusammen mit der Martin-Luther-Universität Halle-Wittenberg und der Friedrich-Schiller-Universität Jena betrieben sowie in Kooperation mit dem Helmholtz-Zentrum für Umweltforschung GmbH – UFZ. Beteiligte Kooperationspartner sind die folgenden außeruniversitären Forschungseinrichtungen: das Helmholtz-Zentrum für Umweltforschung GmbH - UFZ, das Max-Planck-Institut für Biogeochemie (MPI BGC), das Max-Planck-Institut für chemische Ökologie (MPI CE), das Max-Planck-Institut für evolutionäre Anthropologie (MPI EVA), das Leibniz-Institut Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ), das Leibniz-Institut für Pflanzenbiochemie (IPB), das Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK) und das Leibniz-Institut Senckenberg Museum für Naturkunde Görlitz (SMNG). USt-IdNr. DE 141510383


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [parallel-package] feature request: set default cluster type via environment variable

Prof Brian Ripley
On 24/11/2016 07:30, Christian Krause wrote:
> Dear all,
>
> I’m working as an administrator of a High-Performance Computing (HPC) Cluster which runs on Linux. A lot of people are using R on this Linux cluster and, of course, the *parallel* package to speed up their computations.
>
> It has been our collective experience, that using |makeForkCluster| yields an overall better experience /on Linux/ than the |makePSOCKcluster|, for whatever definition of better. Let me just summarize that it works smoother. I believe, other people working with *parallel* on Linux can share this experience

Usually, but not always.  And the differences are mainly in
initialization time, so small once workers are given a reasonable amount
of work (tens of seconds each).  However, as forked workers have a copy
of the whole master process, forking workers can lead to excessive
memory usage.

> Also, we did really welcome the environment variable |MC_CORES|, to be able to specify (in job submit scripts) the amount of CPU cores a user has been granted, most importantly for /dynamic resource requests/ (see below for an example).

Hmm, MC_CORES is primarily for mclapply() and friends, not
makeCluster().  makeForkCluster() is a 'friend' so uses it, but
makePSOCKcluster() was designed for distributing across a cluster of
machines (whereas makeForkCluster is restricted to a single multicore
machine).

> What we would also appreciate - and now we finally get to the feature request - is another environment variable to choose the used cluster, as in:
>
> |export MC_CLUSTER_TYPE=FORK |
>
> Do you think something like this could be implemented in future releases?

No.  (Not least as 'MC_' refers to the former 'multicore' package.)

PSOCK and Fork clusters are not interchangeable, and the author of the
code has to check if Fork can be substituted for PSOCK (which starts
with a clean R environment, and that may well be assumed).

So rather, you need to ask your users to implement this in their calls
to parallel::makeCluster.



>
>
>       Parallel R job submit script
>
> This works with the Univa Grid Engine and should work with other * Grid Engine products:
>
> |#!/bin/bash # request a "parallel environment" with 2 to 20 cores #$ -pe smp 2-20 # set number of cores for the R cluster to the granted value (between 2 and 20) export MC_CORES=$NSLOTS # we want this: export MC_CLUSTER_TYPE=FORK Rscript /path/to/script.R |
>
> Best Regards
>
> ​
>


--
Brian D. Ripley,                  [hidden email]
Emeritus Professor of Applied Statistics, University of Oxford

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel