Random behavior of mclapply

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Random behavior of mclapply

Thibault Vatter-2
Hi,

I wondered about the behavior described in the following stackoverflow
question:

https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly

More specifically, I would like to know if you ever considered the
suggestion made in the comments of the first answer, namely to somehow warn
the user if one of the processes has been killed by the out-of-memory
killer ?

I am always surprised to see the random NULLs without message/warning/error
of any kind, and I think that it could be a useful feature to know whether
the function executed by mclapply returned a NULL or if the process was
killed for some reason.

In the following gist, I have an example of this (in this case non-random)
behavior:

https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715

For the record, I generate the list of NULLs in the 4th mclapply in the
girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of
memory, and my sessionInfo() is:

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
 base

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19

------------------------------------------------------------
Thibault Vatter
Department of Statistics
Columbia University

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Random behavior of mclapply

Tomas Kalibera

Hi Thibault,

mclapply has been designed to signal an error in two ways. User code
errors are returned as special objects (of class "try-error") in the
respective element of the result list. All other errors (including a
process killed) are returned as NULL in the respective elements of the
result list. To detect these errors reliably, one needs to implement FUN
so that it never returns NULL normally (also it cannot return a raw
vector). This is how mclapply was designed and implemented (and also
mccollect, etc). It may be surprising to see multiple NULL elements when
a single process is killed, but this is expected with pre-scheduling
when that process has been tasked to compute multiple elements.

To make this API more user friendly, I've added a warning that is now
emitted when a job does not deliver a result (that is, when a vector
element is NULL because of such error). I've also made it more explicit
in the documentation that NULL signals an error.

Best,
Tomas


On 07/26/2018 08:37 PM, Thibault Vatter wrote:

> Hi,
>
> I wondered about the behavior described in the following stackoverflow
> question:
>
> https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly
>
> More specifically, I would like to know if you ever considered the
> suggestion made in the comments of the first answer, namely to somehow warn
> the user if one of the processes has been killed by the out-of-memory
> killer ?
>
> I am always surprised to see the random NULLs without message/warning/error
> of any kind, and I think that it could be a useful feature to know whether
> the function executed by mclapply returned a NULL or if the process was
> killed for some reason.
>
> In the following gist, I have an example of this (in this case non-random)
> behavior:
>
> https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715
>
> For the record, I generate the list of NULLs in the 4th mclapply in the
> girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of
> memory, and my sessionInfo() is:
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-apple-darwin16.7.0 (64-bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19
>
> ------------------------------------------------------------
> Thibault Vatter
> Department of Statistics
> Columbia University
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Random behavior of mclapply

Thibault Vatter-2
Hi Tomas,

Thanks a lot for the explanation and the changes. The update in the
documentation is especially helpful.

Best,
Thibault




On Thu, Oct 18, 2018 at 10:48 AM Tomas Kalibera <[hidden email]>
wrote:

>
> Hi Thibault,
>
> mclapply has been designed to signal an error in two ways. User code
> errors are returned as special objects (of class "try-error") in the
> respective element of the result list. All other errors (including a
> process killed) are returned as NULL in the respective elements of the
> result list. To detect these errors reliably, one needs to implement FUN
> so that it never returns NULL normally (also it cannot return a raw
> vector). This is how mclapply was designed and implemented (and also
> mccollect, etc). It may be surprising to see multiple NULL elements when
> a single process is killed, but this is expected with pre-scheduling
> when that process has been tasked to compute multiple elements.
>
> To make this API more user friendly, I've added a warning that is now
> emitted when a job does not deliver a result (that is, when a vector
> element is NULL because of such error). I've also made it more explicit
> in the documentation that NULL signals an error.
>
> Best,
> Tomas
>
>
> On 07/26/2018 08:37 PM, Thibault Vatter wrote:
> > Hi,
> >
> > I wondered about the behavior described in the following stackoverflow
> > question:
> >
> >
> https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly
> >
> > More specifically, I would like to know if you ever considered the
> > suggestion made in the comments of the first answer, namely to somehow
> warn
> > the user if one of the processes has been killed by the out-of-memory
> > killer ?
> >
> > I am always surprised to see the random NULLs without
> message/warning/error
> > of any kind, and I think that it could be a useful feature to know
> whether
> > the function executed by mclapply returned a NULL or if the process was
> > killed for some reason.
> >
> > In the following gist, I have an example of this (in this case
> non-random)
> > behavior:
> >
> > https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715
> >
> > For the record, I generate the list of NULLs in the 4th mclapply in the
> > girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of
> > memory, and my sessionInfo() is:
> >
> > R version 3.5.0 (2018-04-23)
> > Platform: x86_64-apple-darwin16.7.0 (64-bit)
> > Running under: macOS High Sierra 10.13.6
> >
> > Matrix products: default
> > BLAS:
> >
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> > LAPACK:
> >
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods
> >   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19
> >
> > ------------------------------------------------------------
> > Thibault Vatter
> > Department of Statistics
> > Columbia University
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel