Bug in mclapply?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in mclapply?

Winston Chang
I've been using mclapply and have encountered situations where it gives
errors or returns incorrect results. Here's a minimal example, which gives
the error on R 2.15.2 on Mac and Linux:

library(parallel)
f <- function(x) NULL
mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)
# Error in sum(sapply(res, inherits, "try-error")) :
#  invalid 'type' (list) of argument


I believe it happens when the following are true:
- The function returns NULL
- mc.preschedule = FALSE
- mc.cores >= length of the input data


Here are some examples I used to trace down the problem.

library(parallel)
f <- function(x) NULL

# Error when mc.preschedule=FALSE and mc.cores >= length(x)
mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)    # Error
mclapply(1, f, mc.preschedule = FALSE, mc.cores = 2)    # Error
mclapply(1:2, f, mc.preschedule = FALSE, mc.cores = 1)  # OK

# In the following 2 cases, I get an error about 10-20% of the time.
# The other times, the result is worse: it returns a list with only one
# element, not two!
mclapply(1:2, f, mc.preschedule = FALSE, mc.cores = 2)  # Error
mclapply(1:2, f, mc.preschedule = FALSE, mc.cores = 3)  # Error


# When mc.preschedule=TRUE, always works
mclapply(1, f, mc.preschedule = TRUE, mc.cores = 1)    # OK
mclapply(1:2, f, mc.preschedule = TRUE, mc.cores = 1)  # OK
mclapply(1:2, f, mc.preschedule = TRUE, mc.cores = 2)  # OK

# lapply() always works
lapply(1, f)    # OK
lapply(1:2, f)  # OK
lapply(1:2, f)  # OK


# If function returns non-null, it works
g <- function(x) 0
mclapply(1, g, mc.preschedule = FALSE, mc.cores = 1)    # OK
mclapply(1:2, g, mc.preschedule = FALSE, mc.cores = 1)  # OK
mclapply(1:2, g, mc.preschedule = FALSE, mc.cores = 2)  # OK



Digging around in mclapply(), I think it happens because mccollect(jobs) is
returning an empty list. But when I use options(error=recover) and debug
the function, I find that when I call mccollect(jobs) again, it returns a
list with values -- it's as though mccollect() is returning too early. This
will illustrate:

> mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)
Error in sum(sapply(res, inherits, "try-error")) :
  invalid 'type' (list) of argument

Enter a frame number, or 0 to exit

1: mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)

Selection: 1
Called from: top level
Browse[1]> res
named list()
Browse[1]> res <- mccollect(jobs)
Browse[1]> res
$`12348`
NULL

The error happens on line 63 of mclapply.r, which is after `res <-
mccollect(jobs)` is called, on line 61. At this point, res should be a
named list with values filled in, but it's empty. When I run `res <-
mccollect(jobs)` again, it gives the correct values.

Is there a good way to work around this issue for now?

-Winston

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in mclapply?

Winston Chang
(Sorry for the repeat message; I forgot to send the previous message
in plain text.)

I've been using mclapply and have encountered situations where it
gives errors or returns incorrect results. Here's a minimal example,
which gives the error on R 2.15.2 on Mac and Linux:

library(parallel)
f <- function(x) NULL
mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)
# Error in sum(sapply(res, inherits, "try-error")) :
#  invalid 'type' (list) of argument


I believe it happens when the following are true:
- The function returns NULL
- mc.preschedule = FALSE
- mc.cores >= length of the input data


Here are some examples I used to trace down the problem.

library(parallel)
f <- function(x) NULL

# Error when mc.preschedule=FALSE and mc.cores >= length(x)
mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)    # Error
mclapply(1, f, mc.preschedule = FALSE, mc.cores = 2)    # Error
mclapply(1:2, f, mc.preschedule = FALSE, mc.cores = 1)  # OK

# In the following 2 cases, I get an error about 10-20% of the time.
# The other times, the result is worse: it returns a list with only one
# element, not two!
mclapply(1:2, f, mc.preschedule = FALSE, mc.cores = 2)  # Error
mclapply(1:2, f, mc.preschedule = FALSE, mc.cores = 3)  # Error


# When mc.preschedule=TRUE, always works
mclapply(1, f, mc.preschedule = TRUE, mc.cores = 1)    # OK
mclapply(1:2, f, mc.preschedule = TRUE, mc.cores = 1)  # OK
mclapply(1:2, f, mc.preschedule = TRUE, mc.cores = 2)  # OK

# lapply() always works
lapply(1, f)    # OK
lapply(1:2, f)  # OK
lapply(1:2, f)  # OK


# If function returns non-null, it works
g <- function(x) 0
mclapply(1, g, mc.preschedule = FALSE, mc.cores = 1)    # OK
mclapply(1:2, g, mc.preschedule = FALSE, mc.cores = 1)  # OK
mclapply(1:2, g, mc.preschedule = FALSE, mc.cores = 2)  # OK



Digging around in mclapply(), I think it happens because
mccollect(jobs) is returning an empty list. But when I use
options(error=recover) and debug the function, I find that when I call
mccollect(jobs) again, it returns a list with values -- it's as though
mccollect() is returning too early. This will illustrate:

> mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)
Error in sum(sapply(res, inherits, "try-error")) :
  invalid 'type' (list) of argument

Enter a frame number, or 0 to exit

1: mclapply(1, f, mc.preschedule = FALSE, mc.cores = 1)

Selection: 1
Called from: top level
Browse[1]> res
named list()
Browse[1]> res <- mccollect(jobs)
Browse[1]> res
$`12348`
NULL

The error happens on line 63 of mclapply.r, which is after `res <-
mccollect(jobs)` is called, on line 61. At this point, res should be a
named list with values filled in, but it's empty. When I run `res <-
mccollect(jobs)` again, it gives the correct values.

Is there a good way to work around this issue for now?

-Winston

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel