error in parallel:::sendMaster

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

error in parallel:::sendMaster

Andreas Kersting
Hi,

I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:

The basic outline of the problematic section of the script:

# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
  # do some data wrangling and write the result to a file
  # ...

  print(paste0("part ", i, " written successfully."))
  return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)

str(mc_ret)


Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.

Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.

But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:

Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster

and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.


Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.


Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.2


I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.

I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?

Regards,
Andreas




Actual output:

[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."

[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."

[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
 "part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."

[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
 "part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."

[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
 "part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
 "part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."

[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."

Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
 $ : logi TRUE
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Andreas Kersting
Hi again,

One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold.

Sorry for this! Regards,
Andreas

2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:

> Hi,
>
> I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
>
> The basic outline of the problematic section of the script:
>
> # parts is a data.table with 88 rows
> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>   # do some data wrangling and write the result to a file
>   # ...
>
>   print(paste0("part ", i, " written successfully."))
>   return(TRUE)
> }, mc.preschedule = FALSE, mc.cores = 2L)
>
> str(mc_ret)
>
>
> Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
>
> Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
>
> But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
>
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>
> and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
>
>
> Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
> - The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
> - The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
> - Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
> - The problem also occurs for mc.preschedule = TRUE.
> - There is no error if I set mc.cores to 1.
> - And stressing again: the code works without any changes from Rscript for thousands of other data sets.
>
>
> Rscript -e "sessionInfo()":
> R version 3.5.2 (2018-12-20)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> Matrix products: default
> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>
> locale:
>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.2
>
>
> I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
>
> I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
>
> Regards,
> Andreas
>
>
>
>
> Actual output:
>
> [1] "part 51 written successfully."
> [1] "part 30 written successfully."
> [1] "part 32 written successfully."
> [1] "part 48 written successfully."
> [1] "part 63 written successfully."
> [1] "part 5 written successfully."
> [1][1] "part 14 written successfully." "part 18 written successfully."
>
> [1] "part 38 written successfully."
> [1] "part 11 written successfully."
> [1] "part 68 written successfully."
> [1] "part 45 written successfully."
> [1] "part 88 written successfully."
> [1] "part 36 written successfully."
> [1] "part 44 written successfully."
> [1] "part 55 written successfully."
> [1] "part 26 written successfully."
> [1] "part 37 written successfully."
> [1] "part 22 written successfully."
> [1] "part 13 written successfully."
> [1] "part 67 written successfully."
> [1] "part 10 written successfully."
> [1] "part 24 written successfully."
> [1] "part 20 written successfully."
> [1] "part 74 written successfully."
> [1] "part 50 written successfully."
> [1] "part 2 written successfully."
> [1] "part 17 written successfully."
> [1] "part 66 written successfully."
> [1] "part 61 written successfully."
> [1][1] "part 7 written successfully." "part 31 written successfully."
>
> [1] "part 70 written successfully."
> [1] "part 71 written successfully."
> [1][1] "part 76 written successfully."
>  "part 42 written successfully."
> [1][1] "part 72 written successfully." "part 12 written successfully."
>
> [1] "part 39 written successfully."
> [1] "part 58 written successfully."
> [1] "part 87 written successfully."
> [1] "part 81 written successfully."
> [1] "part 3 written successfully."
> [1] "part 29 written successfully."
> [1] "part 82 written successfully."
> [1] "part 54 written successfully."
> [1] "part 23 written successfully."
> [1] "part 56 written successfully."
> [1] "part 73 written successfully."
> [1] "part 27 written successfully."
> [1] "part 64 written successfully."
> [1] "part 46 written successfully."
> [1] "part 80 written successfully."
> [1] "part 53 written successfully."
> [1] "part 43 written successfully."
> [1] "part 47 written successfully."
> [1] "part 57 written successfully."
> [1] "part 75 written successfully."
> [1] "part 84 written successfully."
> [1] "part 85 written successfully."
> [1][1] "part 86 written successfully."
>  "part 19 written successfully."
> [1][1] "part 33 written successfully." "part 34 written successfully."
>
> [1] "part 21 written successfully."
> [1] "part 40 written successfully."
> [1][1] "part 4 written successfully."
>  "part 28 written successfully."
> [1] "part 62 written successfully."
> [1] "part 16 written successfully."
> [1] "part 79 written successfully."
> [1] "part 6 written successfully."
> [1] "part 77 written successfully."
> [1] "part 65 written successfully."
> [1] "part 60 written successfully."
> [1] "part 78 written successfully."
> [1] "part 1 written successfully."
> [1] "part 83 written successfully."
> [1] "part 9 written successfully."
> [1] "part 15 written successfully."
> [1][1] "part 41 written successfully."
>  "part 35 written successfully."
> [1][1] "part 8 written successfully." "part 69 written successfully."
>
> [1] "part 59 written successfully."
> [1] "part 25 written successfully."
> [1] "part 49 written successfully."
> [1] "part 52 written successfully."
> List of 88
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
>  $ : logi TRUE
> [1] "part 31 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
> [1] "part 71 written successfully."
> [1] "part 35 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 42 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 4 written successfully."
> Execution halted
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1][1] "part 30 written successfully." "part 40 written successfully."
>
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 41 written successfully."
> [1] "part 80 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 45 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 82 written successfully."
> [1] "part 54 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 59 written successfully."
> Execution halted
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 27 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 36 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 44 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> Execution halted
> [1] "part 83 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 61 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 70 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 66 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 10 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> Execution halted
> [1] "part 53 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 55 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 15 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 50 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 5 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> Execution halted
> [1] "part 48 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 32 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 69 written successfully."
> Execution halted
> [1] "part 52 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 77 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> Execution halted
> [1] "part 24 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 56 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 6 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 62 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 33 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 16 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 34 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 74 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 2 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 19 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> [1] "part 9 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 11 written successfully."
> [1] "part 28 written successfully."
> [1] "part 29 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 17 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 65 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> Execution halted
> Execution halted
> [1] "part 68 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 43 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 38 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 1 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 57 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 8 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1] "part 25 written successfully."
> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
> [1]List of 85
>  $ : logi TRUE
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
>  $ : NULL
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Tomas Kalibera
Hi Andreas,

the error is reported when some child process cannot send results to the
master process, which originates from an error returned by write() -
when write() returns -1 or 0. The logic around the writing has not
changed since R 3.5.2. It should not be related to the printing in the
child, only to returning the value. The problem may be originating from
the execution environment, virtualization, and/or possibly from a lack
of robustness in R. To resolve this we need to find out which error was
returned and why. Either you can try to create a reproducible example
(something I could use to trigger an error on my system and then debug)
or to debug on your system (build R from source, ensure the bug is still
triggered, then instrument to print the exact error from the OS and
where it was detected, etc). In principle you could also try without
code instrumentation just using strace. Just from looking at the code in
R around the writing I am not seeing any bug there. If you choose to
debug on your system I can help with the instrumentation.

Best
Tomas

On 11/27/19 12:40 PM, Andreas Kersting wrote:

> Hi again,
>
> One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold.
>
> Sorry for this! Regards,
> Andreas
>
> 2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:
>> Hi,
>>
>> I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
>>
>> The basic outline of the problematic section of the script:
>>
>> # parts is a data.table with 88 rows
>> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>>    # do some data wrangling and write the result to a file
>>    # ...
>>
>>    print(paste0("part ", i, " written successfully."))
>>    return(TRUE)
>> }, mc.preschedule = FALSE, mc.cores = 2L)
>>
>> str(mc_ret)
>>
>>
>> Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
>>
>> Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
>>
>> But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
>>
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>
>> and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
>>
>>
>> Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
>> - The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
>> - The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
>> - Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
>> - The problem also occurs for mc.preschedule = TRUE.
>> - There is no error if I set mc.cores to 1.
>> - And stressing again: the code works without any changes from Rscript for thousands of other data sets.
>>
>>
>> Rscript -e "sessionInfo()":
>> R version 3.5.2 (2018-12-20)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 18.04.2 LTS
>>
>> Matrix products: default
>> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>>
>> locale:
>>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
>>   [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
>>   [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
>> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.5.2
>>
>>
>> I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
>>
>> I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
>>
>> Regards,
>> Andreas
>>
>>
>>
>>
>> Actual output:
>>
>> [1] "part 51 written successfully."
>> [1] "part 30 written successfully."
>> [1] "part 32 written successfully."
>> [1] "part 48 written successfully."
>> [1] "part 63 written successfully."
>> [1] "part 5 written successfully."
>> [1][1] "part 14 written successfully." "part 18 written successfully."
>>
>> [1] "part 38 written successfully."
>> [1] "part 11 written successfully."
>> [1] "part 68 written successfully."
>> [1] "part 45 written successfully."
>> [1] "part 88 written successfully."
>> [1] "part 36 written successfully."
>> [1] "part 44 written successfully."
>> [1] "part 55 written successfully."
>> [1] "part 26 written successfully."
>> [1] "part 37 written successfully."
>> [1] "part 22 written successfully."
>> [1] "part 13 written successfully."
>> [1] "part 67 written successfully."
>> [1] "part 10 written successfully."
>> [1] "part 24 written successfully."
>> [1] "part 20 written successfully."
>> [1] "part 74 written successfully."
>> [1] "part 50 written successfully."
>> [1] "part 2 written successfully."
>> [1] "part 17 written successfully."
>> [1] "part 66 written successfully."
>> [1] "part 61 written successfully."
>> [1][1] "part 7 written successfully." "part 31 written successfully."
>>
>> [1] "part 70 written successfully."
>> [1] "part 71 written successfully."
>> [1][1] "part 76 written successfully."
>>   "part 42 written successfully."
>> [1][1] "part 72 written successfully." "part 12 written successfully."
>>
>> [1] "part 39 written successfully."
>> [1] "part 58 written successfully."
>> [1] "part 87 written successfully."
>> [1] "part 81 written successfully."
>> [1] "part 3 written successfully."
>> [1] "part 29 written successfully."
>> [1] "part 82 written successfully."
>> [1] "part 54 written successfully."
>> [1] "part 23 written successfully."
>> [1] "part 56 written successfully."
>> [1] "part 73 written successfully."
>> [1] "part 27 written successfully."
>> [1] "part 64 written successfully."
>> [1] "part 46 written successfully."
>> [1] "part 80 written successfully."
>> [1] "part 53 written successfully."
>> [1] "part 43 written successfully."
>> [1] "part 47 written successfully."
>> [1] "part 57 written successfully."
>> [1] "part 75 written successfully."
>> [1] "part 84 written successfully."
>> [1] "part 85 written successfully."
>> [1][1] "part 86 written successfully."
>>   "part 19 written successfully."
>> [1][1] "part 33 written successfully." "part 34 written successfully."
>>
>> [1] "part 21 written successfully."
>> [1] "part 40 written successfully."
>> [1][1] "part 4 written successfully."
>>   "part 28 written successfully."
>> [1] "part 62 written successfully."
>> [1] "part 16 written successfully."
>> [1] "part 79 written successfully."
>> [1] "part 6 written successfully."
>> [1] "part 77 written successfully."
>> [1] "part 65 written successfully."
>> [1] "part 60 written successfully."
>> [1] "part 78 written successfully."
>> [1] "part 1 written successfully."
>> [1] "part 83 written successfully."
>> [1] "part 9 written successfully."
>> [1] "part 15 written successfully."
>> [1][1] "part 41 written successfully."
>>   "part 35 written successfully."
>> [1][1] "part 8 written successfully." "part 69 written successfully."
>>
>> [1] "part 59 written successfully."
>> [1] "part 25 written successfully."
>> [1] "part 49 written successfully."
>> [1] "part 52 written successfully."
>> List of 88
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>>   $ : logi TRUE
>> [1] "part 31 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
>> [1] "part 71 written successfully."
>> [1] "part 35 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 42 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 4 written successfully."
>> Execution halted
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1][1] "part 30 written successfully." "part 40 written successfully."
>>
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 41 written successfully."
>> [1] "part 80 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 45 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 82 written successfully."
>> [1] "part 54 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 59 written successfully."
>> Execution halted
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 27 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 36 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 44 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> Execution halted
>> [1] "part 83 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 61 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 70 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 66 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 10 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> Execution halted
>> [1] "part 53 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 55 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 15 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 50 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 5 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> Execution halted
>> [1] "part 48 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 32 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 69 written successfully."
>> Execution halted
>> [1] "part 52 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 77 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> Execution halted
>> [1] "part 24 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 56 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 6 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 62 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 33 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 16 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 34 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 74 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 2 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 19 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> [1] "part 9 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 11 written successfully."
>> [1] "part 28 written successfully."
>> [1] "part 29 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 17 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 65 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> Execution halted
>> Execution halted
>> [1] "part 68 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 43 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 38 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 1 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 57 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 8 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1] "part 25 written successfully."
>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>    write error, closing pipe to the master
>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>> [1]List of 85
>>   $ : logi TRUE
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>>   $ : NULL
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Andreas Kersting
Hi Tomas,

Thanks for your prompt reply and your offer to help. I might need to get back to this since I am not too experienced in debugging these kinds of issues. Anyway, I gave it a try and I think I have found the immediate cause:

I installed the debug symbols (r-base-core-dbg), placed https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd and changed the wrapper code to:

mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) {
    # we fail for the input resulting in parts having 85 rows
    if (nrow(parts) == 85L && !file.exists(as.character(Sys.getpid()))) {
      file.create(as.character(Sys.getpid()))
      print(Sys.getpid())
      Sys.sleep(30)
    }

    # ...

    return(TRUE)
  }, mc.preschedule = TRUE, mc.cores = 2L)

This way I ended up with only two child processes to which I each attached a debugger. In total I ran about 10 debugging sessions and it was always the second child process failing. The errno after write returned -1 was 9 (EBADF).

From what I can see, the reason for this is that the second child tries to write to fd 7, but already during the very beginning of the first invocation of the anonymous function to parallelize, i.e. during Sys.sleep(30), there is no such file descriptor. From this observation I would conclude that it is NOT the code run from that function, i.e. # ...,  causing the issue. Let me point out again, that this is NOT the very first invocation of mclapply in this R session. There is at least one previous call to it, which works fine.


File descriptors directly after attaching gdb to both child processes during Sys.sleep(30):

### master
root@ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
total 0
lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'


### first child (writes to fd 6)
(gdb) shell ls -l /proc/22134/fd
total 0
lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'

### second child (tries writing to fd 7)
(gdb) shell ls -l /proc/22135/fd
total 0
lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock


# debugging mc_send_master in both child processes:

### first child (all OK):
(gdb) break mc_send_master
Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
(gdb) c
Continuing.

Breakpoint 1, mc_send_master (what=0x555c6dd2cd00) at fork.c:681
warning: Source file is more recent than executable.
681 {
(gdb) info args
what = 0x555c6dd2cd00
(gdb) n
682    if (is_master)
(gdb) n
684    if (master_fd == -1)
(gdb) n
686    if (TYPEOF(what) != RAWSXP)
(gdb) n
688    R_xlen_t len = XLENGTH(what);
(gdb) n
689    unsigned char *b = RAW(what);
(gdb) n
693    if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) {
(gdb) info locals
len = 538
b = <optimized out>
n = <optimized out>
(gdb) s
writerep (fildes=6, buf=buf@entry=0x7ffdd6da7db0, nbyte=nbyte@entry=8)
    at fork.c:653
653 {
(gdb) info args
fildes = 6
buf = 0x7ffdd6da7db0
nbyte = 8
(gdb) n
654    size_t wbyte = 0;
(gdb) n
653 {
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) n
664 if (w == 0) {
(gdb) n
670 wbyte += w;
(gdb) n
671 if (wbyte == nbyte)
(gdb) n
672    return wbyte;
(gdb) n
674 }
(gdb) n
mc_send_master (what=<optimized out>) at fork.c:699
699    for (R_xlen_t i = 0; i < len; i += n) {
(gdb) n
700 n = writerep(master_fd, b + i, len - i);
(gdb) n
701 if (n < 1) {
(gdb) n
699    for (R_xlen_t i = 0; i < len; i += n) {
(gdb) n
707    return ScalarLogical(1);
(gdb) n
708 }



### second child (write fails with EBADF):
(gdb) break mc_send_master
Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
(gdb) c
Continuing.

Breakpoint 1, mc_send_master (what=0x563382a71910) at fork.c:681
warning: Source file is more recent than executable.
681 {
(gdb) info args
what = 0x563382a71910
(gdb) n
682    if (is_master)
(gdb) n
684    if (master_fd == -1)
(gdb) n
686    if (TYPEOF(what) != RAWSXP)
(gdb) n
688    R_xlen_t len = XLENGTH(what);
(gdb) n
689    unsigned char *b = RAW(what);
(gdb) n
693    if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) {
(gdb) info locals
len = 526
b = <optimized out>
n = <optimized out>
(gdb) s
writerep (fildes=7, buf=buf@entry=0x7fff4027ad60, nbyte=nbyte@entry=8)
    at fork.c:653
653 {
(gdb) info args
fildes = 7
buf = 0x7fff4027ad60
nbyte = 8
(gdb) n
654    size_t wbyte = 0;
(gdb) n
653 {
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) info locals
w = <optimized out>
wbyte = 0
ptr = 0x7fff4027ad60 "\016\002"
(gdb) n
657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
(gdb) n
658 if (w == -1) {
(gdb) n
659    if (errno == EINTR)
(gdb) n
674 }
(gdb) p __errno_location()
$1 = (int *) 0x7f50322cb540
(gdb) x/x $1
0x7f50322cb540: 0x00000009
(gdb) python import errno
(gdb) python print(errno.errorcode[9])
EBADF
(gdb) n
mc_send_master (what=<optimized out>) at fork.c:702
702    close(master_fd);
(gdb) n
704    error(_("write error, closing pipe to the master"));
(gdb) n
703    master_fd = -1;
(gdb) n
704    error(_("write error, closing pipe to the master"));
(gdb) n
685 error(_("there is no pipe to the master process"));


Does this help in any way?

Is there something else I can/should look at?

Regards,
Andreas


2019-11-27 15:04 GMT+01:00 Tomas Kalibera<[hidden email]>:
> Hi Andreas,
> the error is reported when some child process cannot send results to the master process, which originates from an error returned by write() - when write() returns -1 or 0. The logic around the writing has not changed since R 3.5.2. It should not be related to the printing in the child, only to returning the value. The problem may be originating from the execution environment, virtualization, and/or possibly from a lack of robustness in R. To resolve this we need to find out which error was returned and why. Either you can try to create a reproducible example (something I could use to trigger an error on my system and then debug) or to debug on your system (build R from source, ensure the bug is still triggered, then instrument to print the exact error from the OS and where it was detected, etc). In principle you could also try without code instrumentation just using strace. Just from looking at the code in R around the writing I am not seeing any bug there. If you choose to debug o
 n your system I can help with the instrumentation.

>
> Best
> Tomas
>
> On 11/27/19 12:40 PM, Andreas Kersting wrote:
>> Hi again,
>>
>> One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold.
>>
>> Sorry for this! Regards,
>> Andreas
>>
>> 2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:
>>> Hi,
>>>
>>> I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
>>>
>>> The basic outline of the problematic section of the script:
>>>
>>> # parts is a data.table with 88 rows
>>> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>>>    # do some data wrangling and write the result to a file
>>>    # ...
>>>
>>>    print(paste0("part ", i, " written successfully."))
>>>    return(TRUE)
>>> }, mc.preschedule = FALSE, mc.cores = 2L)
>>>
>>> str(mc_ret)
>>>
>>>
>>> Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
>>>
>>> Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
>>>
>>> But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
>>>
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>
>>> and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
>>>
>>>
>>> Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
>>> - The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
>>> - The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
>>> - Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
>>> - The problem also occurs for mc.preschedule = TRUE.
>>> - There is no error if I set mc.cores to 1.
>>> - And stressing again: the code works without any changes from Rscript for thousands of other data sets.
>>>
>>>
>>> Rscript -e "sessionInfo()":
>>> R version 3.5.2 (2018-12-20)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 18.04.2 LTS
>>>
>>> Matrix products: default
>>> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>>>
>>> locale:
>>>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
>>>   [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
>>>   [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
>>> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.5.2
>>>
>>>
>>> I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
>>>
>>> I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>>
>>>
>>> Actual output:
>>>
>>> [1] "part 51 written successfully."
>>> [1] "part 30 written successfully."
>>> [1] "part 32 written successfully."
>>> [1] "part 48 written successfully."
>>> [1] "part 63 written successfully."
>>> [1] "part 5 written successfully."
>>> [1][1] "part 14 written successfully." "part 18 written successfully."
>>>
>>> [1] "part 38 written successfully."
>>> [1] "part 11 written successfully."
>>> [1] "part 68 written successfully."
>>> [1] "part 45 written successfully."
>>> [1] "part 88 written successfully."
>>> [1] "part 36 written successfully."
>>> [1] "part 44 written successfully."
>>> [1] "part 55 written successfully."
>>> [1] "part 26 written successfully."
>>> [1] "part 37 written successfully."
>>> [1] "part 22 written successfully."
>>> [1] "part 13 written successfully."
>>> [1] "part 67 written successfully."
>>> [1] "part 10 written successfully."
>>> [1] "part 24 written successfully."
>>> [1] "part 20 written successfully."
>>> [1] "part 74 written successfully."
>>> [1] "part 50 written successfully."
>>> [1] "part 2 written successfully."
>>> [1] "part 17 written successfully."
>>> [1] "part 66 written successfully."
>>> [1] "part 61 written successfully."
>>> [1][1] "part 7 written successfully." "part 31 written successfully."
>>>
>>> [1] "part 70 written successfully."
>>> [1] "part 71 written successfully."
>>> [1][1] "part 76 written successfully."
>>>   "part 42 written successfully."
>>> [1][1] "part 72 written successfully." "part 12 written successfully."
>>>
>>> [1] "part 39 written successfully."
>>> [1] "part 58 written successfully."
>>> [1] "part 87 written successfully."
>>> [1] "part 81 written successfully."
>>> [1] "part 3 written successfully."
>>> [1] "part 29 written successfully."
>>> [1] "part 82 written successfully."
>>> [1] "part 54 written successfully."
>>> [1] "part 23 written successfully."
>>> [1] "part 56 written successfully."
>>> [1] "part 73 written successfully."
>>> [1] "part 27 written successfully."
>>> [1] "part 64 written successfully."
>>> [1] "part 46 written successfully."
>>> [1] "part 80 written successfully."
>>> [1] "part 53 written successfully."
>>> [1] "part 43 written successfully."
>>> [1] "part 47 written successfully."
>>> [1] "part 57 written successfully."
>>> [1] "part 75 written successfully."
>>> [1] "part 84 written successfully."
>>> [1] "part 85 written successfully."
>>> [1][1] "part 86 written successfully."
>>>   "part 19 written successfully."
>>> [1][1] "part 33 written successfully." "part 34 written successfully."
>>>
>>> [1] "part 21 written successfully."
>>> [1] "part 40 written successfully."
>>> [1][1] "part 4 written successfully."
>>>   "part 28 written successfully."
>>> [1] "part 62 written successfully."
>>> [1] "part 16 written successfully."
>>> [1] "part 79 written successfully."
>>> [1] "part 6 written successfully."
>>> [1] "part 77 written successfully."
>>> [1] "part 65 written successfully."
>>> [1] "part 60 written successfully."
>>> [1] "part 78 written successfully."
>>> [1] "part 1 written successfully."
>>> [1] "part 83 written successfully."
>>> [1] "part 9 written successfully."
>>> [1] "part 15 written successfully."
>>> [1][1] "part 41 written successfully."
>>>   "part 35 written successfully."
>>> [1][1] "part 8 written successfully." "part 69 written successfully."
>>>
>>> [1] "part 59 written successfully."
>>> [1] "part 25 written successfully."
>>> [1] "part 49 written successfully."
>>> [1] "part 52 written successfully."
>>> List of 88
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>>   $ : logi TRUE
>>> [1] "part 31 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
>>> [1] "part 71 written successfully."
>>> [1] "part 35 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 42 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 4 written successfully."
>>> Execution halted
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1][1] "part 30 written successfully." "part 40 written successfully."
>>>
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 41 written successfully."
>>> [1] "part 80 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 45 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 82 written successfully."
>>> [1] "part 54 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 59 written successfully."
>>> Execution halted
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 27 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 36 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 44 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> Execution halted
>>> [1] "part 83 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 61 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 70 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 66 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 10 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> Execution halted
>>> [1] "part 53 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 55 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 15 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 50 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 5 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> Execution halted
>>> [1] "part 48 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 32 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 69 written successfully."
>>> Execution halted
>>> [1] "part 52 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 77 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> Execution halted
>>> [1] "part 24 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 56 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 6 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 62 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 33 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 16 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 34 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 74 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 2 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 19 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> [1] "part 9 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 11 written successfully."
>>> [1] "part 28 written successfully."
>>> [1] "part 29 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 17 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 65 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> Execution halted
>>> Execution halted
>>> [1] "part 68 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 43 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 38 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 1 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 57 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 8 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1] "part 25 written successfully."
>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>    write error, closing pipe to the master
>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>> [1]List of 85
>>>   $ : logi TRUE
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>>   $ : NULL
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Tomas Kalibera
Hi Andreas,

thank you very much, good job finding it was EBADF. Now the question is
why the pipe has been closed prematurely; it could be accidentally by R
(a race condition in the cleanup code in fork.c) or possibly by some
other code running in the same process (maybe the R program itself or
some other code it runs). Maybe we can take this off the list and come
back when we know the cause or have it fixed.

It would help a lot if you could try with R built from source, with
optimizations disabled to get more accurate debug symbols (e.g. env
CFLAGS="-Wall -O0 -gdwarf-2 -g3" CXXFLAGS="-Wall -O0 -gdwarf-2 -g3"
./configure), and with MC_DEBUG defined in fork.c - line 26. Ideally in
R-devel, so that we are sure the problem still exists. The debug
messages should give a hint whether it was R (fork.c) that closed the
pipe and why. Maybe you could also add a debug message to
close_fds_child_ci() to see if it was closed there. Maybe you could find
this out even in your current debugging setup via breakpoints and
backtraces, but I think it may be easier to build from source with these
debugging messages.

Also if you could send me a complete example I could run that causes
this on your system, that would be nice (even if it didn't cause the
problem on my system).

Thanks
Tomas

On 11/28/19 6:35 AM, Andreas Kersting wrote:

> Hi Tomas,
>
> Thanks for your prompt reply and your offer to help. I might need to get back to this since I am not too experienced in debugging these kinds of issues. Anyway, I gave it a try and I think I have found the immediate cause:
>
> I installed the debug symbols (r-base-core-dbg), placed https://github.com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd and changed the wrapper code to:
>
> mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) {
>      # we fail for the input resulting in parts having 85 rows
>      if (nrow(parts) == 85L && !file.exists(as.character(Sys.getpid()))) {
>        file.create(as.character(Sys.getpid()))
>        print(Sys.getpid())
>        Sys.sleep(30)
>      }
>
>      # ...
>
>      return(TRUE)
>    }, mc.preschedule = TRUE, mc.cores = 2L)
>
> This way I ended up with only two child processes to which I each attached a debugger. In total I ran about 10 debugging sessions and it was always the second child process failing. The errno after write returned -1 was 9 (EBADF).
>
>  From what I can see, the reason for this is that the second child tries to write to fd 7, but already during the very beginning of the first invocation of the anonymous function to parallelize, i.e. during Sys.sleep(30), there is no such file descriptor. From this observation I would conclude that it is NOT the code run from that function, i.e. # ...,  causing the issue. Let me point out again, that this is NOT the very first invocation of mclapply in this R session. There is at least one previous call to it, which works fine.
>
>
> File descriptors directly after attaching gdb to both child processes during Sys.sleep(30):
>
> ### master
> root@ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
> total 0
> lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
> lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
> lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
> lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
> lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
> lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
> l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'
>
>
> ### first child (writes to fd 6)
> (gdb) shell ls -l /proc/22134/fd
> total 0
> lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
> l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
> l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'
>
> ### second child (tries writing to fd 7)
> (gdb) shell ls -l /proc/22135/fd
> total 0
> lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
> l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>
>
> # debugging mc_send_master in both child processes:
>
> ### first child (all OK):
> (gdb) break mc_send_master
> Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
> (gdb) c
> Continuing.
>
> Breakpoint 1, mc_send_master (what=0x555c6dd2cd00) at fork.c:681
> warning: Source file is more recent than executable.
> 681 {
> (gdb) info args
> what = 0x555c6dd2cd00
> (gdb) n
> 682    if (is_master)
> (gdb) n
> 684    if (master_fd == -1)
> (gdb) n
> 686    if (TYPEOF(what) != RAWSXP)
> (gdb) n
> 688    R_xlen_t len = XLENGTH(what);
> (gdb) n
> 689    unsigned char *b = RAW(what);
> (gdb) n
> 693    if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) {
> (gdb) info locals
> len = 538
> b = <optimized out>
> n = <optimized out>
> (gdb) s
> writerep (fildes=6, buf=buf@entry=0x7ffdd6da7db0, nbyte=nbyte@entry=8)
>      at fork.c:653
> 653 {
> (gdb) info args
> fildes = 6
> buf = 0x7ffdd6da7db0
> nbyte = 8
> (gdb) n
> 654    size_t wbyte = 0;
> (gdb) n
> 653 {
> (gdb) n
> 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
> (gdb) n
> 658 if (w == -1) {
> (gdb) n
> 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
> (gdb) n
> 658 if (w == -1) {
> (gdb) n
> 664 if (w == 0) {
> (gdb) n
> 670 wbyte += w;
> (gdb) n
> 671 if (wbyte == nbyte)
> (gdb) n
> 672    return wbyte;
> (gdb) n
> 674 }
> (gdb) n
> mc_send_master (what=<optimized out>) at fork.c:699
> 699    for (R_xlen_t i = 0; i < len; i += n) {
> (gdb) n
> 700 n = writerep(master_fd, b + i, len - i);
> (gdb) n
> 701 if (n < 1) {
> (gdb) n
> 699    for (R_xlen_t i = 0; i < len; i += n) {
> (gdb) n
> 707    return ScalarLogical(1);
> (gdb) n
> 708 }
>
>
>
> ### second child (write fails with EBADF):
> (gdb) break mc_send_master
> Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
> (gdb) c
> Continuing.
>
> Breakpoint 1, mc_send_master (what=0x563382a71910) at fork.c:681
> warning: Source file is more recent than executable.
> 681 {
> (gdb) info args
> what = 0x563382a71910
> (gdb) n
> 682    if (is_master)
> (gdb) n
> 684    if (master_fd == -1)
> (gdb) n
> 686    if (TYPEOF(what) != RAWSXP)
> (gdb) n
> 688    R_xlen_t len = XLENGTH(what);
> (gdb) n
> 689    unsigned char *b = RAW(what);
> (gdb) n
> 693    if (writerep(master_fd, &len, sizeof(len)) != sizeof(len)) {
> (gdb) info locals
> len = 526
> b = <optimized out>
> n = <optimized out>
> (gdb) s
> writerep (fildes=7, buf=buf@entry=0x7fff4027ad60, nbyte=nbyte@entry=8)
>      at fork.c:653
> 653 {
> (gdb) info args
> fildes = 7
> buf = 0x7fff4027ad60
> nbyte = 8
> (gdb) n
> 654    size_t wbyte = 0;
> (gdb) n
> 653 {
> (gdb) n
> 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
> (gdb) n
> 658 if (w == -1) {
> (gdb) info locals
> w = <optimized out>
> wbyte = 0
> ptr = 0x7fff4027ad60 "\016\002"
> (gdb) n
> 657 ssize_t w = write(fildes, ptr + wbyte, nbyte - wbyte);
> (gdb) n
> 658 if (w == -1) {
> (gdb) n
> 659    if (errno == EINTR)
> (gdb) n
> 674 }
> (gdb) p __errno_location()
> $1 = (int *) 0x7f50322cb540
> (gdb) x/x $1
> 0x7f50322cb540: 0x00000009
> (gdb) python import errno
> (gdb) python print(errno.errorcode[9])
> EBADF
> (gdb) n
> mc_send_master (what=<optimized out>) at fork.c:702
> 702    close(master_fd);
> (gdb) n
> 704    error(_("write error, closing pipe to the master"));
> (gdb) n
> 703    master_fd = -1;
> (gdb) n
> 704    error(_("write error, closing pipe to the master"));
> (gdb) n
> 685 error(_("there is no pipe to the master process"));
>
>
> Does this help in any way?
>
> Is there something else I can/should look at?
>
> Regards,
> Andreas
>
>
> 2019-11-27 15:04 GMT+01:00 Tomas Kalibera<[hidden email]>:
>> Hi Andreas,
>> the error is reported when some child process cannot send results to the master process, which originates from an error returned by write() - when write() returns -1 or 0. The logic around the writing has not changed since R 3.5.2. It should not be related to the printing in the child, only to returning the value. The problem may be originating from the execution environment, virtualization, and/or possibly from a lack of robustness in R. To resolve this we need to find out which error was returned and why. Either you can try to create a reproducible example (something I could use to trigger an error on my system and then debug) or to debug on your system (build R from source, ensure the bug is still triggered, then instrument to print the exact error from the OS and where it was detected, etc). In principle you could also try without code instrumentation just using strace. Just from looking at the code in R around the writing I am not seeing any bug there. If you choose to debug
 on your system I can help with the instrumentation.

>>
>> Best
>> Tomas
>>
>> On 11/27/19 12:40 PM, Andreas Kersting wrote:
>>> Hi again,
>>>
>>> One important correction of my first message: I misinterpreted the output. Actually in that R session 2 input files were processed one after the other in a loop. The first (with 88 parts went fine). The second (with 85 parts) produced the sendMaster errors and failed. If (in a new session via Rscript) I only process the second input file it will work. The other observations on R vs Rscript, NFS share etc. still hold.
>>>
>>> Sorry for this! Regards,
>>> Andreas
>>>
>>> 2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:
>>>> Hi,
>>>>
>>>> I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:
>>>>
>>>> The basic outline of the problematic section of the script:
>>>>
>>>> # parts is a data.table with 88 rows
>>>> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>>>>     # do some data wrangling and write the result to a file
>>>>     # ...
>>>>
>>>>     print(paste0("part ", i, " written successfully."))
>>>>     return(TRUE)
>>>> }, mc.preschedule = FALSE, mc.cores = 2L)
>>>>
>>>> str(mc_ret)
>>>>
>>>>
>>>> Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.
>>>>
>>>> Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.
>>>>
>>>> But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:
>>>>
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>>
>>>> and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.
>>>>
>>>>
>>>> Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
>>>> - The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
>>>> - The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
>>>> - Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
>>>> - The problem also occurs for mc.preschedule = TRUE.
>>>> - There is no error if I set mc.cores to 1.
>>>> - And stressing again: the code works without any changes from Rscript for thousands of other data sets.
>>>>
>>>>
>>>> Rscript -e "sessionInfo()":
>>>> R version 3.5.2 (2018-12-20)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>> Running under: Ubuntu 18.04.2 LTS
>>>>
>>>> Matrix products: default
>>>> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
>>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>>>>
>>>> locale:
>>>>    [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
>>>>    [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
>>>>    [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
>>>> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] compiler_3.5.2
>>>>
>>>>
>>>> I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.
>>>>
>>>> I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?
>>>>
>>>> Regards,
>>>> Andreas
>>>>
>>>>
>>>>
>>>>
>>>> Actual output:
>>>>
>>>> [1] "part 51 written successfully."
>>>> [1] "part 30 written successfully."
>>>> [1] "part 32 written successfully."
>>>> [1] "part 48 written successfully."
>>>> [1] "part 63 written successfully."
>>>> [1] "part 5 written successfully."
>>>> [1][1] "part 14 written successfully." "part 18 written successfully."
>>>>
>>>> [1] "part 38 written successfully."
>>>> [1] "part 11 written successfully."
>>>> [1] "part 68 written successfully."
>>>> [1] "part 45 written successfully."
>>>> [1] "part 88 written successfully."
>>>> [1] "part 36 written successfully."
>>>> [1] "part 44 written successfully."
>>>> [1] "part 55 written successfully."
>>>> [1] "part 26 written successfully."
>>>> [1] "part 37 written successfully."
>>>> [1] "part 22 written successfully."
>>>> [1] "part 13 written successfully."
>>>> [1] "part 67 written successfully."
>>>> [1] "part 10 written successfully."
>>>> [1] "part 24 written successfully."
>>>> [1] "part 20 written successfully."
>>>> [1] "part 74 written successfully."
>>>> [1] "part 50 written successfully."
>>>> [1] "part 2 written successfully."
>>>> [1] "part 17 written successfully."
>>>> [1] "part 66 written successfully."
>>>> [1] "part 61 written successfully."
>>>> [1][1] "part 7 written successfully." "part 31 written successfully."
>>>>
>>>> [1] "part 70 written successfully."
>>>> [1] "part 71 written successfully."
>>>> [1][1] "part 76 written successfully."
>>>>    "part 42 written successfully."
>>>> [1][1] "part 72 written successfully." "part 12 written successfully."
>>>>
>>>> [1] "part 39 written successfully."
>>>> [1] "part 58 written successfully."
>>>> [1] "part 87 written successfully."
>>>> [1] "part 81 written successfully."
>>>> [1] "part 3 written successfully."
>>>> [1] "part 29 written successfully."
>>>> [1] "part 82 written successfully."
>>>> [1] "part 54 written successfully."
>>>> [1] "part 23 written successfully."
>>>> [1] "part 56 written successfully."
>>>> [1] "part 73 written successfully."
>>>> [1] "part 27 written successfully."
>>>> [1] "part 64 written successfully."
>>>> [1] "part 46 written successfully."
>>>> [1] "part 80 written successfully."
>>>> [1] "part 53 written successfully."
>>>> [1] "part 43 written successfully."
>>>> [1] "part 47 written successfully."
>>>> [1] "part 57 written successfully."
>>>> [1] "part 75 written successfully."
>>>> [1] "part 84 written successfully."
>>>> [1] "part 85 written successfully."
>>>> [1][1] "part 86 written successfully."
>>>>    "part 19 written successfully."
>>>> [1][1] "part 33 written successfully." "part 34 written successfully."
>>>>
>>>> [1] "part 21 written successfully."
>>>> [1] "part 40 written successfully."
>>>> [1][1] "part 4 written successfully."
>>>>    "part 28 written successfully."
>>>> [1] "part 62 written successfully."
>>>> [1] "part 16 written successfully."
>>>> [1] "part 79 written successfully."
>>>> [1] "part 6 written successfully."
>>>> [1] "part 77 written successfully."
>>>> [1] "part 65 written successfully."
>>>> [1] "part 60 written successfully."
>>>> [1] "part 78 written successfully."
>>>> [1] "part 1 written successfully."
>>>> [1] "part 83 written successfully."
>>>> [1] "part 9 written successfully."
>>>> [1] "part 15 written successfully."
>>>> [1][1] "part 41 written successfully."
>>>>    "part 35 written successfully."
>>>> [1][1] "part 8 written successfully." "part 69 written successfully."
>>>>
>>>> [1] "part 59 written successfully."
>>>> [1] "part 25 written successfully."
>>>> [1] "part 49 written successfully."
>>>> [1] "part 52 written successfully."
>>>> List of 88
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>>    $ : logi TRUE
>>>> [1] "part 31 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
>>>> [1] "part 71 written successfully."
>>>> [1] "part 35 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 42 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 4 written successfully."
>>>> Execution halted
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1][1] "part 30 written successfully." "part 40 written successfully."
>>>>
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 41 written successfully."
>>>> [1] "part 80 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 45 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 82 written successfully."
>>>> [1] "part 54 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 59 written successfully."
>>>> Execution halted
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 27 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 36 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 44 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> Execution halted
>>>> [1] "part 83 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 61 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 70 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 66 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 10 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> Execution halted
>>>> [1] "part 53 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 55 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 15 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 50 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 5 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> Execution halted
>>>> [1] "part 48 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 32 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 69 written successfully."
>>>> Execution halted
>>>> [1] "part 52 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 77 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> Execution halted
>>>> [1] "part 24 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 56 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 6 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 62 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 33 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 16 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 34 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 74 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 2 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 19 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> [1] "part 9 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 11 written successfully."
>>>> [1] "part 28 written successfully."
>>>> [1] "part 29 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 17 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 65 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> Execution halted
>>>> Execution halted
>>>> [1] "part 68 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 43 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 38 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 1 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 57 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 8 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1] "part 25 written successfully."
>>>> Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
>>>>     write error, closing pipe to the master
>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
>>>> [1]List of 85
>>>>    $ : logi TRUE
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>>    $ : NULL
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Andreas Kersting
Hi Tomas,

I rebuild R (v3.5.2 for now, R-devel to follow) from the Debian package with MC_DEBUG defined and hopefully also with "-Wall -O0 -gdwarf-2 -g3", though I still have to verify this.

Below is the output. I think it is a total of two mclapply invocations in this R session, the failing one starting around the lines "[1] 15381" and "[1] 15382". The "Error in partDF::write_partDF ..." is because the script/package checks the return value of mclapply and detects that it is not a list of length 85 with only the elements "TRUE".

Regarding sending you the complete example: I first have to figure out if this is possible at all, because it would involve data of a client.

Regards,
Andreas

parent[15366] created pipes: comm (6->5), sir (8->7)
parent registers new child 15379
child process 15379 started
parent[15366] created pipes: comm (7->6), sir (10->9)
parent registers new child 15380
child process 15380 started
select_children: added child 15380 (6)
select_children: added child 15379 (5)
select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
child 15380: send_master (550 bytes)
  sr = 1
 - read select 1 children: 15380
child 15380: 'mcexit' called
child 15380 is waiting for permission to exit
read_child_ci(15380) - read length returned 8
read_child_ci(15380) - read 550 at 0 returned 550
select_children: added child 15380 (6)
select_children: added child 15379 (5)
select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
  sr = 1
 - read select 1 children: 15380
read_child_ci(15380) - read length returned 8
detached child 15380 (signal 10)
child process 15380 got SIGUSR1; child_exit_status=-1
child 15380: exiting
select_children: added child 15379 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
child 15380 terminated with exit status 0
child 15379: send_master (550 bytes)
  sr = 1
 - read select 1 children: 15379
read_child_ci(15379) - read length returned 8
read_child_ci(15379) - read 550 at 0 returned 550
select_children: added child 15379 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
child 15379: 'mcexit' called
child 15379 is waiting for permission to exit
  sr = 1
 - read select 1 children: 15379
read_child_ci(15379) - read length returned 8
detached child 15379 (signal 10)
child process 15379 got SIGUSR1; child_exit_status=-1
child 15379: exiting
removing waited-for child 15380 from the list
killed detached child 15379 (signal 15)
removing waited-for child 2147483647 from the list
child 15379 terminated with exit status 0
removing waited-for child 15379 from the list
parent[15366] created pipes: comm (6->5), sir (8->7)
parent registers new child 15381
child process 15381 started
parent[15366] created pipes: comm (7->6), sir (10->9)
[1] 15381
parent registers new child 15382
child process 15382 started
select_children: added child 15382 (6)
select_children: added child 15381 (5)
select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
  sr = 1
 - read select 1 children: 15382
read_child_ci(15382) - read length returned 0
detached child 15382 (signal 10)
child process 15382 got SIGUSR1; child_exit_status=-1
select_children: added child 15381 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
[1] 15382
child 15382: send_master (526 bytes)
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> lapply -> FUN -> sendMaster
child 15382: 'mcexit' called
child 15382: exiting
child 15382 terminated with exit status 1
child 15381: send_master (538 bytes)
  sr = 1
 - read select 1 children: 15381
read_child_ci(15381) - read length returned 8
read_child_ci(15381) - read 538 at 0 returned 538
child 15381: 'mcexit' called
child 15381 is waiting for permission to exit
select_children: added child 15381 (5)
select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
  sr = 1
 - read select 1 children: 15381
read_child_ci(15381) - read length returned 8
detached child 15381 (signal 10)
child process 15381 got SIGUSR1; child_exit_status=-1
child 15381: exiting
removing waited-for child 15382 from the list
killed detached child 15381 (signal 15)
removing waited-for child 2147483647 from the list
Error in partDF::write_partDF(X, out_path, c("Date", "Panel", "Type",  :
  at least one core did not return from parllel write; maybe it was killed (be the Linux Out of Memory Killer ?) or there was a fatal error in the forked process(es)
Calls: lapply -> FUN -> write_partDF_repo -> <Anonymous>
Execution halted
killed detached child 15381 (signal 9)
child 15381 terminated with exit status 0
removing waited-for child 15381 from the list
process 15366 parallel shutdown ok


2019-11-28 10:40 GMT+01:00 Tomas Kalibera<[hidden email]>:

> Hi Andreas,
> thank you very much, good job finding it was EBADF. Now the question is=20
> why the pipe has been closed prematurely; it could be accidentally by R=20
> (a race condition in the cleanup code in fork.c) or possibly by some=20
> other code running in the same process (maybe the R program itself or=20
> some other code it runs). Maybe we can take this off the list and come=20
> back when we know the cause or have it fixed.
>
> It would help a lot if you could try with R built from source, with=20
> optimizations disabled to get more accurate debug symbols (e.g. env=20
> CFLAGS=3D"-Wall -O0 -gdwarf-2 -g3" CXXFLAGS=3D"-Wall -O0 -gdwarf-2 -g3"=20
> =2E/configure), and with MC_DEBUG defined in fork.c - line 26. Ideally in=
> =20
> R-devel, so that we are sure the problem still exists. The debug=20
> messages should give a hint whether it was R (fork.c) that closed the=20
> pipe and why. Maybe you could also add a debug message to=20
> close_fds_child_ci() to see if it was closed there. Maybe you could find =
>
> this out even in your current debugging setup via breakpoints and=20
> backtraces, but I think it may be easier to build from source with these =
>
> debugging messages.
>
> Also if you could send me a complete example I could run that causes=20
> this on your system, that would be nice (even if it didn't cause the=20
> problem on my system).
>
> Thanks
> Tomas
>
> On 11/28/19 6:35 AM, Andreas Kersting wrote:
>> Hi Tomas,
>>
>> Thanks for your prompt reply and your offer to help. I might need to ge=
> t back to this since I am not too experienced in debugging these kinds of=
> issues. Anyway, I gave it a try and I think I have found the immediate c=
> ause:
>>
>> I installed the debug symbols (r-base-core-dbg), placed https://github.=
> com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd=
> and changed the wrapper code to:
>>
>> mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) {
>>      # we fail for the input resulting in parts having 85 rows
>>      if (nrow(parts) =3D=3D 85L && !file.exists(as.character(Sys.getpid=
> ()))) {
>>        file.create(as.character(Sys.getpid()))
>>        print(Sys.getpid())
>>        Sys.sleep(30)
>>      }
>>
>>      # ...
>>
>>      return(TRUE)
>>    }, mc.preschedule =3D TRUE, mc.cores =3D 2L)
>>
>> This way I ended up with only two child processes to which I each attac=
> hed a debugger. In total I ran about 10 debugging sessions and it was alw=
> ays the second child process failing. The errno after write returned -1 w=
> as 9 (EBADF).
>>
>>  From what I can see, the reason for this is that the second child trie=
> s to write to fd 7, but already during the very beginning of the first in=
> vocation of the anonymous function to parallelize, i.e. during Sys.sleep(=
> 30), there is no such file descriptor. From this observation I would conc=
> lude that it is NOT the code run from that function, i.e. # ...,  causing=
> the issue. Let me point out again, that this is NOT the very first invoc=
> ation of mclapply in this R session. There is at least one previous call =
> to it, which works fine.
>>
>>
>> File descriptors directly after attaching gdb to both child processes d=
> uring Sys.sleep(30):
>>
>> ### master
>> root@ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
>> total 0
>> lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
>> lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
>> lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
>> lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
>> lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
>> lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
>> l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'
>>
>>
>> ### first child (writes to fd 6)
>> (gdb) shell ls -l /proc/22134/fd
>> total 0
>> lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
>> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
>> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
>> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
>> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>> l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
>> l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'
>>
>> ### second child (tries writing to fd 7)
>> (gdb) shell ls -l /proc/22135/fd
>> total 0
>> lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
>> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
>> l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
>> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
>> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
>> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>>
>>
>> # debugging mc_send_master in both child processes:
>>
>> ### first child (all OK):
>> (gdb) break mc_send_master
>> Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, mc_send_master (what=3D0x555c6dd2cd00) at fork.c:681
>> warning: Source file is more recent than executable.
>> 681 {
>> (gdb) info args
>> what =3D 0x555c6dd2cd00
>> (gdb) n
>> 682    if (is_master)
>> (gdb) n
>> 684    if (master_fd =3D=3D -1)
>> (gdb) n
>> 686    if (TYPEOF(what) !=3D RAWSXP)
>> (gdb) n
>> 688    R_xlen_t len =3D XLENGTH(what);
>> (gdb) n
>> 689    unsigned char *b =3D RAW(what);
>> (gdb) n
>> 693    if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
>> (gdb) info locals
>> len =3D 538
>> b =3D <optimized out>
>> n =3D <optimized out>
>> (gdb) s
>> writerep (fildes=3D6, buf=3Dbuf@entry=3D0x7ffdd6da7db0, nbyte=3Dnbyte@e=
> ntry=3D8)
>>      at fork.c:653
>> 653 {
>> (gdb) info args
>> fildes =3D 6
>> buf =3D 0x7ffdd6da7db0
>> nbyte =3D 8
>> (gdb) n
>> 654    size_t wbyte =3D 0;
>> (gdb) n
>> 653 {
>> (gdb) n
>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>> (gdb) n
>> 658 if (w =3D=3D -1) {
>> (gdb) n
>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>> (gdb) n
>> 658 if (w =3D=3D -1) {
>> (gdb) n
>> 664 if (w =3D=3D 0) {
>> (gdb) n
>> 670 wbyte +=3D w;
>> (gdb) n
>> 671 if (wbyte =3D=3D nbyte)
>> (gdb) n
>> 672    return wbyte;
>> (gdb) n
>> 674 }
>> (gdb) n
>> mc_send_master (what=3D<optimized out>) at fork.c:699
>> 699    for (R_xlen_t i =3D 0; i < len; i +=3D n) {
>> (gdb) n
>> 700 n =3D writerep(master_fd, b + i, len - i);
>> (gdb) n
>> 701 if (n < 1) {
>> (gdb) n
>> 699    for (R_xlen_t i =3D 0; i < len; i +=3D n) {
>> (gdb) n
>> 707    return ScalarLogical(1);
>> (gdb) n
>> 708 }
>>
>>
>>
>> ### second child (write fails with EBADF):
>> (gdb) break mc_send_master
>> Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, mc_send_master (what=3D0x563382a71910) at fork.c:681
>> warning: Source file is more recent than executable.
>> 681 {
>> (gdb) info args
>> what =3D 0x563382a71910
>> (gdb) n
>> 682    if (is_master)
>> (gdb) n
>> 684    if (master_fd =3D=3D -1)
>> (gdb) n
>> 686    if (TYPEOF(what) !=3D RAWSXP)
>> (gdb) n
>> 688    R_xlen_t len =3D XLENGTH(what);
>> (gdb) n
>> 689    unsigned char *b =3D RAW(what);
>> (gdb) n
>> 693    if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
>> (gdb) info locals
>> len =3D 526
>> b =3D <optimized out>
>> n =3D <optimized out>
>> (gdb) s
>> writerep (fildes=3D7, buf=3Dbuf@entry=3D0x7fff4027ad60, nbyte=3Dnbyte@e=
> ntry=3D8)
>>      at fork.c:653
>> 653 {
>> (gdb) info args
>> fildes =3D 7
>> buf =3D 0x7fff4027ad60
>> nbyte =3D 8
>> (gdb) n
>> 654    size_t wbyte =3D 0;
>> (gdb) n
>> 653 {
>> (gdb) n
>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>> (gdb) n
>> 658 if (w =3D=3D -1) {
>> (gdb) info locals
>> w =3D <optimized out>
>> wbyte =3D 0
>> ptr =3D 0x7fff4027ad60 "\016\002"
>> (gdb) n
>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>> (gdb) n
>> 658 if (w =3D=3D -1) {
>> (gdb) n
>> 659    if (errno =3D=3D EINTR)
>> (gdb) n
>> 674 }
>> (gdb) p __errno_location()
>> $1 =3D (int *) 0x7f50322cb540
>> (gdb) x/x $1
>> 0x7f50322cb540: 0x00000009
>> (gdb) python import errno
>> (gdb) python print(errno.errorcode[9])
>> EBADF
>> (gdb) n
>> mc_send_master (what=3D<optimized out>) at fork.c:702
>> 702    close(master_fd);
>> (gdb) n
>> 704    error(_("write error, closing pipe to the master"));
>> (gdb) n
>> 703    master_fd =3D -1;
>> (gdb) n
>> 704    error(_("write error, closing pipe to the master"));
>> (gdb) n
>> 685 error(_("there is no pipe to the master process"));
>>
>>
>> Does this help in any way?
>>
>> Is there something else I can/should look at?
>>
>> Regards,
>> Andreas
>>
>>
>> 2019-11-27 15:04 GMT+01:00 Tomas Kalibera<[hidden email]>:
>>> Hi Andreas,
>>> the error is reported when some child process cannot send results to t=
> he master process, which originates from an error returned by write() - w=
> hen write() returns -1 or 0. The logic around the writing has not changed=
> since R 3.5.2. It should not be related to the printing in the child, on=
> ly to returning the value. The problem may be originating from the execut=
> ion environment, virtualization, and/or possibly from a lack of robustnes=
> s in R. To resolve this we need to find out which error was returned and =
> why. Either you can try to create a reproducible example (something I cou=
> ld use to trigger an error on my system and then debug) or to debug on yo=
> ur system (build R from source, ensure the bug is still triggered, then i=
> nstrument to print the exact error from the OS and where it was detected,=
> etc). In principle you could also try without code instrumentation just =
> using strace. Just from looking at the code in R around the writing I am =
> not seeing any bug there. If you choose to debug on your system I can hel=
> p with the instrumentation.
>>>
>>> Best
>>> Tomas
>>>
>>> On 11/27/19 12:40 PM, Andreas Kersting wrote:
>>>> Hi again,
>>>>
>>>> One important correction of my first message: I misinterpreted the ou=
> tput. Actually in that R session 2 input files were processed one after t=
> he other in a loop. The first (with 88 parts went fine). The second (with=
> 85 parts) produced the sendMaster errors and failed. If (in a new sessio=
> n via Rscript) I only process the second input file it will work. The oth=
> er observations on R vs Rscript, NFS share etc. still hold.
>>>>
>>>> Sorry for this! Regards,
>>>> Andreas
>>>>
>>>> 2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:
>>>>> Hi,
>>>>>
>>>>> I am facing a very weird problem with parallel::mclapply. I have a s=
> cript which does some data wrangling on an input dataset in parallel and =
> then writes the results to disk. I have been using this script daily for =
> more than one year always on an EC2 instance launched from the same AMI (=
> no updates installed after launch) and processed thousands of different i=
> nput data sets successfully. I now have an input dataset for which I face=
> the following bug:
>>>>>
>>>>> The basic outline of the problematic section of the script:
>>>>>
>>>>> # parts is a data.table with 88 rows
>>>>> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>>>>>     # do some data wrangling and write the result to a file
>>>>>     # ...
>>>>>
>>>>>     print(paste0("part ", i, " written successfully."))
>>>>>     return(TRUE)
>>>>> }, mc.preschedule =3D FALSE, mc.cores =3D 2L)
>>>>>
>>>>> str(mc_ret)
>>>>>
>>>>>
>>>>> Expected output: "part i written successfully." is printed 88 times,=
> once for each value of i. mc_ret is a list of length 88, each element be=
> ing TRUE. Its structure is printed once. All outputs are created successf=
> ully.
>>>>>
>>>>> Actual output (see end of the message): "part i written successfully=
> =2E" is printed 88 times, once for each value of i. mc_ret is a list of l=
> ength 88, each element being TRUE. Its structure is printed. All outputs =
> are created successfully. So far so good.
>>>>>
>>>>> But then "part i written successfully." it is printed another X time=
> s, for values of i for which it was already printed. This output is inter=
> mingled with X-1 times the following error message:
>>>>>
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>>
>>>>> and Y times the message "Execution halted". mc_ret is printed again,=
> now being a list of length 85, with the first element being TRUE and all=
> other elements being NULL. X and Y vary from run to run.
>>>>>
>>>>>
>>>>> Now to the main problem: I tried very hard to create a reproducible =
> example, but I failed. What I observed:
>>>>> - The output is (and has always been) written to path which is on an=
> NFS share. If I instead write to a path on a local disk it will work.
>>>>> - The script is invoked using Rscript. If I instead source it from a=
> n interactive R session it works. There are at least two more people who =
> have observed this: https://stackoverflow.com/questions/51986674/mclapply=
> -sendmaster-error-only-with-rscript
>>>>> - Before the call to mclapply the code acquires an exclusive file lo=
> ck on a dedicated lock file, not written to but also on the NFS share. If=
> I remove the code acquiring the lock, the whole script will also work if=
> called using Rscript.
>>>>> - The problem also occurs for mc.preschedule =3D TRUE.
>>>>> - There is no error if I set mc.cores to 1.
>>>>> - And stressing again: the code works without any changes from Rscri=
> pt for thousands of other data sets.
>>>>>
>>>>>
>>>>> Rscript -e "sessionInfo()":
>>>>> R version 3.5.2 (2018-12-20)
>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>> Running under: Ubuntu 18.04.2 LTS
>>>>>
>>>>> Matrix products: default
>>>>> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
>>>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>>>>>
>>>>> locale:
>>>>>    [1] LC_CTYPE=3DC.UTF-8       LC_NUMERIC=3DC           LC_TIME=3DC=
> =2EUTF-8
>>>>>    [4] LC_COLLATE=3DC.UTF-8     LC_MONETARY=3DC.UTF-8    LC_MESSAGES=
> =3DC.UTF-8
>>>>>    [7] LC_PAPER=3DC.UTF-8       LC_NAME=3DC              LC_ADDRESS=3D=
> C
>>>>> [10] LC_TELEPHONE=3DC         LC_MEASUREMENT=3DC.UTF-8 LC_IDENTIFICA=
> TION=3DC
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base=
>
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] compiler_3.5.2
>>>>>
>>>>>
>>>>> I know this is a fairly old R version. I have not been able to repro=
> duce the bug with a more recent version, but since it is so difficult to =
> trigger, this does not mean much, I guess. I have looked through the chan=
> ges made to the code of mclapply since that version and could not find so=
> mething directly related. I am not even sure if it is a problem in the pa=
> rallel package or some other (memory) bug. What strikes me is that others=
> have observed a very similar error when using Rscript but not when using=
> an interactive R session, just like I do.
>>>>>
>>>>> I am not expecting a fix based on the information I provide, but may=
> be someone has some thoughts on this!?
>>>>>
>>>>> Regards,
>>>>> Andreas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Actual output:
>>>>>
>>>>> [1] "part 51 written successfully."
>>>>> [1] "part 30 written successfully."
>>>>> [1] "part 32 written successfully."
>>>>> [1] "part 48 written successfully."
>>>>> [1] "part 63 written successfully."
>>>>> [1] "part 5 written successfully."
>>>>> [1][1] "part 14 written successfully." "part 18 written successfully=
> =2E"
>>>>>
>>>>> [1] "part 38 written successfully."
>>>>> [1] "part 11 written successfully."
>>>>> [1] "part 68 written successfully."
>>>>> [1] "part 45 written successfully."
>>>>> [1] "part 88 written successfully."
>>>>> [1] "part 36 written successfully."
>>>>> [1] "part 44 written successfully."
>>>>> [1] "part 55 written successfully."
>>>>> [1] "part 26 written successfully."
>>>>> [1] "part 37 written successfully."
>>>>> [1] "part 22 written successfully."
>>>>> [1] "part 13 written successfully."
>>>>> [1] "part 67 written successfully."
>>>>> [1] "part 10 written successfully."
>>>>> [1] "part 24 written successfully."
>>>>> [1] "part 20 written successfully."
>>>>> [1] "part 74 written successfully."
>>>>> [1] "part 50 written successfully."
>>>>> [1] "part 2 written successfully."
>>>>> [1] "part 17 written successfully."
>>>>> [1] "part 66 written successfully."
>>>>> [1] "part 61 written successfully."
>>>>> [1][1] "part 7 written successfully." "part 31 written successfully.=
> "
>>>>>
>>>>> [1] "part 70 written successfully."
>>>>> [1] "part 71 written successfully."
>>>>> [1][1] "part 76 written successfully."
>>>>>    "part 42 written successfully."
>>>>> [1][1] "part 72 written successfully." "part 12 written successfully=
> =2E"
>>>>>
>>>>> [1] "part 39 written successfully."
>>>>> [1] "part 58 written successfully."
>>>>> [1] "part 87 written successfully."
>>>>> [1] "part 81 written successfully."
>>>>> [1] "part 3 written successfully."
>>>>> [1] "part 29 written successfully."
>>>>> [1] "part 82 written successfully."
>>>>> [1] "part 54 written successfully."
>>>>> [1] "part 23 written successfully."
>>>>> [1] "part 56 written successfully."
>>>>> [1] "part 73 written successfully."
>>>>> [1] "part 27 written successfully."
>>>>> [1] "part 64 written successfully."
>>>>> [1] "part 46 written successfully."
>>>>> [1] "part 80 written successfully."
>>>>> [1] "part 53 written successfully."
>>>>> [1] "part 43 written successfully."
>>>>> [1] "part 47 written successfully."
>>>>> [1] "part 57 written successfully."
>>>>> [1] "part 75 written successfully."
>>>>> [1] "part 84 written successfully."
>>>>> [1] "part 85 written successfully."
>>>>> [1][1] "part 86 written successfully."
>>>>>    "part 19 written successfully."
>>>>> [1][1] "part 33 written successfully." "part 34 written successfully=
> =2E"
>>>>>
>>>>> [1] "part 21 written successfully."
>>>>> [1] "part 40 written successfully."
>>>>> [1][1] "part 4 written successfully."
>>>>>    "part 28 written successfully."
>>>>> [1] "part 62 written successfully."
>>>>> [1] "part 16 written successfully."
>>>>> [1] "part 79 written successfully."
>>>>> [1] "part 6 written successfully."
>>>>> [1] "part 77 written successfully."
>>>>> [1] "part 65 written successfully."
>>>>> [1] "part 60 written successfully."
>>>>> [1] "part 78 written successfully."
>>>>> [1] "part 1 written successfully."
>>>>> [1] "part 83 written successfully."
>>>>> [1] "part 9 written successfully."
>>>>> [1] "part 15 written successfully."
>>>>> [1][1] "part 41 written successfully."
>>>>>    "part 35 written successfully."
>>>>> [1][1] "part 8 written successfully." "part 69 written successfully.=
> "
>>>>>
>>>>> [1] "part 59 written successfully."
>>>>> [1] "part 25 written successfully."
>>>>> [1] "part 49 written successfully."
>>>>> [1] "part 52 written successfully."
>>>>> List of 88
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>>    $ : logi TRUE
>>>>> [1] "part 31 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> send=
> Master
>>>>> [1] "part 71 written successfully."
>>>>> [1] "part 35 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 42 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 4 written successfully."
>>>>> Execution halted
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1][1] "part 30 written successfully." "part 40 written successfully=
> =2E"
>>>>>
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 41 written successfully."
>>>>> [1] "part 80 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 45 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 82 written successfully."
>>>>> [1] "part 54 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 59 written successfully."
>>>>> Execution halted
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 27 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 36 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 44 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> Execution halted
>>>>> [1] "part 83 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 61 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 70 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 66 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 10 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> Execution halted
>>>>> [1] "part 53 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 55 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 15 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 50 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 5 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> Execution halted
>>>>> [1] "part 48 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 32 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 69 written successfully."
>>>>> Execution halted
>>>>> [1] "part 52 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 77 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> Execution halted
>>>>> [1] "part 24 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 56 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 6 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 62 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 33 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 16 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 34 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 74 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 2 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 19 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> [1] "part 9 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 11 written successfully."
>>>>> [1] "part 28 written successfully."
>>>>> [1] "part 29 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 17 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 65 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> Execution halted
>>>>> Execution halted
>>>>> [1] "part 68 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 43 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 38 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 1 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 57 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 8 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1] "part 25 written successfully."
>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>     write error, closing pipe to the master
>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
> ster
>>>>> [1]List of 85
>>>>>    $ : logi TRUE
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>>    $ : NULL
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Andreas Kersting-2
In reply to this post by Andreas Kersting
Hi all,

With the help of Tomas, I was able to track the issue down: Prior to R v3.6.0 the parallel package passes an uninitialized variable as the file descriptor argument to the close system call.

In my particular R session this uninitialized variable (reproducibly) was holding the value 7, which corresponded to the file descriptor of the write end of the pipe the second child would use to send its results to the master. Hence, the child unintentionally closed this pipe directly after fork in close_fds_child_ci() resulting in sendMaster() later failing with EBADF.

It was fixed with this commit: https://github.com/wch/r-source/commit/e08cffac1c5b9015a1625938d568b648eb1d8aee

Regards,
Andreas

2019-11-28 13:54 GMT+01:00 Andreas Kersting<[hidden email]>:

> Hi Tomas,
>
> I rebuild R (v3.5.2 for now, R-devel to follow) from the Debian package with MC_DEBUG defined and hopefully also with "-Wall -O0 -gdwarf-2 -g3", though I still have to verify this.
>
> Below is the output. I think it is a total of two mclapply invocations in this R session, the failing one starting around the lines "[1] 15381" and "[1] 15382". The "Error in partDF::write_partDF ..." is because the script/package checks the return value of mclapply and detects that it is not a list of length 85 with only the elements "TRUE".
>
> Regarding sending you the complete example: I first have to figure out if this is possible at all, because it would involve data of a client.
>
> Regards,
> Andreas
>
> parent[15366] created pipes: comm (6->5), sir (8->7)
> parent registers new child 15379
> child process 15379 started
> parent[15366] created pipes: comm (7->6), sir (10->9)
> parent registers new child 15380
> child process 15380 started
> select_children: added child 15380 (6)
> select_children: added child 15379 (5)
> select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
> child 15380: send_master (550 bytes)
>   sr = 1
>  - read select 1 children: 15380
> child 15380: 'mcexit' called
> child 15380 is waiting for permission to exit
> read_child_ci(15380) - read length returned 8
> read_child_ci(15380) - read 550 at 0 returned 550
> select_children: added child 15380 (6)
> select_children: added child 15379 (5)
> select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
>   sr = 1
>  - read select 1 children: 15380
> read_child_ci(15380) - read length returned 8
> detached child 15380 (signal 10)
> child process 15380 got SIGUSR1; child_exit_status=-1
> child 15380: exiting
> select_children: added child 15379 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
> child 15380 terminated with exit status 0
> child 15379: send_master (550 bytes)
>   sr = 1
>  - read select 1 children: 15379
> read_child_ci(15379) - read length returned 8
> read_child_ci(15379) - read 550 at 0 returned 550
> select_children: added child 15379 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
> child 15379: 'mcexit' called
> child 15379 is waiting for permission to exit
>   sr = 1
>  - read select 1 children: 15379
> read_child_ci(15379) - read length returned 8
> detached child 15379 (signal 10)
> child process 15379 got SIGUSR1; child_exit_status=-1
> child 15379: exiting
> removing waited-for child 15380 from the list
> killed detached child 15379 (signal 15)
> removing waited-for child 2147483647 from the list
> child 15379 terminated with exit status 0
> removing waited-for child 15379 from the list
> parent[15366] created pipes: comm (6->5), sir (8->7)
> parent registers new child 15381
> child process 15381 started
> parent[15366] created pipes: comm (7->6), sir (10->9)
> [1] 15381
> parent registers new child 15382
> child process 15382 started
> select_children: added child 15382 (6)
> select_children: added child 15381 (5)
> select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
>   sr = 1
>  - read select 1 children: 15382
> read_child_ci(15382) - read length returned 0
> detached child 15382 (signal 10)
> child process 15382 got SIGUSR1; child_exit_status=-1
> select_children: added child 15381 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
> [1] 15382
> child 15382: send_master (526 bytes)
> Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> lapply -> FUN -> sendMaster
> child 15382: 'mcexit' called
> child 15382: exiting
> child 15382 terminated with exit status 1
> child 15381: send_master (538 bytes)
>   sr = 1
>  - read select 1 children: 15381
> read_child_ci(15381) - read length returned 8
> read_child_ci(15381) - read 538 at 0 returned 538
> child 15381: 'mcexit' called
> child 15381 is waiting for permission to exit
> select_children: added child 15381 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
>   sr = 1
>  - read select 1 children: 15381
> read_child_ci(15381) - read length returned 8
> detached child 15381 (signal 10)
> child process 15381 got SIGUSR1; child_exit_status=-1
> child 15381: exiting
> removing waited-for child 15382 from the list
> killed detached child 15381 (signal 15)
> removing waited-for child 2147483647 from the list
> Error in partDF::write_partDF(X, out_path, c("Date", "Panel", "Type",  :
>   at least one core did not return from parllel write; maybe it was killed (be the Linux Out of Memory Killer ?) or there was a fatal error in the forked process(es)
> Calls: lapply -> FUN -> write_partDF_repo -> <Anonymous>
> Execution halted
> killed detached child 15381 (signal 9)
> child 15381 terminated with exit status 0
> removing waited-for child 15381 from the list
> process 15366 parallel shutdown ok
>
>
> 2019-11-28 10:40 GMT+01:00 Tomas Kalibera<[hidden email]>:
>> Hi Andreas,
>> thank you very much, good job finding it was EBADF. Now the question is=20
>> why the pipe has been closed prematurely; it could be accidentally by R=20
>> (a race condition in the cleanup code in fork.c) or possibly by some=20
>> other code running in the same process (maybe the R program itself or=20
>> some other code it runs). Maybe we can take this off the list and come=20
>> back when we know the cause or have it fixed.
>>
>> It would help a lot if you could try with R built from source, with=20
>> optimizations disabled to get more accurate debug symbols (e.g. env=20
>> CFLAGS=3D"-Wall -O0 -gdwarf-2 -g3" CXXFLAGS=3D"-Wall -O0 -gdwarf-2 -g3"=20
>> =2E/configure), and with MC_DEBUG defined in fork.c - line 26. Ideally in=
>> =20
>> R-devel, so that we are sure the problem still exists. The debug=20
>> messages should give a hint whether it was R (fork.c) that closed the=20
>> pipe and why. Maybe you could also add a debug message to=20
>> close_fds_child_ci() to see if it was closed there. Maybe you could find =
>>
>> this out even in your current debugging setup via breakpoints and=20
>> backtraces, but I think it may be easier to build from source with these =
>>
>> debugging messages.
>>
>> Also if you could send me a complete example I could run that causes=20
>> this on your system, that would be nice (even if it didn't cause the=20
>> problem on my system).
>>
>> Thanks
>> Tomas
>>
>> On 11/28/19 6:35 AM, Andreas Kersting wrote:
>>> Hi Tomas,
>>>
>>> Thanks for your prompt reply and your offer to help. I might need to ge=
>> t back to this since I am not too experienced in debugging these kinds of=
>> issues. Anyway, I gave it a try and I think I have found the immediate c=
>> ause:
>>>
>>> I installed the debug symbols (r-base-core-dbg), placed https://github.=
>> com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd=
>> and changed the wrapper code to:
>>>
>>> mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) {
>>>      # we fail for the input resulting in parts having 85 rows
>>>      if (nrow(parts) =3D=3D 85L && !file.exists(as.character(Sys.getpid=
>> ()))) {
>>>        file.create(as.character(Sys.getpid()))
>>>        print(Sys.getpid())
>>>        Sys.sleep(30)
>>>      }
>>>
>>>      # ...
>>>
>>>      return(TRUE)
>>>    }, mc.preschedule =3D TRUE, mc.cores =3D 2L)
>>>
>>> This way I ended up with only two child processes to which I each attac=
>> hed a debugger. In total I ran about 10 debugging sessions and it was alw=
>> ays the second child process failing. The errno after write returned -1 w=
>> as 9 (EBADF).
>>>
>>>  From what I can see, the reason for this is that the second child trie=
>> s to write to fd 7, but already during the very beginning of the first in=
>> vocation of the anonymous function to parallelize, i.e. during Sys.sleep(=
>> 30), there is no such file descriptor. From this observation I would conc=
>> lude that it is NOT the code run from that function, i.e. # ...,  causing=
>> the issue. Let me point out again, that this is NOT the very first invoc=
>> ation of mclapply in this R session. There is at least one previous call =
>> to it, which works fine.
>>>
>>>
>>> File descriptors directly after attaching gdb to both child processes d=
>> uring Sys.sleep(30):
>>>
>>> ### master
>>> root@ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
>>> total 0
>>> lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
>>> lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
>>> lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
>>> lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
>>> l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'
>>>
>>>
>>> ### first child (writes to fd 6)
>>> (gdb) shell ls -l /proc/22134/fd
>>> total 0
>>> lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
>>> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
>>> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>>> l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
>>> l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'
>>>
>>> ### second child (tries writing to fd 7)
>>> (gdb) shell ls -l /proc/22135/fd
>>> total 0
>>> lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
>>> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
>>> l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
>>> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
>>> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
>>> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>>>
>>>
>>> # debugging mc_send_master in both child processes:
>>>
>>> ### first child (all OK):
>>> (gdb) break mc_send_master
>>> Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
>>> (gdb) c
>>> Continuing.
>>>
>>> Breakpoint 1, mc_send_master (what=3D0x555c6dd2cd00) at fork.c:681
>>> warning: Source file is more recent than executable.
>>> 681 {
>>> (gdb) info args
>>> what =3D 0x555c6dd2cd00
>>> (gdb) n
>>> 682    if (is_master)
>>> (gdb) n
>>> 684    if (master_fd =3D=3D -1)
>>> (gdb) n
>>> 686    if (TYPEOF(what) !=3D RAWSXP)
>>> (gdb) n
>>> 688    R_xlen_t len =3D XLENGTH(what);
>>> (gdb) n
>>> 689    unsigned char *b =3D RAW(what);
>>> (gdb) n
>>> 693    if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
>>> (gdb) info locals
>>> len =3D 538
>>> b =3D <optimized out>
>>> n =3D <optimized out>
>>> (gdb) s
>>> writerep (fildes=3D6, buf=3Dbuf@entry=3D0x7ffdd6da7db0, nbyte=3Dnbyte@e=
>> ntry=3D8)
>>>      at fork.c:653
>>> 653 {
>>> (gdb) info args
>>> fildes =3D 6
>>> buf =3D 0x7ffdd6da7db0
>>> nbyte =3D 8
>>> (gdb) n
>>> 654    size_t wbyte =3D 0;
>>> (gdb) n
>>> 653 {
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) n
>>> 664 if (w =3D=3D 0) {
>>> (gdb) n
>>> 670 wbyte +=3D w;
>>> (gdb) n
>>> 671 if (wbyte =3D=3D nbyte)
>>> (gdb) n
>>> 672    return wbyte;
>>> (gdb) n
>>> 674 }
>>> (gdb) n
>>> mc_send_master (what=3D<optimized out>) at fork.c:699
>>> 699    for (R_xlen_t i =3D 0; i < len; i +=3D n) {
>>> (gdb) n
>>> 700 n =3D writerep(master_fd, b + i, len - i);
>>> (gdb) n
>>> 701 if (n < 1) {
>>> (gdb) n
>>> 699    for (R_xlen_t i =3D 0; i < len; i +=3D n) {
>>> (gdb) n
>>> 707    return ScalarLogical(1);
>>> (gdb) n
>>> 708 }
>>>
>>>
>>>
>>> ### second child (write fails with EBADF):
>>> (gdb) break mc_send_master
>>> Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
>>> (gdb) c
>>> Continuing.
>>>
>>> Breakpoint 1, mc_send_master (what=3D0x563382a71910) at fork.c:681
>>> warning: Source file is more recent than executable.
>>> 681 {
>>> (gdb) info args
>>> what =3D 0x563382a71910
>>> (gdb) n
>>> 682    if (is_master)
>>> (gdb) n
>>> 684    if (master_fd =3D=3D -1)
>>> (gdb) n
>>> 686    if (TYPEOF(what) !=3D RAWSXP)
>>> (gdb) n
>>> 688    R_xlen_t len =3D XLENGTH(what);
>>> (gdb) n
>>> 689    unsigned char *b =3D RAW(what);
>>> (gdb) n
>>> 693    if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
>>> (gdb) info locals
>>> len =3D 526
>>> b =3D <optimized out>
>>> n =3D <optimized out>
>>> (gdb) s
>>> writerep (fildes=3D7, buf=3Dbuf@entry=3D0x7fff4027ad60, nbyte=3Dnbyte@e=
>> ntry=3D8)
>>>      at fork.c:653
>>> 653 {
>>> (gdb) info args
>>> fildes =3D 7
>>> buf =3D 0x7fff4027ad60
>>> nbyte =3D 8
>>> (gdb) n
>>> 654    size_t wbyte =3D 0;
>>> (gdb) n
>>> 653 {
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) info locals
>>> w =3D <optimized out>
>>> wbyte =3D 0
>>> ptr =3D 0x7fff4027ad60 "\016\002"
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) n
>>> 659    if (errno =3D=3D EINTR)
>>> (gdb) n
>>> 674 }
>>> (gdb) p __errno_location()
>>> $1 =3D (int *) 0x7f50322cb540
>>> (gdb) x/x $1
>>> 0x7f50322cb540: 0x00000009
>>> (gdb) python import errno
>>> (gdb) python print(errno.errorcode[9])
>>> EBADF
>>> (gdb) n
>>> mc_send_master (what=3D<optimized out>) at fork.c:702
>>> 702    close(master_fd);
>>> (gdb) n
>>> 704    error(_("write error, closing pipe to the master"));
>>> (gdb) n
>>> 703    master_fd =3D -1;
>>> (gdb) n
>>> 704    error(_("write error, closing pipe to the master"));
>>> (gdb) n
>>> 685 error(_("there is no pipe to the master process"));
>>>
>>>
>>> Does this help in any way?
>>>
>>> Is there something else I can/should look at?
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> 2019-11-27 15:04 GMT+01:00 Tomas Kalibera<[hidden email]>:
>>>> Hi Andreas,
>>>> the error is reported when some child process cannot send results to t=
>> he master process, which originates from an error returned by write() - w=
>> hen write() returns -1 or 0. The logic around the writing has not changed=
>> since R 3.5.2. It should not be related to the printing in the child, on=
>> ly to returning the value. The problem may be originating from the execut=
>> ion environment, virtualization, and/or possibly from a lack of robustnes=
>> s in R. To resolve this we need to find out which error was returned and =
>> why. Either you can try to create a reproducible example (something I cou=
>> ld use to trigger an error on my system and then debug) or to debug on yo=
>> ur system (build R from source, ensure the bug is still triggered, then i=
>> nstrument to print the exact error from the OS and where it was detected,=
>> etc). In principle you could also try without code instrumentation just =
>> using strace. Just from looking at the code in R around the writing I am =
>> not seeing any bug there. If you choose to debug on your system I can hel=
>> p with the instrumentation.
>>>>
>>>> Best
>>>> Tomas
>>>>
>>>> On 11/27/19 12:40 PM, Andreas Kersting wrote:
>>>>> Hi again,
>>>>>
>>>>> One important correction of my first message: I misinterpreted the ou=
>> tput. Actually in that R session 2 input files were processed one after t=
>> he other in a loop. The first (with 88 parts went fine). The second (with=
>> 85 parts) produced the sendMaster errors and failed. If (in a new sessio=
>> n via Rscript) I only process the second input file it will work. The oth=
>> er observations on R vs Rscript, NFS share etc. still hold.
>>>>>
>>>>> Sorry for this! Regards,
>>>>> Andreas
>>>>>
>>>>> 2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:
>>>>>> Hi,
>>>>>>
>>>>>> I am facing a very weird problem with parallel::mclapply. I have a s=
>> cript which does some data wrangling on an input dataset in parallel and =
>> then writes the results to disk. I have been using this script daily for =
>> more than one year always on an EC2 instance launched from the same AMI (=
>> no updates installed after launch) and processed thousands of different i=
>> nput data sets successfully. I now have an input dataset for which I face=
>> the following bug:
>>>>>>
>>>>>> The basic outline of the problematic section of the script:
>>>>>>
>>>>>> # parts is a data.table with 88 rows
>>>>>> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>>>>>>     # do some data wrangling and write the result to a file
>>>>>>     # ...
>>>>>>
>>>>>>     print(paste0("part ", i, " written successfully."))
>>>>>>     return(TRUE)
>>>>>> }, mc.preschedule =3D FALSE, mc.cores =3D 2L)
>>>>>>
>>>>>> str(mc_ret)
>>>>>>
>>>>>>
>>>>>> Expected output: "part i written successfully." is printed 88 times,=
>> once for each value of i. mc_ret is a list of length 88, each element be=
>> ing TRUE. Its structure is printed once. All outputs are created successf=
>> ully.
>>>>>>
>>>>>> Actual output (see end of the message): "part i written successfully=
>> =2E" is printed 88 times, once for each value of i. mc_ret is a list of l=
>> ength 88, each element being TRUE. Its structure is printed. All outputs =
>> are created successfully. So far so good.
>>>>>>
>>>>>> But then "part i written successfully." it is printed another X time=
>> s, for values of i for which it was already printed. This output is inter=
>> mingled with X-1 times the following error message:
>>>>>>
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>>
>>>>>> and Y times the message "Execution halted". mc_ret is printed again,=
>> now being a list of length 85, with the first element being TRUE and all=
>> other elements being NULL. X and Y vary from run to run.
>>>>>>
>>>>>>
>>>>>> Now to the main problem: I tried very hard to create a reproducible =
>> example, but I failed. What I observed:
>>>>>> - The output is (and has always been) written to path which is on an=
>> NFS share. If I instead write to a path on a local disk it will work.
>>>>>> - The script is invoked using Rscript. If I instead source it from a=
>> n interactive R session it works. There are at least two more people who =
>> have observed this: https://stackoverflow.com/questions/51986674/mclapply=
>> -sendmaster-error-only-with-rscript
>>>>>> - Before the call to mclapply the code acquires an exclusive file lo=
>> ck on a dedicated lock file, not written to but also on the NFS share. If=
>> I remove the code acquiring the lock, the whole script will also work if=
>> called using Rscript.
>>>>>> - The problem also occurs for mc.preschedule =3D TRUE.
>>>>>> - There is no error if I set mc.cores to 1.
>>>>>> - And stressing again: the code works without any changes from Rscri=
>> pt for thousands of other data sets.
>>>>>>
>>>>>>
>>>>>> Rscript -e "sessionInfo()":
>>>>>> R version 3.5.2 (2018-12-20)
>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>> Running under: Ubuntu 18.04.2 LTS
>>>>>>
>>>>>> Matrix products: default
>>>>>> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
>>>>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>>>>>>
>>>>>> locale:
>>>>>>    [1] LC_CTYPE=3DC.UTF-8       LC_NUMERIC=3DC           LC_TIME=3DC=
>> =2EUTF-8
>>>>>>    [4] LC_COLLATE=3DC.UTF-8     LC_MONETARY=3DC.UTF-8    LC_MESSAGES=
>> =3DC.UTF-8
>>>>>>    [7] LC_PAPER=3DC.UTF-8       LC_NAME=3DC              LC_ADDRESS=3D=
>> C
>>>>>> [10] LC_TELEPHONE=3DC         LC_MEASUREMENT=3DC.UTF-8 LC_IDENTIFICA=
>> TION=3DC
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base=
>>
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] compiler_3.5.2
>>>>>>
>>>>>>
>>>>>> I know this is a fairly old R version. I have not been able to repro=
>> duce the bug with a more recent version, but since it is so difficult to =
>> trigger, this does not mean much, I guess. I have looked through the chan=
>> ges made to the code of mclapply since that version and could not find so=
>> mething directly related. I am not even sure if it is a problem in the pa=
>> rallel package or some other (memory) bug. What strikes me is that others=
>> have observed a very similar error when using Rscript but not when using=
>> an interactive R session, just like I do.
>>>>>>
>>>>>> I am not expecting a fix based on the information I provide, but may=
>> be someone has some thoughts on this!?
>>>>>>
>>>>>> Regards,
>>>>>> Andreas
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actual output:
>>>>>>
>>>>>> [1] "part 51 written successfully."
>>>>>> [1] "part 30 written successfully."
>>>>>> [1] "part 32 written successfully."
>>>>>> [1] "part 48 written successfully."
>>>>>> [1] "part 63 written successfully."
>>>>>> [1] "part 5 written successfully."
>>>>>> [1][1] "part 14 written successfully." "part 18 written successfully=
>> =2E"
>>>>>>
>>>>>> [1] "part 38 written successfully."
>>>>>> [1] "part 11 written successfully."
>>>>>> [1] "part 68 written successfully."
>>>>>> [1] "part 45 written successfully."
>>>>>> [1] "part 88 written successfully."
>>>>>> [1] "part 36 written successfully."
>>>>>> [1] "part 44 written successfully."
>>>>>> [1] "part 55 written successfully."
>>>>>> [1] "part 26 written successfully."
>>>>>> [1] "part 37 written successfully."
>>>>>> [1] "part 22 written successfully."
>>>>>> [1] "part 13 written successfully."
>>>>>> [1] "part 67 written successfully."
>>>>>> [1] "part 10 written successfully."
>>>>>> [1] "part 24 written successfully."
>>>>>> [1] "part 20 written successfully."
>>>>>> [1] "part 74 written successfully."
>>>>>> [1] "part 50 written successfully."
>>>>>> [1] "part 2 written successfully."
>>>>>> [1] "part 17 written successfully."
>>>>>> [1] "part 66 written successfully."
>>>>>> [1] "part 61 written successfully."
>>>>>> [1][1] "part 7 written successfully." "part 31 written successfully.=
>> "
>>>>>>
>>>>>> [1] "part 70 written successfully."
>>>>>> [1] "part 71 written successfully."
>>>>>> [1][1] "part 76 written successfully."
>>>>>>    "part 42 written successfully."
>>>>>> [1][1] "part 72 written successfully." "part 12 written successfully=
>> =2E"
>>>>>>
>>>>>> [1] "part 39 written successfully."
>>>>>> [1] "part 58 written successfully."
>>>>>> [1] "part 87 written successfully."
>>>>>> [1] "part 81 written successfully."
>>>>>> [1] "part 3 written successfully."
>>>>>> [1] "part 29 written successfully."
>>>>>> [1] "part 82 written successfully."
>>>>>> [1] "part 54 written successfully."
>>>>>> [1] "part 23 written successfully."
>>>>>> [1] "part 56 written successfully."
>>>>>> [1] "part 73 written successfully."
>>>>>> [1] "part 27 written successfully."
>>>>>> [1] "part 64 written successfully."
>>>>>> [1] "part 46 written successfully."
>>>>>> [1] "part 80 written successfully."
>>>>>> [1] "part 53 written successfully."
>>>>>> [1] "part 43 written successfully."
>>>>>> [1] "part 47 written successfully."
>>>>>> [1] "part 57 written successfully."
>>>>>> [1] "part 75 written successfully."
>>>>>> [1] "part 84 written successfully."
>>>>>> [1] "part 85 written successfully."
>>>>>> [1][1] "part 86 written successfully."
>>>>>>    "part 19 written successfully."
>>>>>> [1][1] "part 33 written successfully." "part 34 written successfully=
>> =2E"
>>>>>>
>>>>>> [1] "part 21 written successfully."
>>>>>> [1] "part 40 written successfully."
>>>>>> [1][1] "part 4 written successfully."
>>>>>>    "part 28 written successfully."
>>>>>> [1] "part 62 written successfully."
>>>>>> [1] "part 16 written successfully."
>>>>>> [1] "part 79 written successfully."
>>>>>> [1] "part 6 written successfully."
>>>>>> [1] "part 77 written successfully."
>>>>>> [1] "part 65 written successfully."
>>>>>> [1] "part 60 written successfully."
>>>>>> [1] "part 78 written successfully."
>>>>>> [1] "part 1 written successfully."
>>>>>> [1] "part 83 written successfully."
>>>>>> [1] "part 9 written successfully."
>>>>>> [1] "part 15 written successfully."
>>>>>> [1][1] "part 41 written successfully."
>>>>>>    "part 35 written successfully."
>>>>>> [1][1] "part 8 written successfully." "part 69 written successfully.=
>> "
>>>>>>
>>>>>> [1] "part 59 written successfully."
>>>>>> [1] "part 25 written successfully."
>>>>>> [1] "part 49 written successfully."
>>>>>> [1] "part 52 written successfully."
>>>>>> List of 88
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>> [1] "part 31 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> send=
>> Master
>>>>>> [1] "part 71 written successfully."
>>>>>> [1] "part 35 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 42 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 4 written successfully."
>>>>>> Execution halted
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1][1] "part 30 written successfully." "part 40 written successfully=
>> =2E"
>>>>>>
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 41 written successfully."
>>>>>> [1] "part 80 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 45 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 82 written successfully."
>>>>>> [1] "part 54 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 59 written successfully."
>>>>>> Execution halted
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 27 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 36 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 44 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 83 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 61 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 70 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 66 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 10 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 53 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 55 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 15 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 50 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 5 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 48 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 32 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 69 written successfully."
>>>>>> Execution halted
>>>>>> [1] "part 52 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 77 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 24 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 56 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 6 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 62 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 33 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 16 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 34 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 74 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 2 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 19 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 9 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 11 written successfully."
>>>>>> [1] "part 28 written successfully."
>>>>>> [1] "part 29 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 17 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 65 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 68 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 43 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 38 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 1 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 57 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 8 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 25 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1]List of 85
>>>>>>    $ : logi TRUE
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>
>>
>>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply | Threaded
Open this post in threaded view
|

Re: error in parallel:::sendMaster

Andreas Kersting
In reply to this post by Andreas Kersting
Hi all,

With the help of Tomas, I was able to track the issue down: Prior to R v3.6.0 the parallel package passes an uninitialized variable as the file descriptor argument to the close system call.

In my particular R session this uninitialized variable (reproducibly) was holding the value 7, which corresponded to the file descriptor of the write end of the pipe the second child would use to send its results to the master. Hence, the child unintentionally closed this pipe directly after fork in close_fds_child_ci() resulting in sendMaster() later failing with EBADF.

It was fixed with this commit: https://github.com/wch/r-source/commit/e08cffac1c5b9015a1625938d568b648eb1d8aee

Regards,
Andreas

2019-11-28 13:54 GMT+01:00 Andreas Kersting<[hidden email]>:

> Hi Tomas,
>
> I rebuild R (v3.5.2 for now, R-devel to follow) from the Debian package with MC_DEBUG defined and hopefully also with "-Wall -O0 -gdwarf-2 -g3", though I still have to verify this.
>
> Below is the output. I think it is a total of two mclapply invocations in this R session, the failing one starting around the lines "[1] 15381" and "[1] 15382". The "Error in partDF::write_partDF ..." is because the script/package checks the return value of mclapply and detects that it is not a list of length 85 with only the elements "TRUE".
>
> Regarding sending you the complete example: I first have to figure out if this is possible at all, because it would involve data of a client.
>
> Regards,
> Andreas
>
> parent[15366] created pipes: comm (6->5), sir (8->7)
> parent registers new child 15379
> child process 15379 started
> parent[15366] created pipes: comm (7->6), sir (10->9)
> parent registers new child 15380
> child process 15380 started
> select_children: added child 15380 (6)
> select_children: added child 15379 (5)
> select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
> child 15380: send_master (550 bytes)
>   sr = 1
>  - read select 1 children: 15380
> child 15380: 'mcexit' called
> child 15380 is waiting for permission to exit
> read_child_ci(15380) - read length returned 8
> read_child_ci(15380) - read 550 at 0 returned 550
> select_children: added child 15380 (6)
> select_children: added child 15379 (5)
> select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
>   sr = 1
>  - read select 1 children: 15380
> read_child_ci(15380) - read length returned 8
> detached child 15380 (signal 10)
> child process 15380 got SIGUSR1; child_exit_status=-1
> child 15380: exiting
> select_children: added child 15379 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
> child 15380 terminated with exit status 0
> child 15379: send_master (550 bytes)
>   sr = 1
>  - read select 1 children: 15379
> read_child_ci(15379) - read length returned 8
> read_child_ci(15379) - read 550 at 0 returned 550
> select_children: added child 15379 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
> child 15379: 'mcexit' called
> child 15379 is waiting for permission to exit
>   sr = 1
>  - read select 1 children: 15379
> read_child_ci(15379) - read length returned 8
> detached child 15379 (signal 10)
> child process 15379 got SIGUSR1; child_exit_status=-1
> child 15379: exiting
> removing waited-for child 15380 from the list
> killed detached child 15379 (signal 15)
> removing waited-for child 2147483647 from the list
> child 15379 terminated with exit status 0
> removing waited-for child 15379 from the list
> parent[15366] created pipes: comm (6->5), sir (8->7)
> parent registers new child 15381
> child process 15381 started
> parent[15366] created pipes: comm (7->6), sir (10->9)
> [1] 15381
> parent registers new child 15382
> child process 15382 started
> select_children: added child 15382 (6)
> select_children: added child 15381 (5)
> select_children: maxfd=6, wlen=2, wcount=2, timeout=-1.000000
>   sr = 1
>  - read select 1 children: 15382
> read_child_ci(15382) - read length returned 0
> detached child 15382 (signal 10)
> child process 15382 got SIGUSR1; child_exit_status=-1
> select_children: added child 15381 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
> [1] 15382
> child 15382: send_master (526 bytes)
> Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
>   write error, closing pipe to the master
> Calls: lapply ... <Anonymous> -> <Anonymous> -> lapply -> FUN -> sendMaster
> child 15382: 'mcexit' called
> child 15382: exiting
> child 15382 terminated with exit status 1
> child 15381: send_master (538 bytes)
>   sr = 1
>  - read select 1 children: 15381
> read_child_ci(15381) - read length returned 8
> read_child_ci(15381) - read 538 at 0 returned 538
> child 15381: 'mcexit' called
> child 15381 is waiting for permission to exit
> select_children: added child 15381 (5)
> select_children: maxfd=5, wlen=1, wcount=1, timeout=-1.000000
>   sr = 1
>  - read select 1 children: 15381
> read_child_ci(15381) - read length returned 8
> detached child 15381 (signal 10)
> child process 15381 got SIGUSR1; child_exit_status=-1
> child 15381: exiting
> removing waited-for child 15382 from the list
> killed detached child 15381 (signal 15)
> removing waited-for child 2147483647 from the list
> Error in partDF::write_partDF(X, out_path, c("Date", "Panel", "Type",  :
>   at least one core did not return from parllel write; maybe it was killed (be the Linux Out of Memory Killer ?) or there was a fatal error in the forked process(es)
> Calls: lapply -> FUN -> write_partDF_repo -> <Anonymous>
> Execution halted
> killed detached child 15381 (signal 9)
> child 15381 terminated with exit status 0
> removing waited-for child 15381 from the list
> process 15366 parallel shutdown ok
>
>
> 2019-11-28 10:40 GMT+01:00 Tomas Kalibera<[hidden email]>:
>> Hi Andreas,
>> thank you very much, good job finding it was EBADF. Now the question is=20
>> why the pipe has been closed prematurely; it could be accidentally by R=20
>> (a race condition in the cleanup code in fork.c) or possibly by some=20
>> other code running in the same process (maybe the R program itself or=20
>> some other code it runs). Maybe we can take this off the list and come=20
>> back when we know the cause or have it fixed.
>>
>> It would help a lot if you could try with R built from source, with=20
>> optimizations disabled to get more accurate debug symbols (e.g. env=20
>> CFLAGS=3D"-Wall -O0 -gdwarf-2 -g3" CXXFLAGS=3D"-Wall -O0 -gdwarf-2 -g3"=20
>> =2E/configure), and with MC_DEBUG defined in fork.c - line 26. Ideally in=
>> =20
>> R-devel, so that we are sure the problem still exists. The debug=20
>> messages should give a hint whether it was R (fork.c) that closed the=20
>> pipe and why. Maybe you could also add a debug message to=20
>> close_fds_child_ci() to see if it was closed there. Maybe you could find =
>>
>> this out even in your current debugging setup via breakpoints and=20
>> backtraces, but I think it may be easier to build from source with these =
>>
>> debugging messages.
>>
>> Also if you could send me a complete example I could run that causes=20
>> this on your system, that would be nice (even if it didn't cause the=20
>> problem on my system).
>>
>> Thanks
>> Tomas
>>
>> On 11/28/19 6:35 AM, Andreas Kersting wrote:
>>> Hi Tomas,
>>>
>>> Thanks for your prompt reply and your offer to help. I might need to ge=
>> t back to this since I am not too experienced in debugging these kinds of=
>> issues. Anyway, I gave it a try and I think I have found the immediate c=
>> ause:
>>>
>>> I installed the debug symbols (r-base-core-dbg), placed https://github.=
>> com/wch/r-source/blob/tags/R-3-5-2/src/library/parallel/src/fork.c in cwd=
>> and changed the wrapper code to:
>>>
>>> mc_ret <- parallel::mclapply(seq_len(nrow(parts)), function(i) {
>>>      # we fail for the input resulting in parts having 85 rows
>>>      if (nrow(parts) =3D=3D 85L && !file.exists(as.character(Sys.getpid=
>> ()))) {
>>>        file.create(as.character(Sys.getpid()))
>>>        print(Sys.getpid())
>>>        Sys.sleep(30)
>>>      }
>>>
>>>      # ...
>>>
>>>      return(TRUE)
>>>    }, mc.preschedule =3D TRUE, mc.cores =3D 2L)
>>>
>>> This way I ended up with only two child processes to which I each attac=
>> hed a debugger. In total I ran about 10 debugging sessions and it was alw=
>> ays the second child process failing. The errno after write returned -1 w=
>> as 9 (EBADF).
>>>
>>>  From what I can see, the reason for this is that the second child trie=
>> s to write to fd 7, but already during the very beginning of the first in=
>> vocation of the anonymous function to parallelize, i.e. during Sys.sleep(=
>> 30), there is no such file descriptor. From this observation I would conc=
>> lude that it is NOT the code run from that function, i.e. # ...,  causing=
>> the issue. Let me point out again, that this is NOT the very first invoc=
>> ation of mclapply in this R session. There is at least one previous call =
>> to it, which works fine.
>>>
>>>
>>> File descriptors directly after attaching gdb to both child processes d=
>> uring Sys.sleep(30):
>>>
>>> ### master
>>> root@ip-10-0-48-30:~/latest_test# ls -l /proc/22119/fd
>>> total 0
>>> lrwx------ 1 root root 64 Nov 28 04:49 0 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:49 1 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:49 2 -> /dev/pts/0
>>> lr-x------ 1 root root 64 Nov 28 04:49 3 -> /path/to/script.R
>>> lrwx------ 1 root root 64 Nov 28 04:49 4 -> /path/on/nfs/write.lock
>>> lr-x------ 1 root root 64 Nov 28 04:49 5 -> 'pipe:[266120]'
>>> l-wx------ 1 root root 64 Nov 28 04:49 8 -> 'pipe:[266121]'
>>>
>>>
>>> ### first child (writes to fd 6)
>>> (gdb) shell ls -l /proc/22134/fd
>>> total 0
>>> lrwx------ 1 root root 64 Nov 28 04:42 0 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
>>> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
>>> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
>>> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>>> l-wx------ 1 root root 64 Nov 28 04:42 6 -> 'pipe:[266120]'
>>> l-wx------ 1 root root 64 Nov 28 04:42 8 -> 'pipe:[266121]'
>>>
>>> ### second child (tries writing to fd 7)
>>> (gdb) shell ls -l /proc/22135/fd
>>> total 0
>>> lr-x------ 1 root root 64 Nov 28 04:42 0 -> 'pipe:[266123]'
>>> lrwx------ 1 root root 64 Nov 28 04:42 1 -> /dev/pts/0
>>> l-wx------ 1 root root 64 Nov 28 04:42 10 -> 'pipe:[266123]'
>>> lrwx------ 1 root root 64 Nov 28 04:42 2 -> /dev/pts/0
>>> lr-x------ 1 root root 64 Nov 28 04:42 3 -> /path/to/script.R
>>> lrwx------ 1 root root 64 Nov 28 04:42 4 -> /path/on/nfs/write.lock
>>>
>>>
>>> # debugging mc_send_master in both child processes:
>>>
>>> ### first child (all OK):
>>> (gdb) break mc_send_master
>>> Breakpoint 1 at 0x7f5b2687b0a0: file fork.c, line 681.
>>> (gdb) c
>>> Continuing.
>>>
>>> Breakpoint 1, mc_send_master (what=3D0x555c6dd2cd00) at fork.c:681
>>> warning: Source file is more recent than executable.
>>> 681 {
>>> (gdb) info args
>>> what =3D 0x555c6dd2cd00
>>> (gdb) n
>>> 682    if (is_master)
>>> (gdb) n
>>> 684    if (master_fd =3D=3D -1)
>>> (gdb) n
>>> 686    if (TYPEOF(what) !=3D RAWSXP)
>>> (gdb) n
>>> 688    R_xlen_t len =3D XLENGTH(what);
>>> (gdb) n
>>> 689    unsigned char *b =3D RAW(what);
>>> (gdb) n
>>> 693    if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
>>> (gdb) info locals
>>> len =3D 538
>>> b =3D <optimized out>
>>> n =3D <optimized out>
>>> (gdb) s
>>> writerep (fildes=3D6, buf=3Dbuf@entry=3D0x7ffdd6da7db0, nbyte=3Dnbyte@e=
>> ntry=3D8)
>>>      at fork.c:653
>>> 653 {
>>> (gdb) info args
>>> fildes =3D 6
>>> buf =3D 0x7ffdd6da7db0
>>> nbyte =3D 8
>>> (gdb) n
>>> 654    size_t wbyte =3D 0;
>>> (gdb) n
>>> 653 {
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) n
>>> 664 if (w =3D=3D 0) {
>>> (gdb) n
>>> 670 wbyte +=3D w;
>>> (gdb) n
>>> 671 if (wbyte =3D=3D nbyte)
>>> (gdb) n
>>> 672    return wbyte;
>>> (gdb) n
>>> 674 }
>>> (gdb) n
>>> mc_send_master (what=3D<optimized out>) at fork.c:699
>>> 699    for (R_xlen_t i =3D 0; i < len; i +=3D n) {
>>> (gdb) n
>>> 700 n =3D writerep(master_fd, b + i, len - i);
>>> (gdb) n
>>> 701 if (n < 1) {
>>> (gdb) n
>>> 699    for (R_xlen_t i =3D 0; i < len; i +=3D n) {
>>> (gdb) n
>>> 707    return ScalarLogical(1);
>>> (gdb) n
>>> 708 }
>>>
>>>
>>>
>>> ### second child (write fails with EBADF):
>>> (gdb) break mc_send_master
>>> Breakpoint 1 at 0x7f5016f3e0a0: file fork.c, line 681.
>>> (gdb) c
>>> Continuing.
>>>
>>> Breakpoint 1, mc_send_master (what=3D0x563382a71910) at fork.c:681
>>> warning: Source file is more recent than executable.
>>> 681 {
>>> (gdb) info args
>>> what =3D 0x563382a71910
>>> (gdb) n
>>> 682    if (is_master)
>>> (gdb) n
>>> 684    if (master_fd =3D=3D -1)
>>> (gdb) n
>>> 686    if (TYPEOF(what) !=3D RAWSXP)
>>> (gdb) n
>>> 688    R_xlen_t len =3D XLENGTH(what);
>>> (gdb) n
>>> 689    unsigned char *b =3D RAW(what);
>>> (gdb) n
>>> 693    if (writerep(master_fd, &len, sizeof(len)) !=3D sizeof(len)) {
>>> (gdb) info locals
>>> len =3D 526
>>> b =3D <optimized out>
>>> n =3D <optimized out>
>>> (gdb) s
>>> writerep (fildes=3D7, buf=3Dbuf@entry=3D0x7fff4027ad60, nbyte=3Dnbyte@e=
>> ntry=3D8)
>>>      at fork.c:653
>>> 653 {
>>> (gdb) info args
>>> fildes =3D 7
>>> buf =3D 0x7fff4027ad60
>>> nbyte =3D 8
>>> (gdb) n
>>> 654    size_t wbyte =3D 0;
>>> (gdb) n
>>> 653 {
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) info locals
>>> w =3D <optimized out>
>>> wbyte =3D 0
>>> ptr =3D 0x7fff4027ad60 "\016\002"
>>> (gdb) n
>>> 657 ssize_t w =3D write(fildes, ptr + wbyte, nbyte - wbyte);
>>> (gdb) n
>>> 658 if (w =3D=3D -1) {
>>> (gdb) n
>>> 659    if (errno =3D=3D EINTR)
>>> (gdb) n
>>> 674 }
>>> (gdb) p __errno_location()
>>> $1 =3D (int *) 0x7f50322cb540
>>> (gdb) x/x $1
>>> 0x7f50322cb540: 0x00000009
>>> (gdb) python import errno
>>> (gdb) python print(errno.errorcode[9])
>>> EBADF
>>> (gdb) n
>>> mc_send_master (what=3D<optimized out>) at fork.c:702
>>> 702    close(master_fd);
>>> (gdb) n
>>> 704    error(_("write error, closing pipe to the master"));
>>> (gdb) n
>>> 703    master_fd =3D -1;
>>> (gdb) n
>>> 704    error(_("write error, closing pipe to the master"));
>>> (gdb) n
>>> 685 error(_("there is no pipe to the master process"));
>>>
>>>
>>> Does this help in any way?
>>>
>>> Is there something else I can/should look at?
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> 2019-11-27 15:04 GMT+01:00 Tomas Kalibera<[hidden email]>:
>>>> Hi Andreas,
>>>> the error is reported when some child process cannot send results to t=
>> he master process, which originates from an error returned by write() - w=
>> hen write() returns -1 or 0. The logic around the writing has not changed=
>> since R 3.5.2. It should not be related to the printing in the child, on=
>> ly to returning the value. The problem may be originating from the execut=
>> ion environment, virtualization, and/or possibly from a lack of robustnes=
>> s in R. To resolve this we need to find out which error was returned and =
>> why. Either you can try to create a reproducible example (something I cou=
>> ld use to trigger an error on my system and then debug) or to debug on yo=
>> ur system (build R from source, ensure the bug is still triggered, then i=
>> nstrument to print the exact error from the OS and where it was detected,=
>> etc). In principle you could also try without code instrumentation just =
>> using strace. Just from looking at the code in R around the writing I am =
>> not seeing any bug there. If you choose to debug on your system I can hel=
>> p with the instrumentation.
>>>>
>>>> Best
>>>> Tomas
>>>>
>>>> On 11/27/19 12:40 PM, Andreas Kersting wrote:
>>>>> Hi again,
>>>>>
>>>>> One important correction of my first message: I misinterpreted the ou=
>> tput. Actually in that R session 2 input files were processed one after t=
>> he other in a loop. The first (with 88 parts went fine). The second (with=
>> 85 parts) produced the sendMaster errors and failed. If (in a new sessio=
>> n via Rscript) I only process the second input file it will work. The oth=
>> er observations on R vs Rscript, NFS share etc. still hold.
>>>>>
>>>>> Sorry for this! Regards,
>>>>> Andreas
>>>>>
>>>>> 2019-11-27 12:10 GMT+01:00 Andreas Kersting<[hidden email]>:
>>>>>> Hi,
>>>>>>
>>>>>> I am facing a very weird problem with parallel::mclapply. I have a s=
>> cript which does some data wrangling on an input dataset in parallel and =
>> then writes the results to disk. I have been using this script daily for =
>> more than one year always on an EC2 instance launched from the same AMI (=
>> no updates installed after launch) and processed thousands of different i=
>> nput data sets successfully. I now have an input dataset for which I face=
>> the following bug:
>>>>>>
>>>>>> The basic outline of the problematic section of the script:
>>>>>>
>>>>>> # parts is a data.table with 88 rows
>>>>>> mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
>>>>>>     # do some data wrangling and write the result to a file
>>>>>>     # ...
>>>>>>
>>>>>>     print(paste0("part ", i, " written successfully."))
>>>>>>     return(TRUE)
>>>>>> }, mc.preschedule =3D FALSE, mc.cores =3D 2L)
>>>>>>
>>>>>> str(mc_ret)
>>>>>>
>>>>>>
>>>>>> Expected output: "part i written successfully." is printed 88 times,=
>> once for each value of i. mc_ret is a list of length 88, each element be=
>> ing TRUE. Its structure is printed once. All outputs are created successf=
>> ully.
>>>>>>
>>>>>> Actual output (see end of the message): "part i written successfully=
>> =2E" is printed 88 times, once for each value of i. mc_ret is a list of l=
>> ength 88, each element being TRUE. Its structure is printed. All outputs =
>> are created successfully. So far so good.
>>>>>>
>>>>>> But then "part i written successfully." it is printed another X time=
>> s, for values of i for which it was already printed. This output is inter=
>> mingled with X-1 times the following error message:
>>>>>>
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>>
>>>>>> and Y times the message "Execution halted". mc_ret is printed again,=
>> now being a list of length 85, with the first element being TRUE and all=
>> other elements being NULL. X and Y vary from run to run.
>>>>>>
>>>>>>
>>>>>> Now to the main problem: I tried very hard to create a reproducible =
>> example, but I failed. What I observed:
>>>>>> - The output is (and has always been) written to path which is on an=
>> NFS share. If I instead write to a path on a local disk it will work.
>>>>>> - The script is invoked using Rscript. If I instead source it from a=
>> n interactive R session it works. There are at least two more people who =
>> have observed this: https://stackoverflow.com/questions/51986674/mclapply=
>> -sendmaster-error-only-with-rscript
>>>>>> - Before the call to mclapply the code acquires an exclusive file lo=
>> ck on a dedicated lock file, not written to but also on the NFS share. If=
>> I remove the code acquiring the lock, the whole script will also work if=
>> called using Rscript.
>>>>>> - The problem also occurs for mc.preschedule =3D TRUE.
>>>>>> - There is no error if I set mc.cores to 1.
>>>>>> - And stressing again: the code works without any changes from Rscri=
>> pt for thousands of other data sets.
>>>>>>
>>>>>>
>>>>>> Rscript -e "sessionInfo()":
>>>>>> R version 3.5.2 (2018-12-20)
>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>> Running under: Ubuntu 18.04.2 LTS
>>>>>>
>>>>>> Matrix products: default
>>>>>> BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
>>>>>> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
>>>>>>
>>>>>> locale:
>>>>>>    [1] LC_CTYPE=3DC.UTF-8       LC_NUMERIC=3DC           LC_TIME=3DC=
>> =2EUTF-8
>>>>>>    [4] LC_COLLATE=3DC.UTF-8     LC_MONETARY=3DC.UTF-8    LC_MESSAGES=
>> =3DC.UTF-8
>>>>>>    [7] LC_PAPER=3DC.UTF-8       LC_NAME=3DC              LC_ADDRESS=3D=
>> C
>>>>>> [10] LC_TELEPHONE=3DC         LC_MEASUREMENT=3DC.UTF-8 LC_IDENTIFICA=
>> TION=3DC
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base=
>>
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] compiler_3.5.2
>>>>>>
>>>>>>
>>>>>> I know this is a fairly old R version. I have not been able to repro=
>> duce the bug with a more recent version, but since it is so difficult to =
>> trigger, this does not mean much, I guess. I have looked through the chan=
>> ges made to the code of mclapply since that version and could not find so=
>> mething directly related. I am not even sure if it is a problem in the pa=
>> rallel package or some other (memory) bug. What strikes me is that others=
>> have observed a very similar error when using Rscript but not when using=
>> an interactive R session, just like I do.
>>>>>>
>>>>>> I am not expecting a fix based on the information I provide, but may=
>> be someone has some thoughts on this!?
>>>>>>
>>>>>> Regards,
>>>>>> Andreas
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actual output:
>>>>>>
>>>>>> [1] "part 51 written successfully."
>>>>>> [1] "part 30 written successfully."
>>>>>> [1] "part 32 written successfully."
>>>>>> [1] "part 48 written successfully."
>>>>>> [1] "part 63 written successfully."
>>>>>> [1] "part 5 written successfully."
>>>>>> [1][1] "part 14 written successfully." "part 18 written successfully=
>> =2E"
>>>>>>
>>>>>> [1] "part 38 written successfully."
>>>>>> [1] "part 11 written successfully."
>>>>>> [1] "part 68 written successfully."
>>>>>> [1] "part 45 written successfully."
>>>>>> [1] "part 88 written successfully."
>>>>>> [1] "part 36 written successfully."
>>>>>> [1] "part 44 written successfully."
>>>>>> [1] "part 55 written successfully."
>>>>>> [1] "part 26 written successfully."
>>>>>> [1] "part 37 written successfully."
>>>>>> [1] "part 22 written successfully."
>>>>>> [1] "part 13 written successfully."
>>>>>> [1] "part 67 written successfully."
>>>>>> [1] "part 10 written successfully."
>>>>>> [1] "part 24 written successfully."
>>>>>> [1] "part 20 written successfully."
>>>>>> [1] "part 74 written successfully."
>>>>>> [1] "part 50 written successfully."
>>>>>> [1] "part 2 written successfully."
>>>>>> [1] "part 17 written successfully."
>>>>>> [1] "part 66 written successfully."
>>>>>> [1] "part 61 written successfully."
>>>>>> [1][1] "part 7 written successfully." "part 31 written successfully.=
>> "
>>>>>>
>>>>>> [1] "part 70 written successfully."
>>>>>> [1] "part 71 written successfully."
>>>>>> [1][1] "part 76 written successfully."
>>>>>>    "part 42 written successfully."
>>>>>> [1][1] "part 72 written successfully." "part 12 written successfully=
>> =2E"
>>>>>>
>>>>>> [1] "part 39 written successfully."
>>>>>> [1] "part 58 written successfully."
>>>>>> [1] "part 87 written successfully."
>>>>>> [1] "part 81 written successfully."
>>>>>> [1] "part 3 written successfully."
>>>>>> [1] "part 29 written successfully."
>>>>>> [1] "part 82 written successfully."
>>>>>> [1] "part 54 written successfully."
>>>>>> [1] "part 23 written successfully."
>>>>>> [1] "part 56 written successfully."
>>>>>> [1] "part 73 written successfully."
>>>>>> [1] "part 27 written successfully."
>>>>>> [1] "part 64 written successfully."
>>>>>> [1] "part 46 written successfully."
>>>>>> [1] "part 80 written successfully."
>>>>>> [1] "part 53 written successfully."
>>>>>> [1] "part 43 written successfully."
>>>>>> [1] "part 47 written successfully."
>>>>>> [1] "part 57 written successfully."
>>>>>> [1] "part 75 written successfully."
>>>>>> [1] "part 84 written successfully."
>>>>>> [1] "part 85 written successfully."
>>>>>> [1][1] "part 86 written successfully."
>>>>>>    "part 19 written successfully."
>>>>>> [1][1] "part 33 written successfully." "part 34 written successfully=
>> =2E"
>>>>>>
>>>>>> [1] "part 21 written successfully."
>>>>>> [1] "part 40 written successfully."
>>>>>> [1][1] "part 4 written successfully."
>>>>>>    "part 28 written successfully."
>>>>>> [1] "part 62 written successfully."
>>>>>> [1] "part 16 written successfully."
>>>>>> [1] "part 79 written successfully."
>>>>>> [1] "part 6 written successfully."
>>>>>> [1] "part 77 written successfully."
>>>>>> [1] "part 65 written successfully."
>>>>>> [1] "part 60 written successfully."
>>>>>> [1] "part 78 written successfully."
>>>>>> [1] "part 1 written successfully."
>>>>>> [1] "part 83 written successfully."
>>>>>> [1] "part 9 written successfully."
>>>>>> [1] "part 15 written successfully."
>>>>>> [1][1] "part 41 written successfully."
>>>>>>    "part 35 written successfully."
>>>>>> [1][1] "part 8 written successfully." "part 69 written successfully.=
>> "
>>>>>>
>>>>>> [1] "part 59 written successfully."
>>>>>> [1] "part 25 written successfully."
>>>>>> [1] "part 49 written successfully."
>>>>>> [1] "part 52 written successfully."
>>>>>> List of 88
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>>    $ : logi TRUE
>>>>>> [1] "part 31 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> send=
>> Master
>>>>>> [1] "part 71 written successfully."
>>>>>> [1] "part 35 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 42 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 4 written successfully."
>>>>>> Execution halted
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1][1] "part 30 written successfully." "part 40 written successfully=
>> =2E"
>>>>>>
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 41 written successfully."
>>>>>> [1] "part 80 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 45 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 82 written successfully."
>>>>>> [1] "part 54 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 59 written successfully."
>>>>>> Execution halted
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 27 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 36 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 44 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 83 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 61 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 70 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 66 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 10 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 53 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 55 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 15 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 50 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 5 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 48 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 32 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 69 written successfully."
>>>>>> Execution halted
>>>>>> [1] "part 52 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 77 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 24 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 56 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 6 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 62 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 33 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 16 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 34 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 74 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 2 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 19 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> [1] "part 9 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 11 written successfully."
>>>>>> [1] "part 28 written successfully."
>>>>>> [1] "part 29 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 17 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 65 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> Execution halted
>>>>>> Execution halted
>>>>>> [1] "part 68 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 43 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 38 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 1 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 57 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 8 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1] "part 25 written successfully."
>>>>>> Error in sendMaster(try(eval(expr, env), silent =3D TRUE)) :
>>>>>>     write error, closing pipe to the master
>>>>>> Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMa=
>> ster
>>>>>> [1]List of 85
>>>>>>    $ : logi TRUE
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>>    $ : NULL
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>
>>
>>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel