mccollect with NULL in R 3.6

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

mccollect with NULL in R 3.6

Gergely Daróczi
Dear All,

I'm running into issues with calling mccollect on a list containing NULL
using R 3.6 (this used to work in 3.5.3):

jobs <- lapply(
    list(NULL, 'foobar'),
    function(x) mcparallel(identity(x)))
mccollect(jobs, wait = FALSE, timeout = 0)
#> Error in names(res) <- pnames[match(s, pids)] :
#>   'names' attribute [2] must be the same length as the vector [1]

Note, setting a "name" for the jobs does not help, but the above works with
"wait=TRUE", and also if I change the order of NULL and "foobar", although
in that case, the second value (NULL) is ommitted.  It also works with
mclapply fine.

Any ideas/suggestion on how to get mccollect work with the above example?

Thanks,
Gergely

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mccollect with NULL in R 3.6

Tomas Kalibera
On 5/1/19 12:25 AM, Gergely Daróczi wrote:

> Dear All,
>
> I'm running into issues with calling mccollect on a list containing NULL
> using R 3.6 (this used to work in 3.5.3):
>
> jobs <- lapply(
>      list(NULL, 'foobar'),
>      function(x) mcparallel(identity(x)))
> mccollect(jobs, wait = FALSE, timeout = 0)
> #> Error in names(res) <- pnames[match(s, pids)] :
> #>   'names' attribute [2] must be the same length as the vector [1]
>
> Note, setting a "name" for the jobs does not help, but the above works with
> "wait=TRUE", and also if I change the order of NULL and "foobar", although
> in that case, the second value (NULL) is ommitted.  It also works with
> mclapply fine.
>
> Any ideas/suggestion on how to get mccollect work with the above example?.

NULL is not a valid job identification. Perhaps mccollect() could give a
clearer error message, but I don't see, given its documentation, what
else than throwing an error it should do. What is the problem you were
trying to solve?

Best
Tomas

>
> Thanks,
> Gergely
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mccollect with NULL in R 3.6

Gergely Daróczi
On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <[hidden email]> wrote:

>
> On 5/1/19 12:25 AM, Gergely Daróczi wrote:
> > Dear All,
> >
> > I'm running into issues with calling mccollect on a list containing NULL
> > using R 3.6 (this used to work in 3.5.3):
> >
> > jobs <- lapply(
> >      list(NULL, 'foobar'),
> >      function(x) mcparallel(identity(x)))
> > mccollect(jobs, wait = FALSE, timeout = 0)
> > #> Error in names(res) <- pnames[match(s, pids)] :
> > #>   'names' attribute [2] must be the same length as the vector [1]
> >
> > Note, setting a "name" for the jobs does not help, but the above works with
> > "wait=TRUE", and also if I change the order of NULL and "foobar", although
> > in that case, the second value (NULL) is ommitted.  It also works with
> > mclapply fine.
> >
> > Any ideas/suggestion on how to get mccollect work with the above example?.
>
> NULL is not a valid job identification. Perhaps mccollect() could give a
> clearer error message, but I don't see, given its documentation, what
> else than throwing an error it should do. What is the problem you were
> trying to solve?

Thank you very much for looking into this!

What was interesting to me is that it used to work before 3.6 -- I
have a script iterating over a list of data frames to train models,
but it started to fail with the 3.6 release.

The "NULL is not a valid job identification" problem doesn't seem to
stand for my production job, as each list element has a proper name,
but I think I can reproduce this with this minimal example as well:

library(parallel)
jobs <- lapply(1:2, function(x) {
    mcparallel(if (x == 1) NULL else x, name = as.character(x))
})
mccollect(jobs, wait = FALSE, timeout = 2)
#> Error in names(res) <- pnames[match(s, pids)] :
#>   'names' attribute [1] must be the same length as the vector [0]

So the jobs have proper name, but the NULL return value is causing
problems. Note, that it only causes problems when the NULL value is
the first, eg switching 1 and 2 works, also running this on 1:3 and
returning NULL on 2 etc.

Now, I'm aware that 7 months ago this was added to the docs at
https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4
that NULL should not be returned, so it seems to be a user error on my
end, but it seems to fail only when NULL is the first returned element
in mccollect, and working OK eg when NULL is the 2nd or other element
(although with side effects, eg missed elements).

So maybe failing with an explicit error message whenever mccollect
hits a NULL for consistency might help here instead of skipping
"delivered.result <- delivered.result + 1L" when the returned value is
not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72
? Or even better (at least for my use case), maybe allowing to return
NULL and throwing the warning on line 108 in that case.

Thanks for considering this,
Gergely


>
> Best
> Tomas
>
> >
> > Thanks,
> > Gergely
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mccollect with NULL in R 3.6

Tomas Kalibera
On 5/3/19 3:04 PM, Gergely Daróczi wrote:

> On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <[hidden email]> wrote:
>> On 5/1/19 12:25 AM, Gergely Daróczi wrote:
>>> Dear All,
>>>
>>> I'm running into issues with calling mccollect on a list containing NULL
>>> using R 3.6 (this used to work in 3.5.3):
>>>
>>> jobs <- lapply(
>>>       list(NULL, 'foobar'),
>>>       function(x) mcparallel(identity(x)))
>>> mccollect(jobs, wait = FALSE, timeout = 0)
>>> #> Error in names(res) <- pnames[match(s, pids)] :
>>> #>   'names' attribute [2] must be the same length as the vector [1]
>>>
>>> Note, setting a "name" for the jobs does not help, but the above works with
>>> "wait=TRUE", and also if I change the order of NULL and "foobar", although
>>> in that case, the second value (NULL) is ommitted.  It also works with
>>> mclapply fine.
>>>
>>> Any ideas/suggestion on how to get mccollect work with the above example?.
>> NULL is not a valid job identification. Perhaps mccollect() could give a
>> clearer error message, but I don't see, given its documentation, what
>> else than throwing an error it should do. What is the problem you were
>> trying to solve?
> Thank you very much for looking into this!
>
> What was interesting to me is that it used to work before 3.6 -- I
> have a script iterating over a list of data frames to train models,
> but it started to fail with the 3.6 release.
>
> The "NULL is not a valid job identification" problem doesn't seem to
> stand for my production job, as each list element has a proper name,
> but I think I can reproduce this with this minimal example as well:
>
> library(parallel)
> jobs <- lapply(1:2, function(x) {
>      mcparallel(if (x == 1) NULL else x, name = as.character(x))
> })
> mccollect(jobs, wait = FALSE, timeout = 2)
> #> Error in names(res) <- pnames[match(s, pids)] :
> #>   'names' attribute [1] must be the same length as the vector [0]
>
> So the jobs have proper name, but the NULL return value is causing
> problems. Note, that it only causes problems when the NULL value is
> the first, eg switching 1 and 2 works, also running this on 1:3 and
> returning NULL on 2 etc.
Thanks for the clarification, I got confused by that you wrote about
"calling mccollect on a list containing NULL", so I was answering a
different question. Now I see you actually meant collecting results from
jobs that return NULL, and even though that writing jobs that return
NULL is incorrect, I see that there is really a bug that causes the R error.
> Now, I'm aware that 7 months ago this was added to the docs at
> https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4
> that NULL should not be returned, so it seems to be a user error on my
> end, but it seems to fail only when NULL is the first returned element
> in mccollect, and working OK eg when NULL is the 2nd or other element
> (although with side effects, eg missed elements).
NULL has been reserved even well before that documentation update, and
if it somewhat worked as a result, it had been a coincidence. The
documentation change was only to make it more obvious, but it already
said "‘mccollect’ will return ‘NULL’ for a terminating job that has sent
its results already after which the job is no longer available."
> So maybe failing with an explicit error message whenever mccollect
> hits a NULL for consistency might help here instead of skipping
> "delivered.result <- delivered.result + 1L" when the returned value is
> not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72
> ? Or even better (at least for my use case), maybe allowing to return
> NULL and throwing the warning on line 108 in that case.

Thanks for the report, I've fixed mccollect() in R-devel to return NULL
from the job. To be ported to R-pathed unless problems appear.

Best
Tomas

>
> Thanks for considering this,
> Gergely

>
>> Best
>> Tomas
>>
>>> Thanks,
>>> Gergely
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: mccollect with NULL in R 3.6

Gergely Daróczi
On Fri, May 3, 2019 at 4:34 PM Tomas Kalibera <[hidden email]> wrote:

>
> On 5/3/19 3:04 PM, Gergely Daróczi wrote:
> > On Thu, May 2, 2019 at 7:24 PM Tomas Kalibera <[hidden email]> wrote:
> >> On 5/1/19 12:25 AM, Gergely Daróczi wrote:
> >>> Dear All,
> >>>
> >>> I'm running into issues with calling mccollect on a list containing NULL
> >>> using R 3.6 (this used to work in 3.5.3):
> >>>
> >>> jobs <- lapply(
> >>>       list(NULL, 'foobar'),
> >>>       function(x) mcparallel(identity(x)))
> >>> mccollect(jobs, wait = FALSE, timeout = 0)
> >>> #> Error in names(res) <- pnames[match(s, pids)] :
> >>> #>   'names' attribute [2] must be the same length as the vector [1]
> >>>
> >>> Note, setting a "name" for the jobs does not help, but the above works with
> >>> "wait=TRUE", and also if I change the order of NULL and "foobar", although
> >>> in that case, the second value (NULL) is ommitted.  It also works with
> >>> mclapply fine.
> >>>
> >>> Any ideas/suggestion on how to get mccollect work with the above example?.
> >> NULL is not a valid job identification. Perhaps mccollect() could give a
> >> clearer error message, but I don't see, given its documentation, what
> >> else than throwing an error it should do. What is the problem you were
> >> trying to solve?
> > Thank you very much for looking into this!
> >
> > What was interesting to me is that it used to work before 3.6 -- I
> > have a script iterating over a list of data frames to train models,
> > but it started to fail with the 3.6 release.
> >
> > The "NULL is not a valid job identification" problem doesn't seem to
> > stand for my production job, as each list element has a proper name,
> > but I think I can reproduce this with this minimal example as well:
> >
> > library(parallel)
> > jobs <- lapply(1:2, function(x) {
> >      mcparallel(if (x == 1) NULL else x, name = as.character(x))
> > })
> > mccollect(jobs, wait = FALSE, timeout = 2)
> > #> Error in names(res) <- pnames[match(s, pids)] :
> > #>   'names' attribute [1] must be the same length as the vector [0]
> >
> > So the jobs have proper name, but the NULL return value is causing
> > problems. Note, that it only causes problems when the NULL value is
> > the first, eg switching 1 and 2 works, also running this on 1:3 and
> > returning NULL on 2 etc.
> Thanks for the clarification, I got confused by that you wrote about
> "calling mccollect on a list containing NULL", so I was answering a
> different question. Now I see you actually meant collecting results from
> jobs that return NULL, and even though that writing jobs that return
> NULL is incorrect, I see that there is really a bug that causes the R error.
> > Now, I'm aware that 7 months ago this was added to the docs at
> > https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4
> > that NULL should not be returned, so it seems to be a user error on my
> > end, but it seems to fail only when NULL is the first returned element
> > in mccollect, and working OK eg when NULL is the 2nd or other element
> > (although with side effects, eg missed elements).
> NULL has been reserved even well before that documentation update, and
> if it somewhat worked as a result, it had been a coincidence. The
> documentation change was only to make it more obvious, but it already
> said "‘mccollect’ will return ‘NULL’ for a terminating job that has sent
> its results already after which the job is no longer available."
> > So maybe failing with an explicit error message whenever mccollect
> > hits a NULL for consistency might help here instead of skipping
> > "delivered.result <- delivered.result + 1L" when the returned value is
> > not raw at https://github.com/wch/r-source/commit/f0d15be765dcf92e2349429428d49cd5b212abb4#diff-e634fbaed323aac88667e7826865b160R72
> > ? Or even better (at least for my use case), maybe allowing to return
> > NULL and throwing the warning on line 108 in that case.
>
> Thanks for the report, I've fixed mccollect() in R-devel to return NULL
> from the job. To be ported to R-pathed unless problems appear.

Thank you very much for the great news and your work!

>
> Best
> Tomas
>
> >
> > Thanks for considering this,
> > Gergely
>
> >
> >> Best
> >> Tomas
> >>
> >>> Thanks,
> >>> Gergely
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel