Detecting whether a process exists or not by its PID?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Detecting whether a process exists or not by its PID?

Henrik Bengtsson-5
Hi, I'd like to test whether a (localhost) PSOCK cluster node is still
running or not by its PID, e.g. it may have crashed / core dumped.
I'm ok with getting false-positive results due to *another* process
with the same PID has since started.

I can the PID of each cluster nodes by querying them for their
Sys.getpid(), e.g.

    pids <- parallel::clusterEvalQ(cl, Sys.getpid())

Is there a function in core R for testing whether a process with a
given PID exists or not? From trial'n'error, I found that on Linux:

  pid_exists <- function(pid) as.logical(tools::pskill(pid, signal = 0L))

returns TRUE for existing processes and FALSE otherwise, but I'm not
sure if I can trust this.  It's not a documented feature in
?tools::pskill, which also warns about 'signal' not being standardized
across OSes.

The other Linux alternative I can imagine is:

  pid_exists <- function(pid) system2("ps", args = c("--pid", pid),
stdout = FALSE) == 0L

Can I expect this to work on macOS as well?  What about other *nix systems?

And, finally, what can be done on Windows?

I'm sure there are packages on CRAN that provides this, but I'd like
to keep dependencies at a minimum.

I appreciate any feedback. Thxs,

Henrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Detecting whether a process exists or not by its PID?

Gábor Csárdi
On Fri, Aug 31, 2018 at 1:18 AM Henrik Bengtsson
<[hidden email]> wrote:
[...]
>   pid_exists <- function(pid) as.logical(tools::pskill(pid, signal = 0L))
>
> returns TRUE for existing processes and FALSE otherwise, but I'm not
> sure if I can trust this.  It's not a documented feature in
> ?tools::pskill, which also warns about 'signal' not being standardized
> across OSes.

Yes, as long as tools::pskill() is willing to call a killl(0) system
call, AFAIK this will work fine on all UNIX systems.

> The other Linux alternative I can imagine is:
>
>   pid_exists <- function(pid) system2("ps", args = c("--pid", pid),
> stdout = FALSE) == 0L
>
> Can I expect this to work on macOS as well?  What about other *nix systems?

There is no --pid option on macOS. I think simply `ps <pid>` is
better, but some very minimal systems might not have ps at all.

> And, finally, what can be done on Windows?

You need to call OpenProcess from C, or find some base R function that
does that without messing up the process. Seems like tools::psnice()
does that.

> I'm sure there are packages on CRAN that provides this, but I'd like
> to keep dependencies at a minimum.

Yes, e.g. the ps package does this, and it does it properly, i.e. you
don't need to worry about pid reuse. Pid reuse does cause problems
quite frequently, especially on Windows, and especially on a system
that starts a lot of processes, like win-builder.

Gabor

> I appreciate any feedback. Thxs,
>
> Henrik
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Detecting whether a process exists or not by its PID?

Tomas Kalibera
In reply to this post by Henrik Bengtsson-5
On 08/31/2018 01:18 AM, Henrik Bengtsson wrote:
> Hi, I'd like to test whether a (localhost) PSOCK cluster node is still
> running or not by its PID, e.g. it may have crashed / core dumped.
> I'm ok with getting false-positive results due to *another* process
> with the same PID has since started.
kill(sig=0) is specified by POSIX but indeed as you say there is a race
condition due to PID-reuse.  In principle, detecting that a worker
process is still alive cannot be done correctly outside base R. At
user-level I would probably consider some watchdog, e.g. the parallel
tasks would be repeatedly touching a file.

In base R, one can do this correctly for forked processes via
mcparallel/mccollect, not for PSOCK cluster workers which are based on
system() (and I understand it would be a useful feature)

 > j <- mcparallel(Sys.sleep(1000))
 > mccollect(j, wait=FALSE)
NULL

# kill the child process

 > mccollect(j, wait=FALSE)
$`1542`
NULL

More details indeed in ?mcparallel. The key part is that the job must be
started as non-detached and as soon as mccollect() collects is,
mccollect() must never be called on it again.

Tomas

>
> I can the PID of each cluster nodes by querying them for their
> Sys.getpid(), e.g.
>
>      pids <- parallel::clusterEvalQ(cl, Sys.getpid())
>
> Is there a function in core R for testing whether a process with a
> given PID exists or not? From trial'n'error, I found that on Linux:
>
>    pid_exists <- function(pid) as.logical(tools::pskill(pid, signal = 0L))
>
> returns TRUE for existing processes and FALSE otherwise, but I'm not
> sure if I can trust this.  It's not a documented feature in
> ?tools::pskill, which also warns about 'signal' not being standardized
> across OSes.
>
> The other Linux alternative I can imagine is:
>
>    pid_exists <- function(pid) system2("ps", args = c("--pid", pid),
> stdout = FALSE) == 0L
>
> Can I expect this to work on macOS as well?  What about other *nix systems?
>
> And, finally, what can be done on Windows?
>
> I'm sure there are packages on CRAN that provides this, but I'd like
> to keep dependencies at a minimum.
>
> I appreciate any feedback. Thxs,
>
> Henrik
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Detecting whether a process exists or not by its PID?

Gábor Csárdi
On Fri, Aug 31, 2018 at 2:51 PM Tomas Kalibera <[hidden email]> wrote:
[...]
> kill(sig=0) is specified by POSIX but indeed as you say there is a race
> condition due to PID-reuse.  In principle, detecting that a worker
> process is still alive cannot be done correctly outside base R.

I am not sure why you think so.

> At user-level I would probably consider some watchdog, e.g. the parallel
> tasks would be repeatedly touching a file.

I am pretty sure that there are simpler and better solutions. E.g. one
would be to
ask the worker process for its startup time (with as much precision as possible)
and then use the (pid, startup_time) pair as a unique id.

With this you can check if the process is still running, by checking
that the pid exists,
and that its startup time matches.

This is all very simple with the ps package, on Linux, macOS and Windows.

Gabor

> In base R, one can do this correctly for forked processes via
> mcparallel/mccollect, not for PSOCK cluster workers which are based on
> system() (and I understand it would be a useful feature)
>
>  > j <- mcparallel(Sys.sleep(1000))
>  > mccollect(j, wait=FALSE)
> NULL
>
> # kill the child process
>
>  > mccollect(j, wait=FALSE)
> $`1542`
> NULL
>
> More details indeed in ?mcparallel. The key part is that the job must be
> started as non-detached and as soon as mccollect() collects is,
> mccollect() must never be called on it again.
>
> Tomas
>
> >
> > I can the PID of each cluster nodes by querying them for their
> > Sys.getpid(), e.g.
> >
> >      pids <- parallel::clusterEvalQ(cl, Sys.getpid())
> >
> > Is there a function in core R for testing whether a process with a
> > given PID exists or not? From trial'n'error, I found that on Linux:
> >
> >    pid_exists <- function(pid) as.logical(tools::pskill(pid, signal = 0L))
> >
> > returns TRUE for existing processes and FALSE otherwise, but I'm not
> > sure if I can trust this.  It's not a documented feature in
> > ?tools::pskill, which also warns about 'signal' not being standardized
> > across OSes.
> >
> > The other Linux alternative I can imagine is:
> >
> >    pid_exists <- function(pid) system2("ps", args = c("--pid", pid),
> > stdout = FALSE) == 0L
> >
> > Can I expect this to work on macOS as well?  What about other *nix systems?
> >
> > And, finally, what can be done on Windows?
> >
> > I'm sure there are packages on CRAN that provides this, but I'd like
> > to keep dependencies at a minimum.
> >
> > I appreciate any feedback. Thxs,
> >
> > Henrik
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Detecting whether a process exists or not by its PID?

Tomas Kalibera
On 08/31/2018 03:13 PM, Gábor Csárdi wrote:
> On Fri, Aug 31, 2018 at 2:51 PM Tomas Kalibera <[hidden email]> wrote:
> [...]
>> kill(sig=0) is specified by POSIX but indeed as you say there is a race
>> condition due to PID-reuse.  In principle, detecting that a worker
>> process is still alive cannot be done correctly outside base R.
> I am not sure why you think so.
To avoid the race with PID re-use one needs access to signal handling,
to blocking signals, to handling sigchld. system/system2 and
mcparallel/mccollect in base R use these features and the interaction is
still safe given the specific use in system/system2 and
mcparallel/mccollect, yet would have to be re-visited if either of the
two uses change. These features cannot be safely used outside of base R
in contributed packages.

Tomas

>
>> At user-level I would probably consider some watchdog, e.g. the parallel
>> tasks would be repeatedly touching a file.
> I am pretty sure that there are simpler and better solutions. E.g. one
> would be to
> ask the worker process for its startup time (with as much precision as possible)
> and then use the (pid, startup_time) pair as a unique id.
>
> With this you can check if the process is still running, by checking
> that the pid exists,
> and that its startup time matches.
>
> This is all very simple with the ps package, on Linux, macOS and Windows.
>
> Gabor
>
>> In base R, one can do this correctly for forked processes via
>> mcparallel/mccollect, not for PSOCK cluster workers which are based on
>> system() (and I understand it would be a useful feature)
>>
>>   > j <- mcparallel(Sys.sleep(1000))
>>   > mccollect(j, wait=FALSE)
>> NULL
>>
>> # kill the child process
>>
>>   > mccollect(j, wait=FALSE)
>> $`1542`
>> NULL
>>
>> More details indeed in ?mcparallel. The key part is that the job must be
>> started as non-detached and as soon as mccollect() collects is,
>> mccollect() must never be called on it again.
>>
>> Tomas
>>
>>> I can the PID of each cluster nodes by querying them for their
>>> Sys.getpid(), e.g.
>>>
>>>       pids <- parallel::clusterEvalQ(cl, Sys.getpid())
>>>
>>> Is there a function in core R for testing whether a process with a
>>> given PID exists or not? From trial'n'error, I found that on Linux:
>>>
>>>     pid_exists <- function(pid) as.logical(tools::pskill(pid, signal = 0L))
>>>
>>> returns TRUE for existing processes and FALSE otherwise, but I'm not
>>> sure if I can trust this.  It's not a documented feature in
>>> ?tools::pskill, which also warns about 'signal' not being standardized
>>> across OSes.
>>>
>>> The other Linux alternative I can imagine is:
>>>
>>>     pid_exists <- function(pid) system2("ps", args = c("--pid", pid),
>>> stdout = FALSE) == 0L
>>>
>>> Can I expect this to work on macOS as well?  What about other *nix systems?
>>>
>>> And, finally, what can be done on Windows?
>>>
>>> I'm sure there are packages on CRAN that provides this, but I'd like
>>> to keep dependencies at a minimum.
>>>
>>> I appreciate any feedback. Thxs,
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Detecting whether a process exists or not by its PID?

Gábor Csárdi
On Fri, Aug 31, 2018 at 3:35 PM Tomas Kalibera <[hidden email]> wrote:

>
> On 08/31/2018 03:13 PM, Gábor Csárdi wrote:
> > On Fri, Aug 31, 2018 at 2:51 PM Tomas Kalibera <[hidden email]> wrote:
> > [...]
> >> kill(sig=0) is specified by POSIX but indeed as you say there is a race
> >> condition due to PID-reuse.  In principle, detecting that a worker
> >> process is still alive cannot be done correctly outside base R.
> > I am not sure why you think so.
> To avoid the race with PID re-use one needs access to signal handling,
> to blocking signals, to handling sigchld. system/system2 and
> mcparallel/mccollect in base R use these features and the interaction is
> still safe given the specific use in system/system2 and
> mcparallel/mccollect, yet would have to be re-visited if either of the
> two uses change. These features cannot be safely used outside of base R
> in contributed packages.

Yes, _in theory_ this is right, and of course this only works for
child processes.

_In practice_, you do not need signal handling. The startup time stamp
method is
completely fine, because it is practically impossible to have two
processes with the
same pid and the same (high precision) startup time. This method also
works for any
process (not just child processes), so for PSOCK clusters as well.

Gabor

[...]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Detecting whether a process exists or not by its PID?

luke-tierney
On Fri, 31 Aug 2018, Gábor Csárdi wrote:

> On Fri, Aug 31, 2018 at 3:35 PM Tomas Kalibera <[hidden email]> wrote:
>>
>> On 08/31/2018 03:13 PM, Gábor Csárdi wrote:
>>> On Fri, Aug 31, 2018 at 2:51 PM Tomas Kalibera <[hidden email]> wrote:
>>> [...]
>>>> kill(sig=0) is specified by POSIX but indeed as you say there is a race
>>>> condition due to PID-reuse.  In principle, detecting that a worker
>>>> process is still alive cannot be done correctly outside base R.
>>> I am not sure why you think so.
>> To avoid the race with PID re-use one needs access to signal handling,
>> to blocking signals, to handling sigchld. system/system2 and
>> mcparallel/mccollect in base R use these features and the interaction is
>> still safe given the specific use in system/system2 and
>> mcparallel/mccollect, yet would have to be re-visited if either of the
>> two uses change. These features cannot be safely used outside of base R
>> in contributed packages.
>
> Yes, _in theory_ this is right, and of course this only works for
> child processes.
>
> _In practice_, you do not need signal handling. The startup time stamp
> method is
> completely fine, because it is practically impossible to have two
> processes with the
> same pid and the same (high precision) startup time. This method also
> works for any
> process (not just child processes), so for PSOCK clusters as well.

PSOCK workers may not be running on the same host as the master process.

Best,

luke

>
> Gabor
>
> [...]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   [hidden email]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel