system/system2 and open file descriptors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

system/system2 and open file descriptors

Winston Chang
It seems that the system() and system2() functions don't close file
descriptors between the fork() and exec() (on Unix platforms, of course).
This means that the child processes inherit open files and socket
connections.

Running this (from a terminal) will result in the child process writing to
a file that was opened by R:

R
f <- file('foo.txt', 'w')
system('echo "abc" >&3')



You can also see the open files if you run the following:
  f <- file('foo.txt', 'w')
  system2('sleep', '100', wait=F)

And then in another terminal:
  lsof -c R -c sleep
it will show that both the R and sleep processes have the file open:
  ...
  R       324 root    3w   REG   0,48        0   4259 /foo.txt
  ...
  sleep   327 root    3w   REG   0,48        0   4259 /foo.txt


This behavior can cause problems if R spawns a child process that outlives
the R process, but keeps open some resources.

Would it be possible to add an option to close file descriptors for child
processes? It would be nice if that were the default, but I suspect that
making that change would break a lot of existing code.

To take an example from the Python world, subprocess.Popen() has an option,
close_fds, which closes all file descriptors except 0, 1, and 2.
  https://docs.python.org/2/library/subprocess.html#popen-constructor


-Winston

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: system/system2 and open file descriptors

Winston Chang
In addition to the issue of a child process holding onto open files, the
child process can also manipulate a file descriptor in a way that affects
the parent process. For example, calling lseek() in the child process will
move the file offset in the parent process.

Here is a set of commands that demonstrates it. They can be copied and
pasted in a terminal. What it does:
- Creates C program that seeks to the beginning of a file descriptor, and
compiles it to a program named "lseek".
- Creates a file with some text in it.
- Starts R. In R:
    - Opens the text file and reads the first line.
    - Runs lseek in a child process.
    - Reads the rest of the lines.


echo "#include <unistd.h>
int main(void) {
  lseek(3, 0, SEEK_SET);
}" > lseek.c

gcc lseek.c -o lseek

echo "line 1
line 2
line 3" > lines.txt

R
f <- file('lines.txt', 'r')
cat(readLines(f, n = 1), sep = "\n")
system('./lseek')
cat(readLines(f), sep = "\n")


Here's what it outputs:
> f <- file('lines.txt', 'r')
> cat(readLines(f, n = 1), sep = "\n")
line 1
> system('./lseek')
> cat(readLines(f), sep = "\n")
line 2
line 3
line 1
line 2
line 3

The child process has changed what the parent process reads from the file.
(I'm guessing that the reason readLines() prints out "line 2" and "line 3"
before starting over is because it has already buffered the whole file
before lseek is executed.)

This is obviously a highly contrived case, but it illustrates what's
possible. The other issue I mentioned, with child processes holding open
files after the R process exits, is more likely to cause problems in the
real world. That's actually how I encountered this issue in the first
place: when restarting R inside of RStudio on a Mac, if there are any
extant child processes started by system(), they keep some files open, and
this causes RStudio to hang. (There's a fix in progress for RStudio for
this particular issue.)

-Winston



On Tue, Apr 18, 2017 at 3:20 PM, Winston Chang <[hidden email]>
wrote:

> It seems that the system() and system2() functions don't close file
> descriptors between the fork() and exec() (on Unix platforms, of course).
> This means that the child processes inherit open files and socket
> connections.
>
> Running this (from a terminal) will result in the child process writing to
> a file that was opened by R:
>
> R
> f <- file('foo.txt', 'w')
> system('echo "abc" >&3')
>
>
>
> You can also see the open files if you run the following:
>   f <- file('foo.txt', 'w')
>   system2('sleep', '100', wait=F)
>
> And then in another terminal:
>   lsof -c R -c sleep
> it will show that both the R and sleep processes have the file open:
>   ...
>   R       324 root    3w   REG   0,48        0   4259 /foo.txt
>   ...
>   sleep   327 root    3w   REG   0,48        0   4259 /foo.txt
>
>
> This behavior can cause problems if R spawns a child process that outlives
> the R process, but keeps open some resources.
>
> Would it be possible to add an option to close file descriptors for child
> processes? It would be nice if that were the default, but I suspect that
> making that change would break a lot of existing code.
>
> To take an example from the Python world, subprocess.Popen() has an
> option, close_fds, which closes all file descriptors except 0, 1, and 2.
>   https://docs.python.org/2/library/subprocess.html#popen-constructor
>
>
> -Winston
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: system/system2 and open file descriptors

R devel mailing list
In S+ on Unix-alikes we dealt with this issue by using fcntl(fd,
F_SETFD, 1) to set the close-on-exec flag on a file descriptor as soon
as we opened it.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Apr 19, 2017 at 8:40 PM, Winston Chang <[hidden email]> wrote:

> In addition to the issue of a child process holding onto open files, the
> child process can also manipulate a file descriptor in a way that affects
> the parent process. For example, calling lseek() in the child process will
> move the file offset in the parent process.
>
> Here is a set of commands that demonstrates it. They can be copied and
> pasted in a terminal. What it does:
> - Creates C program that seeks to the beginning of a file descriptor, and
> compiles it to a program named "lseek".
> - Creates a file with some text in it.
> - Starts R. In R:
>     - Opens the text file and reads the first line.
>     - Runs lseek in a child process.
>     - Reads the rest of the lines.
>
>
> echo "#include <unistd.h>
> int main(void) {
>   lseek(3, 0, SEEK_SET);
> }" > lseek.c
>
> gcc lseek.c -o lseek
>
> echo "line 1
> line 2
> line 3" > lines.txt
>
> R
> f <- file('lines.txt', 'r')
> cat(readLines(f, n = 1), sep = "\n")
> system('./lseek')
> cat(readLines(f), sep = "\n")
>
>
> Here's what it outputs:
>> f <- file('lines.txt', 'r')
>> cat(readLines(f, n = 1), sep = "\n")
> line 1
>> system('./lseek')
>> cat(readLines(f), sep = "\n")
> line 2
> line 3
> line 1
> line 2
> line 3
>
> The child process has changed what the parent process reads from the file.
> (I'm guessing that the reason readLines() prints out "line 2" and "line 3"
> before starting over is because it has already buffered the whole file
> before lseek is executed.)
>
> This is obviously a highly contrived case, but it illustrates what's
> possible. The other issue I mentioned, with child processes holding open
> files after the R process exits, is more likely to cause problems in the
> real world. That's actually how I encountered this issue in the first
> place: when restarting R inside of RStudio on a Mac, if there are any
> extant child processes started by system(), they keep some files open, and
> this causes RStudio to hang. (There's a fix in progress for RStudio for
> this particular issue.)
>
> -Winston
>
>
>
> On Tue, Apr 18, 2017 at 3:20 PM, Winston Chang <[hidden email]>
> wrote:
>
>> It seems that the system() and system2() functions don't close file
>> descriptors between the fork() and exec() (on Unix platforms, of course).
>> This means that the child processes inherit open files and socket
>> connections.
>>
>> Running this (from a terminal) will result in the child process writing to
>> a file that was opened by R:
>>
>> R
>> f <- file('foo.txt', 'w')
>> system('echo "abc" >&3')
>>
>>
>>
>> You can also see the open files if you run the following:
>>   f <- file('foo.txt', 'w')
>>   system2('sleep', '100', wait=F)
>>
>> And then in another terminal:
>>   lsof -c R -c sleep
>> it will show that both the R and sleep processes have the file open:
>>   ...
>>   R       324 root    3w   REG   0,48        0   4259 /foo.txt
>>   ...
>>   sleep   327 root    3w   REG   0,48        0   4259 /foo.txt
>>
>>
>> This behavior can cause problems if R spawns a child process that outlives
>> the R process, but keeps open some resources.
>>
>> Would it be possible to add an option to close file descriptors for child
>> processes? It would be nice if that were the default, but I suspect that
>> making that change would break a lot of existing code.
>>
>> To take an example from the Python world, subprocess.Popen() has an
>> option, close_fds, which closes all file descriptors except 0, 1, and 2.
>>   https://docs.python.org/2/library/subprocess.html#popen-constructor
>>
>>
>> -Winston
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...