Quantcast

tempdir() may be deleted during long-running R session

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

tempdir() may be deleted during long-running R session

Mikko Korpela-2
Temporary files not accessed for a long time are automatically removed
in some Linux distributions and probably other operating systems too,
depending on system configuration. This may affect the per-session
temporary directory, the path of which is returned by tempdir(). I think
it would be nice if R automatically tried to recreate a missing
tempdir() but this could have some performance implications.

I ran the same test (below) on R 3.3.3 patched, R 3.4.0 beta, and
R-devel, all at r72499 (2017-04-09) and compiled by myself. The results
from the test were practically identical on all of those versions, the
test platform being Ubuntu 14.04.5 LTS. This system is configured for a
/tmp cleanup threshold of 7 days of inactivity (which is the default).
After a wait of roughly 10 days, the R temporary directory had been
deleted by an automatic cleanup procedure, and a call to `?` failed.
This StackExchange question has some answers about the Ubuntu /tmp
cleanup practice: https://askubuntu.com/q/20783

a <- print(tempdir())
# [1] "/tmp/user/1069138/RtmpGc9M5z"
dir.exists(a) # TRUE
# [1] TRUE
Sys.time()
# [1] "2017-04-10 16:00:30 EEST"
## Wait for one week (Ubuntu 14.04.5 LTS)
print(Sys.time()); ?regex
# [1] "2017-04-20 14:17:29 EEST"
# Error in file(out, "wt") : cannot open the connection
# In addition: Warning message:
# In file(out, "wt") :
#   cannot open file '/tmp/user/1069138/RtmpGc9M5z/Rtxt3dbb65870ad4': No
such file or directory
b <- print(tempdir())
# [1] "/tmp/user/1069138/RtmpGc9M5z"
identical(a, b)
# [1] TRUE
dir.exists(b)
# [1] FALSE

--
Mikko Korpela
Department of Geosciences and Geography
University of Helsinki

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Prof Brian Ripley
 From the R-admin manual §5:

'Various environment variables can be set to determine where R creates
its per-session temporary directory. The environment variables TMPDIR,
TMP and TEMP are searched in turn and the first one which is set and
points to a writable area is used. If none do, the final default is /tmp
on Unix-alikes and the value of R_USER on Windows. The path should be an
absolute path not containing spaces (and it is best to avoid
non-alphanumeric characters such as +).

Some Unix-alike systems are set up to remove files and directories
periodically from /tmp, for example by a cron job running tmpwatch. Set
TMPDIR to another directory before starting long-running jobs on such a
system.'


On 21/04/2017 11:49, Mikko Korpela wrote:
> Temporary files not accessed for a long time are automatically removed
> in some Linux distributions and probably other operating systems too,
> depending on system configuration. This may affect the per-session
> temporary directory, the path of which is returned by tempdir(). I think

Not for those who follow the manual and know that sysadmnins have
enabled such a script.

> it would be nice if R automatically tried to recreate a missing
> tempdir() but this could have some performance implications.
>
> I ran the same test (below) on R 3.3.3 patched, R 3.4.0 beta, and
> R-devel, all at r72499 (2017-04-09) and compiled by myself. The results
> from the test were practically identical on all of those versions, the
> test platform being Ubuntu 14.04.5 LTS. This system is configured for a
> /tmp cleanup threshold of 7 days of inactivity (which is the default).
> After a wait of roughly 10 days, the R temporary directory had been
> deleted by an automatic cleanup procedure, and a call to `?` failed.
> This StackExchange question has some answers about the Ubuntu /tmp
> cleanup practice: https://askubuntu.com/q/20783
>
> a <- print(tempdir())
> # [1] "/tmp/user/1069138/RtmpGc9M5z"
> dir.exists(a) # TRUE
> # [1] TRUE
> Sys.time()
> # [1] "2017-04-10 16:00:30 EEST"
> ## Wait for one week (Ubuntu 14.04.5 LTS)
> print(Sys.time()); ?regex
> # [1] "2017-04-20 14:17:29 EEST"
> # Error in file(out, "wt") : cannot open the connection
> # In addition: Warning message:
> # In file(out, "wt") :
> #   cannot open file '/tmp/user/1069138/RtmpGc9M5z/Rtxt3dbb65870ad4': No
> such file or directory
> b <- print(tempdir())
> # [1] "/tmp/user/1069138/RtmpGc9M5z"
> identical(a, b)
> # [1] TRUE
> dir.exists(b)
> # [1] FALSE
>


--
Brian D. Ripley,                  [hidden email]
Emeritus Professor of Applied Statistics, University of Oxford

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Joris FA Meys
In defense of the OP: I would have checked ?tmpdir and missed the
information in the manual as well. On the help page there's ample
information on the underlying processes that create the dir on multiple
platforms. I think adding the last two sentences of prof. Ripley's quote as
a warning to the help page, would be worth the effort.

I do wonder though why you would run something that lasts 10 days and still
rely on something that is called a "temporary" directory.

Best regards
Joris

On Fri, Apr 21, 2017 at 1:03 PM, Prof Brian Ripley <[hidden email]>
wrote:

> From the R-admin manual §5:
>
> 'Various environment variables can be set to determine where R creates its
> per-session temporary directory. The environment variables TMPDIR, TMP and
> TEMP are searched in turn and the first one which is set and points to a
> writable area is used. If none do, the final default is /tmp on Unix-alikes
> and the value of R_USER on Windows. The path should be an absolute path not
> containing spaces (and it is best to avoid non-alphanumeric characters such
> as +).
>
> Some Unix-alike systems are set up to remove files and directories
> periodically from /tmp, for example by a cron job running tmpwatch. Set
> TMPDIR to another directory before starting long-running jobs on such a
> system.'
>
>
> On 21/04/2017 11:49, Mikko Korpela wrote:
>
>> Temporary files not accessed for a long time are automatically removed
>> in some Linux distributions and probably other operating systems too,
>> depending on system configuration. This may affect the per-session
>> temporary directory, the path of which is returned by tempdir(). I think
>>
>
> Not for those who follow the manual and know that sysadmnins have enabled
> such a script.
>
>
> it would be nice if R automatically tried to recreate a missing
>> tempdir() but this could have some performance implications.
>>
>> I ran the same test (below) on R 3.3.3 patched, R 3.4.0 beta, and
>> R-devel, all at r72499 (2017-04-09) and compiled by myself. The results
>> from the test were practically identical on all of those versions, the
>> test platform being Ubuntu 14.04.5 LTS. This system is configured for a
>> /tmp cleanup threshold of 7 days of inactivity (which is the default).
>> After a wait of roughly 10 days, the R temporary directory had been
>> deleted by an automatic cleanup procedure, and a call to `?` failed.
>> This StackExchange question has some answers about the Ubuntu /tmp
>> cleanup practice: https://askubuntu.com/q/20783
>>
>> a <- print(tempdir())
>> # [1] "/tmp/user/1069138/RtmpGc9M5z"
>> dir.exists(a) # TRUE
>> # [1] TRUE
>> Sys.time()
>> # [1] "2017-04-10 16:00:30 EEST"
>> ## Wait for one week (Ubuntu 14.04.5 LTS)
>> print(Sys.time()); ?regex
>> # [1] "2017-04-20 14:17:29 EEST"
>> # Error in file(out, "wt") : cannot open the connection
>> # In addition: Warning message:
>> # In file(out, "wt") :
>> #   cannot open file '/tmp/user/1069138/RtmpGc9M5z/Rtxt3dbb65870ad4': No
>> such file or directory
>> b <- print(tempdir())
>> # [1] "/tmp/user/1069138/RtmpGc9M5z"
>> identical(a, b)
>> # [1] TRUE
>> dir.exists(b)
>> # [1] FALSE
>>
>>
>
> --
> Brian D. Ripley,                  [hidden email]
> Emeritus Professor of Applied Statistics, University of Oxford
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Mikko Korpela-2
In reply to this post by Prof Brian Ripley
On 21/04/17 14:03, Prof Brian Ripley wrote:

> From the R-admin manual §5:
>
> 'Various environment variables can be set to determine where R creates
> its per-session temporary directory. The environment variables TMPDIR,
> TMP and TEMP are searched in turn and the first one which is set and
> points to a writable area is used. If none do, the final default is /tmp
> on Unix-alikes and the value of R_USER on Windows. The path should be an
> absolute path not containing spaces (and it is best to avoid
> non-alphanumeric characters such as +).
>
> Some Unix-alike systems are set up to remove files and directories
> periodically from /tmp, for example by a cron job running tmpwatch. Set
> TMPDIR to another directory before starting long-running jobs on such a
> system.'

I am sorry for having missed this part of the manual, where the issue
indeed is clearly documented.

>
>
> On 21/04/2017 11:49, Mikko Korpela wrote:
>> Temporary files not accessed for a long time are automatically removed
>> in some Linux distributions and probably other operating systems too,
>> depending on system configuration. This may affect the per-session
>> temporary directory, the path of which is returned by tempdir(). I think
>
> Not for those who follow the manual and know that sysadmnins have
> enabled such a script.
>
>> it would be nice if R automatically tried to recreate a missing
>> tempdir() but this could have some performance implications.

Despite my obvious failure to read the manual and report this properly,
I will try to make a case. I understand that data stored in a temporary
file may disappear, and for that reason using an alternative TMPDIR
might be advisable. However, I think that creating a new temporary file
is a different case, and it would be nice if `?` and `help` continued to
work, for example. I understand if this will not be put on the R core
list of things to do.

>>
>> I ran the same test (below) on R 3.3.3 patched, R 3.4.0 beta, and
>> R-devel, all at r72499 (2017-04-09) and compiled by myself. The results
>> from the test were practically identical on all of those versions, the
>> test platform being Ubuntu 14.04.5 LTS. This system is configured for a
>> /tmp cleanup threshold of 7 days of inactivity (which is the default).
>> After a wait of roughly 10 days, the R temporary directory had been
>> deleted by an automatic cleanup procedure, and a call to `?` failed.
>> This StackExchange question has some answers about the Ubuntu /tmp
>> cleanup practice: https://askubuntu.com/q/20783
>>
>> a <- print(tempdir())
>> # [1] "/tmp/user/1069138/RtmpGc9M5z"
>> dir.exists(a) # TRUE
>> # [1] TRUE
>> Sys.time()
>> # [1] "2017-04-10 16:00:30 EEST"
>> ## Wait for one week (Ubuntu 14.04.5 LTS)
>> print(Sys.time()); ?regex
>> # [1] "2017-04-20 14:17:29 EEST"
>> # Error in file(out, "wt") : cannot open the connection
>> # In addition: Warning message:
>> # In file(out, "wt") :
>> #   cannot open file '/tmp/user/1069138/RtmpGc9M5z/Rtxt3dbb65870ad4': No
>> such file or directory
>> b <- print(tempdir())
>> # [1] "/tmp/user/1069138/RtmpGc9M5z"
>> identical(a, b)
>> # [1] TRUE
>> dir.exists(b)
>> # [1] FALSE
>>

--
Mikko Korpela
Department of Geosciences and Geography
University of Helsinki

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Mikko Korpela-2
In reply to this post by Joris FA Meys
On 21/04/17 14:42, Joris Meys wrote:
> In defense of the OP: I would have checked ?tmpdir and missed the
> information in the manual as well. On the help page there's ample
> information on the underlying processes that create the dir on multiple
> platforms. I think adding the last two sentences of prof. Ripley's quote as
> a warning to the help page, would be worth the effort.
>
> I do wonder though why you would run something that lasts 10 days and still
> rely on something that is called a "temporary" directory.

For me, intuitively, the temporary directory is a place for temporary
files. I mean, the directory itself does not seem as temporary as the
files in it. I think this is not entirely unsupported by the wording in
?tempdir: "per-session temporary directory". I now understand that
"per-session" rather means that tempdir() returns a constant value
during a single R session, but the directory itself may disappear due to
things not controlled by R (documented elsewhere as pointed out by Prof
Ripley).

- Mikko

> Best regards
> Joris
>
> On Fri, Apr 21, 2017 at 1:03 PM, Prof Brian Ripley <[hidden email]>
> wrote:
>
>> From the R-admin manual §5:
>>
>> 'Various environment variables can be set to determine where R creates its
>> per-session temporary directory. The environment variables TMPDIR, TMP and
>> TEMP are searched in turn and the first one which is set and points to a
>> writable area is used. If none do, the final default is /tmp on Unix-alikes
>> and the value of R_USER on Windows. The path should be an absolute path not
>> containing spaces (and it is best to avoid non-alphanumeric characters such
>> as +).
>>
>> Some Unix-alike systems are set up to remove files and directories
>> periodically from /tmp, for example by a cron job running tmpwatch. Set
>> TMPDIR to another directory before starting long-running jobs on such a
>> system.'
>>
>>
>> On 21/04/2017 11:49, Mikko Korpela wrote:
>>
>>> Temporary files not accessed for a long time are automatically removed
>>> in some Linux distributions and probably other operating systems too,
>>> depending on system configuration. This may affect the per-session
>>> temporary directory, the path of which is returned by tempdir(). I think
>>>
>>
>> Not for those who follow the manual and know that sysadmnins have enabled
>> such a script.
>>
>>
>> it would be nice if R automatically tried to recreate a missing
>>> tempdir() but this could have some performance implications.
>>>
>>> I ran the same test (below) on R 3.3.3 patched, R 3.4.0 beta, and
>>> R-devel, all at r72499 (2017-04-09) and compiled by myself. The results
>>> from the test were practically identical on all of those versions, the
>>> test platform being Ubuntu 14.04.5 LTS. This system is configured for a
>>> /tmp cleanup threshold of 7 days of inactivity (which is the default).
>>> After a wait of roughly 10 days, the R temporary directory had been
>>> deleted by an automatic cleanup procedure, and a call to `?` failed.
>>> This StackExchange question has some answers about the Ubuntu /tmp
>>> cleanup practice: https://askubuntu.com/q/20783
>>>
>>> a <- print(tempdir())
>>> # [1] "/tmp/user/1069138/RtmpGc9M5z"
>>> dir.exists(a) # TRUE
>>> # [1] TRUE
>>> Sys.time()
>>> # [1] "2017-04-10 16:00:30 EEST"
>>> ## Wait for one week (Ubuntu 14.04.5 LTS)
>>> print(Sys.time()); ?regex
>>> # [1] "2017-04-20 14:17:29 EEST"
>>> # Error in file(out, "wt") : cannot open the connection
>>> # In addition: Warning message:
>>> # In file(out, "wt") :
>>> #   cannot open file '/tmp/user/1069138/RtmpGc9M5z/Rtxt3dbb65870ad4': No
>>> such file or directory
>>> b <- print(tempdir())
>>> # [1] "/tmp/user/1069138/RtmpGc9M5z"
>>> identical(a, b)
>>> # [1] TRUE
>>> dir.exists(b)
>>> # [1] FALSE
>>>
>>>
>>
>> --
>> Brian D. Ripley,                  [hidden email]
>> Emeritus Professor of Applied Statistics, University of Oxford
>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Dirk Eddelbuettel
In reply to this post by Mikko Korpela-2

On 21 April 2017 at 15:13, Mikko Korpela wrote:
| Despite my obvious failure to read the manual and report this properly,
| I will try to make a case. I understand that data stored in a temporary
| file may disappear, and for that reason using an alternative TMPDIR
| might be advisable. However, I think that creating a new temporary file
| is a different case, and it would be nice if `?` and `help` continued to
| work, for example. I understand if this will not be put on the R core
| list of things to do.

It's complicated as it is clearly an interaction between the hosting OS and
the R application running.  "R cannot know" what policy the host OS may be
having.

You could also talk to your sys.admins and have the service configured. Eg on
my system the description for the tmpreaper package reads

 This package provides a program that can be used to clean out temporary-file
 directories.  It recursively searches the directory, refusing to chdir()
 across symlinks, and removes files that haven't been accessed in a
 user-specified amount of time.  You can specify a set of files to protect
 from deletion with a shell pattern.  It will not remove files owned by the
 process EUID that have the `w' bit clear, unless you ask it to, much like
 `rm -f'.  `tmpreaper' will not remove symlinks, sockets, fifos, or special
 files unless given a command line option enabling it to.
 .
 WARNING:  Please do not run `tmpreaper' on `/'.  There are no protections
 against this written into the program, as that would prevent it from
 functioning the way you'd expect it to in a `chroot(8)' environment.
 .
 The daily tmpreaper run can be configured through /etc/tmpreaper.conf .

which makes it clear that you can configure local behaviour.

Lastly, as the manual referenced in the initial reply says, you are in fact
in full control of this as you can set the environment variables for your R
sessions.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

frederik
Hi Mikko,

I was bitten by this recently and I think some of the replies are
missing the point. As I understand it, the problem consists of these
elements:

1. When R starts, it creates a directory like /tmp/RtmpVIeFj4

2. Right after R starts I can create files in this directory with no
   error

3. After some hours or days I can no longer create files in this
   directory, because it has been deleted

If R expected the directory to be deleted at random, and if we expect
users to call dir.create every time they access tempdir, then why did
R create the directory for us at the beginning of the session? That's
just setting people up to get weird bugs, which only appear in
difficult-to-reproduce situations (i.e. after the session has been
open for a long time).

I think before we dismiss this we should think about possible in-R
solutions and why they are not feasible. Are there any packages which
would break if a call to 'tempdir' automatically recreated this
directory? (Or would it be too much of a performance hit to have
'tempdir' check and even just issue a warning when the directory is
found not to exist?) Should we have a timer which periodically updates
the modification time of tempdir()? What do other long-running
programs do (e.g. screen, emacs)?

Thank you,

Frederick

P.S. I noticed that dir.create does not seem to update the access or
modification time of the file. So there is also a remote possibility
that the directory could be "cleaned up" in between calling
'dir.create()' and putting a file in it. Maybe this is nitpicky, but
if we accept that the *really* correct practice is more complicated
than just calling 'dir.create()', this also argues for putting the
proper invocations into some kind of standard function - either
'tempdir()' or something else.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Dirk Eddelbuettel

On 21 April 2017 at 10:34, [hidden email] wrote:
| Hi Mikko,
|
| I was bitten by this recently and I think some of the replies are
| missing the point. As I understand it, the problem consists of these
| elements:
|
| 1. When R starts, it creates a directory like /tmp/RtmpVIeFj4
|
| 2. Right after R starts I can create files in this directory with no
|    error
|
| 3. After some hours or days I can no longer create files in this
|    directory, because it has been deleted

Nope. That is local to your system. Witness eg at my workstation:

/tmp$ ls -ltGd Rtmp*
drwx------ 3 edd 4096 Apr 21 16:12 Rtmp9K6bSN
drwx------ 3 edd 4096 Apr 21 11:48 RtmpRRbaMP
drwx------ 3 edd 4096 Apr 21 11:28 RtmpFlguFy
drwx------ 3 edd 4096 Apr 20 13:06 RtmpWJDF3U
drwx------ 3 edd 4096 Apr 18 15:58 RtmpY7ZIS1
drwx------ 3 edd 4096 Apr 18 12:12 Rtmpzr9W0v
drwx------ 2 edd 4096 Apr 16 16:02 RtmpeD27El
drwx------ 2 edd 4096 Apr 16 15:57 Rtmp572FHk
drwx------ 3 edd 4096 Apr 13 11:08 RtmpqP0JSf
drwx------ 3 edd 4096 Apr 10 18:47 RtmpzRzyFb
drwx------ 3 edd 4096 Apr  6 15:21 RtmpQhvAUb
drwx------ 3 edd 4096 Apr  6 11:24 Rtmp2lFKPz
drwx------ 3 edd 4096 Apr  5 20:57 RtmprCeWUS
drwx------ 2 edd 4096 Apr  3 15:12 Rtmp8xviDl
drwx------ 3 edd 4096 Mar 30 16:50 Rtmp8w9n5h
drwx------ 3 edd 4096 Mar 28 11:33 RtmpjAg6iY
drwx------ 2 edd 4096 Mar 28 09:26 RtmpYHSgZG
drwx------ 2 edd 4096 Mar 27 11:21 Rtmp0gSV4e
drwx------ 2 edd 4096 Mar 27 11:21 RtmpOnneiY
drwx------ 2 edd 4096 Mar 27 11:17 RtmpIWeiTJ
drwx------ 3 edd 4096 Mar 22 08:51 RtmpJkVsSJ
drwx------ 3 edd 4096 Mar 21 10:33 Rtmp9a5KxL
/tmp$

Clearly still there after a month. I tend to have some longer-running R
sessions in either Emacs/ESS or RStudio.

So what I wrote in my last message here *clearly* applies to you: a local
issue for which you have to take local action as R cannot know.  You also
have a choice of setting variables to affect this.
 
| If R expected the directory to be deleted at random, and if we expect
| users to call dir.create every time they access tempdir, then why did
| R create the directory for us at the beginning of the session? That's
| just setting people up to get weird bugs, which only appear in
| difficult-to-reproduce situations (i.e. after the session has been
| open for a long time).

I disagree. R has been doing this many years, possibly two decades.
 
| I think before we dismiss this we should think about possible in-R
| solutions and why they are not feasible. Are there any packages which
| would break if a call to 'tempdir' automatically recreated this
| directory? (Or would it be too much of a performance hit to have
| 'tempdir' check and even just issue a warning when the directory is
| found not to exist?) Should we have a timer which periodically updates
| the modification time of tempdir()? What do other long-running
| programs do (e.g. screen, emacs)?

There are options you have right now.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

frederik
Dirk,

Your message felt a bit antagonistic to me, or maybe I'm not
understanding what you're trying to say. We all seem to agree that
different configurations exist, and that some Linux distributions are
configured to delete files in /tmp/ after a certain amount of time
(seems to be 10 days for Arch Linux, not sure about Ubuntu or Debian).

The question of how users of such distributions can individually work
around the problem Mikko identified has already been answered. The
question that remains is what we expect new users to do. It's not
really helpful to pretend that they will be reading the mailing list,
as exciting as it is, or that they'll read the "R Installation and
Administration" manual to make sure that their distribution did a good
job of packaging R. There are plenty more visible places where this
"gotcha" could be documented, than a manual I've never heard of until
now.

Even if a particular solution has to be implemented by the package
maintainers of various distributions, I think it is fitting to discuss
and solicit such solutions here on this mailing list. But it felt like
you were trying to stifle such discussion.

As it is, I don't even know what distributions are affected. I'm not
sure how to look up the contents of a "default" configuration on other
distributions.

Frederick


On Sun, Apr 23, 2017 at 09:15:18AM -0500, Dirk Eddelbuettel wrote:

>
> On 21 April 2017 at 10:34, [hidden email] wrote:
> | Hi Mikko,
> |
> | I was bitten by this recently and I think some of the replies are
> | missing the point. As I understand it, the problem consists of these
> | elements:
> |
> | 1. When R starts, it creates a directory like /tmp/RtmpVIeFj4
> |
> | 2. Right after R starts I can create files in this directory with no
> |    error
> |
> | 3. After some hours or days I can no longer create files in this
> |    directory, because it has been deleted
>
> Nope. That is local to your system. Witness eg at my workstation:
>
> /tmp$ ls -ltGd Rtmp*
> drwx------ 3 edd 4096 Apr 21 16:12 Rtmp9K6bSN
> drwx------ 3 edd 4096 Apr 21 11:48 RtmpRRbaMP
> drwx------ 3 edd 4096 Apr 21 11:28 RtmpFlguFy
> drwx------ 3 edd 4096 Apr 20 13:06 RtmpWJDF3U
> drwx------ 3 edd 4096 Apr 18 15:58 RtmpY7ZIS1
> drwx------ 3 edd 4096 Apr 18 12:12 Rtmpzr9W0v
> drwx------ 2 edd 4096 Apr 16 16:02 RtmpeD27El
> drwx------ 2 edd 4096 Apr 16 15:57 Rtmp572FHk
> drwx------ 3 edd 4096 Apr 13 11:08 RtmpqP0JSf
> drwx------ 3 edd 4096 Apr 10 18:47 RtmpzRzyFb
> drwx------ 3 edd 4096 Apr  6 15:21 RtmpQhvAUb
> drwx------ 3 edd 4096 Apr  6 11:24 Rtmp2lFKPz
> drwx------ 3 edd 4096 Apr  5 20:57 RtmprCeWUS
> drwx------ 2 edd 4096 Apr  3 15:12 Rtmp8xviDl
> drwx------ 3 edd 4096 Mar 30 16:50 Rtmp8w9n5h
> drwx------ 3 edd 4096 Mar 28 11:33 RtmpjAg6iY
> drwx------ 2 edd 4096 Mar 28 09:26 RtmpYHSgZG
> drwx------ 2 edd 4096 Mar 27 11:21 Rtmp0gSV4e
> drwx------ 2 edd 4096 Mar 27 11:21 RtmpOnneiY
> drwx------ 2 edd 4096 Mar 27 11:17 RtmpIWeiTJ
> drwx------ 3 edd 4096 Mar 22 08:51 RtmpJkVsSJ
> drwx------ 3 edd 4096 Mar 21 10:33 Rtmp9a5KxL
> /tmp$
>
> Clearly still there after a month. I tend to have some longer-running R
> sessions in either Emacs/ESS or RStudio.
>
> So what I wrote in my last message here *clearly* applies to you: a local
> issue for which you have to take local action as R cannot know.  You also
> have a choice of setting variables to affect this.
>  
> | If R expected the directory to be deleted at random, and if we expect
> | users to call dir.create every time they access tempdir, then why did
> | R create the directory for us at the beginning of the session? That's
> | just setting people up to get weird bugs, which only appear in
> | difficult-to-reproduce situations (i.e. after the session has been
> | open for a long time).
>
> I disagree. R has been doing this many years, possibly two decades.
>  
> | I think before we dismiss this we should think about possible in-R
> | solutions and why they are not feasible. Are there any packages which
> | would break if a call to 'tempdir' automatically recreated this
> | directory? (Or would it be too much of a performance hit to have
> | 'tempdir' check and even just issue a warning when the directory is
> | found not to exist?) Should we have a timer which periodically updates
> | the modification time of tempdir()? What do other long-running
> | programs do (e.g. screen, emacs)?
>
> There are options you have right now.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Dirk Eddelbuettel

On 24 April 2017 at 12:34, [hidden email] wrote:
| As it is, I don't even know what distributions are affected. I'm not
| sure how to look up the contents of a "default" configuration on other
| distributions.

So how do you think R can automate that?  Hint: It can't.

Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Martin Maechler
In reply to this post by Dirk Eddelbuettel
>>>>> Dirk Eddelbuettel <[hidden email]>
>>>>>     on Sun, 23 Apr 2017 09:15:18 -0500 writes:

    > On 21 April 2017 at 10:34, [hidden email] wrote:
    > | Hi Mikko,
    > |
    > | I was bitten by this recently and I think some of the replies are
    > | missing the point. As I understand it, the problem consists of these
    > | elements:
    > |
    > | 1. When R starts, it creates a directory like /tmp/RtmpVIeFj4
    > |
    > | 2. Right after R starts I can create files in this directory with no
    > |    error
    > |
    > | 3. After some hours or days I can no longer create files in this
    > |    directory, because it has been deleted

    > Nope. That is local to your system.

Correct.  OTOH, Mikko and Frederik have a point in my view (below).

    > Witness eg at my workstation:

    > /tmp$ ls -ltGd Rtmp*
    > drwx------ 3 edd 4096 Apr 21 16:12 Rtmp9K6bSN
    > drwx------ 3 edd 4096 Apr 21 11:48 RtmpRRbaMP
    > drwx------ 3 edd 4096 Apr 21 11:28 RtmpFlguFy
    > drwx------ 3 edd 4096 Apr 20 13:06 RtmpWJDF3U
    > drwx------ 3 edd 4096 Apr 18 15:58 RtmpY7ZIS1
    > drwx------ 3 edd 4096 Apr 18 12:12 Rtmpzr9W0v
    > drwx------ 2 edd 4096 Apr 16 16:02 RtmpeD27El
    > drwx------ 2 edd 4096 Apr 16 15:57 Rtmp572FHk
    > drwx------ 3 edd 4096 Apr 13 11:08 RtmpqP0JSf
    > drwx------ 3 edd 4096 Apr 10 18:47 RtmpzRzyFb
    > drwx------ 3 edd 4096 Apr  6 15:21 RtmpQhvAUb
    > drwx------ 3 edd 4096 Apr  6 11:24 Rtmp2lFKPz
    > drwx------ 3 edd 4096 Apr  5 20:57 RtmprCeWUS
    > drwx------ 2 edd 4096 Apr  3 15:12 Rtmp8xviDl
    > drwx------ 3 edd 4096 Mar 30 16:50 Rtmp8w9n5h
    > drwx------ 3 edd 4096 Mar 28 11:33 RtmpjAg6iY
    > drwx------ 2 edd 4096 Mar 28 09:26 RtmpYHSgZG
    > drwx------ 2 edd 4096 Mar 27 11:21 Rtmp0gSV4e
    > drwx------ 2 edd 4096 Mar 27 11:21 RtmpOnneiY
    > drwx------ 2 edd 4096 Mar 27 11:17 RtmpIWeiTJ
    > drwx------ 3 edd 4096 Mar 22 08:51 RtmpJkVsSJ
    > drwx------ 3 edd 4096 Mar 21 10:33 Rtmp9a5KxL
    > /tmp$

    > Clearly still there after a month. I tend to have some longer-running R
    > sessions in either Emacs/ESS or RStudio.

    > So what I wrote in my last message here *clearly* applies to you: a local
    > issue for which you have to take local action as R cannot know.  You also
    > have a choice of setting variables to affect this.

Thank you Dirk (and Brian).  That is all true, and of course I
have known about this myself "forever" as well.
 
    > | If R expected the directory to be deleted at random, and if we expect
    > | users to call dir.create every time they access tempdir, then why did
    > | R create the directory for us at the beginning of the session? That's
    > | just setting people up to get weird bugs, which only appear in
    > | difficult-to-reproduce situations (i.e. after the session has been
    > | open for a long time).

    > I disagree. R has been doing this many years, possibly two decades.

Yes, R has been doing this for a long time, including all the
configuration options with environment variables, and yes this
is sufficient "in principle".
 
    > | I think before we dismiss this we should think about possible in-R
    > | solutions and why they are not feasible.

Here Mikko and Frederik do have a point I think.

    > | Are there any packages which
    > | would break if a call to 'tempdir' automatically recreated this
    > | directory? (Or would it be too much of a performance hit to have
    > | 'tempdir' check and even just issue a warning when the directory is
    > | found not to exist?)

    > | Should we have a timer which periodically updates
    > | the modification time of tempdir()? What do other long-running
    > | programs do (e.g. screen, emacs)?

Valid questions, in my view.  Before answering, let's try to see
how hard it would be to make the tempdir() function in R more versatile.

As I've found it is not at all hard to add an option which
checks the existence and if the directory is no longer "valid",
tries to recreate it (and if it fails doing that it calls the
famous R_Suicide(), as it does when R starts up and tempdir()
cannot be initialized correctly).

The proposed entry in NEWS is

   • tempdir(check=TRUE) recreates the tmpdir() if it is no longer valid.

and of course the default would be status quo, i.e.,  check = FALSE,
and once this is in R-devel, we (those who install R-devel) can
experiment with it.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Jeroen Ooms
On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
<[hidden email]> wrote:
> As I've found it is not at all hard to add an option which
> checks the existence and if the directory is no longer "valid",
> tries to recreate it (and if it fails doing that it calls the
> famous R_Suicide(), as it does when R starts up and tempdir()
> cannot be initialized correctly).

Perhaps this can also fix the problem with mcparallel deleting the
tempdir() when one of its children dies:

  file.exists(tempdir()) #TRUE
  parallel::mcparallel(q('no'))
  file.exists(tempdir()) # FALSE

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Cook, Malcolm
In reply to this post by Martin Maechler
Chiming in late on this thread...

>     > | Are there any packages which
 >     > | would break if a call to 'tempdir' automatically recreated this
 >     > | directory? (Or would it be too much of a performance hit to have
 >     > | 'tempdir' check and even just issue a warning when the directory is
 >     > | found not to exist?)
 >
 >     > | Should we have a timer which periodically updates
 >     > | the modification time of tempdir()? What do other long-running
 >     > | programs do (e.g. screen, emacs)?
 >
 > Valid questions, in my view.  Before answering, let's try to see
 > how hard it would be to make the tempdir() function in R more versatile.

Might this combination serve the purpose:
        * R session keeps an open handle on the tempdir it creates,
        * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories)

 >
 > As I've found it is not at all hard to add an option which
 > checks the existence and if the directory is no longer "valid",
 > tries to recreate it (and if it fails doing that it calls the
 > famous R_Suicide(), as it does when R starts up and tempdir()
 > cannot be initialized correctly).
 >
 > The proposed entry in NEWS is
 >
 >    • tempdir(check=TRUE) recreates the tmpdir() if it is no longer valid.
 >
 > and of course the default would be status quo, i.e.,  check = FALSE,
 > and once this is in R-devel, we (those who install R-devel) can
 > experiment with it.
 >
 > Martin
 

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Martin Maechler
In reply to this post by Jeroen Ooms
>>>>> Jeroen Ooms <[hidden email]>
>>>>>     on Tue, 25 Apr 2017 15:05:51 +0200 writes:

    > On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
    > <[hidden email]> wrote:
    >> As I've found it is not at all hard to add an option
    >> which checks the existence and if the directory is no
    >> longer "valid", tries to recreate it (and if it fails
    >> doing that it calls the famous R_Suicide(), as it does
    >> when R starts up and tempdir() cannot be initialized
    >> correctly).

    > Perhaps this can also fix the problem with mcparallel
    > deleting the tempdir() when one of its children dies:

   >   file.exists(tempdir()) #TRUE
   >   parallel::mcparallel(q('no'))
   >   file.exists(tempdir()) # FALSE

Thank you, Jeroen, for the extra example.

I now have comitted the new feature... (completely back
compatible: in R's code tempdir() is not yet called with an
argument and the default is  check = FALSE ),
actually in a "suicide-free" way ...  which needed only slightly
more code.

In the worst case, one could save the R session by
   Sys.setenv(TEMPDIR = "<something writable>")
if for instance /tmp/ suddenly became unwritable for the user.

What we could consider is making the default of 'check' settable
by an option, and experiment with setting the option to TRUE, so
all such problems would be auto-solved (says the incurable optimist ...).

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Gabriel Becker
Martin,

Thanks for your work on this.

One thing that seems to be missing from the conversation is that recreating
the temp directory will prevent future failures when R wants to write a
temp file, but the files will, of course, not be there. Any code written
assuming the contract is that the temporary directory, and thus temporary
files, will not be cleaned up before the R process exits (which was my
naive assumption before this thread, and is the behavior AFAICT on all the
systems I regularly use) will still break.

I'm not saying that's necessarily fixable (though the R keeping a permanent
pointer to a file in the dir suggested by Malcom might? fix it.), but I
would argue if it IS fixable, a fix that includes that would be preferable.

Best,
~G

On Tue, Apr 25, 2017 at 8:53 AM, Martin Maechler <[hidden email]
> wrote:

> >>>>> Jeroen Ooms <[hidden email]>
> >>>>>     on Tue, 25 Apr 2017 15:05:51 +0200 writes:
>
>     > On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
>     > <[hidden email]> wrote:
>     >> As I've found it is not at all hard to add an option
>     >> which checks the existence and if the directory is no
>     >> longer "valid", tries to recreate it (and if it fails
>     >> doing that it calls the famous R_Suicide(), as it does
>     >> when R starts up and tempdir() cannot be initialized
>     >> correctly).
>
>     > Perhaps this can also fix the problem with mcparallel
>     > deleting the tempdir() when one of its children dies:
>
>    >   file.exists(tempdir()) #TRUE
>    >   parallel::mcparallel(q('no'))
>    >   file.exists(tempdir()) # FALSE
>
> Thank you, Jeroen, for the extra example.
>
> I now have comitted the new feature... (completely back
> compatible: in R's code tempdir() is not yet called with an
> argument and the default is  check = FALSE ),
> actually in a "suicide-free" way ...  which needed only slightly
> more code.
>
> In the worst case, one could save the R session by
>    Sys.setenv(TEMPDIR = "<something writable>")
> if for instance /tmp/ suddenly became unwritable for the user.
>
> What we could consider is making the default of 'check' settable
> by an option, and experiment with setting the option to TRUE, so
> all such problems would be auto-solved (says the incurable optimist ...).
>
> Martin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Cook, Malcolm
> Martin,
 >
 > Thanks for your work on this.
 >
 > One thing that seems to be missing from the conversation is that recreating
 > the temp directory will prevent future failures when R wants to write a
 > temp file, but the files will, of course, not be there. Any code written
 > assuming the contract is that the temporary directory, and thus temporary
 > files, will not be cleaned up before the R process exits (which was my
 > naive assumption before this thread, and is the behavior AFAICT on all the
 > systems I regularly use) will still break.
 >

That is the kind of scenario I was hoping to obviate with my suggestion...

 > I'm not saying that's necessarily fixable (though the R keeping a permanent
 > pointer to a file in the dir suggested by Malcom might? fix it.),

(and, FWIW, that's "Malcolm" with two "l"s.  I think all those missing "l"s are flattened out versions of all the extra close parens I typed in the 80s that somehow got lost on the nets...)))

> but I
 > would argue if it IS fixable, a fix that includes that would be preferable.

Agreed!

 >
 > Best,
 > ~G
 >
 > On Tue, Apr 25, 2017 at 8:53 AM, Martin Maechler
 > <[hidden email]
 > > wrote:
 >
 > > >>>>> Jeroen Ooms <[hidden email]>
 > > >>>>>     on Tue, 25 Apr 2017 15:05:51 +0200 writes:
 > >
 > >     > On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
 > >     > <[hidden email]> wrote:
 > >     >> As I've found it is not at all hard to add an option
 > >     >> which checks the existence and if the directory is no
 > >     >> longer "valid", tries to recreate it (and if it fails
 > >     >> doing that it calls the famous R_Suicide(), as it does
 > >     >> when R starts up and tempdir() cannot be initialized
 > >     >> correctly).
 > >
 > >     > Perhaps this can also fix the problem with mcparallel
 > >     > deleting the tempdir() when one of its children dies:
 > >
 > >    >   file.exists(tempdir()) #TRUE
 > >    >   parallel::mcparallel(q('no'))
 > >    >   file.exists(tempdir()) # FALSE
 > >
 > > Thank you, Jeroen, for the extra example.
 > >
 > > I now have comitted the new feature... (completely back
 > > compatible: in R's code tempdir() is not yet called with an
 > > argument and the default is  check = FALSE ),
 > > actually in a "suicide-free" way ...  which needed only slightly
 > > more code.
 > >
 > > In the worst case, one could save the R session by
 > >    Sys.setenv(TEMPDIR = "<something writable>")
 > > if for instance /tmp/ suddenly became unwritable for the user.
 > >
 > > What we could consider is making the default of 'check' settable
 > > by an option, and experiment with setting the option to TRUE, so
 > > all such problems would be auto-solved (says the incurable optimist ...).
 > >
 > > Martin
 > >
 > > ______________________________________________
 > > [hidden email] mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > >
 >
 >
 >
 > --
 > Gabriel Becker, PhD
 > Associate Scientist (Bioinformatics)
 > Genentech Research
 >
 > [[alternative HTML version deleted]]
 >
 > ______________________________________________
 > [hidden email] mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

frederik
In reply to this post by Cook, Malcolm
On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote:
> Might this combination serve the purpose:
> * R session keeps an open handle on the tempdir it creates,
> * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories)

Good suggestion but doesn't work with the (increasingly popular)
"Systemd":

    $ mkdir /tmp/somedir
    $ touch -d "12 days ago" /tmp/somedir/
    $ cd /tmp/somedir/
    $ sudo systemd-tmpfiles --clean
    $ ls /tmp/somedir/
    ls: cannot access '/tmp/somedir/': No such file or directory

I would advocate just changing 'tempfile()' so that it recreates the
directory where the file is (the "dirname") before returning the file
path. This would have fixed the issue I ran into. Changing 'tempdir()'
to recreate the directory is another option.

Thanks,

Frederick

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

frederik
In reply to this post by Gabriel Becker
Hi Gabriel,

Thanks for asking for a better solution, as far as actually preventing
temporary files from getting deleted in the first place.

I still don't know very much about other peoples' distributions, but
Arch uses Systemd which is the culprit on my system. Systemd's
'tmpfiles.d(5)' man page says we can create configuration files in
locations like

    /usr/lib/tmpfiles.d/*.conf
    /etc/tmpfiles.d/*.conf

which control when temporary files are deleted. There is an 'x'
specifier which accepts glob paths and can protect everything in
/tmp/Rtmp* ...:

    $ mkdir /tmp/Rtmpaoeu
    $ touch -d "12 days ago" /tmp/Rtmpaoeu
    $ sudo systemd-tmpfiles --clean
    $ ls /tmp/Rtmpaoeu
    ls: cannot access '/tmp/Rtmpaoeu': No such file or directory

    $ sudo sh -c "echo 'x /tmp/Rtmp*' > /etc/tmpfiles.d/R.conf"
    $ mkdir /tmp/Rtmpaoeu
    $ touch -d "12 days ago" /tmp/Rtmpaoeu
    $ sudo systemd-tmpfiles --clean
    $ ls /tmp/Rtmpaoeu
    (still there)

I guess installing such a file is something that would be done by the
various distribution-specific R packages. Even though I run R from a
home-directory compiled version, I have my distribution's binary
package installed globally, and so I would get the benefit of this
protection from the distribution package. If this sounds like it makes
sense then I can ask the Arch package maintainer to do it. Of course I
don't need permission but it would be good to hear if I'm missing or
forgetting something.

Based on what other packages are doing the file should probably be
named:

    /usr/lib/tmpfiles.d/R.conf

and contain:

    x /tmp/Rtmp*

(For example on my system I have stuff like this owned by various
packages:

    $ pacman -Qo /usr/lib/tmpfiles.d/*
    /usr/lib/tmpfiles.d/apache.conf is owned by apache 2.4.25-1
    /usr/lib/tmpfiles.d/bind.conf is owned by bind 9.11.0.P3-3
    /usr/lib/tmpfiles.d/colord.conf is owned by colord 1.3.4-1
    /usr/lib/tmpfiles.d/etc.conf is owned by systemd 232-8
    /usr/lib/tmpfiles.d/gvfsd-fuse-tmpfiles.conf is owned by gvfs 1.30.3-1
    ...

)

Thanks!

Frederick

On Tue, Apr 25, 2017 at 09:03:01AM -0700, Gabriel Becker wrote:

> Martin,
>
> Thanks for your work on this.
>
> One thing that seems to be missing from the conversation is that recreating
> the temp directory will prevent future failures when R wants to write a
> temp file, but the files will, of course, not be there. Any code written
> assuming the contract is that the temporary directory, and thus temporary
> files, will not be cleaned up before the R process exits (which was my
> naive assumption before this thread, and is the behavior AFAICT on all the
> systems I regularly use) will still break.
>
> I'm not saying that's necessarily fixable (though the R keeping a permanent
> pointer to a file in the dir suggested by Malcom might? fix it.), but I
> would argue if it IS fixable, a fix that includes that would be preferable.
>
> Best,
> ~G
>
> On Tue, Apr 25, 2017 at 8:53 AM, Martin Maechler <[hidden email]
> > wrote:
>
> > >>>>> Jeroen Ooms <[hidden email]>
> > >>>>>     on Tue, 25 Apr 2017 15:05:51 +0200 writes:
> >
> >     > On Tue, Apr 25, 2017 at 1:00 PM, Martin Maechler
> >     > <[hidden email]> wrote:
> >     >> As I've found it is not at all hard to add an option
> >     >> which checks the existence and if the directory is no
> >     >> longer "valid", tries to recreate it (and if it fails
> >     >> doing that it calls the famous R_Suicide(), as it does
> >     >> when R starts up and tempdir() cannot be initialized
> >     >> correctly).
> >
> >     > Perhaps this can also fix the problem with mcparallel
> >     > deleting the tempdir() when one of its children dies:
> >
> >    >   file.exists(tempdir()) #TRUE
> >    >   parallel::mcparallel(q('no'))
> >    >   file.exists(tempdir()) # FALSE
> >
> > Thank you, Jeroen, for the extra example.
> >
> > I now have comitted the new feature... (completely back
> > compatible: in R's code tempdir() is not yet called with an
> > argument and the default is  check = FALSE ),
> > actually in a "suicide-free" way ...  which needed only slightly
> > more code.
> >
> > In the worst case, one could save the R session by
> >    Sys.setenv(TEMPDIR = "<something writable>")
> > if for instance /tmp/ suddenly became unwritable for the user.
> >
> > What we could consider is making the default of 'check' settable
> > by an option, and experiment with setting the option to TRUE, so
> > all such problems would be auto-solved (says the incurable optimist ...).
> >
> > Martin
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
>
> --
> Gabriel Becker, PhD
> Associate Scientist (Bioinformatics)
> Genentech Research
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Martin Maechler
In reply to this post by frederik
>>>>>   <[hidden email]>
>>>>>     on Tue, 25 Apr 2017 21:13:59 -0700 writes:

    > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote:
    >> Might this combination serve the purpose:
    >> * R session keeps an open handle on the tempdir it creates,
    >> * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories)

I also agree that the above would be ideal - if possible.

    > Good suggestion but doesn't work with the (increasingly popular)
    > "Systemd":

    > $ mkdir /tmp/somedir
    > $ touch -d "12 days ago" /tmp/somedir/
    > $ cd /tmp/somedir/
    > $ sudo systemd-tmpfiles --clean
    > $ ls /tmp/somedir/
    > ls: cannot access '/tmp/somedir/': No such file or directory

Some thing like your example is what I'd expect is always a
possibility on some platforms, all of course depending on low
things such as  root/syadmin/...  "permission" to clean up etc.

Jeroeen mentioned the fact that tempdir()s also can disappear
for other reasons {his was multicore child processes
.. bugously(?) implemented}.
Further reasons may be race conditions / user code bugs / user
errors, etc.
Note that the R process which created the tempdir on startup
always has the permission to remove it again.  But you can also
think a full file system, etc.

Current  R-devel's    tempdir(check = TRUE)   would create a new
one or give an error (and then the user should be able to use
    Sys.setenv("TEMPDIR" ...)
    to a directory she has write-permission )

Gabe's point of course is important too: If you have a long
running process that uses a tempfile,
and if  "big brother"  has removed the full tempdir() you will
be "unhappy" in any case.
Trying to prevent big brother from doing that in all cases seems
"not easy" in any case.

I did want to provide an easy solution to the OP situation:
Suddenly tmpdir() is gone, and quite a few things stop working
in the current R process {he mentioned  help(), e.g.}.
With new   tmpdir(check=TRUE)  facility, code could be changed
to replace

   tempfile("foo")

either by
   tempfile("foo", tmpdir=tempdir(check=TRUE))

or by something like

   tryCatch(tempfile("foo"),
             error = function(e)
                tempfile("foo", tmpdir=tempdir(check=TRUE)))

or be even more sophisticated.

We could also consider allowing   check =  TRUE | NA | FALSE

and make  NA  the default and have that correspond to
check =TRUE  but additionally do the equivalent of
   warning("tempdir() has become invalid and been recreated")
in case the tempdir() had been invalid.

    > I would advocate just changing 'tempfile()' so that it recreates the
    > directory where the file is (the "dirname") before returning the file
    > path. This would have fixed the issue I ran into. Changing 'tempdir()'
    > to recreate the directory is another option.

In the end I had decided that

      tempfile("foo", tmpdir = tempdir(check = TRUE))

is actually better self-documenting than

      tempfile("foo", checkDir = TRUE)

which was my first inclination.

Note again that currently, the checking is _off_ by default.
I've just provided a tool -- which was relatively easy and
platform independent! --- to do more (real and thought)
experiments.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: tempdir() may be deleted during long-running R session

Duncan Murdoch-2
On 26/04/2017 4:21 AM, Martin Maechler wrote:

>>>>>>   <[hidden email]>
>>>>>>     on Tue, 25 Apr 2017 21:13:59 -0700 writes:
>
>     > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote:
>     >> Might this combination serve the purpose:
>     >> * R session keeps an open handle on the tempdir it creates,
>     >> * whatever tempdir harvesting cron job the user has be made sensitive enough not to delete open files (including open directories)
>
> I also agree that the above would be ideal - if possible.
>
>     > Good suggestion but doesn't work with the (increasingly popular)
>     > "Systemd":
>
>     > $ mkdir /tmp/somedir
>     > $ touch -d "12 days ago" /tmp/somedir/
>     > $ cd /tmp/somedir/
>     > $ sudo systemd-tmpfiles --clean
>     > $ ls /tmp/somedir/
>     > ls: cannot access '/tmp/somedir/': No such file or directory
>
> Some thing like your example is what I'd expect is always a
> possibility on some platforms, all of course depending on low
> things such as  root/syadmin/...  "permission" to clean up etc.
>
> Jeroeen mentioned the fact that tempdir()s also can disappear
> for other reasons {his was multicore child processes
> .. bugously(?) implemented}.
> Further reasons may be race conditions / user code bugs / user
> errors, etc.
> Note that the R process which created the tempdir on startup
> always has the permission to remove it again.  But you can also
> think a full file system, etc.
>
> Current  R-devel's    tempdir(check = TRUE)   would create a new
> one or give an error (and then the user should be able to use
>     Sys.setenv("TEMPDIR" ...)
>     to a directory she has write-permission )
>
> Gabe's point of course is important too: If you have a long
> running process that uses a tempfile,
> and if  "big brother"  has removed the full tempdir() you will
> be "unhappy" in any case.
> Trying to prevent big brother from doing that in all cases seems
> "not easy" in any case.
>
> I did want to provide an easy solution to the OP situation:
> Suddenly tmpdir() is gone, and quite a few things stop working
> in the current R process {he mentioned  help(), e.g.}.
> With new   tmpdir(check=TRUE)  facility, code could be changed
> to replace
>
>    tempfile("foo")
>
> either by
>    tempfile("foo", tmpdir=tempdir(check=TRUE))
>
> or by something like
>
>    tryCatch(tempfile("foo"),
>              error = function(e)
>        tempfile("foo", tmpdir=tempdir(check=TRUE)))
>
> or be even more sophisticated.
>
> We could also consider allowing   check =  TRUE | NA | FALSE
>
> and make  NA  the default and have that correspond to
> check =TRUE  but additionally do the equivalent of
>    warning("tempdir() has become invalid and been recreated")
> in case the tempdir() had been invalid.
>
>     > I would advocate just changing 'tempfile()' so that it recreates the
>     > directory where the file is (the "dirname") before returning the file
>     > path. This would have fixed the issue I ran into. Changing 'tempdir()'
>     > to recreate the directory is another option.
>
> In the end I had decided that
>
>       tempfile("foo", tmpdir = tempdir(check = TRUE))
>
> is actually better self-documenting than
>
>       tempfile("foo", checkDir = TRUE)
>
> which was my first inclination.
>
> Note again that currently, the checking is _off_ by default.
> I've just provided a tool -- which was relatively easy and
> platform independent! --- to do more (real and thought)
> experiments.

This seems like the wrong approach.  The problem occurs as soon as the
tempdir() gets cleaned up:  there could be information in temp files
that gets lost at that point.  So the solution should be to prevent the
cleanup, not to continue on after it has occurred (as "check = TRUE"
does).  This follows the principle that it's better for the process to
always die than to sometimes silently produce incorrect results.

Frederick posted the way to do this in systems using systemd.  We should
be putting that in place, or the equivalent on systems using other
tempfile cleanups.  This looks to me like something that "make install"
should do, or perhaps it should be done by people putting together
packages for specific systems.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
12
Loading...