checkpointing

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

checkpointing

Ross Boylan
I would like to checkpoint some of my calculations in R, specifically
those using optim.  As far as I can tell, R doesn't have this facility,
and there seems to have been little discussion of it.

checkpointing is saving enough of the current state so that work can
resume where things were left off if, to take my own example, the system
crashes after 8 days of calculation.

My thought is that this could be added as an option to optim as one of
the control parameters.

I thought I'd check here to see if anyone is aware of any work in this
area or has any thoughts about how to proceed.  In particular, is save a
reasonable way to save a few variables to disk?  I could also make the
code available when/if I get it working.
--
Ross Boylan                                      wk:  (415) 514-8146
185 Berry St #5700                               [hidden email]
Dept of Epidemiology and Biostatistics           fax: (415) 514-8150
University of California, San Francisco
San Francisco, CA 94107-1739                     hm:  (415) 550-1062

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Prof Brian Ripley
I use save.image() or save(), which seem exactly what you are asking for.

On Mon, 2 Jan 2006, Ross Boylan wrote:

> I would like to checkpoint some of my calculations in R, specifically
> those using optim.  As far as I can tell, R doesn't have this facility,
> and there seems to have been little discussion of it.
>
> checkpointing is saving enough of the current state so that work can
> resume where things were left off if, to take my own example, the system
> crashes after 8 days of calculation.
>
> My thought is that this could be added as an option to optim as one of
> the control parameters.
>
> I thought I'd check here to see if anyone is aware of any work in this
> area or has any thoughts about how to proceed.  In particular, is save a
> reasonable way to save a few variables to disk?  I could also make the
> code available when/if I get it working.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Kasper Daniel Hansen
On Jan 3, 2006, at 9:36 AM, Brian D Ripley wrote:

> I use save.image() or save(), which seem exactly what you are  
> asking for.

I have the (perhaps unsupported) impression that Ross wanted to save  
the progress during the optim run. Since it spends most of its time  
in the .Internal(optim(***)) call, save/save.image would not work.

/Kasper

> On Mon, 2 Jan 2006, Ross Boylan wrote:
>
>> I would like to checkpoint some of my calculations in R, specifically
>> those using optim.  As far as I can tell, R doesn't have this  
>> facility,
>> and there seems to have been little discussion of it.
>>
>> checkpointing is saving enough of the current state so that work can
>> resume where things were left off if, to take my own example, the  
>> system
>> crashes after 8 days of calculation.
>>
>> My thought is that this could be added as an option to optim as  
>> one of
>> the control parameters.
>>
>> I thought I'd check here to see if anyone is aware of any work in  
>> this
>> area or has any thoughts about how to proceed.  In particular, is  
>> save a
>> reasonable way to save a few variables to disk?  I could also make  
>> the
>> code available when/if I get it working.
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Gabor Grothendieck
One possibility for overcoming this problem might be to divide the
variables being optimized over into two sets using a grid over one
set (which should probably consist of only one or two variables) and then
fixing the gridded variables use optim over the rest.  In many problems its
really just one or two variables that cause all the problems and if that
were the case, each of the many runs of optim would be fast
and one could save its state upon completion.

Of course it would be even more convenient if there were some
builtin facility as the poster stated but this might work depending
on the particulars of the problem.

On 1/3/06, Kasper Daniel Hansen <[hidden email]> wrote:

> On Jan 3, 2006, at 9:36 AM, Brian D Ripley wrote:
>
> > I use save.image() or save(), which seem exactly what you are
> > asking for.
>
> I have the (perhaps unsupported) impression that Ross wanted to save
> the progress during the optim run. Since it spends most of its time
> in the .Internal(optim(***)) call, save/save.image would not work.
>
> /Kasper
>
> > On Mon, 2 Jan 2006, Ross Boylan wrote:
> >
> >> I would like to checkpoint some of my calculations in R, specifically
> >> those using optim.  As far as I can tell, R doesn't have this
> >> facility,
> >> and there seems to have been little discussion of it.
> >>
> >> checkpointing is saving enough of the current state so that work can
> >> resume where things were left off if, to take my own example, the
> >> system
> >> crashes after 8 days of calculation.
> >>
> >> My thought is that this could be added as an option to optim as
> >> one of
> >> the control parameters.
> >>
> >> I thought I'd check here to see if anyone is aware of any work in
> >> this
> >> area or has any thoughts about how to proceed.  In particular, is
> >> save a
> >> reasonable way to save a few variables to disk?  I could also make
> >> the
> >> code available when/if I get it working.
> >
> > --
> > Brian D. Ripley,                  [hidden email]
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Prof Brian Ripley
In reply to this post by Kasper Daniel Hansen
On Tue, 3 Jan 2006, Kasper Daniel Hansen wrote:

> On Jan 3, 2006, at 9:36 AM, Brian D Ripley wrote:
>
>> I use save.image() or save(), which seem exactly what you are asking for.
>
> I have the (perhaps unsupported) impression that Ross wanted to save the
> progress during the optim run. Since it spends most of its time in the
> .Internal(optim(***)) call, save/save.image would not work.

It certainly does not!  It is most likely spending time in the callbacks
to evaluate the function/gradient.  We have used save() to save the
current information (e.g. current parameter values) from inside optim so a
restart could be done, but then I have only once encountered someone
running a single optimization for over a week: there normally are ways to
speed things up.

> /Kasper
>
>> On Mon, 2 Jan 2006, Ross Boylan wrote:
>>
>>> I would like to checkpoint some of my calculations in R, specifically
>>> those using optim.  As far as I can tell, R doesn't have this facility,
>>> and there seems to have been little discussion of it.
>>>
>>> checkpointing is saving enough of the current state so that work can
>>> resume where things were left off if, to take my own example, the system
>>> crashes after 8 days of calculation.
>>>
>>> My thought is that this could be added as an option to optim as one of
>>> the control parameters.
>>>
>>> I thought I'd check here to see if anyone is aware of any work in this
>>> area or has any thoughts about how to proceed.  In particular, is save a
>>> reasonable way to save a few variables to disk?  I could also make the
>>> code available when/if I get it working.
>>
>> --
>> Brian D. Ripley,                  [hidden email]
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Roger D. Peng
In reply to this post by Ross Boylan
One possibility is to write in some checkpointing into your objective function,
such as saving the current parameter values via 'save()' or 'dput()'.

-roger

Ross Boylan wrote:

> I would like to checkpoint some of my calculations in R, specifically
> those using optim.  As far as I can tell, R doesn't have this facility,
> and there seems to have been little discussion of it.
>
> checkpointing is saving enough of the current state so that work can
> resume where things were left off if, to take my own example, the system
> crashes after 8 days of calculation.
>
> My thought is that this could be added as an option to optim as one of
> the control parameters.
>
> I thought I'd check here to see if anyone is aware of any work in this
> area or has any thoughts about how to proceed.  In particular, is save a
> reasonable way to save a few variables to disk?  I could also make the
> code available when/if I get it working.

--
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

barry rowlingson
Roger D. Peng wrote:
> One possibility is to write in some checkpointing into your objective function,
> such as saving the current parameter values via 'save()' or 'dput()'.

  Has anyone successfully checkpointed and restarted R using any of the
linux process checkpointing solutions I find when I google for 'linux
process checkpointing'? I cant see why you'd bother implementing
checkpointing within optim() if you can do it at the process level and
hence in the middle of anything.

  Unless you're running Windows.

An example and some links here:

  http://www.cise.ufl.edu/~mfoster/research/uclik/uclik.htm

Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Kasper Daniel Hansen
In reply to this post by Prof Brian Ripley

On Jan 3, 2006, at 2:26 PM, Prof Brian Ripley wrote:

> On Tue, 3 Jan 2006, Kasper Daniel Hansen wrote:
>
>> On Jan 3, 2006, at 9:36 AM, Brian D Ripley wrote:
>>
>>> I use save.image() or save(), which seem exactly what you are  
>>> asking for.
>>
>> I have the (perhaps unsupported) impression that Ross wanted to  
>> save the progress during the optim run. Since it spends most of  
>> its time in the .Internal(optim(***)) call, save/save.image would  
>> not work.
>
> It certainly does not!  It is most likely spending time in the  
> callbacks to evaluate the function/gradient.  We have used save()  
> to save the current information (e.g. current parameter values)  
> from inside optim so a restart could be done, but then I have only  
> once encountered someone running a single optimization for over a  
> week: there normally are ways to speed things up.

I stand corrected. Actually I should have thought of this.
.
/Kasper


>> /Kasper
>>
>>> On Mon, 2 Jan 2006, Ross Boylan wrote:
>>>> I would like to checkpoint some of my calculations in R,  
>>>> specifically
>>>> those using optim.  As far as I can tell, R doesn't have this  
>>>> facility,
>>>> and there seems to have been little discussion of it.
>>>> checkpointing is saving enough of the current state so that work  
>>>> can
>>>> resume where things were left off if, to take my own example,  
>>>> the system
>>>> crashes after 8 days of calculation.
>>>> My thought is that this could be added as an option to optim as  
>>>> one of
>>>> the control parameters.
>>>> I thought I'd check here to see if anyone is aware of any work  
>>>> in this
>>>> area or has any thoughts about how to proceed.  In particular,  
>>>> is save a
>>>> reasonable way to save a few variables to disk?  I could also  
>>>> make the
>>>> code available when/if I get it working.
>>> --
>>> Brian D. Ripley,                  [hidden email]
>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Ross Boylan
In reply to this post by Prof Brian Ripley
On Tue, Jan 03, 2006 at 01:26:39PM +0000, Prof Brian Ripley wrote:

> On Tue, 3 Jan 2006, Kasper Daniel Hansen wrote:
>
> >On Jan 3, 2006, at 9:36 AM, Brian D Ripley wrote:
> >
> >>I use save.image() or save(), which seem exactly what you are asking for.
> >
> >I have the (perhaps unsupported) impression that Ross wanted to save the
> >progress during the optim run. Since it spends most of its time in the
> >.Internal(optim(***)) call, save/save.image would not work.
>
> It certainly does not!  
I'm having trouble following; does that sentence mean the preceding
one is wrong, or that save won't work.

> It is most likely spending time in the callbacks
> to evaluate the function/gradient.  
Yes.
> We have used save() to save the
> current information (e.g. current parameter values) from inside optim so a
> restart could be done,
Did you do this by
* using an existing feature of optim I don't know about;
* modifying the code for optim
* writing an objective function that saved the parameters with which
  it was called (which, now that I think of it, might be the simplest
  approach)?

My guess was that optim keeps its state in local variables that would
not be captured by a save.image.  Are you saying the relevant
variables are saved and can be fished out if needed?

It would also probably save some time if the estimated matrix of 2nd
derivatives were saved too (I supply only the objective function, not
derivatives), but that's minor compared to having the parameter
values.

> but then I have only once encountered someone
> running a single optimization for over a week: there normally are ways to
> speed things up.

I certainly hope so.  However, the problem size is likely to remain
large.

In answer to the other question about using OS checkpointing
facilities, I haven't tried them since the application will be running
on a cluster.  More precisely, the optimization will be driven from a
single machine, but the calculation of the objective function will be
distributed.  So checkpointing at the level of the optimization
function is a good fit to my needs.  There are some cluster OS's that
provide a kind of unified process space across the processors (scyld,
mosix), but we're not using them and checkpointing them is an unsolved
problem.  At least, it was unsolved a couple of years ago when I
looked into it.

Ross

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: checkpointing

Ross Boylan
In reply to this post by Gabor Grothendieck
Here's some code I put together for checkpointing a function being
optimized. Hooking directly into optim would require modifying its C
code, so this seemed the easiest route.  I've wanted more information on
the iterations than is currently provided, so this stuff some info back
in the calling environment (by default).

# wrapper to do checkpointing

# Ross Boylan [hidden email]
# 06-Jan-2006
# (C) 2006 Regents of University of California
# Distributed under the Gnu Public License v2 or later at your option

# If you want to checkpoint the optimization of a function f
# Use checkpoint(f) instead.  See below for other possible arguments.

# default operation for checkpoint(fnfoo) is to record the iterations
# in fnfoo.trace in the calling environment

# WARNING: Any existing variable with name in argument name
# will be deleted from the indicated frame
checkpoint <- function(f,
                       name = paste(substitute(f), ".trace", sep=""),
                       fileName = substitute(f),
                       nCalls = 1,
                       nTime = 60*15,
                       frame = parent.frame()) {
  # f is the objective function
  # frame is where to put the variable name
  # name will be a data.frame with rows containing
  #   iteration, time, value, parameters
  # fileName is the stem of the name to save for checkpointing
  #  saving will alternate between files with 0 and 1 appended
  # Saving to disk will happen every nCalls or nTime seconds,
  # whichever comes first
  if (exists(name, where=frame))
      rm(list=name, pos=frame)
  ckpt.lastSave <- 0 # alternate 0/1 for file to write to
  ckpt.lastTime <- Sys.time()  # last time saved
  function(params, ...) {
    p <- as.list(params)
    names(p) <- seq(length(params))
    if (exists(name, where=frame, inherits=FALSE)) {
      progress <- get(name, pos=frame)
      progress <- rbind(progress,
                        data.frame(row.names=dim(progress)[1]+1,
time=Sys.time(),
                        val=NA, p), deparse.level=0)
    } else
        progress <- data.frame(row.names=1, time=Sys.time(), val=NA, p)
    n <- dim(progress)[1]
    # write to disk
    if (n%%nCalls == 0 || progress[n, 1]- ckpt.lastTime > nTime) {
      ckpt.lastSave <<- (ckpt.lastSave+1) %% 2
      save(progress, file=paste(fileName, ckpt.lastSave, sep=""))
      ckpt.lastTime <<- progress[n, 1]
    }
    v <- f(params, ...)
    progress[n, 2] <- v
    assign(name, progress, pos=frame)
    v
  }
}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel