withTimeout bug, it does not work properly with nlme anymore

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

withTimeout bug, it does not work properly with nlme anymore

Ramiro Barrantes-2
Hello,

I was relying on withTimeout (from R.utils) to help me stop nlme when it �hangs�.  However, recently this stopped working.  I am pasting a reproducible example below: withTimeout should stop nlme after 10 seconds but the code will generate data for which nlme does not converge (or takes too long) and withTimeout does not stop it.  I tried this both on a linux (64 bit, CentOS 7, R 3.4.1, nlme 3.1-131 R.util 2.6, and also with R 3.2.5) and mac (Sierra 10.13.1, R 3.4.2, same versions or nlme and R.utils).  It takes over R and I need to use brute-force to stop it.  As mentioned, this used to work and it is very helpful for the purposes of having a loop where nlme goes through many models.

Thank you in advance for any help,
Ramiro

library(nlme)
library(R.utils)

dat<-data.frame(x=c(3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3,3,3,3,3,3,3,3,3,3,3,3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86),
y=c(0.35,0.69,0.57,1.48,6.08,-0.34,0.53,1.66,0.02,4.4,8.42,3.3,2.32,-2.3,7.52,-2.12,3.41,-4.76,7.9,5.04,10.26,-1.42,7.85,-1.88,3.81,-2.59,4.32,5.7,1.18, -1.74,1.81,6.16,4.2,-0.39,1.55,-1.4,1.76,-4.14,-2.36,-0.24,4.8,-7.07,1.34,1.98,0.86,-3.96,-0.61,2.68,-1.65,-2.06,3.67,-0.19,2.33,3.78,2.16,0.35, -5.6,1.32,2.99,4.21,-0.9,4.32,-4.01,2.03,0.9,-0.74,-5.78,5.76,0.52,1.37,-0.9,-4.06,-0.49,-2.39,-2.67,-0.71,-0.4,2.55,0.97,1.96,8.13,-5.93,4.01,0.79, -5.61,0.29,4.92,-2.89,-3.24,-3.06,-0.23,0.71,0.75,4.6,1.35, -3.35),
f.block=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
id=c("a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3","a2","a2","a2","a2","a1","a1","a1","a1","a3","a3","a3","a3"))

fpl.B.range <- function(lx,logbase,A,B,C,D) {
    A/(1+logbase^(-B*(lx-C)))+D
}
myFormula<-list(formula(A~id),formula(B~id),formula(C~id),formula(D~id))
INIT <- c(A.a1=1,A.a2=0,A.a3=0,B=1,B.a2=0,B.a3=0,C=0,C.a2=0,C.a3=0,D=1,D.a2=0,D.a3=0)


for (i in 1:100) {
    print(paste("Iteration ",i,"...this will stall soon"))
    set.seed(i)
    dat$y <- dat$y+rnorm(nrow(dat), mean = 0, sd = 0.1)
    try({withTimeout(nlme(model=y~fpl.B.range(x,exp(1),A,B,C,D),
                      control=nlmeControl(maxIter=50,pnlsMaxIter=7,msMaxIter=50,niterEM=25),
                          data=dat, na.action=na.omit,
                          fixed=myFormula,random=list(f.block=pdSymm(A+B+C+D~1)),
                          start=INIT),timeout=10)})
}


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: withTimeout bug, it does not work properly with nlme anymore

Martin Maechler
>>>>> Ramiro Barrantes <[hidden email]>
>>>>>     on Mon, 27 Nov 2017 21:02:52 +0000 writes:

    > Hello, I was relying on withTimeout (from R.utils) to help
    > me stop nlme when it �hangs�.  However, recently this
    > stopped working.  I am pasting a reproducible example
    > below: withTimeout should stop nlme after 10 seconds but
    > the code will generate data for which nlme does not
    > converge (or takes too long) and withTimeout does not stop
    > it.  I tried this both on a linux (64 bit, CentOS 7, R
    > 3.4.1, nlme 3.1-131 R.util 2.6, and also with R 3.2.5) and
    > mac (Sierra 10.13.1, R 3.4.2, same versions or nlme and
    > R.utils).  It takes over R and I need to use brute-force
    > to stop it.  As mentioned, this used to work and it is
    > very helpful for the purposes of having a loop where nlme
    > goes through many models.

    > Thank you in advance for any help, Ramiro

Dear Ramiro,

as I thought you are reporting a bug  about  R.utils  withTimeout(),
I and maybe others have not reacted.

You've addressed this again in a non-public e-mail,
and indeed the underlying bug is really in nlme  which you do
mention implicitly.

I'm appending a version of your example that is not using R.utils
at all and reproducible hangs for me with R 3.4.3, R 3.4.3
patched and R-devel (and almost surely earlier versions of R
which I did not check.

Indeed, the call to nlme() "stalls" // "hangs" / "freezes" /
... R indeed, and cannot be terminated in a regular way, and, as
you, I do need "brute force" to stop it, killing the R process
too.

As the maintainer of the 'nlme'  *is* R-core,
we are asked to fix this, at least making it interruptable.

Still I should not take time for that for the next couple of
weeks as I should fulfill several other day jobs duties,
instead, and so will not promise anything here.

Tested (minimal) patches are welcome!

Here's a version of your script slightly simplified which
exhibits the problem and shows the problem indeed does not
happen in nlminb() -- which I wrongly assumed for a while --
but indeed in nlme's call to own .C() code.

I am looking into fixing this (making it interruptable // detect
the infinite loop).
My guess is that it only happens in degenerate cases like here.

Martin Maechler
ETH Zurich



## From: Ramiro Barrantes <[hidden email]>
## To: "[hidden email]" <[hidden email]>
## Subject: [Rd] withTimeout bug, it does not work properly with nlme anymore
## Date: Mon, 27 Nov 2017 21:02:52 +0000

## Hello,

## I was relying on withTimeout (from R.utils) to help me stop nlme when it
## �hangs�.  However, recently this stopped working.  I am pasting a
## reproducible example below: withTimeout should stop nlme after 10 seconds
## but the code will generate data for which nlme does not converge (or takes
## too long) and withTimeout does not stop it.  I tried this both on a linux
## (64 bit, CentOS 7, R 3.4.1, nlme 3.1-131 R.util 2.6, and also with R
## 3.2.5) and mac (Sierra 10.13.1, R 3.4.2, same versions or nlme and
## R.utils).  It takes over R and I need to use brute-force to stop it.  As
## mentioned, this used to work and it is very helpful for the purposes of
## having a loop where nlme goes through many models.

## Thank you in advance for any help,
## Ramiro

## ((Modifications by Martin Maechler)
dat <- data.frame(
    x=c(3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3.69,3,3,3,3,3,3,3,3,3,3,3,3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,2.3,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,1.61,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.92,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-0.47,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86,-1.86),
    y=c(0.35,0.69,0.57,1.48,6.08,-0.34,0.53,1.66,0.02,4.4,8.42,3.3,2.32,-2.3,7.52,-2.12,3.41,-4.76,7.9,5.04,10.26,-1.42,7.85,-1.88,3.81,-2.59,4.32,5.7,1.18, -1.74,1.81,6.16,4.2,-0.39,1.55,-1.4,1.76,-4.14,-2.36,-0.24,4.8,-7.07,1.34,1.98,0.86,-3.96,-0.61,2.68,-1.65,-2.06,3.67,-0.19,2.33,3.78,2.16,0.35, -5.6,1.32,2.99,4.21,-0.9,4.32,-4.01,2.03,0.9,-0.74,-5.78,5.76,0.52,1.37,-0.9,-4.06,-0.49,-2.39,-2.67,-0.71,-0.4,2.55,0.97,1.96,8.13,-5.93,4.01,0.79, -5.61,0.29,4.92,-2.89,-3.24,-3.06,-0.23,0.71,0.75,4.6,1.35, -3.35),
    f.block = rep(1:4, 24),
    id= paste0("a", rep(c(2,1,3),each=4)))
str(dat)
## 'data.frame': 96 obs. of  4 variables:
##  $ x      : num  3.69 3.69 3.69 3.69 3.69 3.69 3.69 3.69 3.69 3.69 ...
##  $ y      : num  0.35 0.69 0.57 1.48 6.08 -0.34 0.53 1.66 0.02 4.4 ...
##  $ f.block: num  1 2 3 4 1 2 3 4 1 2 ...
##  $ id     : Factor w/ 3 levels "a1","a2","a3": 2 2 2 2 1 1 1 1 3 3 ...

table(dat$id) # 32 x 3 -- indeed the 2 factors are perfectly balanced:
xtabs(~id + f.block, data=dat)

## This is the version to directly trigger the bug
dd <- dat
set.seed(33)
dd$y <- dat$y + rnorm(nrow(dat), mean = 0, sd = 0.1)

library(nlme, lib = .Library) # <- get R's version not a newer one
cat("nlme version: ", format(packageVersion("nlme")), "\n")
## MM: Barrantes used 'logbase' and 'logbase^(..)' -- I just use exp(..):
fpl.B.range <- function(lx,A,B,C,D) {
    A/(1+exp(-B*(lx-C))) + D
}

INIT <- c(A.a1=1, A.a2=0, A.a3=0,
          B = 1,  B.a2=0, B.a3=0,
          C = 0,  C.a2=0, C.a3=0,
          D = 1,  D.a2=0, D.a3=0)

if(FALSE) # for interactive experiments, eval the following
debugonce(nlme.formula)

trace(nlminb, ## show arguments on entry:
      quote(print(ls.str())),
      exit = quote({cat("exiting nlminb();  port_msg(iv1):\n");
          port_msg(iv1); cat("variables:\n"); print(ls.str())}))


## MM: from watching 'htop' I don't see a clear memory leak...
## >>>>>>>>>>>>>>>Careful: This does "freeze R" : >>>>>>>>>>>>>>>>>
nlme(y ~ fpl.B.range(x, A,B,C,D), data = dd,
     fixed = list(A~id, B~id, C~id, D~id),
     random = list(f.block = pdSymm(A+B+C+D ~ 1)),
     start = INIT,
     control= nlmeControl(## NB: msMaxIter=200, ## gives singularity error at iter.55
         msVerbose=TRUE), #==> passed as 'trace' to nlminb()
     verbose = TRUE) -> res
## Shows that nlminb() is entered, then
## prints 50 iterations, and then *AGAIN* number 50 (!!)
## and then shows how nlminb() *is* exited, then shows -- thanks to verbose=TRUE
## **Iteration 1
## LME step: Loglik: -245.5092, nlminb iterations: 50
## reStruct  parameters:
##   f.block1   f.block2   f.block3   f.block4   f.block5   f.block6   f.block7   f.block8   f.block9  f.block10
##  2.3611369 -0.8382860 13.0713658 -1.0197240 -1.1551335 -0.3378552  5.4881588 -0.4035375 -3.3995335 14.7498195
## and then
## it stalls, I need to kill the R process
## --  [on lynne Fedora 26 (4.14.11-200.fc26.x86_64), Jan.2018]
## in R 3.4.3 and R 3.4.3 patched with nlme 3.1.131
##                    and R-devel with nlme 3.1.135

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: withTimeout bug, it does not work properly with nlme anymore

Martin Maechler
>>>>> Martin Maechler
>>>>>     on Tue, 30 Jan 2018 15:17:50 +0100 writes:

(a bit more than 6 months ago)

>>>>> Ramiro Barrantes <[hidden email]>
>>>>>     on Mon, 27 Nov 2017 21:02:52 +0000 writes:

    >> Hello, I was relying on withTimeout (from R.utils) to
    >> help me stop nlme when it �hangs�.  However, recently
    >> this stopped working.  I am pasting a reproducible
    >> example below: withTimeout should stop nlme after 10
    >> seconds but the code will generate data for which nlme
    >> does not converge (or takes too long) and withTimeout
    >> does not stop it.  I tried this both on a linux (64 bit,
    >> CentOS 7, R 3.4.1, nlme 3.1-131 R.util 2.6, and also with
    >> R 3.2.5) and mac (Sierra 10.13.1, R 3.4.2, same versions
    >> or nlme and R.utils).  It takes over R and I need to use
    >> brute-force to stop it.  As mentioned, this used to work
    >> and it is very helpful for the purposes of having a loop
    >> where nlme goes through many models.

    >> Thank you in advance for any help, Ramiro

    > Dear Ramiro,

    > as I thought you are reporting a bug about R.utils
    > withTimeout(), I and maybe others have not reacted.

    > You've addressed this again in a non-public e-mail, and
    > indeed the underlying bug is really in nlme which you do
    > mention implicitly.

    > I'm appending a version of your example that is not using
    > R.utils at all and reproducible hangs for me with R 3.4.3,
    > R 3.4.3 patched and R-devel (and almost surely earlier
    > versions of R which I did not check.

    > Indeed, the call to nlme() "stalls" // "hangs" / "freezes"
    > / ... R indeed, and cannot be terminated in a regular way,
    > and, as you, I do need "brute force" to stop it, killing
    > the R process too.

    > As the maintainer of the 'nlme' *is* R-core, we are asked
    > to fix this, at least making it interruptable.

    > Still I should not take time for that for the next couple
    > of weeks as I should fulfill several other day jobs
    > duties, instead, and so will not promise anything here.

    > Tested (minimal) patches are welcome!

I had forgotten to follow up on this, here.
We did fix this bug in the nlme source code (in the end by simply
replacing old Fortran code in PYTHAG() which looped infinitely when
passed an NAN by a call to C99's hypot()).
This was released with nlme 3.1-137, which is also part of R
3.5.0 an 3.5.1 (and current R development versions).

These examples should now all give an error (about a singular matrix),
instead of hang.

I also had added a regression test for this problem, but
interestingly that test did fail in rare circumstances and hence
it was decided not to be run by default, activated only by an
environment variable setting.
The test *is* part of nlme's sources, and hence also available
directly from
  https://svn.r-project.org/R-packages/trunk/nlme/tests/nlme-stall.R

Best,
Martin Maechler

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel