multiple parameter optimization with optim()

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple parameter optimization with optim()

Doran, Harold
I am trying to generalize a working piece of code for a single parameter to a multiple parameter problem. Reproducible code is below. The parameters to be estimated are a, b, and c. The estimation problem is such that there is one set of a, b, c parameters for each column of the data. Hence, in this sample data with 20 columns, there are 20 a params, 20 b-params, and 20 c-params.

Because I am estimating so many parameters, I am not certain that I have indicated to the function properly the right number of params to estimate and also if I have generated starting values in a sufficient way.

Thanks for any help.
Harold

dat <- replicate(20, sample(c(0,1), 2000, replace = T))
library(stat mod)
qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma = 1)
nds <- qq$nodes
wts <- qq$weights
fn <- function(params){
a <- params[1:ncol(dat)]
b <- params[1:ncol(dat)]
c <- params[1:ncol(dat)]
L <- sapply(1:ncol(dat), function(i) dbinom(dat[,i], 1, c + ((1 - c)/(1 + exp(-1.7 * a * (nds[i] - b)))) * wts[i]))
r1 <- prod(colSums(L * wts))
-log(r1)
}
startVal <- rep(.5, ncol(dat))
opt <- optim(startVal, fn)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: multiple parameter optimization with optim()

Prof J C Nash (U30A)
Some observations -- no solution here though:

1) the code is not executable. I tried. Maybe that makes it reproducible!
    Typos such as "stat mod", undefined Q etc.

2) My experience is that any setup with a ?apply approach that doesn't
then check to see that the structure of the data is correct has a high
probability of failure due to mismatch with the optimizer requirements.
It's worth being VERY pedestrian in setting up optimization functions
and checking obsessively that you get what you expect and that there are
no regions you are likely to wander into with divide by 0,
log(negative), etc.

3) optim() is a BAD choice here. I wrote the source for three of the
codes, and the one most appropriate for many parameters (CG) I have been
deprecating for about 30 years. Use Rcgmin or something else
instead.

4) If possible, analytic gradients are needed for CG like codes. You
probably need to dig out some source code for dbinom() to do this, but
your function is not particularly complicated, and doesn't have "if"
statements etc. However, you could test a case using the numDeriv
gradient that is an option for Rcgmin, but it will be painfully slow.
For a one-off computation, that may still be acceptable.

JN

On 15-02-18 06:00 AM, [hidden email] wrote:

> Message: 37
> Date: Tue, 17 Feb 2015 23:03:24 +0000
> From: "Doran, Harold" <[hidden email]>
> To: "[hidden email]" <[hidden email]>
> Subject: [R] multiple parameter optimization with optim()
> Message-ID: <D10931E1.23C0E%[hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> I am trying to generalize a working piece of code for a single parameter to a multiple parameter problem. Reproducible code is below. The parameters to be estimated are a, b, and c. The estimation problem is such that there is one set of a, b, c parameters for each column of the data. Hence, in this sample data with 20 columns, there are 20 a params, 20 b-params, and 20 c-params.
>
> Because I am estimating so many parameters, I am not certain that I have indicated to the function properly the right number of params to estimate and also if I have generated starting values in a sufficient way.
>
> Thanks for any help.
> Harold
>
> dat <- replicate(20, sample(c(0,1), 2000, replace = T))
> library(stat mod)
> qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma = 1)
> nds <- qq$nodes
> wts <- qq$weights
> fn <- function(params){
> a <- params[1:ncol(dat)]
> b <- params[1:ncol(dat)]
> c <- params[1:ncol(dat)]
> L <- sapply(1:ncol(dat), function(i) dbinom(dat[,i], 1, c + ((1 - c)/(1 + exp(-1.7 * a * (nds[i] - b)))) * wts[i]))
> r1 <- prod(colSums(L * wts))
> -log(r1)
> }
> startVal <- rep(.5, ncol(dat))
> opt <- optim(startVal, fn)
>
> [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: multiple parameter optimization with optim()

Doran, Harold
John et al

Thank you for your advice below. It was sloppy of me not to verify my reproducible code below.  I have tried a few of your suggestions and wrapped the working code into the function below called pl2. The function properly lands on the right model parameters when I use the optim or nlminb (for nlminb I had to increase max iterations).

The function is enormously slow. At first, I created the object rr1 with two calls to sapply(). This works, but creates an extremely large matrix at each iteration.

library(statmod)
dat <- replicate(20, sample(c(0,1), 2000, replace = T))
a <- b <- rep(1, 20)
Q <- 10
qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
nds <- qq$nodes
wts <- qq$weights

rr1 <- sapply(1:nrow(dat), function(j)
                                                sapply(1:Q, function(i)
                                                                exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = TRUE))) * qq$weights[i]))

So, I thought to reduce some memory, I would do it this way which is equivalent, doesn't create such a large matrix, but instead uses an explicit loop. Both approaches are still equally as slow.

rr1 <- numeric(nrow(dat))
for(j in 1:length(rr1)){
                rr1[j] <- sum(sapply(1:Q,
                function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i]))
        }

As you noted, my likelihood is not complex; in fact I have another program that uses newton-raphson with the analytic first and second derivatives because they are so easy to find. In that program, the model converges very (very) quickly. My purpose in using numeric differentiation is experiential in some respects and hoping to apply this to problems for which the analytic derivatives might not be so easy to come by.

I think the basic idea here to improve speed is to make a call to the gradient, which I understand to be the vector of first derivatives of my likelihood function, is that right? If that is right, in a multi-parameter problem, I'm not sure how to think about the gradient function. Since I am maximizing w.r.t. a and b (these are the parameters of the model), I would have a vector of first partials for a and another for b. So I conceptually do not understand what the gradient would be in this instance, perhaps some clarification would be helpful.

Below is the working function, which as I noted is enormously slow. Any advice on speed improvements here would be helpful. Thank you

pl2 <- function(dat, Q, startVal = NULL, ...){
                if(!is.null(startVal) && length(startVal) != ncol(dat) ){
                                stop("Length of argument startVal not equal to the number of parameters estimated")
                }            
                if(!is.null(startVal)){
                                startVal <- startVal
                                } else {
                                p <- colMeans(dat)
                                startValA <- rep(1, ncol(dat))
                                startValB <- as.vector(log((1 - p)/p))
                                startVal <- c(startValA,startValB)
                }
                rr1 <- numeric(nrow(dat))
                qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
        nds <- qq$nodes
        wts <- qq$weights
                dat <- as.matrix(dat)
                fn <- function(params){
                    a <- params[1:20]            
                    b <- params[21:40]        
                for(j in 1:length(rr1)){
                rr1[j] <- sum(sapply(1:Q,
                function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i]))
                                }
                                -sum(log(rr1))
                }            
                #opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE)
                opt <-  nlminb(startVal, fn)
                #opt <- Rcgmin(startVal, fn)
                opt
                #list("coefficients" = opt$par, "LogLik" = -opt$value, "Std.Error" = sqrt(diag(solve(opt$hessian))))
}

dat <- replicate(20, sample(c(0,1), 2000, replace = T))
r2 <- pl2(datat, Q =10)

-----Original Message-----
From: Prof J C Nash (U30A) [mailto:[hidden email]]
Sent: Wednesday, February 18, 2015 9:07 AM
To: [hidden email]; Doran, Harold
Subject: Re: [R] multiple parameter optimization with optim()

Some observations -- no solution here though:

1) the code is not executable. I tried. Maybe that makes it reproducible!
    Typos such as "stat mod", undefined Q etc.

2) My experience is that any setup with a ?apply approach that doesn't then check to see that the structure of the data is correct has a high probability of failure due to mismatch with the optimizer requirements.
It's worth being VERY pedestrian in setting up optimization functions and checking obsessively that you get what you expect and that there are no regions you are likely to wander into with divide by 0, log(negative), etc.

3) optim() is a BAD choice here. I wrote the source for three of the codes, and the one most appropriate for many parameters (CG) I have been deprecating for about 30 years. Use Rcgmin or something else instead.

4) If possible, analytic gradients are needed for CG like codes. You probably need to dig out some source code for dbinom() to do this, but your function is not particularly complicated, and doesn't have "if"
statements etc. However, you could test a case using the numDeriv gradient that is an option for Rcgmin, but it will be painfully slow.
For a one-off computation, that may still be acceptable.

JN

On 15-02-18 06:00 AM, [hidden email] wrote:

> Message: 37
> Date: Tue, 17 Feb 2015 23:03:24 +0000
> From: "Doran, Harold" <[hidden email]>
> To: "[hidden email]" <[hidden email]>
> Subject: [R] multiple parameter optimization with optim()
> Message-ID: <D10931E1.23C0E%[hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> I am trying to generalize a working piece of code for a single parameter to a multiple parameter problem. Reproducible code is below. The parameters to be estimated are a, b, and c. The estimation problem is such that there is one set of a, b, c parameters for each column of the data. Hence, in this sample data with 20 columns, there are 20 a params, 20 b-params, and 20 c-params.
>
> Because I am estimating so many parameters, I am not certain that I have indicated to the function properly the right number of params to estimate and also if I have generated starting values in a sufficient way.
>
> Thanks for any help.
> Harold
>
> dat <- replicate(20, sample(c(0,1), 2000, replace = T)) library(stat
> mod) qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma = 1) nds
> <- qq$nodes wts <- qq$weights fn <- function(params){ a <-
> params[1:ncol(dat)] b <- params[1:ncol(dat)] c <- params[1:ncol(dat)]
> L <- sapply(1:ncol(dat), function(i) dbinom(dat[,i], 1, c + ((1 -
> c)/(1 + exp(-1.7 * a * (nds[i] - b)))) * wts[i]))
> r1 <- prod(colSums(L * wts))
> -log(r1)
> }
> startVal <- rep(.5, ncol(dat))
> opt <- optim(startVal, fn)
>
> [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: multiple parameter optimization with optim()

Bert Gunter
This is not the proper venue for a discussion of the mathematics of
optimization, no matter that it is interesting. Please take it off
list.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Fri, Feb 20, 2015 at 6:03 AM, Doran, Harold <[hidden email]> wrote:

> John et al
>
> Thank you for your advice below. It was sloppy of me not to verify my reproducible code below.  I have tried a few of your suggestions and wrapped the working code into the function below called pl2. The function properly lands on the right model parameters when I use the optim or nlminb (for nlminb I had to increase max iterations).
>
> The function is enormously slow. At first, I created the object rr1 with two calls to sapply(). This works, but creates an extremely large matrix at each iteration.
>
> library(statmod)
> dat <- replicate(20, sample(c(0,1), 2000, replace = T))
> a <- b <- rep(1, 20)
> Q <- 10
> qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
> nds <- qq$nodes
> wts <- qq$weights
>
> rr1 <- sapply(1:nrow(dat), function(j)
>                                                 sapply(1:Q, function(i)
>                                                                 exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = TRUE))) * qq$weights[i]))
>
> So, I thought to reduce some memory, I would do it this way which is equivalent, doesn't create such a large matrix, but instead uses an explicit loop. Both approaches are still equally as slow.
>
> rr1 <- numeric(nrow(dat))
> for(j in 1:length(rr1)){
>                 rr1[j] <- sum(sapply(1:Q,
>                 function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i]))
>         }
>
> As you noted, my likelihood is not complex; in fact I have another program that uses newton-raphson with the analytic first and second derivatives because they are so easy to find. In that program, the model converges very (very) quickly. My purpose in using numeric differentiation is experiential in some respects and hoping to apply this to problems for which the analytic derivatives might not be so easy to come by.
>
> I think the basic idea here to improve speed is to make a call to the gradient, which I understand to be the vector of first derivatives of my likelihood function, is that right? If that is right, in a multi-parameter problem, I'm not sure how to think about the gradient function. Since I am maximizing w.r.t. a and b (these are the parameters of the model), I would have a vector of first partials for a and another for b. So I conceptually do not understand what the gradient would be in this instance, perhaps some clarification would be helpful.
>
> Below is the working function, which as I noted is enormously slow. Any advice on speed improvements here would be helpful. Thank you
>
> pl2 <- function(dat, Q, startVal = NULL, ...){
>                 if(!is.null(startVal) && length(startVal) != ncol(dat) ){
>                                 stop("Length of argument startVal not equal to the number of parameters estimated")
>                 }
>                 if(!is.null(startVal)){
>                                 startVal <- startVal
>                                 } else {
>                                 p <- colMeans(dat)
>                                 startValA <- rep(1, ncol(dat))
>                                 startValB <- as.vector(log((1 - p)/p))
>                                 startVal <- c(startValA,startValB)
>                 }
>                 rr1 <- numeric(nrow(dat))
>                 qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
>         nds <- qq$nodes
>         wts <- qq$weights
>                 dat <- as.matrix(dat)
>                 fn <- function(params){
>                     a <- params[1:20]
>                     b <- params[21:40]
>                 for(j in 1:length(rr1)){
>                 rr1[j] <- sum(sapply(1:Q,
>                 function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (nds[i] - b))), log = TRUE))) * wts[i]))
>                                 }
>                                 -sum(log(rr1))
>                 }
>                 #opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE)
>                 opt <-  nlminb(startVal, fn)
>                 #opt <- Rcgmin(startVal, fn)
>                 opt
>                 #list("coefficients" = opt$par, "LogLik" = -opt$value, "Std.Error" = sqrt(diag(solve(opt$hessian))))
> }
>
> dat <- replicate(20, sample(c(0,1), 2000, replace = T))
> r2 <- pl2(datat, Q =10)
>
> -----Original Message-----
> From: Prof J C Nash (U30A) [mailto:[hidden email]]
> Sent: Wednesday, February 18, 2015 9:07 AM
> To: [hidden email]; Doran, Harold
> Subject: Re: [R] multiple parameter optimization with optim()
>
> Some observations -- no solution here though:
>
> 1) the code is not executable. I tried. Maybe that makes it reproducible!
>     Typos such as "stat mod", undefined Q etc.
>
> 2) My experience is that any setup with a ?apply approach that doesn't then check to see that the structure of the data is correct has a high probability of failure due to mismatch with the optimizer requirements.
> It's worth being VERY pedestrian in setting up optimization functions and checking obsessively that you get what you expect and that there are no regions you are likely to wander into with divide by 0, log(negative), etc.
>
> 3) optim() is a BAD choice here. I wrote the source for three of the codes, and the one most appropriate for many parameters (CG) I have been deprecating for about 30 years. Use Rcgmin or something else instead.
>
> 4) If possible, analytic gradients are needed for CG like codes. You probably need to dig out some source code for dbinom() to do this, but your function is not particularly complicated, and doesn't have "if"
> statements etc. However, you could test a case using the numDeriv gradient that is an option for Rcgmin, but it will be painfully slow.
> For a one-off computation, that may still be acceptable.
>
> JN
>
> On 15-02-18 06:00 AM, [hidden email] wrote:
>> Message: 37
>> Date: Tue, 17 Feb 2015 23:03:24 +0000
>> From: "Doran, Harold" <[hidden email]>
>> To: "[hidden email]" <[hidden email]>
>> Subject: [R] multiple parameter optimization with optim()
>> Message-ID: <D10931E1.23C0E%[hidden email]>
>> Content-Type: text/plain; charset="UTF-8"
>>
>> I am trying to generalize a working piece of code for a single parameter to a multiple parameter problem. Reproducible code is below. The parameters to be estimated are a, b, and c. The estimation problem is such that there is one set of a, b, c parameters for each column of the data. Hence, in this sample data with 20 columns, there are 20 a params, 20 b-params, and 20 c-params.
>>
>> Because I am estimating so many parameters, I am not certain that I have indicated to the function properly the right number of params to estimate and also if I have generated starting values in a sufficient way.
>>
>> Thanks for any help.
>> Harold
>>
>> dat <- replicate(20, sample(c(0,1), 2000, replace = T)) library(stat
>> mod) qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma = 1) nds
>> <- qq$nodes wts <- qq$weights fn <- function(params){ a <-
>> params[1:ncol(dat)] b <- params[1:ncol(dat)] c <- params[1:ncol(dat)]
>> L <- sapply(1:ncol(dat), function(i) dbinom(dat[,i], 1, c + ((1 -
>> c)/(1 + exp(-1.7 * a * (nds[i] - b)))) * wts[i]))
>> r1 <- prod(colSums(L * wts))
>> -log(r1)
>> }
>> startVal <- rep(.5, ncol(dat))
>> opt <- optim(startVal, fn)
>>
>>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: multiple parameter optimization with optim()

Ravi Varadhan-2
In reply to this post by Doran, Harold
Harold,



Obviously the bottleneck is your objective function fn().  I have speeded up your function by a factor of about 2.4 by using `outer' instead of sapply.  I think it can be speeded much more.  I couldn't figure it out without spending a lot of time.  I am sure someone on this list-serv can come up with a cleverer way to program the objective function.



pl3 <- function(dat, Q, startVal = NULL, ...){
                 if(!is.null(startVal) && length(startVal) != ncol(dat) ){
                                 stop("Length of argument startVal not equal to the number of parameters estimated")
                 }
                 if(!is.null(startVal)){
                                 startVal <- startVal
                                 } else {
                                 p <- colMeans(dat)
                                 startValA <- rep(1, ncol(dat))
                                 startValB <- as.vector(log((1 - p)/p))
                                 startVal <- c(startValA,startValB)
                 }
                 rr1 <- rr2 <- numeric(nrow(dat))
                 qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
         nds <- qq$nodes
         wts <- qq$weights
         Q <- length(qq$nodes)
                 dat <- as.matrix(dat)
                 fn <- function(params){
                     a <- params[1:20]
                     b <- params[21:40]
                 for(j in 1:length(rr1)){
                 rr1[j] <- sum(wts*exp(colSums(outer(dat[j,], nds, function(x,y) dbinom(x, 1, 1/ (1 + exp(- 1.7 * a * (y - b))), log = TRUE)))))
      }
      -sum(log(rr1))
                  }
                 #opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE)
                 opt <-  nlminb(startVal, fn, control=list(trace=1, rel.tol=1.e-06, iter.max=50))
#                 opt <- spg(startVal, fn)
                 opt
                 #list("coefficients" = opt$par, "LogLik" = -opt$value, "Std.Error" = sqrt(diag(solve(opt$hessian))))
 }



Hope this is helpful,

Ravi

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.