Hessian Matrix Issue

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Hessian Matrix Issue

tzaihra
Dear All,

I am running a simulation to obtain coverage probability of Wald type
confidence intervals for my parameter d in a function of two parameters
(mu,d).

I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I
want to invert the Hessian matrix to get Standard errors of the two
parameter estimates. However, my Hessian matrix at times becomes
non-invertible that is it is no more positive definite and I get the
following error msg:

"Error in solve.default(ac$hessian) : system is computationally singular:
reciprocal condition number = 6.89585e-21"
Thank you

Following is the code I am running I would really appreciate your comments
and suggestions:

#Start Code
#option to trace /recover error
#options(error = recover)

#Sample Size
n<-30
mu<-5
size<- 2

#true values of parameter d
d.true<-1+mu/size
d.true

#true value of  zero inflation index phi= 1+log(d)/(1-d)
z.true<-1+(log(d.true)/(1-d.true))
z.true

# Allocating space for simulation vectors and setting counters for simulation
counter<-0
iter<-10000
lower.d<-numeric(iter)
upper.d<-numeric(iter)

#set.seed(987654321)

#begining of simulation loop########

for (i in 1:iter){
r.NB<-rnbinom(n, mu = mu, size = size)
y<-sort(r.NB)
iter.num<-i
print(y)
print(iter.num)
#empirical estimates or sample moments
xbar<-mean(y)
variance<-(sum((y-xbar)^2))/length(y)
dbar<-variance/xbar
#sample estimate of proportion of zeros and zero inflation index
pbar<-length(y[y==0])/length(y)

### Simplified function #############################################

NegBin<-function(th){
mu<-th[1]
d<-th[2]
n<-length(y)

arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0)
#arg1<-n*mean(y)*log(mu)

#arg2<-n*log(d)*((mean(y))+mu/(d-1))
arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1),
0.0000001))

aa<-numeric(length(max(y)))
a<-numeric(length(y))
for (i in 1:n)
{
for (j in 1:y[i]){
aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0)
#aa[j]<-log(1+((j-1)*(d-1))/mu)
#print(aa[j])
}

a[i]<-sum(aa)
#print(a[i])
}
a
arg3<-sum(a)
llh<-arg1+arg2+arg3
if(! is.finite(llh))
llh<-1e+20
-llh
}
ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower=
c(0,1) )
ac
print(ac$hessian)
muhat<-ac$par[1]
dhat<-ac$par[2]
zhat<- 1+(log(dhat)/(1-dhat))
infor<-solve(ac$hessian)
var.dhat<-infor[2,2]
se.dhat<-sqrt(var.dhat)
var.muhat<-infor[1,1]
se.muhat<-sqrt(var.muhat)
var.func<-dhat*muhat
var.func
d.prime<-cbind(dhat,muhat)

se.var.func<-d.prime%*%infor%*%t(d.prime)
se.var.func
lower.d[i]<-dhat-1.96*se.dhat
upper.d[i]<-dhat+1.96*se.dhat

if(lower.d[i]  <= d.true & d.true<= upper.d[i])
counter <-counter+1
}
counter
covg.prob<-counter/iter
covg.prob

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian Matrix Issue

Uwe Ligges-3
I have not really looked into the details of the lengthy and almost
unreadable code below. In any case, there are good reasons why numerics
software typically uses Fisher scoring / IWLS in order to fit GLMs.....

And if your matrix is that "singular", even the common numerical tricks
may not save the day anymore. 7e-21 is very close to exact singularity!

Uwe Ligges






On 02.09.2011 21:33, [hidden email] wrote:

> Dear All,
>
> I am running a simulation to obtain coverage probability of Wald type
> confidence intervals for my parameter d in a function of two parameters
> (mu,d).
>
> I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I
> want to invert the Hessian matrix to get Standard errors of the two
> parameter estimates. However, my Hessian matrix at times becomes
> non-invertible that is it is no more positive definite and I get the
> following error msg:
>
> "Error in solve.default(ac$hessian) : system is computationally singular:
> reciprocal condition number = 6.89585e-21"
> Thank you
>
> Following is the code I am running I would really appreciate your comments
> and suggestions:
>
> #Start Code
> #option to trace /recover error
> #options(error = recover)
>
> #Sample Size
> n<-30
> mu<-5
> size<- 2
>
> #true values of parameter d
> d.true<-1+mu/size
> d.true
>
> #true value of  zero inflation index phi= 1+log(d)/(1-d)
> z.true<-1+(log(d.true)/(1-d.true))
> z.true
>
> # Allocating space for simulation vectors and setting counters for simulation
> counter<-0
> iter<-10000
> lower.d<-numeric(iter)
> upper.d<-numeric(iter)
>
> #set.seed(987654321)
>
> #begining of simulation loop########
>
> for (i in 1:iter){
> r.NB<-rnbinom(n, mu = mu, size = size)
> y<-sort(r.NB)
> iter.num<-i
> print(y)
> print(iter.num)
> #empirical estimates or sample moments
> xbar<-mean(y)
> variance<-(sum((y-xbar)^2))/length(y)
> dbar<-variance/xbar
> #sample estimate of proportion of zeros and zero inflation index
> pbar<-length(y[y==0])/length(y)
>
> ### Simplified function #############################################
>
> NegBin<-function(th){
> mu<-th[1]
> d<-th[2]
> n<-length(y)
>
> arg1<-n*mean(y)*ifelse(mu>= 0, log(mu),0)
> #arg1<-n*mean(y)*log(mu)
>
> #arg2<-n*log(d)*((mean(y))+mu/(d-1))
> arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1),
> 0.0000001))
>
> aa<-numeric(length(max(y)))
> a<-numeric(length(y))
> for (i in 1:n)
> {
> for (j in 1:y[i]){
> aa[j]<-ifelse(((j-1)*(d-1))/mu>0,log(1+((j-1)*(d-1))/mu),0)
> #aa[j]<-log(1+((j-1)*(d-1))/mu)
> #print(aa[j])
> }
>
> a[i]<-sum(aa)
> #print(a[i])
> }
> a
> arg3<-sum(a)
> llh<-arg1+arg2+arg3
> if(! is.finite(llh))
> llh<-1e+20
> -llh
> }
> ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower=
> c(0,1) )
> ac
> print(ac$hessian)
> muhat<-ac$par[1]
> dhat<-ac$par[2]
> zhat<- 1+(log(dhat)/(1-dhat))
> infor<-solve(ac$hessian)
> var.dhat<-infor[2,2]
> se.dhat<-sqrt(var.dhat)
> var.muhat<-infor[1,1]
> se.muhat<-sqrt(var.muhat)
> var.func<-dhat*muhat
> var.func
> d.prime<-cbind(dhat,muhat)
>
> se.var.func<-d.prime%*%infor%*%t(d.prime)
> se.var.func
> lower.d[i]<-dhat-1.96*se.dhat
> upper.d[i]<-dhat+1.96*se.dhat
>
> if(lower.d[i]<= d.true&  d.true<= upper.d[i])
> counter<-counter+1
> }
> counter
> covg.prob<-counter/iter
> covg.prob
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian matrix issue

Prof J C Nash (U30A)
In reply to this post by tzaihra
Unless you are supplying analytic hessian code, you are almost certainly getting an
approximation. Worse, if you do not provide gradients, these are the result of two levels
of differencing, so you should expect some loss of precision in the approximate Hessian.

Moreover, if your estimate of the optimum is a little bit off, or the optimizer has
terminated (algorithms converge, programs terminate) to a point that is not an optimum,
there is no reason the Hessian should be positive definite.

Package optimx() uses the Jacobian of the gradient if the analytic gradient is available.
This drops the differencing to 1 level. Even better is to code the Hessian, but that is
messy and tedious in most cases.

Best, JN


On 09/03/2011 06:00 AM, [hidden email] wrote:

> Message: 59
> Date: Fri, 2 Sep 2011 15:33:13 -0400
> From: [hidden email]
> To: [hidden email]
> Subject: [R] Hessian Matrix Issue
> Message-ID:
> <[hidden email]>
> Content-Type: text/plain;charset=iso-8859-1
>
> Dear All,
>
> I am running a simulation to obtain coverage probability of Wald type
> confidence intervals for my parameter d in a function of two parameters
> (mu,d).
>
> I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I
> want to invert the Hessian matrix to get Standard errors of the two
> parameter estimates. However, my Hessian matrix at times becomes
> non-invertible that is it is no more positive definite and I get the
> following error msg:
>
> "Error in solve.default(ac$hessian) : system is computationally singular:
> reciprocal condition number = 6.89585e-21"
> Thank you
>
> Following is the code I am running I would really appreciate your comments
> and suggestions:
>
> #Start Code
> #option to trace /recover error
> #options(error = recover)
>
> #Sample Size
> n<-30
> mu<-5
> size<- 2
>
> #true values of parameter d
> d.true<-1+mu/size
> d.true
>
> #true value of  zero inflation index phi= 1+log(d)/(1-d)
> z.true<-1+(log(d.true)/(1-d.true))
> z.true
>
> # Allocating space for simulation vectors and setting counters for simulation
> counter<-0
> iter<-10000
> lower.d<-numeric(iter)
> upper.d<-numeric(iter)
>
> #set.seed(987654321)
>
> #begining of simulation loop########
>
> for (i in 1:iter){
> r.NB<-rnbinom(n, mu = mu, size = size)
> y<-sort(r.NB)
> iter.num<-i
> print(y)
> print(iter.num)
> #empirical estimates or sample moments
> xbar<-mean(y)
> variance<-(sum((y-xbar)2))/length(y)
> dbar<-variance/xbar
> #sample estimate of proportion of zeros and zero inflation index
> pbar<-length(y[y==0])/length(y)
>
> ### Simplified function #############################################
>
> NegBin<-function(th){
> mu<-th[1]
> d<-th[2]
> n<-length(y)
>
> arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0)
> #arg1<-n*mean(y)*log(mu)
>
> #arg2<-n*log(d)*((mean(y))+mu/(d-1))
> arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1),
> 0.0000001))
>
> aa<-numeric(length(max(y)))
> a<-numeric(length(y))
> for (i in 1:n)
> {
> for (j in 1:y[i]){
> aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0)
> #aa[j]<-log(1+((j-1)*(d-1))/mu)
> #print(aa[j])
> }
>
> a[i]<-sum(aa)
> #print(a[i])
> }
> a
> arg3<-sum(a)
> llh<-arg1+arg2+arg3
> if(! is.finite(llh))
> llh<-1e+20
> -llh
> }
> ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower=
> c(0,1) )
> ac
> print(ac$hessian)
> muhat<-ac$par[1]
> dhat<-ac$par[2]
> zhat<- 1+(log(dhat)/(1-dhat))
> infor<-solve(ac$hessian)
> var.dhat<-infor[2,2]
> se.dhat<-sqrt(var.dhat)
> var.muhat<-infor[1,1]
> se.muhat<-sqrt(var.muhat)
> var.func<-dhat*muhat
> var.func
> d.prime<-cbind(dhat,muhat)
>
> se.var.func<-d.prime%*%infor%*%t(d.prime)
> se.var.func
> lower.d[i]<-dhat-1.96*se.dhat
> upper.d[i]<-dhat+1.96*se.dhat
>
> if(lower.d[i]  <= d.true & d.true<= upper.d[i])
> counter <-counter+1
> }
> counter
> covg.prob<-counter/iter
> covg.prob
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian Matrix Issue

dave fournier-2
In reply to this post by tzaihra

I wonder if your code is correct?

I ran your script until an error was reported. the data set
of 30 obs was


[1]  0  0  1  3  3  3  4  4  4  4  5  5  5  5  5  7  7  7  7  7  7  8  9
10 11
[26] 12 12 12 15 16

I created a tiny AD Model Builder program to do MLE on it.

DATA_SECTION
   init_int nobs
   init_vector y(1,nobs)
PARAMETER_SECTION
   init_number log_mu
   init_number log_alpha
   sdreport_number mu
   sdreport_number tau
   objective_function_value f
PROCEDURE_SECTION
   mu=exp(log_mu);
   tau=1.0+exp(log_alpha);
   for (int i=1;i<=nobs;i++)
   {
     f-=log_negbinomial_density(y(i),mu,tau);
   }
It converged quickly and

The eigenvalues of the Hessian were

     4.711089774    78.27632341

and the estimates and std devs of the parameters mu and tau were

index   name       value      std dev

      3   mu         6.6000e+00 7.7318e-01
      4   tau        2.7173e+00 7.8944e-01

where tau is the variance divided by the mean.

This was all so simple that I suspect your (rather difficult to read)
R code is wrong, otherwise R must really suck at this kind of problem.

       Dave

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian Matrix Issue

bbolker
In reply to this post by Uwe Ligges-3
Uwe Ligges <ligges <at> statistik.tu-dortmund.de> writes:

>
> I have not really looked into the details of the lengthy and almost
> unreadable code below. In any case, there are good reasons why numerics
> software typically uses Fisher scoring / IWLS in order to fit GLMs.....
>
> And if your matrix is that "singular", even the common numerical tricks
> may not save the day anymore. 7e-21 is very close to exact singularity!
>
> Uwe Ligges
>

  Your problem is with the strategy you use to try to deal with
non-finite values, i.e. setting the negative log-likelihood to
10^20 if the calculated values are not finite.  What happens is
that, rather than just pushing the optimization away from a
bad value, you get stuck there, which leads to a "solution" to
the optimization, which is completely flat (because the objective
function is 1e20 for any value near the solution), which leads to
an uninvertible hessian.
   More specifically, inserting a browser() call at the point
after the "if (!is.finite())" call and inspecting the results
when the objective function is not finite shows that when d=1
the ifelse((d-1)>=0, ...) clause returns (d-1) as a denominator ...

  Beyond that, I can't spend any more time picking through
this ...

  Ben Bolker

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian Matrix Issue

Paul Hiemstra-2
In reply to this post by dave fournier-2
 On 09/03/2011 08:22 PM, dave fournier wrote:

>
> I wonder if your code is correct?
>
> I ran your script until an error was reported. the data set
> of 30 obs was
>
>
> [1]  0  0  1  3  3  3  4  4  4  4  5  5  5  5  5  7  7  7  7  7  7  8
> 9 10 11
> [26] 12 12 12 15 16
>
> I created a tiny AD Model Builder program to do MLE on it.
>
> DATA_SECTION
>   init_int nobs
>   init_vector y(1,nobs)
> PARAMETER_SECTION
>   init_number log_mu
>   init_number log_alpha
>   sdreport_number mu
>   sdreport_number tau
>   objective_function_value f
> PROCEDURE_SECTION
>   mu=exp(log_mu);
>   tau=1.0+exp(log_alpha);
>   for (int i=1;i<=nobs;i++)
>   {
>     f-=log_negbinomial_density(y(i),mu,tau);
>   }
> It converged quickly and
>
> The eigenvalues of the Hessian were
>
>     4.711089774    78.27632341
>
> and the estimates and std devs of the parameters mu and tau were
>
> index   name       value      std dev
>
>      3   mu         6.6000e+00 7.7318e-01
>      4   tau        2.7173e+00 7.8944e-01
>
> where tau is the variance divided by the mean.
>
> This was all so simple that I suspect your (rather difficult to read)
> R code is wrong, otherwise R must really suck at this kind of problem.

I'd put my money on R!

Paul

>
>       Dave
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian matrix issue

Giovanni Petris
In reply to this post by Prof J C Nash (U30A)

About the numerical calculation of the Hessian matrix, I have found
numDeriv:::hessian to be often more accurate than the Hessian returned
by optim.

Best,
Giovanni Petris

On Sat, 2011-09-03 at 13:39 -0400, John C Nash wrote:

> Unless you are supplying analytic hessian code, you are almost certainly getting an
> approximation. Worse, if you do not provide gradients, these are the result of two levels
> of differencing, so you should expect some loss of precision in the approximate Hessian.
>
> Moreover, if your estimate of the optimum is a little bit off, or the optimizer has
> terminated (algorithms converge, programs terminate) to a point that is not an optimum,
> there is no reason the Hessian should be positive definite.
>
> Package optimx() uses the Jacobian of the gradient if the analytic gradient is available.
> This drops the differencing to 1 level. Even better is to code the Hessian, but that is
> messy and tedious in most cases.
>
> Best, JN
>
>
> On 09/03/2011 06:00 AM, [hidden email] wrote:
> > Message: 59
> > Date: Fri, 2 Sep 2011 15:33:13 -0400
> > From: [hidden email]
> > To: [hidden email]
> > Subject: [R] Hessian Matrix Issue
> > Message-ID:
> > <[hidden email]>
> > Content-Type: text/plain;charset=iso-8859-1
> >
> > Dear All,
> >
> > I am running a simulation to obtain coverage probability of Wald type
> > confidence intervals for my parameter d in a function of two parameters
> > (mu,d).
> >
> > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I
> > want to invert the Hessian matrix to get Standard errors of the two
> > parameter estimates. However, my Hessian matrix at times becomes
> > non-invertible that is it is no more positive definite and I get the
> > following error msg:
> >
> > "Error in solve.default(ac$hessian) : system is computationally singular:
> > reciprocal condition number = 6.89585e-21"
> > Thank you
> >
> > Following is the code I am running I would really appreciate your comments
> > and suggestions:
> >
> > #Start Code
> > #option to trace /recover error
> > #options(error = recover)
> >
> > #Sample Size
> > n<-30
> > mu<-5
> > size<- 2
> >
> > #true values of parameter d
> > d.true<-1+mu/size
> > d.true
> >
> > #true value of  zero inflation index phi= 1+log(d)/(1-d)
> > z.true<-1+(log(d.true)/(1-d.true))
> > z.true
> >
> > # Allocating space for simulation vectors and setting counters for simulation
> > counter<-0
> > iter<-10000
> > lower.d<-numeric(iter)
> > upper.d<-numeric(iter)
> >
> > #set.seed(987654321)
> >
> > #begining of simulation loop########
> >
> > for (i in 1:iter){
> > r.NB<-rnbinom(n, mu = mu, size = size)
> > y<-sort(r.NB)
> > iter.num<-i
> > print(y)
> > print(iter.num)
> > #empirical estimates or sample moments
> > xbar<-mean(y)
> > variance<-(sum((y-xbar)2))/length(y)
> > dbar<-variance/xbar
> > #sample estimate of proportion of zeros and zero inflation index
> > pbar<-length(y[y==0])/length(y)
> >
> > ### Simplified function #############################################
> >
> > NegBin<-function(th){
> > mu<-th[1]
> > d<-th[2]
> > n<-length(y)
> >
> > arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0)
> > #arg1<-n*mean(y)*log(mu)
> >
> > #arg2<-n*log(d)*((mean(y))+mu/(d-1))
> > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1),
> > 0.0000001))
> >
> > aa<-numeric(length(max(y)))
> > a<-numeric(length(y))
> > for (i in 1:n)
> > {
> > for (j in 1:y[i]){
> > aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0)
> > #aa[j]<-log(1+((j-1)*(d-1))/mu)
> > #print(aa[j])
> > }
> >
> > a[i]<-sum(aa)
> > #print(a[i])
> > }
> > a
> > arg3<-sum(a)
> > llh<-arg1+arg2+arg3
> > if(! is.finite(llh))
> > llh<-1e+20
> > -llh
> > }
> > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower=
> > c(0,1) )
> > ac
> > print(ac$hessian)
> > muhat<-ac$par[1]
> > dhat<-ac$par[2]
> > zhat<- 1+(log(dhat)/(1-dhat))
> > infor<-solve(ac$hessian)
> > var.dhat<-infor[2,2]
> > se.dhat<-sqrt(var.dhat)
> > var.muhat<-infor[1,1]
> > se.muhat<-sqrt(var.muhat)
> > var.func<-dhat*muhat
> > var.func
> > d.prime<-cbind(dhat,muhat)
> >
> > se.var.func<-d.prime%*%infor%*%t(d.prime)
> > se.var.func
> > lower.d[i]<-dhat-1.96*se.dhat
> > upper.d[i]<-dhat+1.96*se.dhat
> >
> > if(lower.d[i]  <= d.true & d.true<= upper.d[i])
> > counter <-counter+1
> > }
> > counter
> > covg.prob<-counter/iter
> > covg.prob
> >
> >
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Hessian matrix issue

Ravi Varadhan
In reply to this post by tzaihra
Yes, numDeriv::hessian is very accurate.  So, I normally take the output from the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to it.  I, then, compute the standard errors from it.

However, it is important to know that you have obtained a local optimum.  In fact, you need the gradient and Hessian to actually verify this - the first and second order KKT conditions.  However, this is tricky in *constrained* optimization problems.  If you have constraints, and at least one of the constraints is active at the best parameter estimate, then the gradient of the original objective function need not be close to zero and the hessian is also incorrect.

If you employ an augmented Lagrangian approach (see alabama::auglag), then you can obtain the correct gradient and Hessian at best parameter estimate. These gradients and Hessian correspond to the modified objective function that includes a Lagrangian term augmented by a quadratic penalty term.  They can be used to check the KKT conditions.

Here is a simple example:

require(alabama)

fr <- function(x) {   ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

grr <- function(x) { ## Gradient of 'fr'
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
}

p0 <- c(0, 0)

ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf), method="L-BFGS-B", hessian=TRUE)

grr(ans1$par)  # this will not be close to zero

ans1$hessian

# Using an augmented Lagrangian optimizer

hcon <- function(x) 0.9 - x[1]

hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2)

ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac)

ans2$gradient  # this will be close to zero

ans2$hessian

However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid.

Best,
Ravi.
-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University

Ph. (410) 502-2619
email: [hidden email]<mailto:[hidden email]>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hessian matrix issue

Ravi Varadhan
"However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid."

Let me rephrase this.  I think that as a measure of dispersion, the standard error obtained using the augmented Lagrangian algorithm is correct.  However, what is *not known* is the asymptotic distribution of the parameter estimates when constraints are active.  This is a "non-regular" situation where MLEs might have strange asymptotic behavior.  We cannot generally assume normality and use the standard error estimates to construct confidence intervals or calculate significance levels.

Best,
Ravi.
-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University

Ph. (410) 502-2619
email: [hidden email]<mailto:[hidden email]>

From: Ravi Varadhan
Sent: Wednesday, September 07, 2011 1:37 PM
To: '[hidden email]'; '[hidden email]'; '[hidden email]'
Subject: Hessian matrix issue

Yes, numDeriv::hessian is very accurate.  So, I normally take the output from the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to it.  I, then, compute the standard errors from it.

However, it is important to know that you have obtained a local optimum.  In fact, you need the gradient and Hessian to actually verify this - the first and second order KKT conditions.  However, this is tricky in *constrained* optimization problems.  If you have constraints, and at least one of the constraints is active at the best parameter estimate, then the gradient of the original objective function need not be close to zero and the hessian is also incorrect.

If you employ an augmented Lagrangian approach (see alabama::auglag), then you can obtain the correct gradient and Hessian at best parameter estimate. These gradients and Hessian correspond to the modified objective function that includes a Lagrangian term augmented by a quadratic penalty term.  They can be used to check the KKT conditions.

Here is a simple example:

require(alabama)

fr <- function(x) {   ## Rosenbrock Banana function
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

grr <- function(x) { ## Gradient of 'fr'
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
}

p0 <- c(0, 0)

ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf), method="L-BFGS-B", hessian=TRUE)

grr(ans1$par)  # this will not be close to zero

ans1$hessian

# Using an augmented Lagrangian optimizer

hcon <- function(x) 0.9 - x[1]

hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2)

ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac)

ans2$gradient  # this will be close to zero

ans2$hessian

However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid.

Best,
Ravi.
-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University

Ph. (410) 502-2619
email: [hidden email]<mailto:[hidden email]>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.