Re: Very slow optim(): solved

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim(): solved

R help mailing list-2
Dear list,
 I am using optim() to estimate over 60 thousans of parameters, and use the server to run the program.But it took me 5 hours and there was just no result coming out.How could I do to show some results that have been calculated by optim()?
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim(): solved

Jeff Newmiller
Calculate fewer of them?

If you don't setup your code to save intermediate results, then you cannot see intermediate results.

On March 11, 2021 8:32:17 PM PST, "毕芳妮 via R-help" <[hidden email]> wrote:

>Dear list,
>I am using optim() to estimate over 60 thousans of parameters, and use
>the server to run the program.But it took me 5 hours and there was just
>no result coming out.How could I do to show some results that have been
>calculated by optim()?
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim()

J C Nash
optim() has no method really suitable for very large numbers of parameters.

- CG as set up has never worked very well in any of its implementations
  (I wrote it, so am allowed to say so!). Rcgmin in optimx package works
  better, as does Rtnmin. Neither are really intended for 60K parameters
  however.

- optim::L-BFGS-B is reasonable, but my experience is that it still is not
  intended for more than a couple of hundred parameters.

JN



On 2021-03-12 9:31 p.m., Jeff Newmiller wrote:

> Calculate fewer of them?
>
> If you don't setup your code to save intermediate results, then you cannot see intermediate results.
>
> On March 11, 2021 8:32:17 PM PST, "毕芳妮 via R-help" <[hidden email]> wrote:
>> Dear list,
>> I am using optim() to estimate over 60 thousans of parameters, and use
>> the server to run the program.But it took me 5 hours and there was just
>> no result coming out.How could I do to show some results that have been
>> calculated by optim()?
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim()

Spencer Graves-4
TWO COMMENTS:


1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
paramters should be there as est$par.  See the documentation on "optim".


2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?


          I hope you will forgive me for being blunt (or perhaps bigoted), but
I'm skeptical about anyone wanting to use optim to estimate 60,000+
parameters.  With a situation like that, I think you would be wise to
recast the problem as one in which those 60,000+ parameters are sampled
from some hyperdistribution characterized by a small number of
hyperparameters.  Then write a model where your observations are sampled
from distribution(s) controlled by these random parameters.  Then
multiply the likelihood of the observations by the likelihood of the
hyperdistribution and integrate out the 60,000+ parameters, leaving only
a small number hyperparameters.


          When everything is linear and all the random variables / random
effects and observation errors follow normal distributions, this is the
classic linear, mixed-effects situation that is routinely handled well
for most such situations by the nlme package, documented with in
companion book Pinhiero and Bates (2000) Mixed-Effects Models in S and
S-PLUS (Springer).  If the models are nonlinear but with curvature that
is reasonably well behaved and the random variables / random effects and
observation errors are still normal, the nlme package and Pinhiero and
Bates still provide a great approach to most such situations, as far as
I know.  When the observations are non-normally distributed, then the
best software I know is the lme4 package.  I have not used it recently,
but it was written and being maintained by some of the leading experts
in this area as far as I know.


CONCLUSION:


          If you are short on time and "1" will work for you, do that.
Obviously, you will need to do some further analysis to understand the
60,000+ parameters you estimated -- which implies by itself that you
really should be using approach "2".  However, if I'm short on time and
need an answer, then I'd ignore "2" and hope to get something by
plotting and doing other things with the 60,000+ parameters that should
be in "est$par" if "optim" actually ended normally.


          However, if the problem is sufficiently important to justify more
work, then I'd want to cast it as some kind if mixed-effects model, per
"2" -- perhaps using an analysis of "1" as a first step towards "2".


          Hope this helps.
          Spencer


On 2021-03-12 20:53, J C Nash wrote:

> optim() has no method really suitable for very large numbers of parameters.
>
> - CG as set up has never worked very well in any of its implementations
>    (I wrote it, so am allowed to say so!). Rcgmin in optimx package works
>    better, as does Rtnmin. Neither are really intended for 60K parameters
>    however.
>
> - optim::L-BFGS-B is reasonable, but my experience is that it still is not
>    intended for more than a couple of hundred parameters.
>
> JN
>
>
>
> On 2021-03-12 9:31 p.m., Jeff Newmiller wrote:
>> Calculate fewer of them?
>>
>> If you don't setup your code to save intermediate results, then you cannot see intermediate results.
>>
>> On March 11, 2021 8:32:17 PM PST, "毕芳妮 via R-help" <[hidden email]> wrote:
>>> Dear list,
>>> I am using optim() to estimate over 60 thousans of parameters, and use
>>> the server to run the program.But it took me 5 hours and there was just
>>> no result coming out.How could I do to show some results that have been
>>> calculated by optim()?
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim()

Deepayan Sarkar
On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
<[hidden email]> wrote:

>
> TWO COMMENTS:
>
>
> 1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
> optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
> paramters should be there as est$par.  See the documentation on "optim".
>
>
> 2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?
>
>
>           I hope you will forgive me for being blunt (or perhaps bigoted), but
> I'm skeptical about anyone wanting to use optim to estimate 60,000+
> parameters.  With a situation like that, I think you would be wise to
> recast the problem as one in which those 60,000+ parameters are sampled
> from some hyperdistribution characterized by a small number of
> hyperparameters.  Then write a model where your observations are sampled
> from distribution(s) controlled by these random parameters.  Then
> multiply the likelihood of the observations by the likelihood of the
> hyperdistribution and integrate out the 60,000+ parameters, leaving only
> a small number hyperparameters.

Just a comment on this comment: I think it's perfectly reasonable to
optimize 60k+ parameters with conjugate gradient. CG was originally
developed to solve linear equations of the form Ax=b. If x was not
large in size, one would just use solve(A, b) instead of an iterative
method.

Use of CG is quite common in image processing. A relatively small
300x300 image will give you 90k parameters.

-Deepayan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim()

Spencer Graves-4
Hi, Deepayan:


On 2021-03-13 01:27, Deepayan Sarkar wrote:

> On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
> <[hidden email]> wrote:
>>
>> TWO COMMENTS:
>>
>>
>> 1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
>> optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
>> paramters should be there as est$par.  See the documentation on "optim".
>>
>>
>> 2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?
>>
>>
>>            I hope you will forgive me for being blunt (or perhaps bigoted), but
>> I'm skeptical about anyone wanting to use optim to estimate 60,000+
>> parameters.  With a situation like that, I think you would be wise to
>> recast the problem as one in which those 60,000+ parameters are sampled
>> from some hyperdistribution characterized by a small number of
>> hyperparameters.  Then write a model where your observations are sampled
>> from distribution(s) controlled by these random parameters.  Then
>> multiply the likelihood of the observations by the likelihood of the
>> hyperdistribution and integrate out the 60,000+ parameters, leaving only
>> a small number hyperparameters.
>
> Just a comment on this comment: I think it's perfectly reasonable to
> optimize 60k+ parameters with conjugate gradient. CG was originally
> developed to solve linear equations of the form Ax=b. If x was not
> large in size, one would just use solve(A, b) instead of an iterative
> method.
>
> Use of CG is quite common in image processing. A relatively small
> 300x300 image will give you 90k parameters.
>
> -Deepayan
>

          Thanks for this.


          If both A and b are 300x300, then x will also be 300x300.


          What do you do in this case if A is not square or even ill conditioned?


          Do you care if you get only one of many possible or approximate
solutions, and the algorithm spends most of its time making adjustments
in a singular subspace that would have best been avoided?


          Spencer

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim()

Deepayan Sarkar
On Sat, Mar 13, 2021 at 4:33 PM Spencer Graves
<[hidden email]> wrote:

>
> Hi, Deepayan:
>
>
> On 2021-03-13 01:27, Deepayan Sarkar wrote:
> > On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
> > <[hidden email]> wrote:
> >>
> >> TWO COMMENTS:
> >>
> >>
> >> 1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
> >> optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
> >> paramters should be there as est$par.  See the documentation on "optim".
> >>
> >>
> >> 2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?
> >>
> >>
> >>            I hope you will forgive me for being blunt (or perhaps bigoted), but
> >> I'm skeptical about anyone wanting to use optim to estimate 60,000+
> >> parameters.  With a situation like that, I think you would be wise to
> >> recast the problem as one in which those 60,000+ parameters are sampled
> >> from some hyperdistribution characterized by a small number of
> >> hyperparameters.  Then write a model where your observations are sampled
> >> from distribution(s) controlled by these random parameters.  Then
> >> multiply the likelihood of the observations by the likelihood of the
> >> hyperdistribution and integrate out the 60,000+ parameters, leaving only
> >> a small number hyperparameters.
> >
> > Just a comment on this comment: I think it's perfectly reasonable to
> > optimize 60k+ parameters with conjugate gradient. CG was originally
> > developed to solve linear equations of the form Ax=b. If x was not
> > large in size, one would just use solve(A, b) instead of an iterative
> > method.
> >
> > Use of CG is quite common in image processing. A relatively small
> > 300x300 image will give you 90k parameters.
> >
> > -Deepayan
> >
>
>           Thanks for this.
>
>
>           If both A and b are 300x300, then x will also be 300x300.

Sorry for being unclear: the images themselves (b or x) are viewed as
a vector, so would be 90000x1. A would be 90000x90000, so essentially
impossible to construct. CG can solve Ax=b as long as Ax can be
evaluated (for arbitrary x).

>           What do you do in this case if A is not square or even ill conditioned?

A has to be p.d.

>           Do you care if you get only one of many possible or approximate
> solutions, and the algorithm spends most of its time making adjustments
> in a singular subspace that would have best been avoided?

Well, in my experience, optim(method="CG") behaves quite badly if A is
not full-rank. I don't think other implementations will be any better,
but I am not sure.

If you are interested in practical uses of this, we can discuss more off-list.

-Deepayan


>           Spencer
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Very slow optim()

J C Nash
As per my post on this, it is important to distinguish between
"CG" as a general approach and optim::CG. The latter -- my algorithm 22
from Compact Numerical Methods for Computers in 1979 -- never worked
terribly well. But Rcgmin and Rtnmin from optimx often (but not always)
perform quite well.

There are some developments that I've poked at a few times. If the
optimization of very large numbers of parameters as a general (rather
than specific problem type), then this would be worth pursuing for R.
However, we need some representative, and easy to set up, test problems.
Otherwise we get this very hand-waving list discussion.

JN


On 2021-03-13 6:56 a.m., Deepayan Sarkar wrote:

> On Sat, Mar 13, 2021 at 4:33 PM Spencer Graves
> <[hidden email]> wrote:
>>
>> Hi, Deepayan:
>>
>>
>> On 2021-03-13 01:27, Deepayan Sarkar wrote:
>>> On Sat, Mar 13, 2021 at 10:08 AM Spencer Graves
>>> <[hidden email]> wrote:
>>>>
>>>> TWO COMMENTS:
>>>>
>>>>
>>>> 1.  DID YOU ASSIGN THE OUTPUT OF "optim" to an object, like "est <-
>>>> optim(...)"?  If yes and if "optim" terminated normally, the 60,000+
>>>> paramters should be there as est$par.  See the documentation on "optim".
>>>>
>>>>
>>>> 2.  WHAT PROBLEM ARE YOU TRYING TO SOLVE?
>>>>
>>>>
>>>>            I hope you will forgive me for being blunt (or perhaps bigoted), but
>>>> I'm skeptical about anyone wanting to use optim to estimate 60,000+
>>>> parameters.  With a situation like that, I think you would be wise to
>>>> recast the problem as one in which those 60,000+ parameters are sampled
>>>> from some hyperdistribution characterized by a small number of
>>>> hyperparameters.  Then write a model where your observations are sampled
>>>> from distribution(s) controlled by these random parameters.  Then
>>>> multiply the likelihood of the observations by the likelihood of the
>>>> hyperdistribution and integrate out the 60,000+ parameters, leaving only
>>>> a small number hyperparameters.
>>>
>>> Just a comment on this comment: I think it's perfectly reasonable to
>>> optimize 60k+ parameters with conjugate gradient. CG was originally
>>> developed to solve linear equations of the form Ax=b. If x was not
>>> large in size, one would just use solve(A, b) instead of an iterative
>>> method.
>>>
>>> Use of CG is quite common in image processing. A relatively small
>>> 300x300 image will give you 90k parameters.
>>>
>>> -Deepayan
>>>
>>
>>           Thanks for this.
>>
>>
>>           If both A and b are 300x300, then x will also be 300x300.
>
> Sorry for being unclear: the images themselves (b or x) are viewed as
> a vector, so would be 90000x1. A would be 90000x90000, so essentially
> impossible to construct. CG can solve Ax=b as long as Ax can be
> evaluated (for arbitrary x).
>
>>           What do you do in this case if A is not square or even ill conditioned?
>
> A has to be p.d.
>
>>           Do you care if you get only one of many possible or approximate
>> solutions, and the algorithm spends most of its time making adjustments
>> in a singular subspace that would have best been avoided?
>
> Well, in my experience, optim(method="CG") behaves quite badly if A is
> not full-rank. I don't think other implementations will be any better,
> but I am not sure.
>
> If you are interested in practical uses of this, we can discuss more off-list.
>
> -Deepayan
>
>
>>           Spencer
>>
>>
>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.