Nested AIC

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Nested AIC

Aaron MacNeil
Greetings,
I have recently come into some confusion over weather or not AIC  
results for comparing among models requires that they be nested.  
Reading Burnham & Anderson (2002) they are explicit that nested  
models are not required, but other respected statisticians have  
suggested that nesting is a pre-requisite for comparison.  Could  
anyone who feels strongly regarding either position post their  
arguments for or against nested models and AIC? This would assist me  
greatly in some analysis I am currently conducting.
Many thanks,

Aaron

<<
m aaron macneil

school of marine science
     and technology
university of newcastle
newcastle upon tyne, uk
ne1 7ru

m.a.macneil at ncl.ac.uk
 >>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested AIC

Ruben Roa
-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Aaron MacNeil
Sent: 20 February 2006 15:17
To: [hidden email]
Subject: [R] Nested AIC

Greetings,
I have recently come into some confusion over weather or not AIC  
results for comparing among models requires that they be nested.  
Reading Burnham & Anderson (2002) they are explicit that nested models are not required, but other respected statisticians have suggested that nesting is a pre-requisite for comparison.  Could anyone who feels strongly regarding either position post their arguments for or against nested models and AIC? This would assist me greatly in some analysis I am currently conducting.
Many thanks,

Aaron

----
Hi, Aaron, Burnham & Anderson are explicit but they do not go into any depth regarding this issue. Akaike's colleagues Sakamoto, Ishiguro, and Kitagawa (Akaike Information Criterion Statistics, 1986, KTK Scientific Publishers) do no either, deal with it directly, and the examples they present that I have examined (not even half of the total in the book), are all of nested models. However, by reading some of Akaike's papers and the book quoted above it does not appear to me that there is any restriction on the use of the AIC related to nestedness. In fact, the theory does not preclude the comparison of models with different *probability densities (or mass)* as long as you keep all constants (like 1/sqrt(2pi) in the normal) in the calculation.
Akaike (1973) wrote in the first sentence of his paper his general principle, which he called an extension of the maximum likelihood principle:
"Given a set of estimates theta_hat's of the vector of parameters theta of a probability distribution with density f(x|theta) we adopt as our final estimate the one which will give the maximum of the expected log-likelihood, which is by definition
E(log f(X|theta_hat))=E(INTEGRAL f(x|theta)log f(x|theta_hat)dx)
Where X is a random variable following the distribution with the density function f(x|theta) and is independent of theta_hat".
All subsequent derivations in the paper, like the choice of distance measure, class of estimates, and elimination of the true parameter value, revolve around this principle. Now, nestedness is a mathematical property of what Burnham & Anderson call "the structural model", whereas Akaike's principle only concerns the probabilistic model f(x|theta) where the structural model is embedded.
I reply to you even though I do not feel strongly about this issue and you asked for replies from people who feel strongly about this issue.
Ruben

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested AIC

Thomas Lumley

This might be a more suitable message for eg the stats-discuss mailing
list or one of the sci.stat.* newsgroups.

It is more complicated that it looks, partly because of the Anna Karenina
problem: all nested models are the same, but non-nested models can be
non-nested in different ways

Some notes:

1) Sometimes the AIC is clearly inappropriate: eg comparing the fit of a
Poisson regression to a least-squares linear regression for count data.
Here the likelihoods are not densities with respect to the same
measure, so the likelihood ratio is meaningless.  You could also argue
that the linear model isn't really being fitted by maximum likelihood.

2) You need to be careful when fitting models with different R functions,
since they may omit different constants in the likelihood.

3) Transformations of the outcome are a problem. You can frame this as a
mathematical problem or just note the difficulty of saying what you mean
when you decide that the multiplicative error in one model is smaller than
the additive error in another model.

4) If you have two least-squares linear regression models with the same
outcome variable and different predictors then the AIC is choosing based
on a consistent estimate of the mean squared prediction error, and in that
sense it is a valid way to choose the model that predicts best.  This may
or may not be the criterion you want, but if it isn't what you want then
AIC isn't going to help.

5) If you have a large number of models then (nested or not) there is no
guarantee that the estimate of prediction error is *uniformly* consistent,
so the arguments behind AIC do not necessarily work.

        -thomas


On Tue, 21 Feb 2006, Ruben Roa wrote:

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Aaron MacNeil
> Sent: 20 February 2006 15:17
> To: [hidden email]
> Subject: [R] Nested AIC
>
> Greetings,
> I have recently come into some confusion over weather or not AIC
> results for comparing among models requires that they be nested.
> Reading Burnham & Anderson (2002) they are explicit that nested models are not required, but other respected statisticians have suggested that nesting is a pre-requisite for comparison.  Could anyone who feels strongly regarding either position post their arguments for or against nested models and AIC? This would assist me greatly in some analysis I am currently conducting.
> Many thanks,
>
> Aaron
>
> ----
> Hi, Aaron, Burnham & Anderson are explicit but they do not go into any depth regarding this issue. Akaike's colleagues Sakamoto, Ishiguro, and Kitagawa (Akaike Information Criterion Statistics, 1986, KTK Scientific Publishers) do no either, deal with it directly, and the examples they present that I have examined (not even half of the total in the book), are all of nested models. However, by reading some of Akaike's papers and the book quoted above it does not appear to me that there is any restriction on the use of the AIC related to nestedness. In fact, the theory does not preclude the comparison of models with different *probability densities (or mass)* as long as you keep all constants (like 1/sqrt(2pi) in the normal) in the calculation.
> Akaike (1973) wrote in the first sentence of his paper his general principle, which he called an extension of the maximum likelihood principle:
> "Given a set of estimates theta_hat's of the vector of parameters theta of a probability distribution with density f(x|theta) we adopt as our final estimate the one which will give the maximum of the expected log-likelihood, which is by definition
> E(log f(X|theta_hat))=E(INTEGRAL f(x|theta)log f(x|theta_hat)dx)
> Where X is a random variable following the distribution with the density function f(x|theta) and is independent of theta_hat".
> All subsequent derivations in the paper, like the choice of distance measure, class of estimates, and elimination of the true parameter value, revolve around this principle. Now, nestedness is a mathematical property of what Burnham & Anderson call "the structural model", whereas Akaike's principle only concerns the probabilistic model f(x|theta) where the structural model is embedded.
> I reply to you even though I do not feel strongly about this issue and you asked for replies from people who feel strongly about this issue.
> Ruben
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested AIC

Prof Brian Ripley
On Tue, 21 Feb 2006, Thomas Lumley wrote:

>
> This might be a more suitable message for eg the stats-discuss mailing
> list or one of the sci.stat.* newsgroups.
>
> It is more complicated that it looks, partly because of the Anna Karenina
> problem: all nested models are the same, but non-nested models can be
> non-nested in different ways

And I am sure Akaike appreciated that, which may be why he only (AFAIK)
derived a theoretical basis for AIC under strictly limited conditions
including nesting.

> Some notes:
>
> 1) Sometimes the AIC is clearly inappropriate: eg comparing the fit of a
> Poisson regression to a least-squares linear regression for count data.
> Here the likelihoods are not densities with respect to the same
> measure, so the likelihood ratio is meaningless.  You could also argue
> that the linear model isn't really being fitted by maximum likelihood.
>
> 2) You need to be careful when fitting models with different R functions,
> since they may omit different constants in the likelihood.
>
> 3) Transformations of the outcome are a problem. You can frame this as a
> mathematical problem or just note the difficulty of saying what you mean
> when you decide that the multiplicative error in one model is smaller than
> the additive error in another model.
>
> 4) If you have two least-squares linear regression models with the same
> outcome variable and different predictors then the AIC is choosing based
> on a consistent estimate of the mean squared prediction error, and in that
> sense it is a valid way to choose the model that predicts best.  This may
> or may not be the criterion you want, but if it isn't what you want then
> AIC isn't going to help.
>
> 5) If you have a large number of models then (nested or not) there is no
> guarantee that the estimate of prediction error is *uniformly* consistent,
> so the arguments behind AIC do not necessarily work.

(That only makes sense if the model class changes with 'n', suitably
defined.  You do get uniform consistency over a finite class of models,
one of Akaike (1973)'s conditions.  However, to use AIC you don't just
need a consistent estimator, but to worry about the consistency of
the O(1/n) term in the mean since AIC/n is effectively s^2 + 2p/n.)

One other note.

AIC/n is a consistent estimator but only if the model is true, and one
with a lot of sampling error.  Differences in AIC are much more precisely
estimated for a pair of nested models than for some non-nested pairs.  So
sampling error can make comparisons of AIC meaningless unless the
differences are large (and 'large' grows with 'n' for some appropriate
'n').

A recent talk of mine

  http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf

may be illuminating.  There is a published paper version.

>
> On Tue, 21 Feb 2006, Ruben Roa wrote:
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of Aaron MacNeil
>> Sent: 20 February 2006 15:17
>> To: [hidden email]
>> Subject: [R] Nested AIC
>>
>> Greetings,
>> I have recently come into some confusion over weather or not AIC
>> results for comparing among models requires that they be nested.
>> Reading Burnham & Anderson (2002) they are explicit that nested models are not required, but other respected statisticians have suggested that nesting is a pre-requisite for comparison.  Could anyone who feels strongly regarding either position post their arguments for or against nested models and AIC? This would assist me greatly in some analysis I am currently conducting.
>> Many thanks,
>>
>> Aaron
>>
>> ----
>> Hi, Aaron, Burnham & Anderson are explicit but they do not go into any depth regarding this issue. Akaike's colleagues Sakamoto, Ishiguro, and Kitagawa (Akaike Information Criterion Statistics, 1986, KTK Scientific Publishers) do no either, deal with it directly, and the examples they present that I have examined (not even half of the total in the book), are all of nested models. However, by reading some of Akaike's papers and the book quoted above it does not appear to me that there is any restriction on the use of the AIC related to nestedness. In fact, the theory does not preclude the comparison of models with different *probability densities (or mass)* as long as you keep all constants (like 1/sqrt(2pi) in the normal) in the calculation.
>> Akaike (1973) wrote in the first sentence of his paper his general principle, which he called an extension of the maximum likelihood principle:
>> "Given a set of estimates theta_hat's of the vector of parameters theta of a probability distribution with density f(x|theta) we adopt as our final estimate the one which will give the maximum of the expected log-likelihood, which is by definition
>> E(log f(X|theta_hat))=E(INTEGRAL f(x|theta)log f(x|theta_hat)dx)
>> Where X is a random variable following the distribution with the density function f(x|theta) and is independent of theta_hat".
>> All subsequent derivations in the paper, like the choice of distance measure, class of estimates, and elimination of the true parameter value, revolve around this principle. Now, nestedness is a mathematical property of what Burnham & Anderson call "the structural model", whereas Akaike's principle only concerns the probabilistic model f(x|theta) where the structural model is embedded.
>> I reply to you even though I do not feel strongly about this issue and you asked for replies from people who feel strongly about this issue.
>> Ruben
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
>
> Thomas Lumley Assoc. Professor, Biostatistics
> [hidden email] University of Washington, Seattle
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Selecting amongst large classes of models (Was: Nested AIC)

Andrew Robinson-6
Professor Ripley,

On Tue, Feb 21, 2006 at 05:36:15PM +0000, Prof Brian Ripley wrote:
>
> A recent talk of mine
>
>   http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf
>
> may be illuminating.  There is a published paper version.
>

Would you mind providing a citation for that published paper version?
I do not find details on your website, and Current Contents does not
provide any clues.

Cheers,

Andrew
--
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: [hidden email]         http://www.ms.unimelb.edu.au

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Selecting amongst large classes of models (Was: Nested AIC)

Prof Brian Ripley
On Wed, 22 Feb 2006, Andrew Robinson wrote:

> Professor Ripley,
>
> On Tue, Feb 21, 2006 at 05:36:15PM +0000, Prof Brian Ripley wrote:
>>
>> A recent talk of mine
>>
>>   http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf
>>
>> may be illuminating.  There is a published paper version.
>>
>
> Would you mind providing a citation for that published paper version?
> I do not find details on your website, and Current Contents does not
> provide any clues.

Ripley, B.D. (2004)
`Selecting amongst large classes of models'
In `Methods and Models in Statistics'
eds Adams, N., Crowder, M., Hand, D.J. and  Stephens, D. Imperial College
Press, pp. 155-170.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html