Quantcast

How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Jhope
Hi R-listers,

I have developed 47 GLM models with different combinations of interactions from 1 variable to 5 variables. I have manually made each model separately and put them into individual tables (organized by the number of variables) showing the AIC score. I want to compare all of these models.

1) What is the best way to compare various models with unique combinations and different number of variables?
2) I am trying to develop the most simplest model ideally. Even though adding another variable would lower the AIC, how do I interpret it is worth it to include another variable in the model? How do I know when to stop?

Definitions of Variables:
HTL - distance to high tide line (continuous)
Veg - distance to vegetation
Aeventexhumed - Event of exhumation
Sector - number measurements along the beach
Rayos - major sections of beach (grouped sectors)
TotalEggs - nest egg density

Example of how all models were created:
Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, data=data.to.analyze, family=binomial)
Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, data.to.analyze)
Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial)
Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)

Please advise, thanks!
J
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Milan Bouchet-Valat
Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> Hi R-listers,
>
> I have developed 47 GLM models with different combinations of interactions
> from 1 variable to 5 variables. I have manually made each model separately
> and put them into individual tables (organized by the number of variables)
> showing the AIC score. I want to compare all of these models.
>
> 1) What is the best way to compare various models with unique combinations
> and different number of variables?
See ?step or ?stepAIC (from package MASS) if you want an automated way
of doing this.

> 2) I am trying to develop the most simplest model ideally. Even though
> adding another variable would lower the AIC, how do I interpret it is worth
> it to include another variable in the model? How do I know when to stop?
This is a general statistical question, not specific to R. As a general
rule, if adding a variable lowers the AIC by a significant margin, then
it's worth including it. You should only stop when a variable increases
the AIC. But this is assuming you consider it a good indicator and you
know what you're doing. There's plenty of literature on this subject.

> Definitions of Variables:
> HTL - distance to high tide line (continuous)
> Veg - distance to vegetation
> Aeventexhumed - Event of exhumation
> Sector - number measurements along the beach
> Rayos - major sections of beach (grouped sectors)
> TotalEggs - nest egg density
>
> Example of how all models were created:
> Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> binomial, data.to.analyze)
> Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
> data.to.analyze, family = binomial)
> Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
> HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
To extract the AICs of all these models, it's easier to put them in a
list and get their AICs like this:
m <- list()
m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
binomial, data.to.analyze)

sapply(m, extractAIC)


Cheers

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa

A more 'manual' way to do it is this.
If you have all your models named properly and well organized, say Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one component named 'aic') then something like this:

x <- matrix(0,1081,3)
x[,1:2] <- t(combn(47,2))
for(i in 1:n){x[,3] <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

will calculate all the 1081 AIC differences between paired comparisons of the 47 models. It may not be pretty but works for me.

An AIC difference equal or less than 2 is a tie, anything higher is evidence for the model with the less AIC (Sakamoto et al., Akaike Information Criterion Statistics, KTK Scientific Publishers, Tokyo).

HTH

Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN


-----Original Message-----
From: [hidden email] on behalf of Milan Bouchet-Valat
Sent: Wed 1/25/2012 10:32 AM
To: Jhope
Cc: [hidden email]
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
 
Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> Hi R-listers,
>
> I have developed 47 GLM models with different combinations of interactions
> from 1 variable to 5 variables. I have manually made each model separately
> and put them into individual tables (organized by the number of variables)
> showing the AIC score. I want to compare all of these models.
>
> 1) What is the best way to compare various models with unique combinations
> and different number of variables?
See ?step or ?stepAIC (from package MASS) if you want an automated way
of doing this.

> 2) I am trying to develop the most simplest model ideally. Even though
> adding another variable would lower the AIC, how do I interpret it is worth
> it to include another variable in the model? How do I know when to stop?
This is a general statistical question, not specific to R. As a general
rule, if adding a variable lowers the AIC by a significant margin, then
it's worth including it. You should only stop when a variable increases
the AIC. But this is assuming you consider it a good indicator and you
know what you're doing. There's plenty of literature on this subject.

> Definitions of Variables:
> HTL - distance to high tide line (continuous)
> Veg - distance to vegetation
> Aeventexhumed - Event of exhumation
> Sector - number measurements along the beach
> Rayos - major sections of beach (grouped sectors)
> TotalEggs - nest egg density
>
> Example of how all models were created:
> Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> binomial, data.to.analyze)
> Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
> data.to.analyze, family = binomial)
> Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
> HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
To extract the AICs of all these models, it's easier to put them in a
list and get their AICs like this:
m <- list()
m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
binomial, data.to.analyze)

sapply(m, extractAIC)


Cheers

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Frank Harrell
In reply to this post by Jhope
If you are trying to destroy all aspects of statistical inference this is a good way to go.  This is also a good way to ignore the subject matter in driving model selection.
Frank
Jhope wrote
Hi R-listers,

I have developed 47 GLM models with different combinations of interactions from 1 variable to 5 variables. I have manually made each model separately and put them into individual tables (organized by the number of variables) showing the AIC score. I want to compare all of these models.

1) What is the best way to compare various models with unique combinations and different number of variables?
2) I am trying to develop the most simplest model ideally. Even though adding another variable would lower the AIC, how do I interpret it is worth it to include another variable in the model? How do I know when to stop?

Definitions of Variables:
HTL - distance to high tide line (continuous)
Veg - distance to vegetation
Aeventexhumed - Event of exhumation
Sector - number measurements along the beach
Rayos - major sections of beach (grouped sectors)
TotalEggs - nest egg density

Example of how all models were created:
Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, data=data.to.analyze, family=binomial)
Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, data.to.analyze)
Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial)
Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)

Please advise, thanks!
J
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

bbolker
In reply to this post by Rubén Roa
Rubén Roa <rroa <at> azti.es> writes:

> A more 'manual' way to do it is this.

> If you have all your models named properly and well organized, say
> Model1, Model2, ..., Model 47, with a slot with the AIC (glm
> produces a list with one component named 'aic') then something like
> this:
 
> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

 
> will calculate all the 1081 AIC differences between paired
> comparisons of the 47 models. It may not be pretty but works for me.

 
> An AIC difference equal or less than 2 is a tie, anything higher is
> evidence for the model with the less AIC (Sakamoto et al., Akaike
> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
 

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of
the big advantages of information-theoretic approaches is that you can
just list the models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for
automating this sort of thing.  There is an also an AICtab function in
the emdbook package.

(3) If you're really just interested in picking the single model with
the best expected predictive capability (and all of the other
assumptions of the AIC approach are met -- very large data set, good
fit to the data, etc.), then just picking the model with the best AIC
will work.  It's when you start to make inferences on the individual
parameters within that model, without taking account of the model
selection process, that you run into trouble.  If you are really
concerned about good predictions then it may be better to do
multi-model averaging *or* use some form of shrinkage estimator.

(4) The "delta-AIC<2 means pretty much the same" rule of thumb is
fine, but it drives me nuts when people use it as a
quasi-significance-testing rule, as in "the simpler model is adequate
if dAIC<2".  If you're going to work in the model selection arena,
then don't follow hypothesis-testing procedures!  A smaller AIC means
lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority)
opinion that AIC is *not* appropriate for comparing non-nested models
(e.g.  <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).


>
> -----Original Message-----
> From: r-help-bounces <at> r-project.org on behalf of Milan Bouchet-Valat
> Sent: Wed 1/25/2012 10:32 AM
> To: Jhope
> Cc: r-help <at> r-project.org

> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
>  interactions and unique combinations?

 

> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> > Hi R-listers,
> >
> > I have developed 47 GLM models with different combinations of interactions
> > from 1 variable to 5 variables. I have manually made each model separately
> > and put them into individual tables (organized by the number of variables)
> > showing the AIC score. I want to compare all of these models.
> >
> > 1) What is the best way to compare various models with unique combinations
> > and different number of variables?
> See ?step or ?stepAIC (from package MASS) if you want an automated way
> of doing this.
>
> > 2) I am trying to develop the most simplest model ideally. Even though
> > adding another variable would lower the AIC,

  No, not necessarily.

> how do I interpret it is worth
> > it to include another variable in the model? How do I know when to stop?

  When the AIC stops decreasing.

> This is a general statistical question, not specific to R. As a general
> rule, if adding a variable lowers the AIC by a significant margin, then
> it's worth including it.

  If the variable lowers the AIC *at all* then your best estimate is that
you should include it.

> You should only stop when a variable increases
> the AIC. But this is assuming you consider it a good indicator and you
> know what you're doing. There's plenty of literature on this subject.
>
> > Definitions of Variables:
> > HTL - distance to high tide line (continuous)
> > Veg - distance to vegetation
> > Aeventexhumed - Event of exhumation
> > Sector - number measurements along the beach
> > Rayos - major sections of beach (grouped sectors)
> > TotalEggs - nest egg density
> >
> > Example of how all models were created:
> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> > data=data.to.analyze, family=binomial)
> > Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> > binomial, data.to.analyze)
> > Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
> > data.to.analyze, family = binomial)
> > Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
> To extract the AICs of all these models, it's easier to put them in a
> list and get their AICs like this:
> m <- list()
> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> binomial, data.to.analyze)
>
> sapply(m, extractAIC)
>
> Cheers
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa
I think we have gone through this before.
First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding.
Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC.
Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures.
Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability.

Rubén

-----Mensaje original-----
De: [hidden email] [mailto:[hidden email]] En nombre de Ben Bolker
Enviado el: miércoles, 25 de enero de 2012 15:41
Para: [hidden email]
Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa <rroa <at> azti.es> writes:

> A more 'manual' way to do it is this.

> If you have all your models named properly and well organized, say
> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces
> a list with one component named 'aic') then something like
> this:
 
> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

 
> will calculate all the 1081 AIC differences between paired comparisons
> of the 47 models. It may not be pretty but works for me.

 
> An AIC difference equal or less than 2 is a tie, anything higher is
> evidence for the model with the less AIC (Sakamoto et al., Akaike
> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
 

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of the big advantages of information-theoretic approaches is that you can just list the models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for automating this sort of thing.  There is an also an AICtab function in the emdbook package.

(3) If you're really just interested in picking the single model with the best expected predictive capability (and all of the other assumptions of the AIC approach are met -- very large data set, good fit to the data, etc.), then just picking the model with the best AIC will work.  It's when you start to make inferences on the individual parameters within that model, without taking account of the model selection process, that you run into trouble.  If you are really concerned about good predictions then it may be better to do multi-model averaging *or* use some form of shrinkage estimator.

(4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine, but it drives me nuts when people use it as a quasi-significance-testing rule, as in "the simpler model is adequate if dAIC<2".  If you're going to work in the model selection arena, then don't follow hypothesis-testing procedures!  A smaller AIC means lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority) opinion that AIC is *not* appropriate for comparing non-nested models (e.g.  <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).


>
> -----Original Message-----
> From: r-help-bounces <at> r-project.org on behalf of Milan
> Bouchet-Valat
> Sent: Wed 1/25/2012 10:32 AM
> To: Jhope
> Cc: r-help <at> r-project.org

> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
> interactions and unique combinations?

 

> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> > Hi R-listers,
> >
> > I have developed 47 GLM models with different combinations of
> > interactions from 1 variable to 5 variables. I have manually made
> > each model separately and put them into individual tables (organized
> > by the number of variables) showing the AIC score. I want to compare all of these models.
> >
> > 1) What is the best way to compare various models with unique
> > combinations and different number of variables?
> See ?step or ?stepAIC (from package MASS) if you want an automated way
> of doing this.
>
> > 2) I am trying to develop the most simplest model ideally. Even
> > though adding another variable would lower the AIC,

  No, not necessarily.

> how do I interpret it is worth
> > it to include another variable in the model? How do I know when to stop?

  When the AIC stops decreasing.

> This is a general statistical question, not specific to R. As a
> general rule, if adding a variable lowers the AIC by a significant
> margin, then it's worth including it.

  If the variable lowers the AIC *at all* then your best estimate is that you should include it.

> You should only stop when a variable increases the AIC. But this is
> assuming you consider it a good indicator and you know what you're
> doing. There's plenty of literature on this subject.
>
> > Definitions of Variables:
> > HTL - distance to high tide line (continuous) Veg - distance to
> > vegetation Aeventexhumed - Event of exhumation Sector - number
> > measurements along the beach Rayos - major sections of beach
> > (grouped sectors) TotalEggs - nest egg density
> >
> > Example of how all models were created:
> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> > data=data.to.analyze, family=binomial) Model7.glm <-
> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial,
> > data.to.analyze) Model21.glm <- glm(cbind(Shells, TotalEggs-Shells)
> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm
> > <- glm(cbind(Shells, TotalEggs-Shells) ~
> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
> To extract the AICs of all these models, it's easier to put them in a
> list and get their AICs like this:
> m <- list()
> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> binomial, data.to.analyze)
>
> sapply(m, extractAIC)
>
> Cheers
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Jhope
This post was updated on .
In reply to this post by Rubén Roa
Thank you everyone for your dedication to improving 'R' and its function to statistical analysis and your comments.

I have now 48 models (unique combinations of 1 to 5 variables) and have put them into a list and gained the results for all models. Below is a sample of my script & results:

m$model48 <- Model48.glm
> sapply(m, extractAIC)

      model47  model48
[1,]    8.000   46.000
[2,] 6789.863 3549.543

Q
1) How do you interpret these results? What is 1 and 2?
2) I have been suggested to try a quasibinomial, once responses are fixed. Is this necessary? If so is there a way I can do this by considering all these models ?
3) Then gaussian?

Much appreciated!
Saludos, J
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

How do I compare a multiple staged response with multivariables to a Development Index?

Jhope
This post was updated on .
Hi R- listeners,

I should add that I would like also to compare my field data to an index model. The index was created by using the following script:

devel.index <- function(values, weights=c(1, 2, 3, 4, 5, 6)) {
  foo <- values*weights
  return(apply(foo, 1, sum) / apply(values, 1, sum))
}

Background:
Surveyed turtle egg embryos have been categorized into 6 stages of development in the field. The stages in the field data are named ST0, ST1, ST2, ST3, ST4, Shells. from the data = data.to.analyze.

Q?
1. What is the best way to analyze the field data on embryonic development of 6 stages?
2. Doing this while considering, testing the variables: Veg, HTL, Aeventexhumed, Sector, Rayos, TotalEggs?
3. And then compare the results to a development index.

The goal is to determine hatching success in various areas of the beach. And try to create a development index of these microenvironments. Seasonality would play a key role. Is this possible?

Many thanks!
Saludos, J

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare a multiple staged response with multivariables to a Development Index?

Michael Weylandt
This might get more traction on the R-SIG-Ecology lists.

And best of luck to you; I quite like turtles.

Michael

On Thu, Jan 26, 2012 at 4:37 AM, Jhope <[hidden email]> wrote:

> Hi R- listeners,
>
> I should add that I would like also to compare my field data to an index
> model. The index was created by using the following script:
>
> devel.index <- function(values, weights=c(1, 2, 3, 4, 5, 6)) {
>  foo <- values*weights
>  return(apply(foo, 1, sum) / apply(values, 1, sum))
> }
>
> Background:
> Surveyed turtle egg embryos have been categorized into 6 stages of
> development in the field. The stages in the field data are named ST0, ST1,
> ST2, ST3, ST4, Shells. from the data = data.to.analyze.
>
> Q?
> 1. What is the best way to analyze the field data on embryonic development
> of 6 stages?
> 2. Doing this while considering, testing the variables: Veg, HTL,
> Aeventexhumed, Sector, Rayos, TotalEggs?
> 3. And then compare the results to a development index.
>
> The goal is to determine hatching success in various areas of the beach. And
> try to create a development index of these microenvironments. Seasonality
> would play a key role. Is this possible?
>
> Many thanks!
> Saludos, Jean
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Frank Harrell
In reply to this post by Rubén Roa
To pretend that AIC solves this problem is to ignore that AIC is just a restatement of P-values.
Frank
Rubén Roa wrote
I think we have gone through this before.
First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding.
Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC.
Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures.
Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability.

Rubén

-----Mensaje original-----
De: [hidden email] [mailto:[hidden email]] En nombre de Ben Bolker
Enviado el: miércoles, 25 de enero de 2012 15:41
Para: [hidden email]
Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa <rroa <at> azti.es> writes:

> A more 'manual' way to do it is this.

> If you have all your models named properly and well organized, say
> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces
> a list with one component named 'aic') then something like
> this:
 
> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

 
> will calculate all the 1081 AIC differences between paired comparisons
> of the 47 models. It may not be pretty but works for me.

 
> An AIC difference equal or less than 2 is a tie, anything higher is
> evidence for the model with the less AIC (Sakamoto et al., Akaike
> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
 

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of the big advantages of information-theoretic approaches is that you can just list the models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for automating this sort of thing.  There is an also an AICtab function in the emdbook package.

(3) If you're really just interested in picking the single model with the best expected predictive capability (and all of the other assumptions of the AIC approach are met -- very large data set, good fit to the data, etc.), then just picking the model with the best AIC will work.  It's when you start to make inferences on the individual parameters within that model, without taking account of the model selection process, that you run into trouble.  If you are really concerned about good predictions then it may be better to do multi-model averaging *or* use some form of shrinkage estimator.

(4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine, but it drives me nuts when people use it as a quasi-significance-testing rule, as in "the simpler model is adequate if dAIC<2".  If you're going to work in the model selection arena, then don't follow hypothesis-testing procedures!  A smaller AIC means lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority) opinion that AIC is *not* appropriate for comparing non-nested models (e.g.  <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).


>
> -----Original Message-----
> From: r-help-bounces <at> r-project.org on behalf of Milan
> Bouchet-Valat
> Sent: Wed 1/25/2012 10:32 AM
> To: Jhope
> Cc: r-help <at> r-project.org

> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
> interactions and unique combinations?

 
> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> > Hi R-listers,
> >
> > I have developed 47 GLM models with different combinations of
> > interactions from 1 variable to 5 variables. I have manually made
> > each model separately and put them into individual tables (organized
> > by the number of variables) showing the AIC score. I want to compare all of these models.
> >
> > 1) What is the best way to compare various models with unique
> > combinations and different number of variables?
> See ?step or ?stepAIC (from package MASS) if you want an automated way
> of doing this.
>
> > 2) I am trying to develop the most simplest model ideally. Even
> > though adding another variable would lower the AIC,

  No, not necessarily.

> how do I interpret it is worth
> > it to include another variable in the model? How do I know when to stop?

  When the AIC stops decreasing.

> This is a general statistical question, not specific to R. As a
> general rule, if adding a variable lowers the AIC by a significant
> margin, then it's worth including it.

  If the variable lowers the AIC *at all* then your best estimate is that you should include it.

> You should only stop when a variable increases the AIC. But this is
> assuming you consider it a good indicator and you know what you're
> doing. There's plenty of literature on this subject.
>
> > Definitions of Variables:
> > HTL - distance to high tide line (continuous) Veg - distance to
> > vegetation Aeventexhumed - Event of exhumation Sector - number
> > measurements along the beach Rayos - major sections of beach
> > (grouped sectors) TotalEggs - nest egg density
> >
> > Example of how all models were created:
> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> > data=data.to.analyze, family=binomial) Model7.glm <-
> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial,
> > data.to.analyze) Model21.glm <- glm(cbind(Shells, TotalEggs-Shells)
> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm
> > <- glm(cbind(Shells, TotalEggs-Shells) ~
> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
> To extract the AICs of all these models, it's easier to put them in a
> list and get their AICs like this:
> m <- list()
> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> binomial, data.to.analyze)
>
> sapply(m, extractAIC)
>
> Cheers
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Jhope
In reply to this post by bbolker
I ask the question about when to stop adding another variable even though it lowers the AIC because each time I add a variable the AIC is lower. How do I know when the model is a good fit? When to stop adding variables, keeping the model simple?

Thanks, J
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Bert Gunter
Simple question. 8 million pages in the statistical literature of
answers. What, indeed, is the secret to life?

Post on a statistical help list (e.g. stats.stackexchange.com). This
has almost nothing to do with R. Be prepared for an onslaught of often
conflicting "wisdom."

-- Bert

On Thu, Jan 26, 2012 at 1:25 PM, Jhope <[hidden email]> wrote:

> I ask the question about when to stop adding another variable even though it
> lowers the AIC because each time I add a variable the AIC is lower. How do I
> know when the model is a good fit? When to stop adding variables, keeping
> the model simple?
>
> Thanks, J
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa
In reply to this post by Rubén Roa
-----Mensaje original-----
De: Bert Gunter [mailto:[hidden email]]
Enviado el: jueves, 26 de enero de 2012 21:20
Para: Rubén Roa
CC: Ben Bolker; Frank Harrell
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <[hidden email]> wrote:
> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC.
> Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates.

-- And the mathematical basis for this claim is ....  ??

--
Ruben: In my area of work (building/testing/applying mechanistic nonlinear models of natural and economic phenomena) the issue of numerical optimization is a very serious one. It is often the case that a really good looking model does not converge properly (that's why ADMB is so popular among us). So if the AIC is inconclusive, but one AIC-tied model yields reasonably looking standard errors and low pairwise parameter estimates correlations, while the other wasn´t even able to produce a positive definite Hessian matrix (though it was able to maximize the log-likelihood), I think I have good reasons to select the properly converged model. I assume here that the lack of convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect.
---
Ruben: For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability.

-- This is a religious, not a scientific statement, and has no place in any scientific discussion.

--
Ruben: Seriously, there is a wide and open place in scientific discussion for mechanistic model-building. I expect the models I built to be more than able predictors, I want them to capture some aspect of nature, to teach me something about nature, so I refrain from model averaging, which is an open admission that you don't care too much about what's really going on.

-- The belief that certain data analysis practices -- standard or not -- somehow can be trusted to yield reliable scientific results in the face of clear theoretical (mathematical )and practical results to the contrary, while widespread, impedes and often thwarts the progress of science, Evidence-based medicine and clinical trials came about for a reason. I would encourage you to reexamine the basis of your scientific practice and the role that "magical thinking" plays in it.

Best,
Bert

--
Ruben: All right Bert. I often re-examine the basis of my scientific praxis but less often than I did before, I have to confess. I like to think it is because I am converging on the right praxis so there are less issues to re-examine. But this problem of model selection is a tough one. Being a likelihoodist in inference naturally leads you to AIC-based model selection, Jim Lindsey being influent too. Wanting that your models say some something about nature is another strong conditioning factor. Put this two together and it becomes hard to do model-averaging. And it has nothing to do with religion (yuck!).

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Frank Harrell
Ruben, I'm not sure you are understanding the ramifications of what Bert said.  In addition you are making several assumptions implicitly:

1. model selection is needed (vs. fitting the full model and using shrinkage)
2. model selection works in the absence of shrinkage
3. model selection can find the "right" model and the features selected would be the same features selected if the data were slightly perturbed or a new sample taken
4. AIC tells you something that P-values don't (unless using structured multiple degree of freedom tests)
5. parsimony is good

None of these assumptions is true.  Model selection without shrinkage (penalization) seems to offer benefits but this is largely a mirage.

Frank

Rubén Roa wrote
-----Mensaje original-----
De: Bert Gunter [mailto:[hidden email]]
Enviado el: jueves, 26 de enero de 2012 21:20
Para: Rubén Roa
CC: Ben Bolker; Frank Harrell
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <[hidden email]> wrote:
> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC.
> Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates.

-- And the mathematical basis for this claim is ....  ??

--
Ruben: In my area of work (building/testing/applying mechanistic nonlinear models of natural and economic phenomena) the issue of numerical optimization is a very serious one. It is often the case that a really good looking model does not converge properly (that's why ADMB is so popular among us). So if the AIC is inconclusive, but one AIC-tied model yields reasonably looking standard errors and low pairwise parameter estimates correlations, while the other wasn´t even able to produce a positive definite Hessian matrix (though it was able to maximize the log-likelihood), I think I have good reasons to select the properly converged model. I assume here that the lack of convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect.
---
Ruben: For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability.

-- This is a religious, not a scientific statement, and has no place in any scientific discussion.

--
Ruben: Seriously, there is a wide and open place in scientific discussion for mechanistic model-building. I expect the models I built to be more than able predictors, I want them to capture some aspect of nature, to teach me something about nature, so I refrain from model averaging, which is an open admission that you don't care too much about what's really going on.

-- The belief that certain data analysis practices -- standard or not -- somehow can be trusted to yield reliable scientific results in the face of clear theoretical (mathematical )and practical results to the contrary, while widespread, impedes and often thwarts the progress of science, Evidence-based medicine and clinical trials came about for a reason. I would encourage you to reexamine the basis of your scientific practice and the role that "magical thinking" plays in it.

Best,
Bert

--
Ruben: All right Bert. I often re-examine the basis of my scientific praxis but less often than I did before, I have to confess. I like to think it is because I am converging on the right praxis so there are less issues to re-examine. But this problem of model selection is a tough one. Being a likelihoodist in inference naturally leads you to AIC-based model selection, Jim Lindsey being influent too. Wanting that your models say some something about nature is another strong conditioning factor. Put this two together and it becomes hard to do model-averaging. And it has nothing to do with religion (yuck!).

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa
-----Mensaje original-----
De: [hidden email] [mailto:[hidden email]] En nombre de Frank Harrell
Enviado el: viernes, 27 de enero de 2012 14:28
Para: [hidden email]
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Ruben, I'm not sure you are understanding the ramifications of what Bert said.  In addition you are making several assumptions implicitly:

--
Ruben: Frank, I guess we are going nowhere now.
But thanks anyways. See below if you want.

1. model selection is needed (vs. fitting the full model and using shrinkage)
Ruben: Nonlinear mechanistic models that are too complex often just don't converge, they crash. No shrinkage to apply to a failed convergence model.

2. model selection works in the absence of shrinkage
Ruben: I think you are assuming that shrinkage is necessary.

3. model selection can find the "right" model and the features selected would be the same features selected if the data were slightly perturbed or a new sample taken
Ruben: I don't make this assumption. New data, new model.

4. AIC tells you something that P-values don't (unless using structured multiple degree of freedom tests)
Ruben: It does.

 5. parsimony is good
Ruben: It is.

None of these assumptions is true.  Model selection without shrinkage
(penalization) seems to offer benefits but this is largely a mirage.

Ruben: Have a good weekend!

Ruben

Rubén Roa wrote

>
> -----Mensaje original-----
> De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de
> enero de 2012 21:20
> Para: Rubén Roa
> CC: Ben Bolker; Frank Harrell
> Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5
> interactions and unique combinations?
>
> On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa &lt;rroa@&gt; wrote:
>> I think we have gone through this before.
>> First, the destruction of all aspects of statistical inference is not
>> at stake, Frank Harrell's post notwithstanding.
>> Second, checking all pairs is a way to see for _all pairs_ which
>> model outcompetes which in terms of predictive ability by -2AIC or
>> more. Just sorting them by the AIC does not give you that if no model
>> is better than the next best by less than 2AIC.
>> Third, I was not implying that AIC differences play the role of
>> significance tests. I agree with you that model selection is better
>> not understood as a proxy or as a relative of significance tests procedures.
>> Incidentally, when comparing many models the AIC is often inconclusive.
>> If one is bent on selecting just _the model_ then I check numerical
>> optimization diagnostics such as size of gradients, KKT criteria, and
>> other issues such as standard errors of parameter estimates and the
>> correlation matrix of parameter estimates.
>
> -- And the mathematical basis for this claim is ....  ??
>
> --
> Ruben: In my area of work (building/testing/applying mechanistic
> nonlinear models of natural and economic phenomena) the issue of
> numerical optimization is a very serious one. It is often the case
> that a really good looking model does not converge properly (that's
> why ADMB is so popular among us). So if the AIC is inconclusive, but
> one AIC-tied model yields reasonably looking standard errors and low
> pairwise parameter estimates correlations, while the other wasn´t even
> able to produce a positive definite Hessian matrix (though it was able
> to maximize the log-likelihood), I think I have good reasons to select
> the properly converged model. I assume here that the lack of
> convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect.
> ---
> Ruben: For some reasons I don't find model averaging appealing. I
> guess deep in my heart I expect more from my model than just the best
> predictive ability.
>
> -- This is a religious, not a scientific statement, and has no place
> in any scientific discussion.
>
> --
> Ruben: Seriously, there is a wide and open place in scientific
> discussion for mechanistic model-building. I expect the models I built
> to be more than able predictors, I want them to capture some aspect of
> nature, to teach me something about nature, so I refrain from model
> averaging, which is an open admission that you don't care too much
> about what's really going on.
>
> -- The belief that certain data analysis practices -- standard or not
> -- somehow can be trusted to yield reliable scientific results in the
> face of clear theoretical (mathematical )and practical results to the
> contrary, while widespread, impedes and often thwarts the progress of
> science, Evidence-based medicine and clinical trials came about for a
> reason. I would encourage you to reexamine the basis of your
> scientific practice and the role that "magical thinking" plays in it.
>
> Best,
> Bert
>
> --
> Ruben: All right Bert. I often re-examine the basis of my scientific
> praxis but less often than I did before, I have to confess. I like to
> think it is because I am converging on the right praxis so there are
> less issues to re-examine. But this problem of model selection is a tough one.
> Being a likelihoodist in inference naturally leads you to AIC-based
> model selection, Jim Lindsey being influent too. Wanting that your
> models say some something about nature is another strong conditioning
> factor. Put this two together and it becomes hard to do
> model-averaging. And it has nothing to do with religion (yuck!).
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4333464.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Frank Harrell
Ruben you are mistaken on every single point.  But I see it's not worth continuing this discussion.
Frank
Rubén Roa wrote
-----Mensaje original-----
De: [hidden email] [mailto:[hidden email]] En nombre de Frank Harrell
Enviado el: viernes, 27 de enero de 2012 14:28
Para: [hidden email]
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Ruben, I'm not sure you are understanding the ramifications of what Bert said.  In addition you are making several assumptions implicitly:

--
Ruben: Frank, I guess we are going nowhere now.
But thanks anyways. See below if you want.

1. model selection is needed (vs. fitting the full model and using shrinkage)
Ruben: Nonlinear mechanistic models that are too complex often just don't converge, they crash. No shrinkage to apply to a failed convergence model.

2. model selection works in the absence of shrinkage
Ruben: I think you are assuming that shrinkage is necessary.

3. model selection can find the "right" model and the features selected would be the same features selected if the data were slightly perturbed or a new sample taken
Ruben: I don't make this assumption. New data, new model.

4. AIC tells you something that P-values don't (unless using structured multiple degree of freedom tests)
Ruben: It does.

 5. parsimony is good
Ruben: It is.

None of these assumptions is true.  Model selection without shrinkage
(penalization) seems to offer benefits but this is largely a mirage.

Ruben: Have a good weekend!

Ruben

Rubén Roa wrote
>
> -----Mensaje original-----
> De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de
> enero de 2012 21:20
> Para: Rubén Roa
> CC: Ben Bolker; Frank Harrell
> Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5
> interactions and unique combinations?
>
> On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <rroa@> wrote:
>> I think we have gone through this before.
>> First, the destruction of all aspects of statistical inference is not
>> at stake, Frank Harrell's post notwithstanding.
>> Second, checking all pairs is a way to see for _all pairs_ which
>> model outcompetes which in terms of predictive ability by -2AIC or
>> more. Just sorting them by the AIC does not give you that if no model
>> is better than the next best by less than 2AIC.
>> Third, I was not implying that AIC differences play the role of
>> significance tests. I agree with you that model selection is better
>> not understood as a proxy or as a relative of significance tests procedures.
>> Incidentally, when comparing many models the AIC is often inconclusive.
>> If one is bent on selecting just _the model_ then I check numerical
>> optimization diagnostics such as size of gradients, KKT criteria, and
>> other issues such as standard errors of parameter estimates and the
>> correlation matrix of parameter estimates.
>
> -- And the mathematical basis for this claim is ....  ??
>
> --
> Ruben: In my area of work (building/testing/applying mechanistic
> nonlinear models of natural and economic phenomena) the issue of
> numerical optimization is a very serious one. It is often the case
> that a really good looking model does not converge properly (that's
> why ADMB is so popular among us). So if the AIC is inconclusive, but
> one AIC-tied model yields reasonably looking standard errors and low
> pairwise parameter estimates correlations, while the other wasn´t even
> able to produce a positive definite Hessian matrix (though it was able
> to maximize the log-likelihood), I think I have good reasons to select
> the properly converged model. I assume here that the lack of
> convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect.
> ---
> Ruben: For some reasons I don't find model averaging appealing. I
> guess deep in my heart I expect more from my model than just the best
> predictive ability.
>
> -- This is a religious, not a scientific statement, and has no place
> in any scientific discussion.
>
> --
> Ruben: Seriously, there is a wide and open place in scientific
> discussion for mechanistic model-building. I expect the models I built
> to be more than able predictors, I want them to capture some aspect of
> nature, to teach me something about nature, so I refrain from model
> averaging, which is an open admission that you don't care too much
> about what's really going on.
>
> -- The belief that certain data analysis practices -- standard or not
> -- somehow can be trusted to yield reliable scientific results in the
> face of clear theoretical (mathematical )and practical results to the
> contrary, while widespread, impedes and often thwarts the progress of
> science, Evidence-based medicine and clinical trials came about for a
> reason. I would encourage you to reexamine the basis of your
> scientific practice and the role that "magical thinking" plays in it.
>
> Best,
> Bert
>
> --
> Ruben: All right Bert. I often re-examine the basis of my scientific
> praxis but less often than I did before, I have to confess. I like to
> think it is because I am converging on the right praxis so there are
> less issues to re-examine. But this problem of model selection is a tough one.
> Being a likelihoodist in inference naturally leads you to AIC-based
> model selection, Jim Lindsey being influent too. Wanting that your
> models say some something about nature is another strong conditioning
> factor. Put this two together and it becomes hard to do
> model-averaging. And it has nothing to do with religion (yuck!).
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4333464.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Greg Snow-2
In reply to this post by Jhope
What variables to consider adding and when to stop adding them depends greatly upon what question(s) you are trying to answer and the science behind your data.

Are you trying to create a model to predict your outcome for future predictors?  How precise of predictions are needed?

Are you trying to understand how certain predictors relate to the response? How they relate after conditioning on other predictors?

Will humans be using your equation directly? Or will it be in a black box that the computer generates predictions from but people never need to look at the details?

What is the cost (money, time, difficulty, etc.) of collecting the different predictors?

Answers to the above questions will be much more valuable in choosing the "best" model than AIC or other values (though you should still look at the results from analyses for information to combine with the other information).  R and its programmers (no matter how great and wonderful they are) cannot answer these for you.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
801.408.8111


> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Jhope
> Sent: Thursday, January 26, 2012 2:26 PM
> To: [hidden email]
> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
> interactions and unique combinations?
>
> I ask the question about when to stop adding another variable even
> though it
> lowers the AIC because each time I add a variable the AIC is lower. How
> do I
> know when the model is a good fit? When to stop adding variables,
> keeping
> the model simple?
>
> Thanks, J
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-do-I-
> compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-
> tp4326407p4331848.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...