Logistic regression model selection with overdispersed/autocorrelated data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Logistic regression model selection with overdispersed/autocorrelated data

Jesse.Whittington


I am creating habitat selection models for caribou and other species with
data collected from GPS collars.  In my current situation the radio-collars
recorded the locations of 30 caribou every 6 hours.  I am then comparing
resources used at caribou locations to random locations using logistic
regression (standard habitat analysis).

The data is therefore highly autocorrelated and this causes Type I error
two ways – small standard errors around beta-coefficients and
over-paramaterization during model selection.  Robust standard errors are
easily calculated by block-bootstrapping the data using “animal” as a
cluster with the Design library, however I haven’t found a satisfactory
solution for model selection.

A couple options are:
1.  Using QAIC where the deviance is divided by a variance inflation factor
(Burnham & Anderson).  However, this VIF can vary greatly depending on the
data set and the set of covariates used in the global model.
2.  Manual forward stepwise regression using both changes in deviance and
robust p-values for the beta-coefficients.

I have been looking for a solution to this problem for a couple years and
would appreciate any advice.

Jesse

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression model selection with overdispersed/autocorrelated data

Frank Harrell
[hidden email] wrote:

>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the radio-collars
> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways – small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using “animal” as a
> cluster with the Design library, however I haven’t found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation factor
> (Burnham & Anderson).  However, this VIF can vary greatly depending on the
> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse

If you must do non-subject-matter-driven model selection, look at the
fastbw function in Design, which will use the cluster bootstrap variance
matrix.

Frank

>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression model selection with overdispersed/autocorrelated data

Renaud Lancelot
In reply to this post by Jesse.Whittington
If you're not interested in fitting caribou-specific responses, you
can use beta-binomial logistic models. There are several package
available for this purpose on CRAN, among which aod. Because these
models are fitted using maximum-likelihood methods, you can use AIC
(or other information criteria) to compare different models.

Best,

Renaud

2006/1/30, [hidden email] <[hidden email]>:

>
>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the radio-collars
> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways – small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using "animal" as a
> cluster with the Design library, however I haven't found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation factor
> (Burnham & Anderson).  However, this VIF can vary greatly depending on the
> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Renaud LANCELOT
Département Elevage et Médecine Vétérinaire (EMVT) du CIRAD
Directeur adjoint chargé des affaires scientifiques

CIRAD, Animal Production and Veterinary Medicine Department
Deputy director for scientific affairs

Campus international de Baillarguet
TA 30 / B (Bât. B, Bur. 214)
34398 Montpellier Cedex 5 - France
Tél   +33 (0)4 67 59 37 17
Secr. +33 (0)4 67 59 39 04
Fax   +33 (0)4 67 59 37 95

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Logistic regression model selection with overdispersed/autocorrelated data

Jesse.Whittington
In reply to this post by Frank Harrell




[hidden email] wrote:
>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the
radio-collars

> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways – small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using “animal” as a
> cluster with the Design library, however I haven’t found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation
factor
> (Burnham & Anderson).  However, this VIF can vary greatly depending on
the
> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse

Frank E Harrell Jr wrote:

If you must do non-subject-matter-driven model selection, look at the
fastbw function in Design, which will use the cluster bootstrap variance
matrix.

Frank


Thanks for the tip.  I didn't know that the fastbw function could account
for the clustered variance.  For others, the code to run such a model from
the Design library would be:

model.1 <- lrm(y ~ x1+x2+x3+x4, data=data, x=T,y=T)          # create model
model.2 <- bootcov(model.1, cluster=data$animal, B=10000)    # calculate
robust variance matrix
fastbw(model.2)                                              # backward
step-wise selection.

Later we will examine individual caribou responses to trails
(subject-specific model selection).  For this we plan to use mixed effects
models (lmer).  Is this what you would also recommend?

I look forward to reading the new edition of your book when it is
published.

Jesse

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html