conservative robust estimation in (nonlinear) mixed models

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

conservative robust estimation in (nonlinear) mixed models

dave fournier

Conservative robust estimation methods do not appear to be
currently available in the standard mixed model methods for R,
where by conservative robust estimation I mean methods which
work almost as well as the methods based on assumptions of
normality when the assumption of normality *IS* satisfied.

We are considering adding such a conservative robust estimation option
for the random effects to our AD Model Builder mixed model package,
glmmADMB, for R, and perhaps extending it to do robust estimation for
linear mixed models at the same time.

An obvious candidate is to assume something like a mixture of
normals. I have tested this in a simple linear mixed model
using 5% contamination with  a normal with 3 times the standard
deviation, which seems to be
a common assumption. Simulation results indicate that when the
random effects are normally distributed this estimator is about
3% less efficient, while when the random effects are contaminated with
5% outliers  the estimator is about 23% more efficient, where by 23%
more efficient I mean that one would have to use a sample size about
23% larger to obtain the same size confidence limits for the
parameters.

Question?

I wonder if there are other distributions besides a mixture or normals.
which might be preferable. Three things to keep in mind are:

    1.)  It should be likelihood based so that the standard likelihood
          based tests are applicable.

    2.)  It should work well when the random effects are normally
         distributed so that things that are already fixed don't get
         broke.

    3.)  In order to implement the method efficiently it is necessary to
         be able to produce code for calculating the inverse of the
         cumulative distribution function. This enables one to extend
         methods based one the Laplace approximation for the random
         effects (i.e. the Laplace approximation itself, adaptive
         Gaussian integration, adaptive importance sampling) to the new
         distribution.

      Dave

--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: conservative robust estimation in (nonlinear) mixed models

Spencer Graves
          I know of two fairly common models for robust methods.  One is the
contaminated normal that you mentioned.  The other is Student's t.  A
normal plot of the data or of residuals will often indicate whether the
assumption of normality is plausible or not;  when the plot indicates
problems, it will often also indicate whether a contaminated normal or
Student's t would be better.

          Using Student's t introduces one additional parameter.  A
contaminated normal would introduce 2;  however, in many applications,
the contamination proportion (or its logit) will often b highly
correlated with the ratio of the contamination standard deviation to
that of the central portion of the distribution.  Thus, in some cases,
it's often wise to fix the ratio of the standard deviations and estimate
only the contamination proportion.

          hope this helps.
          spencer graves

dave fournier wrote:

> Conservative robust estimation methods do not appear to be
> currently available in the standard mixed model methods for R,
> where by conservative robust estimation I mean methods which
> work almost as well as the methods based on assumptions of
> normality when the assumption of normality *IS* satisfied.
>
> We are considering adding such a conservative robust estimation option
> for the random effects to our AD Model Builder mixed model package,
> glmmADMB, for R, and perhaps extending it to do robust estimation for
> linear mixed models at the same time.
>
> An obvious candidate is to assume something like a mixture of
> normals. I have tested this in a simple linear mixed model
> using 5% contamination with  a normal with 3 times the standard
> deviation, which seems to be
> a common assumption. Simulation results indicate that when the
> random effects are normally distributed this estimator is about
> 3% less efficient, while when the random effects are contaminated with
> 5% outliers  the estimator is about 23% more efficient, where by 23%
> more efficient I mean that one would have to use a sample size about
> 23% larger to obtain the same size confidence limits for the
> parameters.
>
> Question?
>
> I wonder if there are other distributions besides a mixture or normals.
> which might be preferable. Three things to keep in mind are:
>
>     1.)  It should be likelihood based so that the standard likelihood
>           based tests are applicable.
>
>     2.)  It should work well when the random effects are normally
>          distributed so that things that are already fixed don't get
>          broke.
>
>     3.)  In order to implement the method efficiently it is necessary to
>          be able to produce code for calculating the inverse of the
>          cumulative distribution function. This enables one to extend
>          methods based one the Laplace approximation for the random
>          effects (i.e. the Laplace approximation itself, adaptive
>          Gaussian integration, adaptive importance sampling) to the new
>          distribution.
>
>       Dave
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: conservative robust estimation in (nonlinear) mixed models

Bert Gunter
Ok, since Spencer has dived in,I'll go public (I made some prior private
remarks to David because I didn't think they were worth wasting the list's
bandwidth on. Heck, they may still not be...)

My question: isn't the difficult issue which levels of the (co)variance
hierarchy get longer tailed distributions rather than which distributions
are used to model ong tails? Seems to me that there is an inherent
identifiability issue here, and even more so with nonlinear models. It's
easy to construct examples where it all essentially depends on your priors.

Cheers,
Bert

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Spencer Graves
> Sent: Thursday, March 23, 2006 12:34 PM
> To: [hidden email]
> Cc: [hidden email]
> Subject: Re: [R] conservative robust estimation in
> (nonlinear) mixed models
>
>  I know of two fairly common models for robust
> methods.  One is the
> contaminated normal that you mentioned.  The other is Student's t.  A
> normal plot of the data or of residuals will often indicate
> whether the
> assumption of normality is plausible or not;  when the plot indicates
> problems, it will often also indicate whether a contaminated
> normal or
> Student's t would be better.
>
>  Using Student's t introduces one additional parameter.  A
> contaminated normal would introduce 2;  however, in many
> applications,
> the contamination proportion (or its logit) will often b highly
> correlated with the ratio of the contamination standard deviation to
> that of the central portion of the distribution.  Thus, in
> some cases,
> it's often wise to fix the ratio of the standard deviations
> and estimate
> only the contamination proportion.
>
>  hope this helps.
>  spencer graves
>
> dave fournier wrote:
>
> > Conservative robust estimation methods do not appear to be
> > currently available in the standard mixed model methods for R,
> > where by conservative robust estimation I mean methods which
> > work almost as well as the methods based on assumptions of
> > normality when the assumption of normality *IS* satisfied.
> >
> > We are considering adding such a conservative robust
> estimation option
> > for the random effects to our AD Model Builder mixed model package,
> > glmmADMB, for R, and perhaps extending it to do robust
> estimation for
> > linear mixed models at the same time.
> >
> > An obvious candidate is to assume something like a mixture of
> > normals. I have tested this in a simple linear mixed model
> > using 5% contamination with  a normal with 3 times the standard
> > deviation, which seems to be
> > a common assumption. Simulation results indicate that when the
> > random effects are normally distributed this estimator is about
> > 3% less efficient, while when the random effects are
> contaminated with
> > 5% outliers  the estimator is about 23% more efficient, where by 23%
> > more efficient I mean that one would have to use a sample size about
> > 23% larger to obtain the same size confidence limits for the
> > parameters.
> >
> > Question?
> >
> > I wonder if there are other distributions besides a mixture
> or normals.
> > which might be preferable. Three things to keep in mind are:
> >
> >     1.)  It should be likelihood based so that the standard
> likelihood
> >           based tests are applicable.
> >
> >     2.)  It should work well when the random effects are normally
> >          distributed so that things that are already fixed don't get
> >          broke.
> >
> >     3.)  In order to implement the method efficiently it is
> necessary to
> >          be able to produce code for calculating the inverse of the
> >          cumulative distribution function. This enables one
> to extend
> >          methods based one the Laplace approximation for the random
> >          effects (i.e. the Laplace approximation itself, adaptive
> >          Gaussian integration, adaptive importance
> sampling) to the new
> >          distribution.
> >
> >       Dave
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: conservative robust estimation in (nonlinear) mixed models

Spencer Graves
          Bert raised an issue I had overlooked.  Ideally, we would like to be
able to specify a different "family" for the observations and for each
random effect, with Student's t and contaminated normal as valid options
in both places.

          If I were allowed to specify a family (or a robust family) for either
observations or for random effects but not both, I think I'd pick the
observations.  I don't know, but I wonder if misspecification of the
observation distribution might create more problems with estimation and
inference than misspecification of the distribution of a random effect.
  As Bert indicated, there may be identifiability issues here, and the
choice of a model may depend on one's hypotheses about the situation
being modeled.

          spencer graves

Berton Gunter wrote:

> Ok, since Spencer has dived in,I'll go public (I made some prior private
> remarks to David because I didn't think they were worth wasting the list's
> bandwidth on. Heck, they may still not be...)
>
> My question: isn't the difficult issue which levels of the (co)variance
> hierarchy get longer tailed distributions rather than which distributions
> are used to model ong tails? Seems to me that there is an inherent
> identifiability issue here, and even more so with nonlinear models. It's
> easy to construct examples where it all essentially depends on your priors.
>
> Cheers,
> Bert
>
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
>  
>  
>
>
>>-----Original Message-----
>>From: [hidden email]
>>[mailto:[hidden email]] On Behalf Of Spencer Graves
>>Sent: Thursday, March 23, 2006 12:34 PM
>>To: [hidden email]
>>Cc: [hidden email]
>>Subject: Re: [R] conservative robust estimation in
>>(nonlinear) mixed models
>>
>>  I know of two fairly common models for robust
>>methods.  One is the
>>contaminated normal that you mentioned.  The other is Student's t.  A
>>normal plot of the data or of residuals will often indicate
>>whether the
>>assumption of normality is plausible or not;  when the plot indicates
>>problems, it will often also indicate whether a contaminated
>>normal or
>>Student's t would be better.
>>
>>  Using Student's t introduces one additional parameter.  A
>>contaminated normal would introduce 2;  however, in many
>>applications,
>>the contamination proportion (or its logit) will often b highly
>>correlated with the ratio of the contamination standard deviation to
>>that of the central portion of the distribution.  Thus, in
>>some cases,
>>it's often wise to fix the ratio of the standard deviations
>>and estimate
>>only the contamination proportion.
>>
>>  hope this helps.
>>  spencer graves
>>
>>dave fournier wrote:
>>
>>
>>>Conservative robust estimation methods do not appear to be
>>>currently available in the standard mixed model methods for R,
>>>where by conservative robust estimation I mean methods which
>>>work almost as well as the methods based on assumptions of
>>>normality when the assumption of normality *IS* satisfied.
>>>
>>>We are considering adding such a conservative robust
>>
>>estimation option
>>
>>>for the random effects to our AD Model Builder mixed model package,
>>>glmmADMB, for R, and perhaps extending it to do robust
>>
>>estimation for
>>
>>>linear mixed models at the same time.
>>>
>>>An obvious candidate is to assume something like a mixture of
>>>normals. I have tested this in a simple linear mixed model
>>>using 5% contamination with  a normal with 3 times the standard
>>>deviation, which seems to be
>>>a common assumption. Simulation results indicate that when the
>>>random effects are normally distributed this estimator is about
>>>3% less efficient, while when the random effects are
>>
>>contaminated with
>>
>>>5% outliers  the estimator is about 23% more efficient, where by 23%
>>>more efficient I mean that one would have to use a sample size about
>>>23% larger to obtain the same size confidence limits for the
>>>parameters.
>>>
>>>Question?
>>>
>>>I wonder if there are other distributions besides a mixture
>>
>>or normals.
>>
>>>which might be preferable. Three things to keep in mind are:
>>>
>>>    1.)  It should be likelihood based so that the standard
>>
>>likelihood
>>
>>>          based tests are applicable.
>>>
>>>    2.)  It should work well when the random effects are normally
>>>         distributed so that things that are already fixed don't get
>>>         broke.
>>>
>>>    3.)  In order to implement the method efficiently it is
>>
>>necessary to
>>
>>>         be able to produce code for calculating the inverse of the
>>>         cumulative distribution function. This enables one
>>
>>to extend
>>
>>>         methods based one the Laplace approximation for the random
>>>         effects (i.e. the Laplace approximation itself, adaptive
>>>         Gaussian integration, adaptive importance
>>
>>sampling) to the new
>>
>>>         distribution.
>>>
>>>      Dave
>>>
>>
>>______________________________________________
>>[hidden email] mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
>>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

conservative robust estimation in (nonlinear) mixed models

dave fournier
In reply to this post by Bert Gunter
I believe that Bert's comments are a non sequitur.
I did not and do not propose identifying which components
of the model are contaminated by outliers. What I do propose
is the more or less routine use of conservative robust methods
to replace the normal theory estimators. By definition such estimators
are to be almost as efficient as the normal theory estimators in the
case where the normal theory applies. One may argue that
conservative robust estimators do not exist for this class of
problems. I think they do, but the obvious way to establish this
claim is to carry out simulations.

Before such simulations can be carried out one must create the
software to do the analysis. So I am proposing to add that to our
R package glmmADMB. Then other R users can carry out their own
simulation analysis to investigate how the method performs.
I think that normal mixtures are better candidates for
conservative robust estimators than say Student's T distribution,
but I will try to include both (and perhaps any others that appear
useful).

      Dave


>  Bert raised an issue I had overlooked.  Ideally, we would like to be
> able to specify a different "family" for the observations and for each
> random effect, with Student's t and contaminated normal as valid options
> in both places.
>
>  If I were allowed to specify a family (or a robust family) for either
> observations or for random effects but not both, I think I'd pick the
> observations.  I don't know, but I wonder if misspecification of the
> observation distribution might create more problems with estimation and
> inference than misspecification of the distribution of a random effect.
>   As Bert indicated, there may be identifiability issues here, and the
> choice of a model may depend on one's hypotheses about the situation
> being modeled.
>
>  spencer graves
>
> Berton Gunter wrote:
>
>> Ok, since Spencer has dived in,I'll go public (I made some prior private
>> remarks to David because I didn't think they were worth wasting the list's
>> bandwidth on. Heck, they may still not be...)
>>
>> My question: isn't the difficult issue which levels of the (co)variance
>> hierarchy get longer tailed distributions rather than which distributions
>> are used to model ong tails? Seems to me that there is an inherent
>> identifiability issue here, and even more so with nonlinear models. It's
>> easy to construct examples where it all essentially depends on your priors.
>>
>> Cheers,
>> Bert
>>
>> -- Bert Gunter
>> Genentech Non-Clinical Statistics
>> South San Francisco, CA
>>  
>>  
>>
>>
>>>-----Original Message-----
>>>From: r-help-bounces at stat.math.ethz.ch
>>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
>>>Sent: Thursday, March 23, 2006 12:34 PM
>>>To: otter at otter-rsch.com
>>>Cc: r-help at stat.math.ethz.ch
>>>Subject: Re: [R] conservative robust estimation in
>>>(nonlinear) mixed models
>>>
>>>  I know of two fairly common models for robust
>>>methods.  One is the
>>>contaminated normal that you mentioned.  The other is Student's t.  A
>>>normal plot of the data or of residuals will often indicate
>>>whether the
>>>assumption of normality is plausible or not;  when the plot indicates
>>>problems, it will often also indicate whether a contaminated
>>>normal or
>>>Student's t would be better.
>>>
>>>  Using Student's t introduces one additional parameter.  A
>>>contaminated normal would introduce 2;  however, in many
>>>applications,
>>>the contamination proportion (or its logit) will often b highly
>>>correlated with the ratio of the contamination standard deviation to
>>>that of the central portion of the distribution.  Thus, in
>>>some cases,
>>>it's often wise to fix the ratio of the standard deviations
>>>and estimate
>>>only the contamination proportion.
>>>
>>>  hope this helps.
>>>  spencer graves
>>>
>>>dave fournier wrote:
>>>
>>>
>>>>Conservative robust estimation methods do not appear to be
>>>>currently available in the standard mixed model methods for R,
>>>>where by conservative robust estimation I mean methods which
>>>>work almost as well as the methods based on assumptions of
>>>>normality when the assumption of normality *IS* satisfied.
>>>>
>>>>We are considering adding such a conservative robust
>>>
>>>estimation option
>>>
>>>>for the random effects to our AD Model Builder mixed model package,
>>>>glmmADMB, for R, and perhaps extending it to do robust
>>>
>>>estimation for
>>>
>>>>linear mixed models at the same time.
>>>>
>>>>An obvious candidate is to assume something like a mixture of
>>>>normals. I have tested this in a simple linear mixed model
>>>>using 5% contamination with  a normal with 3 times the standard
>>>>deviation, which seems to be
>>>>a common assumption. Simulation results indicate that when the
>>>>random effects are normally distributed this estimator is about
>>>>3% less efficient, while when the random effects are
>>>
>>>contaminated with
>>>
>>>>5% outliers  the estimator is about 23% more efficient, where by 23%
>>>>more efficient I mean that one would have to use a sample size about
>>>>23% larger to obtain the same size confidence limits for the
>>>>parameters.
>>>>
>>>>Question?
>>>>
>>>>I wonder if there are other distributions besides a mixture
>>>
>>>or normals.
>>>
>>>>which might be preferable. Three things to keep in mind are:
>>>>
>>>>    1.)  It should be likelihood based so that the standard
>>>
>>>likelihood
>>>
>>>>          based tests are applicable.
>>>>
>>>>    2.)  It should work well when the random effects are normally
>>>>         distributed so that things that are already fixed don't get
>>>>         broke.
>>>>
>>>>    3.)  In order to implement the method efficiently it is
>>>
>>>necessary to
>>>
>>>>         be able to produce code for calculating the inverse of the
>>>>         cumulative distribution function. This enables one
>>>
>>>to extend
>>>
>>>>         methods based one the Laplace approximation for the random
>>>>         effects (i.e. the Laplace approximation itself, adaptive
>>>>         Gaussian integration, adaptive importance
>>>
>>>sampling) to the new
>>>
>>>>         distribution.
>>>>
>>>>      Dave
>>>>
>>>




--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html