[R] Goodness of fit with robust regression

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[R] Goodness of fit with robust regression

Celso Barros
Dear list members,

       I have been doing robust regressions in R, using the MASS package for
rlm and robustbase for logistic regressions. I must be doing something
wrong, because my output does not include r-squares (or adjusted r-squares),
or, in the case of glmrob, -2log likelihoods. Does anyone know how to get an
output that includes these?

        Thanks so much for the help

                                                        Celso


--
Celso F. Rocha de Barros
DPhil candidate in Sociology, University of Oxford

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [R] Goodness of fit with robust regression

agent dunham
I also have the same problem, can anybody help?

and I would also like to see the p-values associated with the t-value of the coefficients.

At present I type summary (mod1.rlm) and neither of these things appear.

Thanks, [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Goodness of fit with robust regression

Spencer Graves-2
       I'm not an expert on robust modeling.  However, as far as I know,
most robust regression procedures are based on heuristics, justified by
claims that "it seems to work" rather than reference to assumptions
about a probability model that makes the procedures "optimal".  There
may be exceptions for procedures that assume a linear model plus noise
that follows a student's t distribution or a contaminated normal.  Thus,
if you can't get traditional R-squares from a standard robust regression
function, it may be because the people who wrote the function thought
that R-squared (as, "percent of variance explained") did not make sense
in that context.  This is particularly true for robust general linear
models.


       Fortunately, the prospects are not as grim as this explanation
might seem:  The summary method for an "lmrob" object (from the
robustbase package) returned for me the standard table with estimated,
standard errors, t values, and p values for the regression
coefficients.  The robustbase package also includes an anova method for
two nested lmrob models.  This returns pseudoDF (a replacement for the
degrees of freedom), Test.Stat (analogous to 2*log(likelihood ratio)),
Df, and Pr(>chisq).  In addition to the 5 References in the lmrob help
page, help(pac=robustbase) says, it is ' "Essential" Robust Statistics.  
The goal is to provide tools allowing to analyze data with robust
methods.  This includes regression methodology including model
selections and multivariate statistics where we strive to cover the book
"Robust Statistics, Theory and Methods" by Maronna, Martin and Yohai;
Wiley 2006.'


       I chose to use lmrob, because it seemed the obvious choice from a
search I did of Jonathan Baron's database of contributed R packages:


library(sos)
rls <- findFn('robust fit') # 477 matches;  retrieved 400
rls.m <- findFn('robust model')# 2404 matches;  retrieved 400
rls. <- rls|rls.m # union of the two searchs
installPackages(rls.)
# install missing packages with many matches
# so we can get more information about those packages
writeFindFn2xls(rls.)
# Produce an Excel file with a package summary
# as well a table of the individual matches


       Hope this helps.
       Spencer Graves


p.s.  The functions in MASS are very good.  I did not use rlm in this
case primarily because MASS was package number 27 in the package summary
in the Excel file produced by the above script.  Beyond that,
methods(class='rlm') identified predict, print, se.contrast, summary and
vcov methods for rlm objects, and showMethods(class='rlm') returned
nothing.  Conclusion:  If there is an anova method for rlm objects, I
couldn't find it.


On 3/14/2011 7:00 AM, agent dunham wrote:

> I also have the same problem, can anybody help?
>
> and I would also like to see the p-values associated with the t-value of the
> coefficients.
>
> At present I type summary (mod1.rlm) and neither of these things appear.
>
> Thanks, [hidden email]
>
> --
> View this message in context: http://r.789695.n4.nabble.com/R-Goodness-of-fit-with-robust-regression-tp809412p3353919.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Goodness of fit with robust regression

Bert Gunter
Hi:

Just a few additional comments to Spencer's.

1. There is an R-SIG-Robust group that you may wish to post your
question to if you have not already done so. There you **should** find
experts to help you, of which I'm also not one.

2. The situation regarding the effectiveness of robust techniques is
not so grim as Spencer seems to imply. Heuristics have been backed up
by more than 30 years of simulations and asymptotic work. Asymptotic
optimality (which may or may not be all that useful) has been shown
for certain approaches in certain situations. Generally and vaguely
speaking, many "standard" (e.g. M-estimators) robust estimation
methods (e.g. Tukey biweight) have been found to be fairly harmless
(little loss of efficiency) when the data are "well-behaved" and
potentially lots better when they are not.

3. A big problem in establishing reference distributions (for
inference) for the various approaches is that robustness weights
depend on data and fitting parameters. If you can automate the
selection of these in some way, simulation can always be used -- and
is, I believe, what is generally recommended for inference.

4. Probably the most important and difficult issue in applying robust
methods is finding good starting values. That's why MASS's rlm
function has a well thought-out strategy of using a very low
efficiency but high resistance (and typically computationally
intensive) estimator to produce "safe" initial starting guesses from
which an M-estimator than iterates to a solution (typically very
quickly). This is also something that needs to be automated (as rlm
does) for simulation/bootstrap based inference.

But again: Please refer to (1) above. All of my "advice" is subject to
modification by the folks there.

Cheers,
Bert


On Mon, Mar 14, 2011 at 8:54 AM, Spencer Graves
<[hidden email]> wrote:

>      I'm not an expert on robust modeling.  However, as far as I know, most
> robust regression procedures are based on heuristics, justified by claims
> that "it seems to work" rather than reference to assumptions about a
> probability model that makes the procedures "optimal".  There may be
> exceptions for procedures that assume a linear model plus noise that follows
> a student's t distribution or a contaminated normal.  Thus, if you can't get
> traditional R-squares from a standard robust regression function, it may be
> because the people who wrote the function thought that R-squared (as,
> "percent of variance explained") did not make sense in that context.  This
> is particularly true for robust general linear models.
>
>
>      Fortunately, the prospects are not as grim as this explanation might
> seem:  The summary method for an "lmrob" object (from the robustbase
> package) returned for me the standard table with estimated, standard errors,
> t values, and p values for the regression coefficients.  The robustbase
> package also includes an anova method for two nested lmrob models.  This
> returns pseudoDF (a replacement for the degrees of freedom), Test.Stat
> (analogous to 2*log(likelihood ratio)), Df, and Pr(>chisq).  In addition to
> the 5 References in the lmrob help page, help(pac=robustbase) says, it is '
> "Essential" Robust Statistics.  The goal is to provide tools allowing to
> analyze data with robust methods.  This includes regression methodology
> including model selections and multivariate statistics where we strive to
> cover the book "Robust Statistics, Theory and Methods" by Maronna, Martin
> and Yohai; Wiley 2006.'
>
>
>      I chose to use lmrob, because it seemed the obvious choice from a
> search I did of Jonathan Baron's database of contributed R packages:
>
>
> library(sos)
> rls <- findFn('robust fit') # 477 matches;  retrieved 400
> rls.m <- findFn('robust model')# 2404 matches;  retrieved 400
> rls. <- rls|rls.m # union of the two searchs
> installPackages(rls.)
> # install missing packages with many matches
> # so we can get more information about those packages
> writeFindFn2xls(rls.)
> # Produce an Excel file with a package summary
> # as well a table of the individual matches
>
>
>      Hope this helps.
>      Spencer Graves
>
>
> p.s.  The functions in MASS are very good.  I did not use rlm in this case
> primarily because MASS was package number 27 in the package summary in the
> Excel file produced by the above script.  Beyond that, methods(class='rlm')
> identified predict, print, se.contrast, summary and vcov methods for rlm
> objects, and showMethods(class='rlm') returned nothing.  Conclusion:  If
> there is an anova method for rlm objects, I couldn't find it.
>
>
> On 3/14/2011 7:00 AM, agent dunham wrote:
>>
>> I also have the same problem, can anybody help?
>>
>> and I would also like to see the p-values associated with the t-value of
>> the
>> coefficients.
>>
>> At present I type summary (mod1.rlm) and neither of these things appear.
>>
>> Thanks, [hidden email]
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/R-Goodness-of-fit-with-robust-regression-tp809412p3353919.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.