how to test robustness of correlation

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

how to test robustness of correlation

yang.x.qiu
Hi, there:

As you all know, correlation is not a very robust procedure.  Sometimes
correlation could be driven by a few outliers. There are a few ways to
improve the robustness of correlation (pearson correlation), either by
outlier removal procedure, or resampling technique.

I am wondering if there is any R package or R code that have incorporated
outlier removal or resampling procedure in calculating correlation
coefficient.

Your help is greatly appreciated.

Thanks.
Yang

Yang Qiu
Integrated Data Analysis
Cheminformatics@RTP
GlaxoSmithKline
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: how to test robustness of correlation

Bert Gunter
check out cov.rob() in MASS (among others, I'm sure). The procedure is far
more sophisticated than "outlier removal" or resampling (??). References are
given in the docs.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> [hidden email]
> Sent: Wednesday, January 25, 2006 12:37 PM
> To: [hidden email]
> Subject: [R] how to test robustness of correlation
>
> Hi, there:
>
> As you all know, correlation is not a very robust procedure.  
> Sometimes
> correlation could be driven by a few outliers. There are a
> few ways to
> improve the robustness of correlation (pearson correlation),
> either by
> outlier removal procedure, or resampling technique.
>
> I am wondering if there is any R package or R code that have
> incorporated
> outlier removal or resampling procedure in calculating correlation
> coefficient.
>
> Your help is greatly appreciated.
>
> Thanks.
> Yang
>
> Yang Qiu
> Integrated Data Analysis
> Cheminformatics@RTP
> GlaxoSmithKline
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: how to test robustness of correlation

yang.x.qiu
Hi, Berton:
thanks for getting back to me.

I played around cor.rob().  Yes, I can get a robust correlation
coefficient matrix based on mcd or mve outlier detection methods.

I have two further questions:

1) How do I get a p value of the robust r?
2) What I mean by resampling is "leave one out" procedure, to get a
confidence interval of r.  Do you know if there is any package in R to do
it?  I suppose I could code it myself,  but it is nice if there is already
one.

thanks.
Yang





"Berton Gunter" <[hidden email]>
25-Jan-2006 15:57
 
To
[hidden email], [hidden email]
cc

Subject
RE: [R] how to test robustness of correlation






check out cov.rob() in MASS (among others, I'm sure). The procedure is far
more sophisticated than "outlier removal" or resampling (??). References
are
given in the docs.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> [hidden email]
> Sent: Wednesday, January 25, 2006 12:37 PM
> To: [hidden email]
> Subject: [R] how to test robustness of correlation
>
> Hi, there:
>
> As you all know, correlation is not a very robust procedure.
> Sometimes
> correlation could be driven by a few outliers. There are a
> few ways to
> improve the robustness of correlation (pearson correlation),
> either by
> outlier removal procedure, or resampling technique.
>
> I am wondering if there is any R package or R code that have
> incorporated
> outlier removal or resampling procedure in calculating correlation
> coefficient.
>
> Your help is greatly appreciated.
>
> Thanks.
> Yang
>
> Yang Qiu
> Integrated Data Analysis
> Cheminformatics@RTP
> GlaxoSmithKline
>                [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: how to test robustness of correlation

Bert Gunter
Below



>
> Hi, Berton:
> thanks for getting back to me.
>
> I played around cor.rob().  Yes, I can get a robust
> correlation coefficient matrix based on mcd or mve outlier
> detection methods.  
>
> I have two further questions:
>
> 1) How do I get a p value of the robust r?

A p-value for what? That r==0 ?

> 2) What I mean by resampling is "leave one out" procedure, to
> get a confidence interval of r.  Do you know if there is any
> package in R to do it?  I suppose I could code it myself,  
> but it is nice if there is already one.
>
> thanks.
> Yang

**An** answer to both is the same -- bootstrap it. "Leave one out" is not
resampling (/bootstrapping). It is usually referred to as "jackknifing," but
that uses more specific ways of doing things than the analogy implies.
Efron's little SIAM book on "The jackknife, bootstrap, etc. explains them
and their relationships in detail. It is trivial to bootstrap cor.rob in
base R using sample() (from the <x,y> **pairs** -- or n-tuples generally --
not the marginals separately ). If you insist on a package, "boot" is the
obvious one -- why did you not attempt to find it yourself? Either way,
expect it to take a while for a decent size resample (e.g. >1e4).

-- Bert

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: how to test robustness of correlation

Bert Gunter
In reply to this post by yang.x.qiu
One more thing ...


> I played around cor.rob().  Yes, I can get a robust correlation
> coefficient matrix based on mcd or mve outlier detection methods.
>
> I have two further questions:
>

You might call it semantics, but I prefer "resistant estimation" to "outlier
detection methods." I recognize that they are equivalent (any resistant
estimator can be used to identify "outliers"; any outlier detection method
leads to a resistant estimator on downweighting of outliers). However, I
consider the distinction important. "Outlier detection" suggests:

1) That "outlier" is a statistically well-defined concept; it isn't. The
implied dichotomy is a fiction (a dangerous one, IMO -- but many would
disagree).

2) That some sort of hypothesis testing procedure is used to "reject"
points. None is.

Rather, mve() and mcd() try to characterize the behavior of the "central"
mass of the distribution, using that characterization to weight the
informativeness of points outside that mass. A 1-D equivalent is MAD for
spread. This is a far cry from the bad old days of (sequential) "outlier
detection." These methods are crucially dependent on modern computer power
of course.

Cheers,

Bert

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: how to test robustness of correlation

Gabor Grothendieck
In reply to this post by yang.x.qiu
The cor function can do spearman correlation using
method = "spearman" .

On 1/25/06, [hidden email] <[hidden email]> wrote:

> Hi, there:
>
> As you all know, correlation is not a very robust procedure.  Sometimes
> correlation could be driven by a few outliers. There are a few ways to
> improve the robustness of correlation (pearson correlation), either by
> outlier removal procedure, or resampling technique.
>
> I am wondering if there is any R package or R code that have incorporated
> outlier removal or resampling procedure in calculating correlation
> coefficient.
>
> Your help is greatly appreciated.
>
> Thanks.
> Yang
>
> Yang Qiu
> Integrated Data Analysis
> Cheminformatics@RTP
> GlaxoSmithKline
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: how to test robustness of correlation

Bert Gunter
Gabor:

Contrary to popular belief, rank-based procedures are **not** resistant.

Example:

> x<-c(1:10,100);y<-c(1:10+rnorm(10,sd=.25),-100)
> cor(x,y)
[1] -0.9816899  ## awful

> cor(x,y,method='spearman')
[1] 0.5 ## better

> require(MASS)
Loading required package: MASS
[1] TRUE

> cov.rob(cbind(x,y),cor=TRUE)
## ... bunch of output omitted

$cor
          x         y
x 1.0000000 0.9977734  ## best
y 0.9977734 1.0000000

## Look at the plot to see.

-- Bert

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Gabor
> Grothendieck
> Sent: Thursday, January 26, 2006 9:05 AM
> To: [hidden email]
> Cc: [hidden email]
> Subject: Re: [R] how to test robustness of correlation
>
> The cor function can do spearman correlation using
> method = "spearman" .
>
> On 1/25/06, [hidden email] <[hidden email]> wrote:
> > Hi, there:
> >
> > As you all know, correlation is not a very robust
> procedure.  Sometimes
> > correlation could be driven by a few outliers. There are a
> few ways to
> > improve the robustness of correlation (pearson
> correlation), either by
> > outlier removal procedure, or resampling technique.
> >
> > I am wondering if there is any R package or R code that
> have incorporated
> > outlier removal or resampling procedure in calculating correlation
> > coefficient.
> >
> > Your help is greatly appreciated.
> >
> > Thanks.
> > Yang
> >
> > Yang Qiu
> > Integrated Data Analysis
> > Cheminformatics@RTP
> > GlaxoSmithKline
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html