Difference betweeen cor.test() and formula everyone says to use

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Difference betweeen cor.test() and formula everyone says to use

Jeremy Miles-2
I'm trying to understand how cor.test() is calculating the p-value of
a correlation. It gives a p-value based on t, but every text I've ever
seen gives the calculation based on z.

For example:
> data(cars)
> with(cars[1:10, ], cor.test(speed, dist))

Pearson's product-moment correlation

data:  speed and dist
t = 2.3893, df = 8, p-value = 0.04391
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.02641348 0.90658582
sample estimates:
      cor
0.6453079

But when I use the regular formula:
> r <- cor(cars[1:10, ])[1, 2]
> r.z <- fisherz(r)
> se <- se <- 1/sqrt(10 - 3)
> z <- r.z / se
> (1 - pnorm(z))*2
[1] 0.04237039

My p-value is different.  The help file for cor.test doesn't (seem to)
have any reference to this, and I can see in the source code that it
is doing something different. I'm just not sure what.

Thanks,

Jeremy

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Difference betweeen cor.test() and formula everyone says to use

Joshua Wiley-2
Hi Jeremy,

I don't know about references, but this around.  See for example:
http://afni.nimh.nih.gov/sscc/gangc/tr.html

the relevant line in cor.test is:

STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2))

You can convert *t*s to *r*s and vice versa.

Best,

Josh



On Fri, Oct 17, 2014 at 10:32 AM, Jeremy Miles <[hidden email]>
wrote:

> I'm trying to understand how cor.test() is calculating the p-value of
> a correlation. It gives a p-value based on t, but every text I've ever
> seen gives the calculation based on z.
>
> For example:
> > data(cars)
> > with(cars[1:10, ], cor.test(speed, dist))
>
> Pearson's product-moment correlation
>
> data:  speed and dist
> t = 2.3893, df = 8, p-value = 0.04391
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
>  0.02641348 0.90658582
> sample estimates:
>       cor
> 0.6453079
>
> But when I use the regular formula:
> > r <- cor(cars[1:10, ])[1, 2]
> > r.z <- fisherz(r)
> > se <- se <- 1/sqrt(10 - 3)
> > z <- r.z / se
> > (1 - pnorm(z))*2
> [1] 0.04237039
>
> My p-value is different.  The help file for cor.test doesn't (seem to)
> have any reference to this, and I can see in the source code that it
> is doing something different. I'm just not sure what.
>
> Thanks,
>
> Jeremy
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Difference betweeen cor.test() and formula everyone says to use

Peter Dalgaard-2
This is pretty much standard. I'm quite sure that other stats packages do likewise and I wouldn't know who "everyone" is. It is not unheard of that textbook authors give suboptimal formulas in order not to confuse students, though.

The basic point is that the t transformation gives the exact distribution under the null. Fisher's Z is only approximately normally distributed.

The t transformation works because if beta is the regression coefficient of y on x, beta==0 iff rho==0, and we have exact theory for testing beta==0 by a t-test.

Off-null, the t-approach does not readily transfer, so confidence intervals tend to be based on the Z-transformation.

-Peter D.



On 17 Oct 2014, at 02:20 , Joshua Wiley <[hidden email]> wrote:

> Hi Jeremy,
>
> I don't know about references, but this around.  See for example:
> http://afni.nimh.nih.gov/sscc/gangc/tr.html
>
> the relevant line in cor.test is:
>
> STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2))
>
> You can convert *t*s to *r*s and vice versa.
>
> Best,
>
> Josh
>
>
>
> On Fri, Oct 17, 2014 at 10:32 AM, Jeremy Miles <[hidden email]>
> wrote:
>
>> I'm trying to understand how cor.test() is calculating the p-value of
>> a correlation. It gives a p-value based on t, but every text I've ever
>> seen gives the calculation based on z.
>>
>> For example:
>>> data(cars)
>>> with(cars[1:10, ], cor.test(speed, dist))
>>
>> Pearson's product-moment correlation
>>
>> data:  speed and dist
>> t = 2.3893, df = 8, p-value = 0.04391
>> alternative hypothesis: true correlation is not equal to 0
>> 95 percent confidence interval:
>> 0.02641348 0.90658582
>> sample estimates:
>>      cor
>> 0.6453079
>>
>> But when I use the regular formula:
>>> r <- cor(cars[1:10, ])[1, 2]
>>> r.z <- fisherz(r)
>>> se <- se <- 1/sqrt(10 - 3)
>>> z <- r.z / se
>>> (1 - pnorm(z))*2
>> [1] 0.04237039
>>
>> My p-value is different.  The help file for cor.test doesn't (seem to)
>> have any reference to this, and I can see in the source code that it
>> is doing something different. I'm just not sure what.
>>
>> Thanks,
>>
>> Jeremy
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua F. Wiley
> Ph.D. Student, UCLA Department of Psychology
> http://joshuawiley.com/
> Senior Analyst, Elkhart Group Ltd.
> http://elkhartgroup.com
> Office: 260.673.5518
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Difference betweeen cor.test() and formula everyone says to use

JLucke
In reply to this post by Jeremy Miles-2
The distribution of the statistic $ndf * r^2 / (1-r^2)$ with  the true
value $\rho = zero$ follows an $F(1,ndf)$ distribution.
So the t-test is the correct test for $\rho=0$.
Fisher's z is an asymptotically normal  transformation for any value of
$\rho$.
Thus  Fisher's z is better for testing $\rho= \rho_0 $ or $\rho_1 =
\rho_2$.
The two statistics will not be equivalent at $\rho=0$ because the
statistics are based on different assumptions.




Jeremy Miles <[hidden email]>
Sent by: [hidden email]
10/16/2014 07:32 PM

To
r-help <[hidden email]>,
cc

Subject
[R] Difference betweeen cor.test() and formula everyone says to use






I'm trying to understand how cor.test() is calculating the p-value of
a correlation. It gives a p-value based on t, but every text I've ever
seen gives the calculation based on z.

For example:
> data(cars)
> with(cars[1:10, ], cor.test(speed, dist))

Pearson's product-moment correlation

data:  speed and dist
t = 2.3893, df = 8, p-value = 0.04391
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.02641348 0.90658582
sample estimates:
      cor
0.6453079

But when I use the regular formula:
> r <- cor(cars[1:10, ])[1, 2]
> r.z <- fisherz(r)
> se <- se <- 1/sqrt(10 - 3)
> z <- r.z / se
> (1 - pnorm(z))*2
[1] 0.04237039

My p-value is different.  The help file for cor.test doesn't (seem to)
have any reference to this, and I can see in the source code that it
is doing something different. I'm just not sure what.

Thanks,

Jeremy

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.