I'm trying to understand how cor.test() is calculating the p-value of
a correlation. It gives a p-value based on t, but every text I've ever seen gives the calculation based on z. For example: > data(cars) > with(cars[1:10, ], cor.test(speed, dist)) Pearson's product-moment correlation data: speed and dist t = 2.3893, df = 8, p-value = 0.04391 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.02641348 0.90658582 sample estimates: cor 0.6453079 But when I use the regular formula: > r <- cor(cars[1:10, ])[1, 2] > r.z <- fisherz(r) > se <- se <- 1/sqrt(10 - 3) > z <- r.z / se > (1 - pnorm(z))*2 [1] 0.04237039 My p-value is different. The help file for cor.test doesn't (seem to) have any reference to this, and I can see in the source code that it is doing something different. I'm just not sure what. Thanks, Jeremy ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi Jeremy,
I don't know about references, but this around. See for example: http://afni.nimh.nih.gov/sscc/gangc/tr.html the relevant line in cor.test is: STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2)) You can convert *t*s to *r*s and vice versa. Best, Josh On Fri, Oct 17, 2014 at 10:32 AM, Jeremy Miles <[hidden email]> wrote: > I'm trying to understand how cor.test() is calculating the p-value of > a correlation. It gives a p-value based on t, but every text I've ever > seen gives the calculation based on z. > > For example: > > data(cars) > > with(cars[1:10, ], cor.test(speed, dist)) > > Pearson's product-moment correlation > > data: speed and dist > t = 2.3893, df = 8, p-value = 0.04391 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > 0.02641348 0.90658582 > sample estimates: > cor > 0.6453079 > > But when I use the regular formula: > > r <- cor(cars[1:10, ])[1, 2] > > r.z <- fisherz(r) > > se <- se <- 1/sqrt(10 - 3) > > z <- r.z / se > > (1 - pnorm(z))*2 > [1] 0.04237039 > > My p-value is different. The help file for cor.test doesn't (seem to) > have any reference to this, and I can see in the source code that it > is doing something different. I'm just not sure what. > > Thanks, > > Jeremy > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
This is pretty much standard. I'm quite sure that other stats packages do likewise and I wouldn't know who "everyone" is. It is not unheard of that textbook authors give suboptimal formulas in order not to confuse students, though.
The basic point is that the t transformation gives the exact distribution under the null. Fisher's Z is only approximately normally distributed. The t transformation works because if beta is the regression coefficient of y on x, beta==0 iff rho==0, and we have exact theory for testing beta==0 by a t-test. Off-null, the t-approach does not readily transfer, so confidence intervals tend to be based on the Z-transformation. -Peter D. On 17 Oct 2014, at 02:20 , Joshua Wiley <[hidden email]> wrote: > Hi Jeremy, > > I don't know about references, but this around. See for example: > http://afni.nimh.nih.gov/sscc/gangc/tr.html > > the relevant line in cor.test is: > > STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2)) > > You can convert *t*s to *r*s and vice versa. > > Best, > > Josh > > > > On Fri, Oct 17, 2014 at 10:32 AM, Jeremy Miles <[hidden email]> > wrote: > >> I'm trying to understand how cor.test() is calculating the p-value of >> a correlation. It gives a p-value based on t, but every text I've ever >> seen gives the calculation based on z. >> >> For example: >>> data(cars) >>> with(cars[1:10, ], cor.test(speed, dist)) >> >> Pearson's product-moment correlation >> >> data: speed and dist >> t = 2.3893, df = 8, p-value = 0.04391 >> alternative hypothesis: true correlation is not equal to 0 >> 95 percent confidence interval: >> 0.02641348 0.90658582 >> sample estimates: >> cor >> 0.6453079 >> >> But when I use the regular formula: >>> r <- cor(cars[1:10, ])[1, 2] >>> r.z <- fisherz(r) >>> se <- se <- 1/sqrt(10 - 3) >>> z <- r.z / se >>> (1 - pnorm(z))*2 >> [1] 0.04237039 >> >> My p-value is different. The help file for cor.test doesn't (seem to) >> have any reference to this, and I can see in the source code that it >> is doing something different. I'm just not sure what. >> >> Thanks, >> >> Jeremy >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua F. Wiley > Ph.D. Student, UCLA Department of Psychology > http://joshuawiley.com/ > Senior Analyst, Elkhart Group Ltd. > http://elkhartgroup.com > Office: 260.673.5518 > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Jeremy Miles-2
The distribution of the statistic $ndf * r^2 / (1-r^2)$ with the true
value $\rho = zero$ follows an $F(1,ndf)$ distribution. So the t-test is the correct test for $\rho=0$. Fisher's z is an asymptotically normal transformation for any value of $\rho$. Thus Fisher's z is better for testing $\rho= \rho_0 $ or $\rho_1 = \rho_2$. The two statistics will not be equivalent at $\rho=0$ because the statistics are based on different assumptions. Jeremy Miles <[hidden email]> Sent by: [hidden email] 10/16/2014 07:32 PM To r-help <[hidden email]>, cc Subject [R] Difference betweeen cor.test() and formula everyone says to use I'm trying to understand how cor.test() is calculating the p-value of a correlation. It gives a p-value based on t, but every text I've ever seen gives the calculation based on z. For example: > data(cars) > with(cars[1:10, ], cor.test(speed, dist)) Pearson's product-moment correlation data: speed and dist t = 2.3893, df = 8, p-value = 0.04391 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.02641348 0.90658582 sample estimates: cor 0.6453079 But when I use the regular formula: > r <- cor(cars[1:10, ])[1, 2] > r.z <- fisherz(r) > se <- se <- 1/sqrt(10 - 3) > z <- r.z / se > (1 - pnorm(z))*2 [1] 0.04237039 My p-value is different. The help file for cor.test doesn't (seem to) have any reference to this, and I can see in the source code that it is doing something different. I'm just not sure what. Thanks, Jeremy ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |