a problem 'cor' function

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

a problem 'cor' function

Tao Shi
Hi list,

One of my co-workers found this problem with 'cor' in his code and I confirm it too (see below).  He's using R 2.2.1 under Win 2K and I'm using R 2.3.0 under Win XP.

===========================================
> R.Version()
$platform
[1] "i386-pc-mingw32"

$arch
[1] "i386"

$os
[1] "mingw32"

$system
[1] "i386, mingw32"

$status
[1] ""

$major
[1] "2"

$minor
[1] "3.0"

$year
[1] "2006"

$month
[1] "04"

$day
[1] "24"

$`svn rev`
[1] "37909"

$language
[1] "R"

$version.string
[1] "Version 2.3.0 (2006-04-24)"
> data(iris)
> cor(iris[1:4])
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length       1.0000     -0.1176       0.8718      0.8179
Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661
Petal.Length       0.8718     -0.4284       1.0000      0.9629
Petal.Width        0.8179     -0.3661       0.9629      1.0000
> cor(iris[1:4])==1
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length         TRUE       FALSE        FALSE       FALSE
Sepal.Width         FALSE        TRUE        FALSE       FALSE
Petal.Length        FALSE       FALSE         TRUE       FALSE
Petal.Width         FALSE       FALSE        FALSE        TRUE
> cor(iris[1:4], iris[1:4])
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length       1.0000     -0.1176       0.8718      0.8179
Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661
Petal.Length       0.8718     -0.4284       1.0000      0.9629
Petal.Width        0.8179     -0.3661       0.9629      1.0000
> cor(iris[1:4], iris[1:4])==1
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length         TRUE       FALSE        FALSE       FALSE
Sepal.Width         FALSE        TRUE        FALSE       FALSE
Petal.Length        FALSE       FALSE        FALSE       FALSE
Petal.Width         FALSE       FALSE        FALSE        TRUE
===========================================

The two ways of calculating correlation seem to generate the 'same' results, but the second one doesn't appear to be numerically stable (see the 3rd diagonal element of the last matrix).

thanks,

...Tao

_________________________________________________________________
Join the next generation of Hotmail and you could win the adventure of a lifetime

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: a problem 'cor' function

P Ehlers
Looks like another case of the most F of all FAQs: FAQ 7.31.

See if the following makes sense to you:

  pl <- iris[101:150, 3]
  all.equal(cor(pl,pl), 1)
  [1] TRUE
  cor(pl,pl) == 1
  [1] FALSE
  sprintf("%1.22g", cor(pl, pl))
  [1] "0.99999999999999989"
  sprintf("%1.22g", pl)
  [1] "6"                  "5.0999999999999996" "5.9000000000000004"
etc

Peter Ehlers

Tao Shi wrote:

> Hi list,
>
> One of my co-workers found this problem with 'cor' in his code and I confirm it too (see below).  He's using R 2.2.1 under Win 2K and I'm using R 2.3.0 under Win XP.
>
> ===========================================
>
>>R.Version()
>
> $platform
> [1] "i386-pc-mingw32"
>
> $arch
> [1] "i386"
>
> $os
> [1] "mingw32"
>
> $system
> [1] "i386, mingw32"
>
> $status
> [1] ""
>
> $major
> [1] "2"
>
> $minor
> [1] "3.0"
>
> $year
> [1] "2006"
>
> $month
> [1] "04"
>
> $day
> [1] "24"
>
> $`svn rev`
> [1] "37909"
>
> $language
> [1] "R"
>
> $version.string
> [1] "Version 2.3.0 (2006-04-24)"
>
>>data(iris)
>>cor(iris[1:4])
>
>              Sepal.Length Sepal.Width Petal.Length Petal.Width
> Sepal.Length       1.0000     -0.1176       0.8718      0.8179
> Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661
> Petal.Length       0.8718     -0.4284       1.0000      0.9629
> Petal.Width        0.8179     -0.3661       0.9629      1.0000
>
>>cor(iris[1:4])==1
>
>              Sepal.Length Sepal.Width Petal.Length Petal.Width
> Sepal.Length         TRUE       FALSE        FALSE       FALSE
> Sepal.Width         FALSE        TRUE        FALSE       FALSE
> Petal.Length        FALSE       FALSE         TRUE       FALSE
> Petal.Width         FALSE       FALSE        FALSE        TRUE
>
>>cor(iris[1:4], iris[1:4])
>
>              Sepal.Length Sepal.Width Petal.Length Petal.Width
> Sepal.Length       1.0000     -0.1176       0.8718      0.8179
> Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661
> Petal.Length       0.8718     -0.4284       1.0000      0.9629
> Petal.Width        0.8179     -0.3661       0.9629      1.0000
>
>>cor(iris[1:4], iris[1:4])==1
>
>              Sepal.Length Sepal.Width Petal.Length Petal.Width
> Sepal.Length         TRUE       FALSE        FALSE       FALSE
> Sepal.Width         FALSE        TRUE        FALSE       FALSE
> Petal.Length        FALSE       FALSE        FALSE       FALSE
> Petal.Width         FALSE       FALSE        FALSE        TRUE
> ===========================================
>
> The two ways of calculating correlation seem to generate the 'same' results, but the second one doesn't appear to be numerically stable (see the 3rd diagonal element of the last matrix).
>
> thanks,
>
> ...Tao
>
> _________________________________________________________________
> Join the next generation of Hotmail and you could win the adventure of a lifetime
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Peter Ehlers
Chair, Division of Statistics and Actuarial Science
Department of Mathematics and Statistics
University of Calgary, 2500 University Dr. NW       ph: 403-220-3936
Calgary, Alberta  T2N 1N4, CANADA                  fax: 403-282-5150
email: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: a problem 'cor' function

Tao Shi
In reply to this post by Tao Shi
Hi Peter,
 
Thank you very much for your quick reply!
 
I'm aware of the accuracy issue of the numbers in R.  I guess the thing I was puzzled is how come the same function when used in different way produce different results (it's supposed to use the same algorithm).  The only explanation for this is when cor(x) is used, the function automatically assign all the diagonal elements to 1.  But when cor(x, y) is used, since there is no way to check which two vectors are the same, the function gives the 'real' results.
 
best,
 
...Tao
 



> Date: Wed, 31 May 2006 16:49:18 -0600> From: [hidden email]> To: [hidden email]> CC: [hidden email]> Subject: Re: [R] a problem 'cor' function> > Looks like another case of the most F of all FAQs: FAQ 7.31.> > See if the following makes sense to you:> >   pl <- iris[101:150, 3]>   all.equal(cor(pl,pl), 1)>   [1] TRUE>   cor(pl,pl) == 1>   [1] FALSE>   sprintf("%1.22g", cor(pl, pl))>   [1] "0.99999999999999989">   sprintf("%1.22g", pl)>   [1] "6"                  "5.0999999999999996" "5.9000000000000004"> etc> > Peter Ehlers> > Tao Shi wrote:> > > Hi list,> > > > One of my co-workers found this problem with 'cor' in his code and I confirm it too (see below).  He's using R 2.2.1 under Win 2K and I'm using R 2.3.0 under Win XP.> > > > ===========================================> > > >>R.Version()> > > > $platform> > [1] "i386-pc-mingw32"> > > > $arch> > [1] "i386"> > > > $os> > [1] "mingw32"> > > > $system> > [1] "i386, mingw32"> > > > $status> > [1] ""> > > > $major> > [1] "2"> > > > $minor> > [1] "3.0"> > > > $year> > [1] "2006"> > > > $month> > [1] "04"> > > > $day> > [1] "24"> > > > $`svn rev`> > [1] "37909"> > > > $language> > [1] "R"> > > > $version.string> > [1] "Version 2.3.0 (2006-04-24)"> > > >>data(iris)> >>cor(iris[1:4])> > > >              Sepal.Length Sepal.Width Petal.Length Petal.Width> > Sepal.Length       1.0000     -0.1176       0.8718      0.8179> > Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661> > Petal.Length       0.8718     -0.4284       1.0000      0.9629> > Petal.Width        0.8179     -0.3661       0.9629      1.0000> > > >>cor(iris[1:4])==1> > > >              Sepal.Length Sepal.Width Petal.Length Petal.Width> > Sepal.Length         TRUE       FALSE        FALSE       FALSE> > Sepal.Width         FALSE        TRUE        FALSE       FALSE> > Petal.Length        FALSE       FALSE         TRUE       FALSE> > Petal.Width         FALSE       FALSE        FALSE        TRUE> > > >>cor(iris[1:4], iris[1:4])> > > >              Sepal.Length Sepal.Width Petal.Length Petal.Width> > Sepal.Length       1.0000     -0.1176       0.8718      0.8179> > Sepal.Width       -0.1176      1.0000      -0.4284     -0.3661> > Petal.Length       0.8718     -0.4284       1.0000      0.9629> > Petal.Width        0.8179     -0.3661       0.9629      1.0000> > > >>cor(iris[1:4], iris[1:4])==1> > > >              Sepal.Length Sepal.Width Petal.Length Petal.Width> > Sepal.Length         TRUE       FALSE        FALSE       FALSE> > Sepal.Width         FALSE        TRUE        FALSE       FALSE> > Petal.Length        FALSE       FALSE        FALSE       FALSE> > Petal.Width         FALSE       FALSE        FALSE        TRUE> > ===========================================> > > > The two ways of calculating correlation seem to generate the 'same' results, but the second one doesn't appear to be numerically stable (see the 3rd diagonal element of the last matrix).> > > > thanks,> > > > ...Tao > > > > _________________________________________________________________> > Join the next generation of Hotmail and you could win the adventure of a lifetime> > > > ______________________________________________> > [hidden email] mailing list> > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html> > -- > Peter Ehlers> Chair, Division of Statistics and Actuarial Science> Department of Mathematics and Statistics> University of Calgary, 2500 University Dr. NW       ph: 403-220-3936> Calgary, Alberta  T2N 1N4, CANADA                  fax: 403-282-5150> email: [hidden email]>
_________________________________________________________________
ItÂ’s the future of Hotmail: Try Windows Live Mail beta

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html