# [R] How to Get Categorical Correlation Coefficient

4 messages
Open this post in threaded view
|

## [R] How to Get Categorical Correlation Coefficient

 Howdy Gurus ! I have a different correlation result from the same data. The "corridor1" string variable is expressed as a number like the "corridor2" number variable. -------------------------------------------------------------------------- > levels(corridor1) [1] "A"   "B"   "C"   "D"     "E"   "F" > levels(as.factor(corridor2)) [1] "0" "1" "2" "3" "4" > ------------------------------------------------------------------------------------------ I have the correlation results followings using cor() function. ------------------------------------------------------------------------------------------ > cor(jh1_1, as.factor(corridor1)) [1] 0.01528538 > cor(jh1_1, as.factor(corridor2)) [1] -0.4972571 ------------------------------------------------------------------------------------------ I donot know why the above correlation coefficients used the same data are different. They are 0.015 from as.factor(corridor1), -0.497 from as,factor(corridor2). The string variable "corridor1" is the same catergory data with the variable corridor2. The difference is that "A" is replaced with "0", "B" with "1", "C" with "2", ..... Could you tell me why they are different, and which correlation coefficient is correct? Thank in advance, -- Kum-Hoe Hwang, Ph.D.Phone : 82-31-250-3516Email : [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: [R] How to Get Categorical Correlation Coefficient

 "Kum-Hoe Hwang" <[hidden email]> writes: > Howdy Gurus ! > > I have a different correlation result from the same data. The > "corridor1" string variable is expressed > as a number like the "corridor2" number variable. > -------------------------------------------------------------------------- > > levels(corridor1) > [1] "A"   "B"   "C"   "D"     "E"   "F" > > levels(as.factor(corridor2)) > [1] "0" "1" "2" "3" "4" > > > ------------------------------------------------------------------------------------------ > I have the correlation results followings using cor() function. > ------------------------------------------------------------------------------------------ > > cor(jh1_1, as.factor(corridor1)) > [1] 0.01528538 > > cor(jh1_1, as.factor(corridor2)) > [1] -0.4972571 > ------------------------------------------------------------------------------------------ > I donot know why the above correlation coefficients used the same data > are different. > They are 0.015 from as.factor(corridor1), -0.497 from as,factor(corridor2). > The string variable "corridor1" is the same catergory data with the > variable corridor2. > The difference is that "A" is replaced with "0", "B" with "1", "C" > with "2", ..... > > Could you tell me why they are different, and which correlation > coefficient is correct? One thing that strikes me is that corridor1 has 6 levels and corridor2 has 5... In general correlations are not expected to work on factors so I'd be explicit about taking as.numeric(). A glance at table(corridor1,corridor2) should be informative too, as would a summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1))) --    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918 ~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.