Variance-covariance by factor

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Variance-covariance by factor

Yang, Richard

Dear all,

        I have a data frame with one factor and four numeric variables and
wish to obtain the var-cor matrix separately by factor. I tried by() and
sapply() but getting nowhere. I understand this can be done by subsetting
the dataframe, but there should have some sleek ways of doing it.

Here is a simulated dataframe;

s <- rep(c("A","B","C"), c(25,22,18))

d <- c(rnorm(25,14,2.6),rnorm(22,15.2,2.8),rnorm(18,16.4,3.0))
h <- c(rnorm(25,10,1.4),rnorm(22,11.2,1.8),rnorm(18,12.3,2.0))
l <- c(rnorm(25,6.8,1.6), rnorm(22,7.0,1.7),rnorm(18,7.3,1.8))
w <- c(rnorm(25,2.5,0.65),rnorm(22,2.6,0.7),rnorm(18,2.8,0.71))

sim <- data.frame(cbind(S = s, D= d, H=h,L=l,W=w))

> sim.var <- sapply(split(sim, sim$S), function(z) var(z))
Error in var(z) : missing observations in cov/cor
In addition: Warning message:
NAs introduced by coercion

> sim.var <- by(sim, sim$S, function(z) var(z))
Error in var(z) : missing observations in cov/cor
In addition: Warning message:
NAs introduced by coercion

Debug() and trac() the function got no further info.

Because the error suggested missing data, I looked at the data frame and was
surprised

> str(sim)
`data.frame':   65 obs. of  5 variables:
 $ S: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
 $ D: Factor w/ 65 levels "10.0860856437045",..: 51 12 21 11 8 15 57 44 19
60 ...
 $ H: Factor w/ 65 levels "10.0345903489406",..: 17 2 4 52 6 21 29 9 62 10
...
 $ L: Factor w/ 65 levels "10.3854663209663",..: 10 6 23 55 60 8 58 65 11 2
...
 $ W: Factor w/ 65 levels "0.93902749732563",..: 38 13 33 12 22 39 47 31 36
53 ...

Why the data.frame() converts numeric vectors d, h, l, w into factors?

Any suggestions for 1) how to compute var-cor by factor in a data frame, and
2) why data.frame converts numeric variables into factors ?

> sessionInfo()
R version 2.2.0, 2005-10-06, i386-pc-mingw32

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
"base"    

Thanks,

Richard  Yang

Northern Forestry Centre   / Centre de foresterie du Nord
Canadian Forest Service   / Service canadien des forĂȘts
Natural Resources Canada   / Ressources naturelles Canada
5320-122 Street       / 5320, rue 122

Edmonton Alberta Canada
T6H 3S5



        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Variance-covariance by factor

Gabor Grothendieck
The cbind should be removed:

sim <- data.frame(S = s, D= d, H=h, L=l, W=w)
by(sim[,-1], sim$S, var)

On 1/13/06, Yang, Richard <[hidden email]> wrote:

>
> Dear all,
>
>        I have a data frame with one factor and four numeric variables and
> wish to obtain the var-cor matrix separately by factor. I tried by() and
> sapply() but getting nowhere. I understand this can be done by subsetting
> the dataframe, but there should have some sleek ways of doing it.
>
> Here is a simulated dataframe;
>
> s <- rep(c("A","B","C"), c(25,22,18))
>
> d <- c(rnorm(25,14,2.6),rnorm(22,15.2,2.8),rnorm(18,16.4,3.0))
> h <- c(rnorm(25,10,1.4),rnorm(22,11.2,1.8),rnorm(18,12.3,2.0))
> l <- c(rnorm(25,6.8,1.6), rnorm(22,7.0,1.7),rnorm(18,7.3,1.8))
> w <- c(rnorm(25,2.5,0.65),rnorm(22,2.6,0.7),rnorm(18,2.8,0.71))
>
> sim <- data.frame(cbind(S = s, D= d, H=h,L=l,W=w))
>
> > sim.var <- sapply(split(sim, sim$S), function(z) var(z))
> Error in var(z) : missing observations in cov/cor
> In addition: Warning message:
> NAs introduced by coercion
>
> > sim.var <- by(sim, sim$S, function(z) var(z))
> Error in var(z) : missing observations in cov/cor
> In addition: Warning message:
> NAs introduced by coercion
>
> Debug() and trac() the function got no further info.
>
> Because the error suggested missing data, I looked at the data frame and was
> surprised
>
> > str(sim)
> `data.frame':   65 obs. of  5 variables:
>  $ S: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
>  $ D: Factor w/ 65 levels "10.0860856437045",..: 51 12 21 11 8 15 57 44 19
> 60 ...
>  $ H: Factor w/ 65 levels "10.0345903489406",..: 17 2 4 52 6 21 29 9 62 10
> ...
>  $ L: Factor w/ 65 levels "10.3854663209663",..: 10 6 23 55 60 8 58 65 11 2
> ...
>  $ W: Factor w/ 65 levels "0.93902749732563",..: 38 13 33 12 22 39 47 31 36
> 53 ...
>
> Why the data.frame() converts numeric vectors d, h, l, w into factors?
>
> Any suggestions for 1) how to compute var-cor by factor in a data frame, and
> 2) why data.frame converts numeric variables into factors ?
>
> > sessionInfo()
> R version 2.2.0, 2005-10-06, i386-pc-mingw32
>
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
> "base"
>
> Thanks,
>
> Richard  Yang
>
> Northern Forestry Centre   /    Centre de foresterie du Nord
> Canadian Forest Service    /    Service canadien des forĂȘts
> Natural Resources Canada   /    Ressources naturelles Canada
> 5320-122 Street            /    5320, rue 122
>
> Edmonton Alberta Canada
> T6H 3S5
>
>
>
>        [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html