logical variables in models

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

logical variables in models

Fox, John
Dear R-devel list members,

This is an observation about how logical variables in models are handled, followed by questions.

As a general matter, character variables and logical variables are treated as if they were factors when they appear on the RHS of a model formula; for example:

- - - - snip- - - - -

> set.seed(123)
> c <- sample(letters[1:3], 10, replace=TRUE)
> f <- as.factor(sample(LETTERS[1:3], 10, replace=TRUE))
> L <- sample(c(TRUE, FALSE), 10, replace=TRUE)
> y <- rnorm(10)
> options(contrasts=c("contr.sum", "contr.poly"))
> mod <- lm(y ~ c + f + L)
> model.matrix(mod)
   (Intercept) c1 c2 f1 f2 L1
1            1  1  0 -1 -1  1
2            1 -1 -1  0  1  1
3            1  0  1 -1 -1  1
4            1 -1 -1  0  1  1
5            1 -1 -1  1  0  1
6            1  1  0 -1 -1  1
7            1  0  1  1  0  1
8            1 -1 -1  1  0  1
9            1  0  1  1  0 -1
10           1  0  1 -1 -1 -1
attr(,"assign")
[1] 0 1 1 2 2 3
attr(,"contrasts")
attr(,"contrasts")$c
[1] "contr.sum"

attr(,"contrasts")$f
[1] "contr.sum"

attr(,"contrasts")$L
[1] “contr.sum"

- - - - snip- - - - -

But logical variables don’t appear in the $xlevels component of the objects created by lm() and similar functions:

- - - - snip- - - - -

> mod$xlevels
$c
[1] "a" "b" "c"

$f
[1] "A" "B" “C"

- - - - snip- - - - -

Why the discrepancy? It’s true that the level-set (i.e., TRUE, FALSE) for a logical “factor” is known, but examining the $levels component is a simple way to detect variables treated as factors in the model. For example, I’d argue that .getXlevels() returns misleading information:

- - - - snip- - - - -

> .getXlevels(terms(mod), model.frame(mod))
$c
[1] "a" "b" "c"

$f
[1] "A" "B" “C"

- - - - snip- - - - -

An alternative for detecting “factors” is to examine the 'contrasts' attribute of the model matrix, although that doesn’t produce levels:

- - - - snip- - - - -

> names(attr(model.matrix(mod), "contrasts"))
[1] "c" "f" "L"

- - - - snip- - - - -

Is there are argument against making the treatment of logical variables consistent with that of factors and character variables? Comments?

Best,
 John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel