get_all_vars() does not handle rhs matrices in formulae

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

get_all_vars() does not handle rhs matrices in formulae

Thomas J. Leeper
Hello again,

It appears that get_all_vars() incorrectly handles model formulae that
use a right-hand side (rhs) matrix. For example, consider these two
substantively identical models:

# model using named variables
mpg <- mtcars$mpg
wt <- mtcars$wt
hp <- mtcars$hp
m1 <- lm(mpg ~ wt + hp)

# model using matrix
y <- mtcars$mpg
x <- cbind(mtcars$wt, mtcars$hp)
m2 <- lm(y ~ x)

For the first, get_all_vars() returns the correct data frame:

str(get_all_vars(m1, .GlobalEnv))
## 'data.frame':   32 obs. of  3 variables:
##  $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ wt : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ hp : num  110 110 93 110 175 105 245 62 95 123 ...

which could, for example, be passed on to predict() just like the
output from model.frame():

str(predict(m1, model.frame(m1)))
## Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
## - attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...

str(predict(m1, get_all_vars(m1)))
## Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
## - attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...

For the model specified with a rhs matrix, however, get_all_vars()
returns a three-column data frame with the second matrix column added
as an unnamed third column:

str(get_all_vars(m2, .GlobalEnv))
## 'data.frame':   32 obs. of  3 variables:
##  $ y : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ x : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ NA: num  110 110 93 110 175 105 245 62 95 123 ...

This means attempts to use this data structure in predict() fail:

str(predict(m2, get_all_vars(m2)))
## Error: variable 'x' was fitted with type "nmatrix.2" but type
"numeric" was supplied

The correct structure needs to resemble following in order for that to succeed:

newdat <- data.frame(y = y)
newdat$x <- x
str(newdat)
## 'data.frame':   32 obs. of  2 variables:
##  $ y: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ x: num [1:32, 1:2] 2.62 2.88 2.32 3.21 3.44 ...
str(predict(m2, newdat))
##  Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
##  - attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...

The correct structure is basically what is returned by model.frame()
in cases involving a rhs matrix:

all.equal(newdat, model.frame(m2), check.attributes = FALSE)
## [1] TRUE

The issue seems to be in one of the very last lines of get_all_vars():

x <- setNames(as.data.frame(c(variables, extras), optional = TRUE),
        c(varnames, extranames))

This both coerces `variables` to the wrong structure (making a
three-column data frame instead of a two-column data frame) and
therefore misnames the resulting columns. I unfortunately don't know
the most sensible/general way to solve this, otherwise I would submit
a patch. Anyone know how to fix this last line?

Best,
-Thomas

Thomas J. Leeper
http://www.thomasleeper.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Loading...