formula wrangling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

formula wrangling

Koenker, Roger W
I need some help with a formula processing problem that arose from a seemingly innocuous  request
that I add a “subset” argument to the additive modeling function “rqss” in my quantreg package.

I’ve tried to boil the relevant code down to something simpler as illustrated below.  The formulae in
question involve terms called “qss” that construct sparse matrix objects, but I’ve replaced all that with
a much simpler BoxCox construction that I hope illustrates the basic difficulty.  What is supposed to happen
is that xss objects are evaluated and cbind’d to the design matrix, subject to the same subset restriction
as the rest of the model frame.  However, this doesn’t happen, instead the xss vectors are evaluated
on the full sample and the cbind operation generates a warning which probably should be an error.
I’ve inserted a browser() to make it easy to verify that the length of xss[[[1]] doesn’t match dim(X).

Any suggestions would be most welcome, including other simplifications of the code.  Note that
the function untangle.specials() is adapted, or perhaps I should say adopted form the survival
package so you would need the quantreg package to run the attached code.

Thanks,
Roger



fit <- function(formula, subset, data, ...){
    call <- match.call()
    m <- match.call(expand.dots = FALSE)
    tmp <- c("", "formula", "subset", "data")
    m <- m[match(tmp, names(m), nomatch = 0)]
    m[[1]] <- as.name("model.frame")
    Terms <- if(missing(data)) terms(formula,special = "qss")
            else terms(formula, special = "qss", data = data)
    qssterms <- attr(Terms, "specials")$qss
    if (length(qssterms)) {
        tmpc <- untangle.specials(Terms, "qss")
        dropx <- tmpc$terms
        if (length(dropx))
            Terms <- Terms[-dropx]
        attr(Terms, "specials") <- tmpc$vars
        fnames <- function(x) {
            fy <- all.names(x[[2]])
            if (fy[1] == "cbind")
                fy <- fy[-1]
            fy
        }
        fqssnames <- unlist(lapply(parse(text = tmpc$vars), fnames))
        qssnames <- unlist(lapply(parse(text = tmpc$vars), function(x) deparse(x[[2]])))
    }
    if (exists("fqssnames")) {
        ffqss <- paste(fqssnames, collapse = "+")
        ff <- as.formula(paste(deparse(formula), "+", ffqss))
    }
    m$formula <- Terms
    m <- eval(m, parent.frame())
    Y <- model.extract(m, "response")
    X <- model.matrix(Terms, m)
    ef <- environment(formula)
    qss <- function(x, lambda) (x^lambda - 1)/lambda
    if (length(qssterms) > 0) {
        xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), m, enclos = ef))
        for(i in 1:length(xss)){
            X <- cbind(X, xss[[i]]) # Here is the problem
        }
    }
    browser()
    z <- lm.fit(X,Y) # The dreaded least squares fit
    z
}
# Test case
n <- 200
x <- sort(rchisq(n,4))
z <- rnorm(n)
s <- sample(1:n, n/2)
y <- log(x) + rnorm(n)/5
D = data.frame(y = y, x = x, z = z, s = (1:n) %in% s)
plot(x, y)
lam = 0.2
#f0 <- fit(y ~ qss(x,lambda = lam) + z, subset = s)
f1 <- fit(y ~ qss(x, lambda = lam) + z, subset = s, data = D)
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: formula wrangling

Koenker, Roger W
Here is a revised snippet that seems to work the way that was intended.  Apologies to anyone
who wasted time looking at the original post.  Of course my interest in simpler or more efficient
solutions remains unabated.

if (exists("fqssnames")) {
        mff <- m
        mff$formula <- Terms
        ffqss <- paste(fqssnames, collapse = "+")
        mff$formula <- as.formula(paste(deparse(mff$formula), "+", ffqss))
    }
    m$formula <- Terms
    m <- eval(m, parent.frame())
    mff <- eval(mff, parent.frame())
    Y <- model.extract(m, "response")
    X <- model.matrix(Terms, m)
    ef <- environment(formula)
    qss <- function(x, lambda) (x^lambda - 1)/lambda
    if (length(qssterms) > 0) {
        xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), mff))
        for(i in 1:length(xss)){
            X <- cbind(X, xss[[i]]) # Here is the problem
        }
    }


> On Sep 21, 2020, at 9:52 AM, Koenker, Roger W <[hidden email]> wrote:
>
> I need some help with a formula processing problem that arose from a seemingly innocuous  request
> that I add a “subset” argument to the additive modeling function “rqss” in my quantreg package.
>
> I’ve tried to boil the relevant code down to something simpler as illustrated below.  The formulae in
> question involve terms called “qss” that construct sparse matrix objects, but I’ve replaced all that with
> a much simpler BoxCox construction that I hope illustrates the basic difficulty.  What is supposed to happen
> is that xss objects are evaluated and cbind’d to the design matrix, subject to the same subset restriction
> as the rest of the model frame.  However, this doesn’t happen, instead the xss vectors are evaluated
> on the full sample and the cbind operation generates a warning which probably should be an error.
> I’ve inserted a browser() to make it easy to verify that the length of xss[[[1]] doesn’t match dim(X).
>
> Any suggestions would be most welcome, including other simplifications of the code.  Note that
> the function untangle.specials() is adapted, or perhaps I should say adopted form the survival
> package so you would need the quantreg package to run the attached code.
>
> Thanks,
> Roger
>
>
>
> fit <- function(formula, subset, data, ...){
>    call <- match.call()
>    m <- match.call(expand.dots = FALSE)
>    tmp <- c("", "formula", "subset", "data")
>    m <- m[match(tmp, names(m), nomatch = 0)]
>    m[[1]] <- as.name("model.frame")
>    Terms <- if(missing(data)) terms(formula,special = "qss")
>    else terms(formula, special = "qss", data = data)
>    qssterms <- attr(Terms, "specials")$qss
>    if (length(qssterms)) {
>        tmpc <- untangle.specials(Terms, "qss")
>        dropx <- tmpc$terms
>        if (length(dropx))
>            Terms <- Terms[-dropx]
>        attr(Terms, "specials") <- tmpc$vars
> fnames <- function(x) {
>            fy <- all.names(x[[2]])
>            if (fy[1] == "cbind")
>                fy <- fy[-1]
>            fy
>        }
>        fqssnames <- unlist(lapply(parse(text = tmpc$vars), fnames))
>        qssnames <- unlist(lapply(parse(text = tmpc$vars), function(x) deparse(x[[2]])))
>    }
>    if (exists("fqssnames")) {
>        ffqss <- paste(fqssnames, collapse = "+")
>        ff <- as.formula(paste(deparse(formula), "+", ffqss))
>    }
>    m$formula <- Terms
>    m <- eval(m, parent.frame())
>    Y <- model.extract(m, "response")
>    X <- model.matrix(Terms, m)
>    ef <- environment(formula)
>    qss <- function(x, lambda) (x^lambda - 1)/lambda
>    if (length(qssterms) > 0) {
>        xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), m, enclos = ef))
> for(i in 1:length(xss)){
>    X <- cbind(X, xss[[i]]) # Here is the problem
> }
>    }
>    browser()
>    z <- lm.fit(X,Y) # The dreaded least squares fit
>    z
> }
> # Test case
> n <- 200
> x <- sort(rchisq(n,4))
> z <- rnorm(n)
> s <- sample(1:n, n/2)
> y <- log(x) + rnorm(n)/5
> D = data.frame(y = y, x = x, z = z, s = (1:n) %in% s)
> plot(x, y)
> lam = 0.2
> #f0 <- fit(y ~ qss(x,lambda = lam) + z, subset = s)
> f1 <- fit(y ~ qss(x, lambda = lam) + z, subset = s, data = D)
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: formula wrangling

Fox, John
In reply to this post by Koenker, Roger W
Dear Roger,

This is an interesting puzzle and I started to look at it when your
second message arrived. I can simplify your code slightly in two places,
here:

   if (exists("fqssnames")) {
     mff <- m
     ffqss <- paste(fqssnames, collapse = "+")
     mff$formula <- as.formula(paste(deparse(Terms), "+", ffqss))
   }

and here:

   if (length(qssterms) > 0) {
     X <- do.call(cbind,
                  c(list(X),
                    lapply(tmpc$vars, function(u) eval(parse(text = u),
mff))))
     }

and the following line is extraneous:

    ef <- environment(formula)

That doesn't amount to much, and I haven't tested my substitute code
beyond your example.

Best,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-09-21 9:40 a.m., Koenker, Roger W wrote:

> Here is a revised snippet that seems to work the way that was intended.  Apologies to anyone
> who wasted time looking at the original post.  Of course my interest in simpler or more efficient
> solutions remains unabated.
>
> if (exists("fqssnames")) {
> mff <- m
> mff$formula <- Terms
>          ffqss <- paste(fqssnames, collapse = "+")
>          mff$formula <- as.formula(paste(deparse(mff$formula), "+", ffqss))
>      }
>      m$formula <- Terms
>      m <- eval(m, parent.frame())
>      mff <- eval(mff, parent.frame())
>      Y <- model.extract(m, "response")
>      X <- model.matrix(Terms, m)
>      ef <- environment(formula)
>      qss <- function(x, lambda) (x^lambda - 1)/lambda
>      if (length(qssterms) > 0) {
>          xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), mff))
> for(i in 1:length(xss)){
>    X <- cbind(X, xss[[i]]) # Here is the problem
> }
>      }
>
>
>> On Sep 21, 2020, at 9:52 AM, Koenker, Roger W <[hidden email]> wrote:
>>
>> I need some help with a formula processing problem that arose from a seemingly innocuous  request
>> that I add a “subset” argument to the additive modeling function “rqss” in my quantreg package.
>>
>> I’ve tried to boil the relevant code down to something simpler as illustrated below.  The formulae in
>> question involve terms called “qss” that construct sparse matrix objects, but I’ve replaced all that with
>> a much simpler BoxCox construction that I hope illustrates the basic difficulty.  What is supposed to happen
>> is that xss objects are evaluated and cbind’d to the design matrix, subject to the same subset restriction
>> as the rest of the model frame.  However, this doesn’t happen, instead the xss vectors are evaluated
>> on the full sample and the cbind operation generates a warning which probably should be an error.
>> I’ve inserted a browser() to make it easy to verify that the length of xss[[[1]] doesn’t match dim(X).
>>
>> Any suggestions would be most welcome, including other simplifications of the code.  Note that
>> the function untangle.specials() is adapted, or perhaps I should say adopted form the survival
>> package so you would need the quantreg package to run the attached code.
>>
>> Thanks,
>> Roger
>>
>>
>>
>> fit <- function(formula, subset, data, ...){
>>     call <- match.call()
>>     m <- match.call(expand.dots = FALSE)
>>     tmp <- c("", "formula", "subset", "data")
>>     m <- m[match(tmp, names(m), nomatch = 0)]
>>     m[[1]] <- as.name("model.frame")
>>     Terms <- if(missing(data)) terms(formula,special = "qss")
>>    else terms(formula, special = "qss", data = data)
>>     qssterms <- attr(Terms, "specials")$qss
>>     if (length(qssterms)) {
>>         tmpc <- untangle.specials(Terms, "qss")
>>         dropx <- tmpc$terms
>>         if (length(dropx))
>>             Terms <- Terms[-dropx]
>>         attr(Terms, "specials") <- tmpc$vars
>> fnames <- function(x) {
>>             fy <- all.names(x[[2]])
>>             if (fy[1] == "cbind")
>>                 fy <- fy[-1]
>>             fy
>>         }
>>         fqssnames <- unlist(lapply(parse(text = tmpc$vars), fnames))
>>         qssnames <- unlist(lapply(parse(text = tmpc$vars), function(x) deparse(x[[2]])))
>>     }
>>     if (exists("fqssnames")) {
>>         ffqss <- paste(fqssnames, collapse = "+")
>>         ff <- as.formula(paste(deparse(formula), "+", ffqss))
>>     }
>>     m$formula <- Terms
>>     m <- eval(m, parent.frame())
>>     Y <- model.extract(m, "response")
>>     X <- model.matrix(Terms, m)
>>     ef <- environment(formula)
>>     qss <- function(x, lambda) (x^lambda - 1)/lambda
>>     if (length(qssterms) > 0) {
>>         xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), m, enclos = ef))
>> for(i in 1:length(xss)){
>>    X <- cbind(X, xss[[i]]) # Here is the problem
>> }
>>     }
>>     browser()
>>     z <- lm.fit(X,Y) # The dreaded least squares fit
>>     z
>> }
>> # Test case
>> n <- 200
>> x <- sort(rchisq(n,4))
>> z <- rnorm(n)
>> s <- sample(1:n, n/2)
>> y <- log(x) + rnorm(n)/5
>> D = data.frame(y = y, x = x, z = z, s = (1:n) %in% s)
>> plot(x, y)
>> lam = 0.2
>> #f0 <- fit(y ~ qss(x,lambda = lam) + z, subset = s)
>> f1 <- fit(y ~ qss(x, lambda = lam) + z, subset = s, data = D)
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.