# looking for formula parser that allows coefficients Classic List Threaded 5 messages Open this post in threaded view
|

## looking for formula parser that allows coefficients

 Can you point me at any packages that allow users to write a formula with coefficients? I want to write a data simulator that has a matrix X with lots of columns, and then users can generate predictive models by entering a formula that uses some of the variables, allowing interactions, like y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 Currently, in the rockchalk package, I have a function simulates data (genCorrelatedData2), but my interface to enter the beta coefficients is poor.  I assumed user would always enter 0's as place holder for the unused coefficients, and the intercept is always first. The unnamed vector is too confusing.  I have them specify: c(2, 1.1, 0, 3, 0, 0, 0.2, ...) I the documentation I say (ridiculously) it is easy to figure out from the examples, but it really isnt. It function prints out the equation it thinks you intended, thats minimum protection against user error, but still not very good: dat <- genCorrelatedData2(N = 10, rho = 0.0,           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),           means = c(0,0,0), sds = c(1,1,1), stde = 0)  "The equation that was calculated was" y = 1 + 2*x1 + 1*x2 + 1*x3  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3  + N(0,0) random error But still, it is not very good. As I look at this now, I realize expect just the vech, not the whole vector of all interaction terms, so it is even more difficult than I thought to get the correct input.Hence, I'd like to let the user write a formula. The alternative for the user interface is to have named coefficients. I can more or less easily allow a named vector for beta beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) I could build a formula from that.  That's not too bad. But I still think it would be cool to allow formula input. Have you ever seen it done? pj -- Paul E. Johnson   http://pj.freefaculty.orgDirector, Center for Research Methods and Data Analysis http://crmda.ku.eduTo write to me directly, please address me at pauljohn at ku.edu. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: looking for formula parser that allows coefficients

 Dear Paul, Is it possible that you're overthinking this? That is, to you really need an R model formula or just want to evaluate an arithmetic expression using the columns of X? If the latter, the following approach may work for you: > evalFormula <- function(X, expr){ +   if (is.null(colnames(X))) colnames(X) <- paste0("x", 1:ncol(X)) +   with(as.data.frame(X), eval(parse(text=expr))) + } > X <- matrix(1:20, 5, 4) > X      [,1] [,2] [,3] [,4] [1,]    1    6   11   16 [2,]    2    7   12   17 [3,]    3    8   13   18 [4,]    4    9   14   19 [5,]    5   10   15   20 > evalFormula(X, '2 + 3*x1 + 4*x2 + 5*x3 + 6*x1*x2')  120 180 252 336 432 I hope that this helps,  John ----------------------------------------------------------------- John Fox Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: https://socialsciences.mcmaster.ca/jfox/> -----Original Message----- > From: R-help [mailto:[hidden email]] On Behalf Of Paul > Johnson > Sent: Tuesday, August 21, 2018 6:46 PM > To: R-help <[hidden email]> > Subject: [R] looking for formula parser that allows coefficients > > Can you point me at any packages that allow users to write a formula with > coefficients? > > I want to write a data simulator that has a matrix X with lots of columns, and > then users can generate predictive models by entering a formula that uses > some of the variables, allowing interactions, like > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 > > Currently, in the rockchalk package, I have a function simulates data > (genCorrelatedData2), but my interface to enter the beta coefficients is poor. > I assumed user would always enter 0's as place holder for the unused > coefficients, and the intercept is always first. The unnamed vector is too > confusing.  I have them specify: > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...) > > I the documentation I say (ridiculously) it is easy to figure out from the > examples, but it really isnt. > It function prints out the equation it thinks you intended, thats minimum > protection against user error, but still not very good: > > dat <- genCorrelatedData2(N = 10, rho = 0.0, >           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0), >           means = c(0,0,0), sds = c(1,1,1), stde = 0)  "The equation that was > calculated was" > y = 1 + 2*x1 + 1*x2 + 1*x3 >  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1 >  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 >  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 >  + N(0,0) random error > > But still, it is not very good. > > As I look at this now, I realize expect just the vech, not the whole vector of all > interaction terms, so it is even more difficult than I thought to get the correct > input.Hence, I'd like to let the user write a formula. > > The alternative for the user interface is to have named coefficients. > I can more or less easily allow a named vector for beta > > beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) > > I could build a formula from that.  That's not too bad. But I still think it would > be cool to allow formula input. > > Have you ever seen it done? > pj > -- > Paul E. Johnson   http://pj.freefaculty.org> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu> > To write to me directly, please address me at pauljohn at ku.edu. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-> guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: looking for formula parser that allows coefficients

 In reply to this post by PaulJohnson32gmail Some string manipulation can convert the formula to a named vector such as the one shown at the end of your post. library(gsubfn) # input fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2 pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?" ch <- format(fo[]) m <- matrix(strapplyc(ch, pat)[], 3) m <- m[, colSums(m != "") > 0] m[2, m[2, ] == ""] <- 1 m[3, m[3, ] == ""] <- "(Intercept)" co <- as.numeric(paste0(m[1, ], m[2, ])) v <- m[3, ] setNames(co, v) ## (Intercept)          x1          x3       x1:x3       x2:x2 ##         2.0        -1.1         1.0        -1.0         0.2 On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <[hidden email]> wrote: > > Can you point me at any packages that allow users to write a > formula with coefficients? > > I want to write a data simulator that has a matrix X with lots > of columns, and then users can generate predictive models > by entering a formula that uses some of the variables, allowing > interactions, like > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2 > > Currently, in the rockchalk package, I have a function simulates > data (genCorrelatedData2), but my interface to enter the beta > coefficients is poor.  I assumed user would always enter 0's as > place holder for the unused coefficients, and the intercept is > always first. The unnamed vector is too confusing.  I have them specify: > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...) > > I the documentation I say (ridiculously) it is easy to figure out from > the examples, but it really isnt. > It function prints out the equation it thinks you intended, thats > minimum protection against user error, but still not very good: > > dat <- genCorrelatedData2(N = 10, rho = 0.0, >           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0), >           means = c(0,0,0), sds = c(1,1,1), stde = 0) >  "The equation that was calculated was" > y = 1 + 2*x1 + 1*x2 + 1*x3 >  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1 >  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2 >  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3 >  + N(0,0) random error > > But still, it is not very good. > > As I look at this now, I realize expect just the vech, not the whole vector > of all interaction terms, so it is even more difficult than I thought to get the > correct input.Hence, I'd like to let the user write a formula. > > The alternative for the user interface is to have named coefficients. > I can more or less easily allow a named vector for beta > > beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1) > > I could build a formula from that.  That's not too bad. But I still think > it would be cool to allow formula input. > > Have you ever seen it done? > pj > -- > Paul E. Johnson   http://pj.freefaculty.org> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu> > To write to me directly, please address me at pauljohn at ku.edu. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.