# Order of formula terms in model.matrix

8 messages
Open this post in threaded view
|

## Order of formula terms in model.matrix

 I’d appreciate your help on understanding the following. It is not very clear to me from the model.matrix documentation, why simply changing the order of terms in the formula may change the number of resulting columns. Please note I’m purposely not including main effects in the model formula in this case.   set.seed(1) x1 <- rnorm(100) f1 <- factor(sample(letters[1:3], 100, replace = TRUE)) trt <- sample(c(-1,1), 100, replace = TRUE) df <- data.frame(x1=x1, f1=f1, trt=trt) dim(model.matrix( ~ x1:trt + f1:trt, data = df)) [1] 100 4 dim(model.matrix(~ f1:trt + x1:trt, data = df)) [1] 100 5 Thanks, Lars. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Order of formula terms in model.matrix

 On Sun, 17 Jan 2016, Lars Bishop wrote: > I’d appreciate your help on understanding the following. > > It is not very clear to me from the model.matrix documentation, why > simply changing the order of terms in the formula may change the number > of resulting columns. Please note I’m purposely not including main > effects in the model formula in this case. IIRC, there are some heuristics involved harking back to the White Book. I recall there have been discussions of whether and how this could be fixed before on this list and or R-devel, but I cannot seem to lay my browser on them right now. > > > set.seed(1) > x1 <- rnorm(100) > f1 <- factor(sample(letters[1:3], 100, replace = TRUE)) > trt <- sample(c(-1,1), 100, replace = TRUE) > df <- data.frame(x1=x1, f1=f1, trt=trt) > > dim(model.matrix( ~ x1:trt + f1:trt, data = df)) > [1] 100 4 > > dim(model.matrix(~ f1:trt + x1:trt, data = df)) > [1] 100 5 > By `x1:trt' I guess you mean the same thing as `I(x1*trt)'. If you use the latter form, the issue you raise goes away. Note that `I(some.expr)' gives you the ability to force the behavior of model.matrix to be exactly what you want by suitably crafting `some.expr', heuristics notwithstanding. HTH, Chuck ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Order of formula terms in model.matrix

 This is very helpful, thanks! Lars. > On Jan 17, 2016, at 1:34 PM, Charles C. Berry <[hidden email]> wrote: > > On Sun, 17 Jan 2016, Lars Bishop wrote: > >> I’d appreciate your help on understanding the following. > >> It is not very clear to me from the model.matrix documentation, why simply changing the order of terms in the formula may change the number of resulting columns. Please note I’m purposely not including main effects in the model formula in this case. > > > IIRC, there are some heuristics involved harking back to the White Book. I recall there have been discussions of whether and how this could be fixed before on this list and or R-devel, but I cannot seem to lay my browser on them right now. > > >> >> set.seed(1) >> x1 <- rnorm(100) >> f1 <- factor(sample(letters[1:3], 100, replace = TRUE)) >> trt <- sample(c(-1,1), 100, replace = TRUE) >> df <- data.frame(x1=x1, f1=f1, trt=trt) >> >> dim(model.matrix( ~ x1:trt + f1:trt, data = df)) >> [1] 100 4 >> >> dim(model.matrix(~ f1:trt + x1:trt, data = df)) >> [1] 100 5 >> > > By `x1:trt' I guess you mean the same thing as `I(x1*trt)'. > > If you use the latter form, the issue you raise goes away. > > Note that `I(some.expr)' gives you the ability to force the behavior of model.matrix to be exactly what you want by suitably crafting `some.expr', heuristics notwithstanding. > > HTH, > > Chuck > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Order of formula terms in model.matrix

 Question: I(a*b) would work as long as both “a" and “b” are numeric. Is there a way I can force the behaviour of model.matrix when one of these variables is a factor (as in "f1:trt" from my example below)? Specifically, based on my example below, I would like to always return the first matrix (where the first level of “f1” is omitted in the resulting matrix). This would’t happen if the user specifies the terms in reverse order (as per the second matrix). Thanks again, Lars. > On Jan 17, 2016, at 2:53 PM, Lars Bishop <[hidden email]> wrote: > > This is very helpful, thanks! > > Lars. > > >> On Jan 17, 2016, at 1:34 PM, Charles C. Berry <[hidden email]> wrote: >> >> On Sun, 17 Jan 2016, Lars Bishop wrote: >> >>> I’d appreciate your help on understanding the following. >> >>> It is not very clear to me from the model.matrix documentation, why simply changing the order of terms in the formula may change the number of resulting columns. Please note I’m purposely not including main effects in the model formula in this case. >> >> >> IIRC, there are some heuristics involved harking back to the White Book. I recall there have been discussions of whether and how this could be fixed before on this list and or R-devel, but I cannot seem to lay my browser on them right now. >> >> >>> >>> set.seed(1) >>> x1 <- rnorm(100) >>> f1 <- factor(sample(letters[1:3], 100, replace = TRUE)) >>> trt <- sample(c(-1,1), 100, replace = TRUE) >>> df <- data.frame(x1=x1, f1=f1, trt=trt) >>> >>> dim(model.matrix( ~ x1:trt + f1:trt, data = df)) >>> [1] 100 4 >>> >>> dim(model.matrix(~ f1:trt + x1:trt, data = df)) >>> [1] 100 5 >>> >> >> By `x1:trt' I guess you mean the same thing as `I(x1*trt)'. >> >> If you use the latter form, the issue you raise goes away. >> >> Note that `I(some.expr)' gives you the ability to force the behavior of model.matrix to be exactly what you want by suitably crafting `some.expr', heuristics notwithstanding. >> >> HTH, >> >> Chuck >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Order of formula terms in model.matrix

 In reply to this post by Berry, Charles > On 17 Jan 2016, at 19:34 , Charles C. Berry <[hidden email]> wrote: > > > IIRC, there are some heuristics involved harking back to the White Book. I recall there have been discussions of whether and how this could be fixed before on this list and or R-devel, but I cannot seem to lay my browser on them right now. > And IIRC: yup, and one of the issues is that (a) some rules work left-to-right (b) the logic is oblivious to the factor/vector distinction For factors a,b,c, what happens for ~a:b + b:c is that a:b gets the full term expansion since the marginals a and b are not in the model but since b is part of the fully expanded a:b,  b:c gets the reduced form expansion as it would in ~b + b:c (the c-within-b thing). Swapping the terms gives you a different result, but at least it is the same model in the sense that the columns span the same subspace. If a and b are vectors, and c is a factor, you get the same logic: expand a:b fully, then treat b:c as in b + b:c. Unfortunately, a:b is just the product of a and b, whether or not it is fully expanded, so it doesn't really make sense to proceed as if b is contained in a preceding term. So the net result is that you end up with one column less than you probably wanted. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: [hidden email]  Priv: [hidden email] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Order of formula terms in model.matrix

 In reply to this post by Lars Bishop-2 Is it really the same model? Following the example provided by Lars: set.seed(1) x1 <- rnorm(100) f1 <- factor(sample(letters[1:3], 100, replace = TRUE)) trt <- sample(c(-1,1), 100, replace = TRUE) y <- factor(sample(c(0,1), 100, T)) df <- data.frame(y=y, x1=x1, f1=f1, trt=trt) fit1 <- glm(y ~ x1:trt + f1:trt, data = df, family = binomial) coef(fit1) fit2 <- glm(y ~ f1:trt + x1:trt, data = df, family = binomial) coef(fit2) identical(fitted(fit1), fitted(fit2)) [1] FALSE _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference.   Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.