Hello Tyler,
I want to bring to your attention the following document: "What happens if you omit the main effect in a regression model with an interaction?" (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction). This gives a useful review of the problem. Your example is Case 2: a continuous and a categorical regressor. The numerical examples are coded in Stata, and they give the same result as in R. Hence, if this is a bug in R then it is also a bug in Stata. That seems very unlikely. Here is a simulation in R of the above mentioned Case 2 in Stata: df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) print("Full model") print(model.matrix(~(socst+grp)^2 ,data=df)) print("Example 2.1: drop socst") print(model.matrix(~(socst+grp)^2 -socst ,data=df)) print("Example 2.2: drop grp") print(model.matrix(~(socst+grp)^2 -grp ,data=df)) This gives indeed the following regressors: "Full model" (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 "Example 2.1: drop socst" (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4 "Example 2.2: drop grp" (Intercept) socst socst:grp2 socst:grp3 socst:grp4 There is a little bit of R documentation about this, based on the concept of marginality, which typically forbids a model having an interaction but not the corresponding main effects. (You might see the references in https://en.wikipedia.org/wiki/Principle_of_marginality ) See "An Introduction to R", by Venables and Smith and the R Core Team. At the bottom of page 52 (PDF: 57) it says: "Although the details are complicated, model formulae in R will normally generate the models that an expert statistician would expect, provided that marginality is preserved. Fitting, for [a contrary] example, a model with an interaction but not the corresponding main effects will in general lead to surprising results ....". The Reference Manual states that the R functions dropterm() and addterm() resp. drop or add only terms such that marginality is preserved. Finally, about your singular matrix t(mm)%*%mm. This is in fact Example 2.1 in Case 2 discussed above. As discussed there, in Stata and in R the drop of the continuous variable has no effect on the degrees of freedom here: it is just a reparameterisation of the full model, protecting you against losing marginality... Hence the model.matrix 'mm' is still square and nonsingular after the drop of X1, unless of course when a row is removed from the matrix 'design' when before creating 'mm'. Arie On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: > You could possibly try to explain away the behavior for a missing main > effects term, since without the main effects term we don't have main effect > columns in the model matrix used to compute the interaction columns (At > best this is undocumented behavior--I still think it's a bug, as we know > how we would encode the categorical factors if they were in fact present. > It's either specified in contrasts.arg or using the default set in > options). However, when all the main effects are present, why would the > three-factor interaction column not simply be the product of the main > effect columns? In my example: we know X1, we know X2, and we know X3. Why > does the encoding of X1:X2:X3 depend on whether we specified a two-factor > interaction, AND only changes for specific missing interactions? > > In addition, I can use a two-term example similar to yours to show how this > behavior results in a singular covariance matrix when, given the desired > factor encoding, it should not be singular. > > We start with a full factorial design for a two-level continuous factor and > a three-level categorical factor, and remove a single row. This design > matrix does not leave enough degrees of freedom to determine > goodness-of-fit, but should allow us to obtain parameter estimates. > >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) >> design = design[-1,] >> design > X1 X2 > 2 -1 A > 3 1 B > 4 -1 B > 5 1 C > 6 -1 C > > Here, we first calculate the model matrix for the full model, and then > manually remove the X1 column from the model matrix. This gives us the > model matrix one would expect if X1 were removed from the model. We then > successfully calculate the covariance matrix. > >> mm = model.matrix(~(X1+X2)^2,data=design) >> mm > (Intercept) X1 X2B X2C X1:X2B X1:X2C > 2 1 -1 0 0 0 0 > 3 1 1 1 0 1 0 > 4 1 -1 1 0 -1 0 > 5 1 1 0 1 0 1 > 6 1 -1 0 1 0 -1 > >> mm = mm[,-2] >> solve(t(mm) %*% mm) > (Intercept) X2B X2C X1:X2B X1:X2C > (Intercept) 1 -1.0 -1.0 0.0 0.0 > X2B -1 1.5 1.0 0.0 0.0 > X2C -1 1.0 1.5 0.0 0.0 > X1:X2B 0 0.0 0.0 0.5 0.0 > X1:X2C 0 0.0 0.0 0.0 0.5 > > Here, we see the actual behavior for model.matrix. The undesired re-coding > of the model matrix interaction term makes the information matrix singular. > >> mm = model.matrix(~(X1+X2)^2-X1,data=design) >> mm > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C > 2 1 0 0 -1 0 0 > 3 1 1 0 0 1 0 > 4 1 1 0 0 -1 0 > 5 1 0 1 0 0 1 > 6 1 0 1 0 0 -1 > >> solve(t(mm) %*% mm) > Error in solve.default(t(mm) %*% mm) : system is computationally singular: > reciprocal condition number = 5.55112e-18 > > I still believe this is a bug. > > Best regards, > Tyler Morgan-Wall > > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate <[hidden email]> > wrote: > >> I think it is not a bug. It is a general property of interactions. >> This property is best observed if all variables are factors >> (qualitative). >> >> For example, you have three variables (factors). You ask for as many >> interactions as possible, except an interaction term between two >> particular variables. When this interaction is not a constant, it is >> different for different values of the remaining variable. More >> precisely: for all values of that variable. In other words: you have a >> three-way interaction, with all values of that variable. >> >> An even smaller example is the following script with only two >> variables, each being a factor: >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) >> print(model.matrix(~(X1+X2)^2 ,data=df)) >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) >> >> The result is: >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C >> 1 1 0 0 0 0 0 >> 2 1 1 0 0 0 0 >> 3 1 0 1 0 0 0 >> 4 1 1 1 0 1 0 >> 5 1 0 0 1 0 0 >> 6 1 1 0 1 0 1 >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C >> 1 1 0 0 0 0 0 >> 2 1 0 0 1 0 0 >> 3 1 1 0 0 0 0 >> 4 1 1 0 0 1 0 >> 5 1 0 1 0 0 0 >> 6 1 0 1 0 0 1 >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C >> 1 1 0 0 0 0 0 >> 2 1 1 0 0 0 0 >> 3 1 0 1 0 0 0 >> 4 1 1 0 1 0 0 >> 5 1 0 0 0 1 0 >> 6 1 1 0 0 0 1 >> >> Thus, in the second result, we have no main effect of X1. Instead, the >> effect of X1 depends on the value of X2; either A or B or C. In fact, >> this is a two-way interaction, including all three values of X2. In >> the third result, we have no main effect of X2, The effect of X2 >> depends on the value of X1; either p or q. >> >> A complicating element with your example seems to be that your X1 and >> X2 are not factors. >> >> Arie >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: >> > Hi, >> > >> > I recently ran into an inconsistency in the way model.matrix.default >> > handles factor encoding for higher level interactions with categorical >> > variables when the full hierarchy of effects is not present. Depending on >> > which lower level interactions are specified, the factor encoding changes >> > for a higher level interaction. Consider the following minimal >> reproducible >> > example: >> > >> > -------------- >> > >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B X3C >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> > 1 1 1 1 0 0 1 0 0 0 0 >> > 0 0 >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> > 0 0 >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> > 0 0 >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> > 0 0 >> > 5 1 1 1 1 0 1 1 0 1 0 >> > 1 0 >> > 6 1 -1 1 1 0 -1 -1 0 1 0 >> > -1 0 >> > 7 1 1 -1 1 0 -1 1 0 -1 0 >> > -1 0 >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 >> > 1 0 >> > 9 1 1 1 0 1 1 0 1 0 1 >> > 0 1 >> > 10 1 -1 1 0 1 -1 0 -1 0 1 >> > 0 -1 >> > 11 1 1 -1 0 1 -1 0 1 0 -1 >> > 0 -1 >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 >> > 0 1 >> > attr(,"assign") >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 >> > attr(,"contrasts") >> > attr(,"contrasts")$X3 >> > [1] "contr.treatment" >> > >> > -------------- >> > >> > Specifying the full hierarchy gives us what we expect: the interaction >> > columns are simply calculated from the product of the main effect >> columns. >> > The interaction term X1:X2:X3 gives us two columns in the model matrix, >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main effects. >> > >> > If we remove either the X2:X3 interaction or the X1:X3 interaction, we >> get >> > what we would expect for the X1:X2:X3 interaction, but when we remove the >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: >> > >> > -------------- >> > >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) X1 X2 >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> > 1 1 1 1 0 0 1 0 0 0 0 >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> > 5 1 1 1 1 0 1 1 0 1 0 >> > 6 1 -1 1 1 0 -1 1 0 -1 0 >> > 7 1 1 -1 1 0 -1 -1 0 -1 0 >> > 8 1 -1 -1 1 0 1 -1 0 1 0 >> > 9 1 1 1 0 1 1 0 1 0 1 >> > 10 1 -1 1 0 1 -1 0 1 0 -1 >> > 11 1 1 -1 0 1 -1 0 -1 0 -1 >> > 12 1 -1 -1 0 1 1 0 -1 0 1 >> > attr(,"assign") >> > [1] 0 1 2 3 3 4 5 5 6 6 >> > attr(,"contrasts") >> > attr(,"contrasts")$X3 >> > [1] "contr.treatment" >> > >> > >> > >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) X1 X2 >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C >> > 1 1 1 1 0 0 1 0 0 0 0 >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> > 5 1 1 1 1 0 1 1 0 1 0 >> > 6 1 -1 1 1 0 -1 -1 0 -1 0 >> > 7 1 1 -1 1 0 -1 1 0 -1 0 >> > 8 1 -1 -1 1 0 1 -1 0 1 0 >> > 9 1 1 1 0 1 1 0 1 0 1 >> > 10 1 -1 1 0 1 -1 0 -1 0 -1 >> > 11 1 1 -1 0 1 -1 0 1 0 -1 >> > 12 1 -1 -1 0 1 1 0 -1 0 1 >> > attr(,"assign") >> > [1] 0 1 2 3 3 4 5 5 6 6 >> > attr(,"contrasts") >> > attr(,"contrasts")$X3 >> > [1] "contr.treatment" >> > >> > >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) X1 X2 >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C >> > 1 1 1 1 0 0 0 0 0 0 1 >> > 0 0 >> > 2 1 -1 1 0 0 0 0 0 0 -1 >> > 0 0 >> > 3 1 1 -1 0 0 0 0 0 0 -1 >> > 0 0 >> > 4 1 -1 -1 0 0 0 0 0 0 1 >> > 0 0 >> > 5 1 1 1 1 0 1 0 1 0 0 >> > 1 0 >> > 6 1 -1 1 1 0 -1 0 1 0 0 >> > -1 0 >> > 7 1 1 -1 1 0 1 0 -1 0 0 >> > -1 0 >> > 8 1 -1 -1 1 0 -1 0 -1 0 0 >> > 1 0 >> > 9 1 1 1 0 1 0 1 0 1 0 >> > 0 1 >> > 10 1 -1 1 0 1 0 -1 0 1 0 >> > 0 -1 >> > 11 1 1 -1 0 1 0 1 0 -1 0 >> > 0 -1 >> > 12 1 -1 -1 0 1 0 -1 0 -1 0 >> > 0 1 >> > attr(,"assign") >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 >> > attr(,"contrasts") >> > attr(,"contrasts")$X3 >> > [1] "contr.treatment" >> > >> > -------------- >> > >> > Here, we now see the encoding for the interaction X1:X2:X3 is now the >> > interaction of X1 and X2 with a new encoding for X3 where each factor >> level >> > is represented by its own column. I would expect, given the two column >> > dummy variable encoding for X3, that the X1:X2:X3 column would also be >> two >> > columns regardless of what two-factor interactions we also specified, but >> > in this case it switches to three. If other two factor interactions are >> > missing in addition to X1:X2, this issue still occurs. This also happens >> > regardless of the contrast specified in contrasts.arg for X3. I don't see >> > any reasoning for this behavior given in the documentation, so I suspect >> it >> > is a bug. >> > >> > Best regards, >> > Tyler Morgan-Wall >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > [hidden email] mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi Arie,
Thank you for your further research into the issue. Regarding Stata: On the other hand, JMP gives model matrices that use the main effects contrasts in computing the higher order interactions, without the dummy variable encoding. I verified this both by analyzing the linear model given in my first example and noting that JMP has one more degree of freedom than R for the same model, as well as looking at the generated model matrices. It's easy to find a design where JMP will allow us fit our model with goodness-of-fit estimates and R will not due to the extra degree(s) of freedom required. Let's keep the conversation limited to R. I want to refocus back onto my original bug report, which was not for a missing main effects term, but rather for a missing lower-order interaction term. The behavior of model.matrix.default() for a missing main effects term is a nice example to demonstrate how model.matrix encodes with dummy variables instead of contrasts, but doesn't demonstrate the inconsistent behavior my bug report highlighted. I went looking for documentation on this behavior, and the issue stems not from model.matrix.default(), but rather the terms() function in interpreting the formula. This "clever" replacement of contrasts by dummy variables to maintain marginality (presuming that's the reason) is not described anywhere in the documentation for either the model.matrix() or the terms() function. In order to find a description for the behavior, I had to look in the underlying C code, buried above the "TermCode" function of the "model.c" file, which says: "TermCode decides on the encoding of a model term. Returns 1 if variable ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it is to be encoded by dummy variables. This is decided using the heuristic described in Statistical Models in S, page 38." I do not have a copy of this book, and I suspect most R users do not as well. Thankfully, however, some of the pages describing this behavior were available as part of Amazon's "Look Inside" feature--but if not for that, I would have no idea what heuristic R was using. Since those pages could made unavailable by Amazon at any time, at the very least we have an problem with a lack of documentation. However, I still believe there is a bug when comparing R's implementation to the heuristic described in the book. From Statistical Models in S, page 38-39: "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the margin of T_i for factor F_j--that is, the term obtained by dropping F_j from T_i. We say that T_{i(j)} has appeared in the formula if there is some term T_i' for i' < i such that T_i' contains all the factors appearing in T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the formula and by dummy variables if it has not" Here, F_j refers to a factor (variable) in a model and not a categorical factor, as specified later in that section (page 40): "Numeric variables appear in the computations as themselves, uncoded. Therefore, the rule does not do anything special for them, and it remains valid, in a trivial sense, whenever any of the F_j is numeric rather than categorical." Going back to my original example with three variables: X1 (numeric), X2 (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3 with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. When any of the preceding terms do not exist, this heuristic tells us to use dummy variables to encode the interaction (e.g. "F_j [the interaction term] is coded ... by dummy variables if it [any of the marginal terms obtained by dropping a single factor in the interaction] has not [appeared in the formula]"). However, the example I gave demonstrated that this dummy variable encoding only occurs for the model where the missing term is the numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the interaction term X1:X2:X3 is encoded by contrasts, not dummy variables. This is inconsistent with the description of the intended behavior given in the book. Best regards, Tyler On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate <[hidden email]> wrote: > Hello Tyler, > > I want to bring to your attention the following document: "What > happens if you omit the main effect in a regression model with an > interaction?" (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-o > mit-the-main-effect-in-a-regression-model-with-an-interaction). > This gives a useful review of the problem. Your example is Case 2: a > continuous and a categorical regressor. > > The numerical examples are coded in Stata, and they give the same > result as in R. Hence, if this is a bug in R then it is also a bug in > Stata. That seems very unlikely. > > Here is a simulation in R of the above mentioned Case 2 in Stata: > > df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) > print("Full model") > print(model.matrix(~(socst+grp)^2 ,data=df)) > print("Example 2.1: drop socst") > print(model.matrix(~(socst+grp)^2 -socst ,data=df)) > print("Example 2.2: drop grp") > print(model.matrix(~(socst+grp)^2 -grp ,data=df)) > > This gives indeed the following regressors: > > "Full model" > (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 > "Example 2.1: drop socst" > (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4 > "Example 2.2: drop grp" > (Intercept) socst socst:grp2 socst:grp3 socst:grp4 > > There is a little bit of R documentation about this, based on the > concept of marginality, which typically forbids a model having an > interaction but not the corresponding main effects. (You might see the > references in https://en.wikipedia.org/wiki/Principle_of_marginality ) > See "An Introduction to R", by Venables and Smith and the R Core > Team. At the bottom of page 52 (PDF: 57) it says: "Although the > details are complicated, model formulae in R will normally generate > the models that an expert statistician would expect, provided that > marginality is preserved. Fitting, for [a contrary] example, a model > with an interaction but not the corresponding main effects will in > general lead to surprising results ....". > The Reference Manual states that the R functions dropterm() and > addterm() resp. drop or add only terms such that marginality is > preserved. > > Finally, about your singular matrix t(mm)%*%mm. This is in fact > Example 2.1 in Case 2 discussed above. As discussed there, in Stata > and in R the drop of the continuous variable has no effect on the > degrees of freedom here: it is just a reparameterisation of the full > model, protecting you against losing marginality... Hence the > model.matrix 'mm' is still square and nonsingular after the drop of > X1, unless of course when a row is removed from the matrix 'design' > when before creating 'mm'. > > Arie > > On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: > > You could possibly try to explain away the behavior for a missing main > > effects term, since without the main effects term we don't have main > effect > > columns in the model matrix used to compute the interaction columns (At > > best this is undocumented behavior--I still think it's a bug, as we know > > how we would encode the categorical factors if they were in fact present. > > It's either specified in contrasts.arg or using the default set in > > options). However, when all the main effects are present, why would the > > three-factor interaction column not simply be the product of the main > > effect columns? In my example: we know X1, we know X2, and we know X3. > Why > > does the encoding of X1:X2:X3 depend on whether we specified a two-factor > > interaction, AND only changes for specific missing interactions? > > > > In addition, I can use a two-term example similar to yours to show how > this > > behavior results in a singular covariance matrix when, given the desired > > factor encoding, it should not be singular. > > > > We start with a full factorial design for a two-level continuous factor > and > > a three-level categorical factor, and remove a single row. This design > > matrix does not leave enough degrees of freedom to determine > > goodness-of-fit, but should allow us to obtain parameter estimates. > > > >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) > >> design = design[-1,] > >> design > > X1 X2 > > 2 -1 A > > 3 1 B > > 4 -1 B > > 5 1 C > > 6 -1 C > > > > Here, we first calculate the model matrix for the full model, and then > > manually remove the X1 column from the model matrix. This gives us the > > model matrix one would expect if X1 were removed from the model. We then > > successfully calculate the covariance matrix. > > > >> mm = model.matrix(~(X1+X2)^2,data=design) > >> mm > > (Intercept) X1 X2B X2C X1:X2B X1:X2C > > 2 1 -1 0 0 0 0 > > 3 1 1 1 0 1 0 > > 4 1 -1 1 0 -1 0 > > 5 1 1 0 1 0 1 > > 6 1 -1 0 1 0 -1 > > > >> mm = mm[,-2] > >> solve(t(mm) %*% mm) > > (Intercept) X2B X2C X1:X2B X1:X2C > > (Intercept) 1 -1.0 -1.0 0.0 0.0 > > X2B -1 1.5 1.0 0.0 0.0 > > X2C -1 1.0 1.5 0.0 0.0 > > X1:X2B 0 0.0 0.0 0.5 0.0 > > X1:X2C 0 0.0 0.0 0.0 0.5 > > > > Here, we see the actual behavior for model.matrix. The undesired > re-coding > > of the model matrix interaction term makes the information matrix > singular. > > > >> mm = model.matrix(~(X1+X2)^2-X1,data=design) > >> mm > > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C > > 2 1 0 0 -1 0 0 > > 3 1 1 0 0 1 0 > > 4 1 1 0 0 -1 0 > > 5 1 0 1 0 0 1 > > 6 1 0 1 0 0 -1 > > > >> solve(t(mm) %*% mm) > > Error in solve.default(t(mm) %*% mm) : system is computationally > singular: > > reciprocal condition number = 5.55112e-18 > > > > I still believe this is a bug. > > > > Best regards, > > Tyler Morgan-Wall > > > > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate <[hidden email]> > > wrote: > > > >> I think it is not a bug. It is a general property of interactions. > >> This property is best observed if all variables are factors > >> (qualitative). > >> > >> For example, you have three variables (factors). You ask for as many > >> interactions as possible, except an interaction term between two > >> particular variables. When this interaction is not a constant, it is > >> different for different values of the remaining variable. More > >> precisely: for all values of that variable. In other words: you have a > >> three-way interaction, with all values of that variable. > >> > >> An even smaller example is the following script with only two > >> variables, each being a factor: > >> > >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) > >> print(model.matrix(~(X1+X2)^2 ,data=df)) > >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) > >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) > >> > >> The result is: > >> > >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C > >> 1 1 0 0 0 0 0 > >> 2 1 1 0 0 0 0 > >> 3 1 0 1 0 0 0 > >> 4 1 1 1 0 1 0 > >> 5 1 0 0 1 0 0 > >> 6 1 1 0 1 0 1 > >> > >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C > >> 1 1 0 0 0 0 0 > >> 2 1 0 0 1 0 0 > >> 3 1 1 0 0 0 0 > >> 4 1 1 0 0 1 0 > >> 5 1 0 1 0 0 0 > >> 6 1 0 1 0 0 1 > >> > >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C > >> 1 1 0 0 0 0 0 > >> 2 1 1 0 0 0 0 > >> 3 1 0 1 0 0 0 > >> 4 1 1 0 1 0 0 > >> 5 1 0 0 0 1 0 > >> 6 1 1 0 0 0 1 > >> > >> Thus, in the second result, we have no main effect of X1. Instead, the > >> effect of X1 depends on the value of X2; either A or B or C. In fact, > >> this is a two-way interaction, including all three values of X2. In > >> the third result, we have no main effect of X2, The effect of X2 > >> depends on the value of X1; either p or q. > >> > >> A complicating element with your example seems to be that your X1 and > >> X2 are not factors. > >> > >> Arie > >> > >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: > >> > Hi, > >> > > >> > I recently ran into an inconsistency in the way model.matrix.default > >> > handles factor encoding for higher level interactions with categorical > >> > variables when the full hierarchy of effects is not present. > Depending on > >> > which lower level interactions are specified, the factor encoding > changes > >> > for a higher level interaction. Consider the following minimal > >> reproducible > >> > example: > >> > > >> > -------------- > >> > > >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> > >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B X3C > >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> > 1 1 1 1 0 0 1 0 0 0 0 > >> > 0 0 > >> > 2 1 -1 1 0 0 -1 0 0 0 0 > >> > 0 0 > >> > 3 1 1 -1 0 0 -1 0 0 0 0 > >> > 0 0 > >> > 4 1 -1 -1 0 0 1 0 0 0 0 > >> > 0 0 > >> > 5 1 1 1 1 0 1 1 0 1 0 > >> > 1 0 > >> > 6 1 -1 1 1 0 -1 -1 0 1 0 > >> > -1 0 > >> > 7 1 1 -1 1 0 -1 1 0 -1 0 > >> > -1 0 > >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 > >> > 1 0 > >> > 9 1 1 1 0 1 1 0 1 0 1 > >> > 0 1 > >> > 10 1 -1 1 0 1 -1 0 -1 0 1 > >> > 0 -1 > >> > 11 1 1 -1 0 1 -1 0 1 0 -1 > >> > 0 -1 > >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 > >> > 0 1 > >> > attr(,"assign") > >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 > >> > attr(,"contrasts") > >> > attr(,"contrasts")$X3 > >> > [1] "contr.treatment" > >> > > >> > -------------- > >> > > >> > Specifying the full hierarchy gives us what we expect: the interaction > >> > columns are simply calculated from the product of the main effect > >> columns. > >> > The interaction term X1:X2:X3 gives us two columns in the model > matrix, > >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main effects. > >> > > >> > If we remove either the X2:X3 interaction or the X1:X3 interaction, we > >> get > >> > what we would expect for the X1:X2:X3 interaction, but when we remove > the > >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: > >> > > >> > -------------- > >> > > >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) X1 X2 > >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> > 1 1 1 1 0 0 1 0 0 0 0 > >> > 2 1 -1 1 0 0 -1 0 0 0 0 > >> > 3 1 1 -1 0 0 -1 0 0 0 0 > >> > 4 1 -1 -1 0 0 1 0 0 0 0 > >> > 5 1 1 1 1 0 1 1 0 1 0 > >> > 6 1 -1 1 1 0 -1 1 0 -1 0 > >> > 7 1 1 -1 1 0 -1 -1 0 -1 0 > >> > 8 1 -1 -1 1 0 1 -1 0 1 0 > >> > 9 1 1 1 0 1 1 0 1 0 1 > >> > 10 1 -1 1 0 1 -1 0 1 0 -1 > >> > 11 1 1 -1 0 1 -1 0 -1 0 -1 > >> > 12 1 -1 -1 0 1 1 0 -1 0 1 > >> > attr(,"assign") > >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> > attr(,"contrasts") > >> > attr(,"contrasts")$X3 > >> > [1] "contr.treatment" > >> > > >> > > >> > > >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) X1 X2 > >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C > >> > 1 1 1 1 0 0 1 0 0 0 0 > >> > 2 1 -1 1 0 0 -1 0 0 0 0 > >> > 3 1 1 -1 0 0 -1 0 0 0 0 > >> > 4 1 -1 -1 0 0 1 0 0 0 0 > >> > 5 1 1 1 1 0 1 1 0 1 0 > >> > 6 1 -1 1 1 0 -1 -1 0 -1 0 > >> > 7 1 1 -1 1 0 -1 1 0 -1 0 > >> > 8 1 -1 -1 1 0 1 -1 0 1 0 > >> > 9 1 1 1 0 1 1 0 1 0 1 > >> > 10 1 -1 1 0 1 -1 0 -1 0 -1 > >> > 11 1 1 -1 0 1 -1 0 1 0 -1 > >> > 12 1 -1 -1 0 1 1 0 -1 0 1 > >> > attr(,"assign") > >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> > attr(,"contrasts") > >> > attr(,"contrasts")$X3 > >> > [1] "contr.treatment" > >> > > >> > > >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) X1 X2 > >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C > >> > 1 1 1 1 0 0 0 0 0 0 1 > >> > 0 0 > >> > 2 1 -1 1 0 0 0 0 0 0 -1 > >> > 0 0 > >> > 3 1 1 -1 0 0 0 0 0 0 -1 > >> > 0 0 > >> > 4 1 -1 -1 0 0 0 0 0 0 1 > >> > 0 0 > >> > 5 1 1 1 1 0 1 0 1 0 0 > >> > 1 0 > >> > 6 1 -1 1 1 0 -1 0 1 0 0 > >> > -1 0 > >> > 7 1 1 -1 1 0 1 0 -1 0 0 > >> > -1 0 > >> > 8 1 -1 -1 1 0 -1 0 -1 0 0 > >> > 1 0 > >> > 9 1 1 1 0 1 0 1 0 1 0 > >> > 0 1 > >> > 10 1 -1 1 0 1 0 -1 0 1 0 > >> > 0 -1 > >> > 11 1 1 -1 0 1 0 1 0 -1 0 > >> > 0 -1 > >> > 12 1 -1 -1 0 1 0 -1 0 -1 0 > >> > 0 1 > >> > attr(,"assign") > >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 > >> > attr(,"contrasts") > >> > attr(,"contrasts")$X3 > >> > [1] "contr.treatment" > >> > > >> > -------------- > >> > > >> > Here, we now see the encoding for the interaction X1:X2:X3 is now the > >> > interaction of X1 and X2 with a new encoding for X3 where each factor > >> level > >> > is represented by its own column. I would expect, given the two column > >> > dummy variable encoding for X3, that the X1:X2:X3 column would also be > >> two > >> > columns regardless of what two-factor interactions we also specified, > but > >> > in this case it switches to three. If other two factor interactions > are > >> > missing in addition to X1:X2, this issue still occurs. This also > happens > >> > regardless of the contrast specified in contrasts.arg for X3. I don't > see > >> > any reasoning for this behavior given in the documentation, so I > suspect > >> it > >> > is a bug. > >> > > >> > Best regards, > >> > Tyler Morgan-Wall > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > [hidden email] mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hello Tyler,
Thank you for searching for, and finding, the basic description of the behavior of R in this matter. I think your example is in agreement with the book. But let me first note the following. You write: "F_j refers to a factor (variable) in a model and not a categorical factor". However: "a factor is a vector object used to specify a discrete classification" (start of chapter 4 of "An Introduction to R".) You might also see the description of the R function factor(). You note that the book says about a factor F_j: "... F_j is coded by contrasts if T_{i(j)} has appeared in the formula and by dummy variables if it has not" You find: "However, the example I gave demonstrated that this dummy variable encoding only occurs for the model where the missing term is the numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i must be encoded by dummy variables, as indeed it is. Arie On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: > Hi Arie, > > Thank you for your further research into the issue. > > Regarding Stata: On the other hand, JMP gives model matrices that use the > main effects contrasts in computing the higher order interactions, without > the dummy variable encoding. I verified this both by analyzing the linear > model given in my first example and noting that JMP has one more degree of > freedom than R for the same model, as well as looking at the generated model > matrices. It's easy to find a design where JMP will allow us fit our model > with goodness-of-fit estimates and R will not due to the extra degree(s) of > freedom required. Let's keep the conversation limited to R. > > I want to refocus back onto my original bug report, which was not for a > missing main effects term, but rather for a missing lower-order interaction > term. The behavior of model.matrix.default() for a missing main effects term > is a nice example to demonstrate how model.matrix encodes with dummy > variables instead of contrasts, but doesn't demonstrate the inconsistent > behavior my bug report highlighted. > > I went looking for documentation on this behavior, and the issue stems not > from model.matrix.default(), but rather the terms() function in interpreting > the formula. This "clever" replacement of contrasts by dummy variables to > maintain marginality (presuming that's the reason) is not described anywhere > in the documentation for either the model.matrix() or the terms() function. > In order to find a description for the behavior, I had to look in the > underlying C code, buried above the "TermCode" function of the "model.c" > file, which says: > > "TermCode decides on the encoding of a model term. Returns 1 if variable > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it is to > be encoded by dummy variables. This is decided using the heuristic > described in Statistical Models in S, page 38." > > I do not have a copy of this book, and I suspect most R users do not as > well. Thankfully, however, some of the pages describing this behavior were > available as part of Amazon's "Look Inside" feature--but if not for that, I > would have no idea what heuristic R was using. Since those pages could made > unavailable by Amazon at any time, at the very least we have an problem with > a lack of documentation. > > However, I still believe there is a bug when comparing R's implementation to > the heuristic described in the book. From Statistical Models in S, page > 38-39: > > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the > margin of T_i for factor F_j--that is, the term obtained by dropping F_j > from T_i. We say that T_{i(j)} has appeared in the formula if there is some > term T_i' for i' < i such that T_i' contains all the factors appearing in > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the > formula and by dummy variables if it has not" > > Here, F_j refers to a factor (variable) in a model and not a categorical > factor, as specified later in that section (page 40): "Numeric variables > appear in the computations as themselves, uncoded. Therefore, the rule does > not do anything special for them, and it remains valid, in a trivial sense, > whenever any of the F_j is numeric rather than categorical." > > Going back to my original example with three variables: X1 (numeric), X2 > (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3 > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. When > any of the preceding terms do not exist, this heuristic tells us to use > dummy variables to encode the interaction (e.g. "F_j [the interaction term] > is coded ... by dummy variables if it [any of the marginal terms obtained by > dropping a single factor in the interaction] has not [appeared in the > formula]"). However, the example I gave demonstrated that this dummy > variable encoding only occurs for the model where the missing term is the > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the > interaction term X1:X2:X3 is encoded by contrasts, not dummy variables. This > is inconsistent with the description of the intended behavior given in the > book. > > Best regards, > Tyler > > > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate <[hidden email]> > wrote: >> >> Hello Tyler, >> >> I want to bring to your attention the following document: "What >> happens if you omit the main effect in a regression model with an >> interaction?" >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction). >> This gives a useful review of the problem. Your example is Case 2: a >> continuous and a categorical regressor. >> >> The numerical examples are coded in Stata, and they give the same >> result as in R. Hence, if this is a bug in R then it is also a bug in >> Stata. That seems very unlikely. >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) >> print("Full model") >> print(model.matrix(~(socst+grp)^2 ,data=df)) >> print("Example 2.1: drop socst") >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) >> print("Example 2.2: drop grp") >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) >> >> This gives indeed the following regressors: >> >> "Full model" >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 >> "Example 2.1: drop socst" >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4 >> "Example 2.2: drop grp" >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 >> >> There is a little bit of R documentation about this, based on the >> concept of marginality, which typically forbids a model having an >> interaction but not the corresponding main effects. (You might see the >> references in https://en.wikipedia.org/wiki/Principle_of_marginality ) >> See "An Introduction to R", by Venables and Smith and the R Core >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the >> details are complicated, model formulae in R will normally generate >> the models that an expert statistician would expect, provided that >> marginality is preserved. Fitting, for [a contrary] example, a model >> with an interaction but not the corresponding main effects will in >> general lead to surprising results ....". >> The Reference Manual states that the R functions dropterm() and >> addterm() resp. drop or add only terms such that marginality is >> preserved. >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact >> Example 2.1 in Case 2 discussed above. As discussed there, in Stata >> and in R the drop of the continuous variable has no effect on the >> degrees of freedom here: it is just a reparameterisation of the full >> model, protecting you against losing marginality... Hence the >> model.matrix 'mm' is still square and nonsingular after the drop of >> X1, unless of course when a row is removed from the matrix 'design' >> when before creating 'mm'. >> >> Arie >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: >> > You could possibly try to explain away the behavior for a missing main >> > effects term, since without the main effects term we don't have main >> > effect >> > columns in the model matrix used to compute the interaction columns (At >> > best this is undocumented behavior--I still think it's a bug, as we know >> > how we would encode the categorical factors if they were in fact >> > present. >> > It's either specified in contrasts.arg or using the default set in >> > options). However, when all the main effects are present, why would the >> > three-factor interaction column not simply be the product of the main >> > effect columns? In my example: we know X1, we know X2, and we know X3. >> > Why >> > does the encoding of X1:X2:X3 depend on whether we specified a >> > two-factor >> > interaction, AND only changes for specific missing interactions? >> > >> > In addition, I can use a two-term example similar to yours to show how >> > this >> > behavior results in a singular covariance matrix when, given the desired >> > factor encoding, it should not be singular. >> > >> > We start with a full factorial design for a two-level continuous factor >> > and >> > a three-level categorical factor, and remove a single row. This design >> > matrix does not leave enough degrees of freedom to determine >> > goodness-of-fit, but should allow us to obtain parameter estimates. >> > >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) >> >> design = design[-1,] >> >> design >> > X1 X2 >> > 2 -1 A >> > 3 1 B >> > 4 -1 B >> > 5 1 C >> > 6 -1 C >> > >> > Here, we first calculate the model matrix for the full model, and then >> > manually remove the X1 column from the model matrix. This gives us the >> > model matrix one would expect if X1 were removed from the model. We then >> > successfully calculate the covariance matrix. >> > >> >> mm = model.matrix(~(X1+X2)^2,data=design) >> >> mm >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C >> > 2 1 -1 0 0 0 0 >> > 3 1 1 1 0 1 0 >> > 4 1 -1 1 0 -1 0 >> > 5 1 1 0 1 0 1 >> > 6 1 -1 0 1 0 -1 >> > >> >> mm = mm[,-2] >> >> solve(t(mm) %*% mm) >> > (Intercept) X2B X2C X1:X2B X1:X2C >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 >> > X2B -1 1.5 1.0 0.0 0.0 >> > X2C -1 1.0 1.5 0.0 0.0 >> > X1:X2B 0 0.0 0.0 0.5 0.0 >> > X1:X2C 0 0.0 0.0 0.0 0.5 >> > >> > Here, we see the actual behavior for model.matrix. The undesired >> > re-coding >> > of the model matrix interaction term makes the information matrix >> > singular. >> > >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) >> >> mm >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C >> > 2 1 0 0 -1 0 0 >> > 3 1 1 0 0 1 0 >> > 4 1 1 0 0 -1 0 >> > 5 1 0 1 0 0 1 >> > 6 1 0 1 0 0 -1 >> > >> >> solve(t(mm) %*% mm) >> > Error in solve.default(t(mm) %*% mm) : system is computationally >> > singular: >> > reciprocal condition number = 5.55112e-18 >> > >> > I still believe this is a bug. >> > >> > Best regards, >> > Tyler Morgan-Wall >> > >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate <[hidden email]> >> > wrote: >> > >> >> I think it is not a bug. It is a general property of interactions. >> >> This property is best observed if all variables are factors >> >> (qualitative). >> >> >> >> For example, you have three variables (factors). You ask for as many >> >> interactions as possible, except an interaction term between two >> >> particular variables. When this interaction is not a constant, it is >> >> different for different values of the remaining variable. More >> >> precisely: for all values of that variable. In other words: you have a >> >> three-way interaction, with all values of that variable. >> >> >> >> An even smaller example is the following script with only two >> >> variables, each being a factor: >> >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) >> >> >> >> The result is: >> >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C >> >> 1 1 0 0 0 0 0 >> >> 2 1 1 0 0 0 0 >> >> 3 1 0 1 0 0 0 >> >> 4 1 1 1 0 1 0 >> >> 5 1 0 0 1 0 0 >> >> 6 1 1 0 1 0 1 >> >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C >> >> 1 1 0 0 0 0 0 >> >> 2 1 0 0 1 0 0 >> >> 3 1 1 0 0 0 0 >> >> 4 1 1 0 0 1 0 >> >> 5 1 0 1 0 0 0 >> >> 6 1 0 1 0 0 1 >> >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C >> >> 1 1 0 0 0 0 0 >> >> 2 1 1 0 0 0 0 >> >> 3 1 0 1 0 0 0 >> >> 4 1 1 0 1 0 0 >> >> 5 1 0 0 0 1 0 >> >> 6 1 1 0 0 0 1 >> >> >> >> Thus, in the second result, we have no main effect of X1. Instead, the >> >> effect of X1 depends on the value of X2; either A or B or C. In fact, >> >> this is a two-way interaction, including all three values of X2. In >> >> the third result, we have no main effect of X2, The effect of X2 >> >> depends on the value of X1; either p or q. >> >> >> >> A complicating element with your example seems to be that your X1 and >> >> X2 are not factors. >> >> >> >> Arie >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: >> >> > Hi, >> >> > >> >> > I recently ran into an inconsistency in the way model.matrix.default >> >> > handles factor encoding for higher level interactions with >> >> > categorical >> >> > variables when the full hierarchy of effects is not present. >> >> > Depending on >> >> > which lower level interactions are specified, the factor encoding >> >> > changes >> >> > for a higher level interaction. Consider the following minimal >> >> reproducible >> >> > example: >> >> > >> >> > -------------- >> >> > >> >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B X3C >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> > 1 1 1 1 0 0 1 0 0 0 0 >> >> > 0 0 >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> >> > 0 0 >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> >> > 0 0 >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> >> > 0 0 >> >> > 5 1 1 1 1 0 1 1 0 1 0 >> >> > 1 0 >> >> > 6 1 -1 1 1 0 -1 -1 0 1 0 >> >> > -1 0 >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 >> >> > -1 0 >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 >> >> > 1 0 >> >> > 9 1 1 1 0 1 1 0 1 0 1 >> >> > 0 1 >> >> > 10 1 -1 1 0 1 -1 0 -1 0 1 >> >> > 0 -1 >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 >> >> > 0 -1 >> >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 >> >> > 0 1 >> >> > attr(,"assign") >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 >> >> > attr(,"contrasts") >> >> > attr(,"contrasts")$X3 >> >> > [1] "contr.treatment" >> >> > >> >> > -------------- >> >> > >> >> > Specifying the full hierarchy gives us what we expect: the >> >> > interaction >> >> > columns are simply calculated from the product of the main effect >> >> columns. >> >> > The interaction term X1:X2:X3 gives us two columns in the model >> >> > matrix, >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main effects. >> >> > >> >> > If we remove either the X2:X3 interaction or the X1:X3 interaction, >> >> > we >> >> get >> >> > what we would expect for the X1:X2:X3 interaction, but when we remove >> >> > the >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: >> >> > >> >> > -------------- >> >> > >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) X1 X2 >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> > 1 1 1 1 0 0 1 0 0 0 0 >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> >> > 5 1 1 1 1 0 1 1 0 1 0 >> >> > 6 1 -1 1 1 0 -1 1 0 -1 0 >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 0 >> >> > 8 1 -1 -1 1 0 1 -1 0 1 0 >> >> > 9 1 1 1 0 1 1 0 1 0 1 >> >> > 10 1 -1 1 0 1 -1 0 1 0 -1 >> >> > 11 1 1 -1 0 1 -1 0 -1 0 -1 >> >> > 12 1 -1 -1 0 1 1 0 -1 0 1 >> >> > attr(,"assign") >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> > attr(,"contrasts") >> >> > attr(,"contrasts")$X3 >> >> > [1] "contr.treatment" >> >> > >> >> > >> >> > >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) X1 X2 >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C >> >> > 1 1 1 1 0 0 1 0 0 0 0 >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> >> > 5 1 1 1 1 0 1 1 0 1 0 >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 0 >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 >> >> > 8 1 -1 -1 1 0 1 -1 0 1 0 >> >> > 9 1 1 1 0 1 1 0 1 0 1 >> >> > 10 1 -1 1 0 1 -1 0 -1 0 -1 >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 >> >> > 12 1 -1 -1 0 1 1 0 -1 0 1 >> >> > attr(,"assign") >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> > attr(,"contrasts") >> >> > attr(,"contrasts")$X3 >> >> > [1] "contr.treatment" >> >> > >> >> > >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) X1 X2 >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C >> >> > 1 1 1 1 0 0 0 0 0 0 1 >> >> > 0 0 >> >> > 2 1 -1 1 0 0 0 0 0 0 -1 >> >> > 0 0 >> >> > 3 1 1 -1 0 0 0 0 0 0 -1 >> >> > 0 0 >> >> > 4 1 -1 -1 0 0 0 0 0 0 1 >> >> > 0 0 >> >> > 5 1 1 1 1 0 1 0 1 0 0 >> >> > 1 0 >> >> > 6 1 -1 1 1 0 -1 0 1 0 0 >> >> > -1 0 >> >> > 7 1 1 -1 1 0 1 0 -1 0 0 >> >> > -1 0 >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 0 >> >> > 1 0 >> >> > 9 1 1 1 0 1 0 1 0 1 0 >> >> > 0 1 >> >> > 10 1 -1 1 0 1 0 -1 0 1 0 >> >> > 0 -1 >> >> > 11 1 1 -1 0 1 0 1 0 -1 0 >> >> > 0 -1 >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 0 >> >> > 0 1 >> >> > attr(,"assign") >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 >> >> > attr(,"contrasts") >> >> > attr(,"contrasts")$X3 >> >> > [1] "contr.treatment" >> >> > >> >> > -------------- >> >> > >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is now the >> >> > interaction of X1 and X2 with a new encoding for X3 where each factor >> >> level >> >> > is represented by its own column. I would expect, given the two >> >> > column >> >> > dummy variable encoding for X3, that the X1:X2:X3 column would also >> >> > be >> >> two >> >> > columns regardless of what two-factor interactions we also specified, >> >> > but >> >> > in this case it switches to three. If other two factor interactions >> >> > are >> >> > missing in addition to X1:X2, this issue still occurs. This also >> >> > happens >> >> > regardless of the contrast specified in contrasts.arg for X3. I don't >> >> > see >> >> > any reasoning for this behavior given in the documentation, so I >> >> > suspect >> >> it >> >> > is a bug. >> >> > >> >> > Best regards, >> >> > Tyler Morgan-Wall >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > [hidden email] mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > [hidden email] mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi Arie,
The book out of which this behavior is based does not use factor (in this section) to refer to categorical factor. I will again point to this sentence, from page 40, in the same section and referring to the behavior under question, that shows F_j is not limited to categorical factors: "Numeric variables appear in the computations as themselves, uncoded. Therefore, the rule does not do anything special for them, and it remains valid, in a trivial sense, whenever any of the F_j is numeric rather than categorical." Note the "... whenever any of the F_j is numeric rather than categorical." Factor here is used in the more general sense of the word, not referring to the R type "factor." The behavior of R does not match the heuristic that it's citing. Best regards, Tyler On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <[hidden email]> wrote: > Hello Tyler, > > Thank you for searching for, and finding, the basic description of the > behavior of R in this matter. > > I think your example is in agreement with the book. > > But let me first note the following. You write: "F_j refers to a > factor (variable) in a model and not a categorical factor". However: > "a factor is a vector object used to specify a discrete > classification" (start of chapter 4 of "An Introduction to R".) You > might also see the description of the R function factor(). > > You note that the book says about a factor F_j: > "... F_j is coded by contrasts if T_{i(j)} has appeared in the > formula and by dummy variables if it has not" > > You find: > "However, the example I gave demonstrated that this dummy variable > encoding only occurs for the model where the missing term is the > numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." > > We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then > T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i > must be encoded by dummy variables, as indeed it is. > > Arie > > On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: > > Hi Arie, > > > > Thank you for your further research into the issue. > > > > Regarding Stata: On the other hand, JMP gives model matrices that use the > > main effects contrasts in computing the higher order interactions, > without > > the dummy variable encoding. I verified this both by analyzing the linear > > model given in my first example and noting that JMP has one more degree > of > > freedom than R for the same model, as well as looking at the generated > model > > matrices. It's easy to find a design where JMP will allow us fit our > model > > with goodness-of-fit estimates and R will not due to the extra degree(s) > of > > freedom required. Let's keep the conversation limited to R. > > > > I want to refocus back onto my original bug report, which was not for a > > missing main effects term, but rather for a missing lower-order > interaction > > term. The behavior of model.matrix.default() for a missing main effects > term > > is a nice example to demonstrate how model.matrix encodes with dummy > > variables instead of contrasts, but doesn't demonstrate the inconsistent > > behavior my bug report highlighted. > > > > I went looking for documentation on this behavior, and the issue stems > not > > from model.matrix.default(), but rather the terms() function in > interpreting > > the formula. This "clever" replacement of contrasts by dummy variables to > > maintain marginality (presuming that's the reason) is not described > anywhere > > in the documentation for either the model.matrix() or the terms() > function. > > In order to find a description for the behavior, I had to look in the > > underlying C code, buried above the "TermCode" function of the "model.c" > > file, which says: > > > > "TermCode decides on the encoding of a model term. Returns 1 if variable > > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it > is to > > be encoded by dummy variables. This is decided using the heuristic > > described in Statistical Models in S, page 38." > > > > I do not have a copy of this book, and I suspect most R users do not as > > well. Thankfully, however, some of the pages describing this behavior > were > > available as part of Amazon's "Look Inside" feature--but if not for > that, I > > would have no idea what heuristic R was using. Since those pages could > made > > unavailable by Amazon at any time, at the very least we have an problem > with > > a lack of documentation. > > > > However, I still believe there is a bug when comparing R's > implementation to > > the heuristic described in the book. From Statistical Models in S, page > > 38-39: > > > > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the > > margin of T_i for factor F_j--that is, the term obtained by dropping F_j > > from T_i. We say that T_{i(j)} has appeared in the formula if there is > some > > term T_i' for i' < i such that T_i' contains all the factors appearing in > > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding > > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the > > formula and by dummy variables if it has not" > > > > Here, F_j refers to a factor (variable) in a model and not a categorical > > factor, as specified later in that section (page 40): "Numeric variables > > appear in the computations as themselves, uncoded. Therefore, the rule > does > > not do anything special for them, and it remains valid, in a trivial > sense, > > whenever any of the F_j is numeric rather than categorical." > > > > Going back to my original example with three variables: X1 (numeric), X2 > > (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3 > > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. > When > > any of the preceding terms do not exist, this heuristic tells us to use > > dummy variables to encode the interaction (e.g. "F_j [the interaction > term] > > is coded ... by dummy variables if it [any of the marginal terms > obtained by > > dropping a single factor in the interaction] has not [appeared in the > > formula]"). However, the example I gave demonstrated that this dummy > > variable encoding only occurs for the model where the missing term is the > > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the > > interaction term X1:X2:X3 is encoded by contrasts, not dummy variables. > This > > is inconsistent with the description of the intended behavior given in > the > > book. > > > > Best regards, > > Tyler > > > > > > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate <[hidden email]> > > wrote: > >> > >> Hello Tyler, > >> > >> I want to bring to your attention the following document: "What > >> happens if you omit the main effect in a regression model with an > >> interaction?" > >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you- > omit-the-main-effect-in-a-regression-model-with-an-interaction). > >> This gives a useful review of the problem. Your example is Case 2: a > >> continuous and a categorical regressor. > >> > >> The numerical examples are coded in Stata, and they give the same > >> result as in R. Hence, if this is a bug in R then it is also a bug in > >> Stata. That seems very unlikely. > >> > >> Here is a simulation in R of the above mentioned Case 2 in Stata: > >> > >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) > >> print("Full model") > >> print(model.matrix(~(socst+grp)^2 ,data=df)) > >> print("Example 2.1: drop socst") > >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) > >> print("Example 2.2: drop grp") > >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) > >> > >> This gives indeed the following regressors: > >> > >> "Full model" > >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 > >> "Example 2.1: drop socst" > >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4 > >> "Example 2.2: drop grp" > >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 > >> > >> There is a little bit of R documentation about this, based on the > >> concept of marginality, which typically forbids a model having an > >> interaction but not the corresponding main effects. (You might see the > >> references in https://en.wikipedia.org/wiki/Principle_of_marginality ) > >> See "An Introduction to R", by Venables and Smith and the R Core > >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the > >> details are complicated, model formulae in R will normally generate > >> the models that an expert statistician would expect, provided that > >> marginality is preserved. Fitting, for [a contrary] example, a model > >> with an interaction but not the corresponding main effects will in > >> general lead to surprising results ....". > >> The Reference Manual states that the R functions dropterm() and > >> addterm() resp. drop or add only terms such that marginality is > >> preserved. > >> > >> Finally, about your singular matrix t(mm)%*%mm. This is in fact > >> Example 2.1 in Case 2 discussed above. As discussed there, in Stata > >> and in R the drop of the continuous variable has no effect on the > >> degrees of freedom here: it is just a reparameterisation of the full > >> model, protecting you against losing marginality... Hence the > >> model.matrix 'mm' is still square and nonsingular after the drop of > >> X1, unless of course when a row is removed from the matrix 'design' > >> when before creating 'mm'. > >> > >> Arie > >> > >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: > >> > You could possibly try to explain away the behavior for a missing main > >> > effects term, since without the main effects term we don't have main > >> > effect > >> > columns in the model matrix used to compute the interaction columns > (At > >> > best this is undocumented behavior--I still think it's a bug, as we > know > >> > how we would encode the categorical factors if they were in fact > >> > present. > >> > It's either specified in contrasts.arg or using the default set in > >> > options). However, when all the main effects are present, why would > the > >> > three-factor interaction column not simply be the product of the main > >> > effect columns? In my example: we know X1, we know X2, and we know X3. > >> > Why > >> > does the encoding of X1:X2:X3 depend on whether we specified a > >> > two-factor > >> > interaction, AND only changes for specific missing interactions? > >> > > >> > In addition, I can use a two-term example similar to yours to show how > >> > this > >> > behavior results in a singular covariance matrix when, given the > desired > >> > factor encoding, it should not be singular. > >> > > >> > We start with a full factorial design for a two-level continuous > factor > >> > and > >> > a three-level categorical factor, and remove a single row. This design > >> > matrix does not leave enough degrees of freedom to determine > >> > goodness-of-fit, but should allow us to obtain parameter estimates. > >> > > >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) > >> >> design = design[-1,] > >> >> design > >> > X1 X2 > >> > 2 -1 A > >> > 3 1 B > >> > 4 -1 B > >> > 5 1 C > >> > 6 -1 C > >> > > >> > Here, we first calculate the model matrix for the full model, and then > >> > manually remove the X1 column from the model matrix. This gives us the > >> > model matrix one would expect if X1 were removed from the model. We > then > >> > successfully calculate the covariance matrix. > >> > > >> >> mm = model.matrix(~(X1+X2)^2,data=design) > >> >> mm > >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C > >> > 2 1 -1 0 0 0 0 > >> > 3 1 1 1 0 1 0 > >> > 4 1 -1 1 0 -1 0 > >> > 5 1 1 0 1 0 1 > >> > 6 1 -1 0 1 0 -1 > >> > > >> >> mm = mm[,-2] > >> >> solve(t(mm) %*% mm) > >> > (Intercept) X2B X2C X1:X2B X1:X2C > >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 > >> > X2B -1 1.5 1.0 0.0 0.0 > >> > X2C -1 1.0 1.5 0.0 0.0 > >> > X1:X2B 0 0.0 0.0 0.5 0.0 > >> > X1:X2C 0 0.0 0.0 0.0 0.5 > >> > > >> > Here, we see the actual behavior for model.matrix. The undesired > >> > re-coding > >> > of the model matrix interaction term makes the information matrix > >> > singular. > >> > > >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) > >> >> mm > >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C > >> > 2 1 0 0 -1 0 0 > >> > 3 1 1 0 0 1 0 > >> > 4 1 1 0 0 -1 0 > >> > 5 1 0 1 0 0 1 > >> > 6 1 0 1 0 0 -1 > >> > > >> >> solve(t(mm) %*% mm) > >> > Error in solve.default(t(mm) %*% mm) : system is computationally > >> > singular: > >> > reciprocal condition number = 5.55112e-18 > >> > > >> > I still believe this is a bug. > >> > > >> > Best regards, > >> > Tyler Morgan-Wall > >> > > >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate <[hidden email] > > > >> > wrote: > >> > > >> >> I think it is not a bug. It is a general property of interactions. > >> >> This property is best observed if all variables are factors > >> >> (qualitative). > >> >> > >> >> For example, you have three variables (factors). You ask for as many > >> >> interactions as possible, except an interaction term between two > >> >> particular variables. When this interaction is not a constant, it is > >> >> different for different values of the remaining variable. More > >> >> precisely: for all values of that variable. In other words: you have > a > >> >> three-way interaction, with all values of that variable. > >> >> > >> >> An even smaller example is the following script with only two > >> >> variables, each being a factor: > >> >> > >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) > >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) > >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) > >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) > >> >> > >> >> The result is: > >> >> > >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C > >> >> 1 1 0 0 0 0 0 > >> >> 2 1 1 0 0 0 0 > >> >> 3 1 0 1 0 0 0 > >> >> 4 1 1 1 0 1 0 > >> >> 5 1 0 0 1 0 0 > >> >> 6 1 1 0 1 0 1 > >> >> > >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C > >> >> 1 1 0 0 0 0 0 > >> >> 2 1 0 0 1 0 0 > >> >> 3 1 1 0 0 0 0 > >> >> 4 1 1 0 0 1 0 > >> >> 5 1 0 1 0 0 0 > >> >> 6 1 0 1 0 0 1 > >> >> > >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C > >> >> 1 1 0 0 0 0 0 > >> >> 2 1 1 0 0 0 0 > >> >> 3 1 0 1 0 0 0 > >> >> 4 1 1 0 1 0 0 > >> >> 5 1 0 0 0 1 0 > >> >> 6 1 1 0 0 0 1 > >> >> > >> >> Thus, in the second result, we have no main effect of X1. Instead, > the > >> >> effect of X1 depends on the value of X2; either A or B or C. In fact, > >> >> this is a two-way interaction, including all three values of X2. In > >> >> the third result, we have no main effect of X2, The effect of X2 > >> >> depends on the value of X1; either p or q. > >> >> > >> >> A complicating element with your example seems to be that your X1 and > >> >> X2 are not factors. > >> >> > >> >> Arie > >> >> > >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: > >> >> > Hi, > >> >> > > >> >> > I recently ran into an inconsistency in the way > model.matrix.default > >> >> > handles factor encoding for higher level interactions with > >> >> > categorical > >> >> > variables when the full hierarchy of effects is not present. > >> >> > Depending on > >> >> > which lower level interactions are specified, the factor encoding > >> >> > changes > >> >> > for a higher level interaction. Consider the following minimal > >> >> reproducible > >> >> > example: > >> >> > > >> >> > -------------- > >> >> > > >> >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> > >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B > X3C > >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 1 0 0 0 0 > >> >> > 0 0 > >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 > >> >> > 0 0 > >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 > >> >> > 0 0 > >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 > >> >> > 0 0 > >> >> > 5 1 1 1 1 0 1 1 0 1 0 > >> >> > 1 0 > >> >> > 6 1 -1 1 1 0 -1 -1 0 1 0 > >> >> > -1 0 > >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 > >> >> > -1 0 > >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 > >> >> > 1 0 > >> >> > 9 1 1 1 0 1 1 0 1 0 1 > >> >> > 0 1 > >> >> > 10 1 -1 1 0 1 -1 0 -1 0 1 > >> >> > 0 -1 > >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 > >> >> > 0 -1 > >> >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 > >> >> > 0 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > -------------- > >> >> > > >> >> > Specifying the full hierarchy gives us what we expect: the > >> >> > interaction > >> >> > columns are simply calculated from the product of the main effect > >> >> columns. > >> >> > The interaction term X1:X2:X3 gives us two columns in the model > >> >> > matrix, > >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main effects. > >> >> > > >> >> > If we remove either the X2:X3 interaction or the X1:X3 interaction, > >> >> > we > >> >> get > >> >> > what we would expect for the X1:X2:X3 interaction, but when we > remove > >> >> > the > >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: > >> >> > > >> >> > -------------- > >> >> > > >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) > X1 X2 > >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 1 0 0 0 > 0 > >> >> > 2 1 -1 1 0 0 -1 0 0 0 > 0 > >> >> > 3 1 1 -1 0 0 -1 0 0 0 > 0 > >> >> > 4 1 -1 -1 0 0 1 0 0 0 > 0 > >> >> > 5 1 1 1 1 0 1 1 0 1 > 0 > >> >> > 6 1 -1 1 1 0 -1 1 0 -1 > 0 > >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 > 0 > >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > 0 > >> >> > 9 1 1 1 0 1 1 0 1 0 > 1 > >> >> > 10 1 -1 1 0 1 -1 0 1 0 > -1 > >> >> > 11 1 1 -1 0 1 -1 0 -1 0 > -1 > >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > > >> >> > > >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) > X1 X2 > >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 1 0 0 0 > 0 > >> >> > 2 1 -1 1 0 0 -1 0 0 0 > 0 > >> >> > 3 1 1 -1 0 0 -1 0 0 0 > 0 > >> >> > 4 1 -1 -1 0 0 1 0 0 0 > 0 > >> >> > 5 1 1 1 1 0 1 1 0 1 > 0 > >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 > 0 > >> >> > 7 1 1 -1 1 0 -1 1 0 -1 > 0 > >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > 0 > >> >> > 9 1 1 1 0 1 1 0 1 0 > 1 > >> >> > 10 1 -1 1 0 1 -1 0 -1 0 > -1 > >> >> > 11 1 1 -1 0 1 -1 0 1 0 > -1 > >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > > >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) > X1 X2 > >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 0 0 0 0 1 > >> >> > 0 0 > >> >> > 2 1 -1 1 0 0 0 0 0 0 -1 > >> >> > 0 0 > >> >> > 3 1 1 -1 0 0 0 0 0 0 -1 > >> >> > 0 0 > >> >> > 4 1 -1 -1 0 0 0 0 0 0 1 > >> >> > 0 0 > >> >> > 5 1 1 1 1 0 1 0 1 0 0 > >> >> > 1 0 > >> >> > 6 1 -1 1 1 0 -1 0 1 0 0 > >> >> > -1 0 > >> >> > 7 1 1 -1 1 0 1 0 -1 0 0 > >> >> > -1 0 > >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 0 > >> >> > 1 0 > >> >> > 9 1 1 1 0 1 0 1 0 1 0 > >> >> > 0 1 > >> >> > 10 1 -1 1 0 1 0 -1 0 1 0 > >> >> > 0 -1 > >> >> > 11 1 1 -1 0 1 0 1 0 -1 0 > >> >> > 0 -1 > >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 0 > >> >> > 0 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > -------------- > >> >> > > >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is now > the > >> >> > interaction of X1 and X2 with a new encoding for X3 where each > factor > >> >> level > >> >> > is represented by its own column. I would expect, given the two > >> >> > column > >> >> > dummy variable encoding for X3, that the X1:X2:X3 column would also > >> >> > be > >> >> two > >> >> > columns regardless of what two-factor interactions we also > specified, > >> >> > but > >> >> > in this case it switches to three. If other two factor interactions > >> >> > are > >> >> > missing in addition to X1:X2, this issue still occurs. This also > >> >> > happens > >> >> > regardless of the contrast specified in contrasts.arg for X3. I > don't > >> >> > see > >> >> > any reasoning for this behavior given in the documentation, so I > >> >> > suspect > >> >> it > >> >> > is a bug. > >> >> > > >> >> > Best regards, > >> >> > Tyler Morgan-Wall > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > > >> >> > ______________________________________________ > >> >> > [hidden email] mailing list > >> >> > https://stat.ethz.ch/mailman/listinfo/r-devel > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > [hidden email] mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hello Tyler,
I rephrase my previous mail, as follows: In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2, which in the example is dropped from the model. Hence the X3 in T_i must be encoded by dummy variables, as indeed it is. Arie On Thu, Nov 2, 2017 at 4:11 PM, Tyler <[hidden email]> wrote: > Hi Arie, > > The book out of which this behavior is based does not use factor (in this > section) to refer to categorical factor. I will again point to this > sentence, from page 40, in the same section and referring to the behavior > under question, that shows F_j is not limited to categorical factors: > "Numeric variables appear in the computations as themselves, uncoded. > Therefore, the rule does not do anything special for them, and it remains > valid, in a trivial sense, whenever any of the F_j is numeric rather than > categorical." > > Note the "... whenever any of the F_j is numeric rather than categorical." > Factor here is used in the more general sense of the word, not referring to > the R type "factor." The behavior of R does not match the heuristic that > it's citing. > > Best regards, > Tyler > > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <[hidden email]> wrote: >> >> Hello Tyler, >> >> Thank you for searching for, and finding, the basic description of the >> behavior of R in this matter. >> >> I think your example is in agreement with the book. >> >> But let me first note the following. You write: "F_j refers to a >> factor (variable) in a model and not a categorical factor". However: >> "a factor is a vector object used to specify a discrete >> classification" (start of chapter 4 of "An Introduction to R".) You >> might also see the description of the R function factor(). >> >> You note that the book says about a factor F_j: >> "... F_j is coded by contrasts if T_{i(j)} has appeared in the >> formula and by dummy variables if it has not" >> >> You find: >> "However, the example I gave demonstrated that this dummy variable >> encoding only occurs for the model where the missing term is the >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." >> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i >> must be encoded by dummy variables, as indeed it is. >> >> Arie >> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: >> > Hi Arie, >> > >> > Thank you for your further research into the issue. >> > >> > Regarding Stata: On the other hand, JMP gives model matrices that use >> > the >> > main effects contrasts in computing the higher order interactions, >> > without >> > the dummy variable encoding. I verified this both by analyzing the >> > linear >> > model given in my first example and noting that JMP has one more degree >> > of >> > freedom than R for the same model, as well as looking at the generated >> > model >> > matrices. It's easy to find a design where JMP will allow us fit our >> > model >> > with goodness-of-fit estimates and R will not due to the extra degree(s) >> > of >> > freedom required. Let's keep the conversation limited to R. >> > >> > I want to refocus back onto my original bug report, which was not for a >> > missing main effects term, but rather for a missing lower-order >> > interaction >> > term. The behavior of model.matrix.default() for a missing main effects >> > term >> > is a nice example to demonstrate how model.matrix encodes with dummy >> > variables instead of contrasts, but doesn't demonstrate the inconsistent >> > behavior my bug report highlighted. >> > >> > I went looking for documentation on this behavior, and the issue stems >> > not >> > from model.matrix.default(), but rather the terms() function in >> > interpreting >> > the formula. This "clever" replacement of contrasts by dummy variables >> > to >> > maintain marginality (presuming that's the reason) is not described >> > anywhere >> > in the documentation for either the model.matrix() or the terms() >> > function. >> > In order to find a description for the behavior, I had to look in the >> > underlying C code, buried above the "TermCode" function of the "model.c" >> > file, which says: >> > >> > "TermCode decides on the encoding of a model term. Returns 1 if variable >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it >> > is to >> > be encoded by dummy variables. This is decided using the heuristic >> > described in Statistical Models in S, page 38." >> > >> > I do not have a copy of this book, and I suspect most R users do not as >> > well. Thankfully, however, some of the pages describing this behavior >> > were >> > available as part of Amazon's "Look Inside" feature--but if not for >> > that, I >> > would have no idea what heuristic R was using. Since those pages could >> > made >> > unavailable by Amazon at any time, at the very least we have an problem >> > with >> > a lack of documentation. >> > >> > However, I still believe there is a bug when comparing R's >> > implementation to >> > the heuristic described in the book. From Statistical Models in S, page >> > 38-39: >> > >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the >> > margin of T_i for factor F_j--that is, the term obtained by dropping F_j >> > from T_i. We say that T_{i(j)} has appeared in the formula if there is >> > some >> > term T_i' for i' < i such that T_i' contains all the factors appearing >> > in >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the >> > formula and by dummy variables if it has not" >> > >> > Here, F_j refers to a factor (variable) in a model and not a categorical >> > factor, as specified later in that section (page 40): "Numeric variables >> > appear in the computations as themselves, uncoded. Therefore, the rule >> > does >> > not do anything special for them, and it remains valid, in a trivial >> > sense, >> > whenever any of the F_j is numeric rather than categorical." >> > >> > Going back to my original example with three variables: X1 (numeric), X2 >> > (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3 >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. >> > When >> > any of the preceding terms do not exist, this heuristic tells us to use >> > dummy variables to encode the interaction (e.g. "F_j [the interaction >> > term] >> > is coded ... by dummy variables if it [any of the marginal terms >> > obtained by >> > dropping a single factor in the interaction] has not [appeared in the >> > formula]"). However, the example I gave demonstrated that this dummy >> > variable encoding only occurs for the model where the missing term is >> > the >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy variables. >> > This >> > is inconsistent with the description of the intended behavior given in >> > the >> > book. >> > >> > Best regards, >> > Tyler >> > >> > >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate <[hidden email]> >> > wrote: >> >> >> >> Hello Tyler, >> >> >> >> I want to bring to your attention the following document: "What >> >> happens if you omit the main effect in a regression model with an >> >> interaction?" >> >> >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction). >> >> This gives a useful review of the problem. Your example is Case 2: a >> >> continuous and a categorical regressor. >> >> >> >> The numerical examples are coded in Stata, and they give the same >> >> result as in R. Hence, if this is a bug in R then it is also a bug in >> >> Stata. That seems very unlikely. >> >> >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: >> >> >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) >> >> print("Full model") >> >> print(model.matrix(~(socst+grp)^2 ,data=df)) >> >> print("Example 2.1: drop socst") >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) >> >> print("Example 2.2: drop grp") >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) >> >> >> >> This gives indeed the following regressors: >> >> >> >> "Full model" >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 >> >> "Example 2.1: drop socst" >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4 >> >> "Example 2.2: drop grp" >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 >> >> >> >> There is a little bit of R documentation about this, based on the >> >> concept of marginality, which typically forbids a model having an >> >> interaction but not the corresponding main effects. (You might see the >> >> references in https://en.wikipedia.org/wiki/Principle_of_marginality ) >> >> See "An Introduction to R", by Venables and Smith and the R Core >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the >> >> details are complicated, model formulae in R will normally generate >> >> the models that an expert statistician would expect, provided that >> >> marginality is preserved. Fitting, for [a contrary] example, a model >> >> with an interaction but not the corresponding main effects will in >> >> general lead to surprising results ....". >> >> The Reference Manual states that the R functions dropterm() and >> >> addterm() resp. drop or add only terms such that marginality is >> >> preserved. >> >> >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact >> >> Example 2.1 in Case 2 discussed above. As discussed there, in Stata >> >> and in R the drop of the continuous variable has no effect on the >> >> degrees of freedom here: it is just a reparameterisation of the full >> >> model, protecting you against losing marginality... Hence the >> >> model.matrix 'mm' is still square and nonsingular after the drop of >> >> X1, unless of course when a row is removed from the matrix 'design' >> >> when before creating 'mm'. >> >> >> >> Arie >> >> >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: >> >> > You could possibly try to explain away the behavior for a missing >> >> > main >> >> > effects term, since without the main effects term we don't have main >> >> > effect >> >> > columns in the model matrix used to compute the interaction columns >> >> > (At >> >> > best this is undocumented behavior--I still think it's a bug, as we >> >> > know >> >> > how we would encode the categorical factors if they were in fact >> >> > present. >> >> > It's either specified in contrasts.arg or using the default set in >> >> > options). However, when all the main effects are present, why would >> >> > the >> >> > three-factor interaction column not simply be the product of the main >> >> > effect columns? In my example: we know X1, we know X2, and we know >> >> > X3. >> >> > Why >> >> > does the encoding of X1:X2:X3 depend on whether we specified a >> >> > two-factor >> >> > interaction, AND only changes for specific missing interactions? >> >> > >> >> > In addition, I can use a two-term example similar to yours to show >> >> > how >> >> > this >> >> > behavior results in a singular covariance matrix when, given the >> >> > desired >> >> > factor encoding, it should not be singular. >> >> > >> >> > We start with a full factorial design for a two-level continuous >> >> > factor >> >> > and >> >> > a three-level categorical factor, and remove a single row. This >> >> > design >> >> > matrix does not leave enough degrees of freedom to determine >> >> > goodness-of-fit, but should allow us to obtain parameter estimates. >> >> > >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) >> >> >> design = design[-1,] >> >> >> design >> >> > X1 X2 >> >> > 2 -1 A >> >> > 3 1 B >> >> > 4 -1 B >> >> > 5 1 C >> >> > 6 -1 C >> >> > >> >> > Here, we first calculate the model matrix for the full model, and >> >> > then >> >> > manually remove the X1 column from the model matrix. This gives us >> >> > the >> >> > model matrix one would expect if X1 were removed from the model. We >> >> > then >> >> > successfully calculate the covariance matrix. >> >> > >> >> >> mm = model.matrix(~(X1+X2)^2,data=design) >> >> >> mm >> >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C >> >> > 2 1 -1 0 0 0 0 >> >> > 3 1 1 1 0 1 0 >> >> > 4 1 -1 1 0 -1 0 >> >> > 5 1 1 0 1 0 1 >> >> > 6 1 -1 0 1 0 -1 >> >> > >> >> >> mm = mm[,-2] >> >> >> solve(t(mm) %*% mm) >> >> > (Intercept) X2B X2C X1:X2B X1:X2C >> >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 >> >> > X2B -1 1.5 1.0 0.0 0.0 >> >> > X2C -1 1.0 1.5 0.0 0.0 >> >> > X1:X2B 0 0.0 0.0 0.5 0.0 >> >> > X1:X2C 0 0.0 0.0 0.0 0.5 >> >> > >> >> > Here, we see the actual behavior for model.matrix. The undesired >> >> > re-coding >> >> > of the model matrix interaction term makes the information matrix >> >> > singular. >> >> > >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) >> >> >> mm >> >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C >> >> > 2 1 0 0 -1 0 0 >> >> > 3 1 1 0 0 1 0 >> >> > 4 1 1 0 0 -1 0 >> >> > 5 1 0 1 0 0 1 >> >> > 6 1 0 1 0 0 -1 >> >> > >> >> >> solve(t(mm) %*% mm) >> >> > Error in solve.default(t(mm) %*% mm) : system is computationally >> >> > singular: >> >> > reciprocal condition number = 5.55112e-18 >> >> > >> >> > I still believe this is a bug. >> >> > >> >> > Best regards, >> >> > Tyler Morgan-Wall >> >> > >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate >> >> > <[hidden email]> >> >> > wrote: >> >> > >> >> >> I think it is not a bug. It is a general property of interactions. >> >> >> This property is best observed if all variables are factors >> >> >> (qualitative). >> >> >> >> >> >> For example, you have three variables (factors). You ask for as many >> >> >> interactions as possible, except an interaction term between two >> >> >> particular variables. When this interaction is not a constant, it is >> >> >> different for different values of the remaining variable. More >> >> >> precisely: for all values of that variable. In other words: you have >> >> >> a >> >> >> three-way interaction, with all values of that variable. >> >> >> >> >> >> An even smaller example is the following script with only two >> >> >> variables, each being a factor: >> >> >> >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) >> >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) >> >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) >> >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) >> >> >> >> >> >> The result is: >> >> >> >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C >> >> >> 1 1 0 0 0 0 0 >> >> >> 2 1 1 0 0 0 0 >> >> >> 3 1 0 1 0 0 0 >> >> >> 4 1 1 1 0 1 0 >> >> >> 5 1 0 0 1 0 0 >> >> >> 6 1 1 0 1 0 1 >> >> >> >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C >> >> >> 1 1 0 0 0 0 0 >> >> >> 2 1 0 0 1 0 0 >> >> >> 3 1 1 0 0 0 0 >> >> >> 4 1 1 0 0 1 0 >> >> >> 5 1 0 1 0 0 0 >> >> >> 6 1 0 1 0 0 1 >> >> >> >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C >> >> >> 1 1 0 0 0 0 0 >> >> >> 2 1 1 0 0 0 0 >> >> >> 3 1 0 1 0 0 0 >> >> >> 4 1 1 0 1 0 0 >> >> >> 5 1 0 0 0 1 0 >> >> >> 6 1 1 0 0 0 1 >> >> >> >> >> >> Thus, in the second result, we have no main effect of X1. Instead, >> >> >> the >> >> >> effect of X1 depends on the value of X2; either A or B or C. In >> >> >> fact, >> >> >> this is a two-way interaction, including all three values of X2. In >> >> >> the third result, we have no main effect of X2, The effect of X2 >> >> >> depends on the value of X1; either p or q. >> >> >> >> >> >> A complicating element with your example seems to be that your X1 >> >> >> and >> >> >> X2 are not factors. >> >> >> >> >> >> Arie >> >> >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: >> >> >> > Hi, >> >> >> > >> >> >> > I recently ran into an inconsistency in the way >> >> >> > model.matrix.default >> >> >> > handles factor encoding for higher level interactions with >> >> >> > categorical >> >> >> > variables when the full hierarchy of effects is not present. >> >> >> > Depending on >> >> >> > which lower level interactions are specified, the factor encoding >> >> >> > changes >> >> >> > for a higher level interaction. Consider the following minimal >> >> >> reproducible >> >> >> > example: >> >> >> > >> >> >> > -------------- >> >> >> > >> >> >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B >> >> >> X3C >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> > 1 1 1 1 0 0 1 0 0 0 0 >> >> >> > 0 0 >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> >> >> > 0 0 >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> >> >> > 0 0 >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> >> >> > 0 0 >> >> >> > 5 1 1 1 1 0 1 1 0 1 0 >> >> >> > 1 0 >> >> >> > 6 1 -1 1 1 0 -1 -1 0 1 0 >> >> >> > -1 0 >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 >> >> >> > -1 0 >> >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 >> >> >> > 1 0 >> >> >> > 9 1 1 1 0 1 1 0 1 0 1 >> >> >> > 0 1 >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 1 >> >> >> > 0 -1 >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 >> >> >> > 0 -1 >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 >> >> >> > 0 1 >> >> >> > attr(,"assign") >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 >> >> >> > attr(,"contrasts") >> >> >> > attr(,"contrasts")$X3 >> >> >> > [1] "contr.treatment" >> >> >> > >> >> >> > -------------- >> >> >> > >> >> >> > Specifying the full hierarchy gives us what we expect: the >> >> >> > interaction >> >> >> > columns are simply calculated from the product of the main effect >> >> >> columns. >> >> >> > The interaction term X1:X2:X3 gives us two columns in the model >> >> >> > matrix, >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main >> >> >> > effects. >> >> >> > >> >> >> > If we remove either the X2:X3 interaction or the X1:X3 >> >> >> > interaction, >> >> >> > we >> >> >> get >> >> >> > what we would expect for the X1:X2:X3 interaction, but when we >> >> >> > remove >> >> >> > the >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: >> >> >> > >> >> >> > -------------- >> >> >> > >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) X1 >> >> >> >> X2 >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> > 0 >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> > 0 >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> > 0 >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> > 0 >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> > 0 >> >> >> > 6 1 -1 1 1 0 -1 1 0 -1 >> >> >> > 0 >> >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 >> >> >> > 0 >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> > 0 >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> > 1 >> >> >> > 10 1 -1 1 0 1 -1 0 1 0 >> >> >> > -1 >> >> >> > 11 1 1 -1 0 1 -1 0 -1 0 >> >> >> > -1 >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> > 1 >> >> >> > attr(,"assign") >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> > attr(,"contrasts") >> >> >> > attr(,"contrasts")$X3 >> >> >> > [1] "contr.treatment" >> >> >> > >> >> >> > >> >> >> > >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) X1 >> >> >> >> X2 >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> > 0 >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> > 0 >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> > 0 >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> > 0 >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> > 0 >> >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 >> >> >> > 0 >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 >> >> >> > 0 >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> > 0 >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> > 1 >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 >> >> >> > -1 >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 >> >> >> > -1 >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> > 1 >> >> >> > attr(,"assign") >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> > attr(,"contrasts") >> >> >> > attr(,"contrasts")$X3 >> >> >> > [1] "contr.treatment" >> >> >> > >> >> >> > >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) X1 >> >> >> >> X2 >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C >> >> >> > 1 1 1 1 0 0 0 0 0 0 1 >> >> >> > 0 0 >> >> >> > 2 1 -1 1 0 0 0 0 0 0 -1 >> >> >> > 0 0 >> >> >> > 3 1 1 -1 0 0 0 0 0 0 -1 >> >> >> > 0 0 >> >> >> > 4 1 -1 -1 0 0 0 0 0 0 1 >> >> >> > 0 0 >> >> >> > 5 1 1 1 1 0 1 0 1 0 0 >> >> >> > 1 0 >> >> >> > 6 1 -1 1 1 0 -1 0 1 0 0 >> >> >> > -1 0 >> >> >> > 7 1 1 -1 1 0 1 0 -1 0 0 >> >> >> > -1 0 >> >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 0 >> >> >> > 1 0 >> >> >> > 9 1 1 1 0 1 0 1 0 1 0 >> >> >> > 0 1 >> >> >> > 10 1 -1 1 0 1 0 -1 0 1 0 >> >> >> > 0 -1 >> >> >> > 11 1 1 -1 0 1 0 1 0 -1 0 >> >> >> > 0 -1 >> >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 0 >> >> >> > 0 1 >> >> >> > attr(,"assign") >> >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 >> >> >> > attr(,"contrasts") >> >> >> > attr(,"contrasts")$X3 >> >> >> > [1] "contr.treatment" >> >> >> > >> >> >> > -------------- >> >> >> > >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is now >> >> >> > the >> >> >> > interaction of X1 and X2 with a new encoding for X3 where each >> >> >> > factor >> >> >> level >> >> >> > is represented by its own column. I would expect, given the two >> >> >> > column >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column would >> >> >> > also >> >> >> > be >> >> >> two >> >> >> > columns regardless of what two-factor interactions we also >> >> >> > specified, >> >> >> > but >> >> >> > in this case it switches to three. If other two factor >> >> >> > interactions >> >> >> > are >> >> >> > missing in addition to X1:X2, this issue still occurs. This also >> >> >> > happens >> >> >> > regardless of the contrast specified in contrasts.arg for X3. I >> >> >> > don't >> >> >> > see >> >> >> > any reasoning for this behavior given in the documentation, so I >> >> >> > suspect >> >> >> it >> >> >> > is a bug. >> >> >> > >> >> >> > Best regards, >> >> >> > Tyler Morgan-Wall >> >> >> > >> >> >> > [[alternative HTML version deleted]] >> >> >> > >> >> >> > ______________________________________________ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi Arie,
I understand what you're saying. The following excerpt out of the book shows that F_j does not refer exclusively to categorical factors: "...the rule does not do anything special for them, and it remains valid, in a trivial sense, whenever any of the F_j is numeric rather than categorical." Since F_j refers to both categorical and numeric variables, the behavior of model.matrix is not consistent with the heuristic. Best regards, Tyler On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate <[hidden email]> wrote: > Hello Tyler, > > I rephrase my previous mail, as follows: > > In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical > variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2, > which in the example is dropped from the model. Hence the X3 in T_i > must be encoded by dummy variables, as indeed it is. > > Arie > > > On Thu, Nov 2, 2017 at 4:11 PM, Tyler <[hidden email]> wrote: > > Hi Arie, > > > > The book out of which this behavior is based does not use factor (in this > > section) to refer to categorical factor. I will again point to this > > sentence, from page 40, in the same section and referring to the behavior > > under question, that shows F_j is not limited to categorical factors: > > "Numeric variables appear in the computations as themselves, uncoded. > > Therefore, the rule does not do anything special for them, and it remains > > valid, in a trivial sense, whenever any of the F_j is numeric rather than > > categorical." > > > > Note the "... whenever any of the F_j is numeric rather than > categorical." > > Factor here is used in the more general sense of the word, not referring > to > > the R type "factor." The behavior of R does not match the heuristic that > > it's citing. > > > > Best regards, > > Tyler > > > > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <[hidden email]> > wrote: > >> > >> Hello Tyler, > >> > >> Thank you for searching for, and finding, the basic description of the > >> behavior of R in this matter. > >> > >> I think your example is in agreement with the book. > >> > >> But let me first note the following. You write: "F_j refers to a > >> factor (variable) in a model and not a categorical factor". However: > >> "a factor is a vector object used to specify a discrete > >> classification" (start of chapter 4 of "An Introduction to R".) You > >> might also see the description of the R function factor(). > >> > >> You note that the book says about a factor F_j: > >> "... F_j is coded by contrasts if T_{i(j)} has appeared in the > >> formula and by dummy variables if it has not" > >> > >> You find: > >> "However, the example I gave demonstrated that this dummy variable > >> encoding only occurs for the model where the missing term is the > >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." > >> > >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then > >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i > >> must be encoded by dummy variables, as indeed it is. > >> > >> Arie > >> > >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: > >> > Hi Arie, > >> > > >> > Thank you for your further research into the issue. > >> > > >> > Regarding Stata: On the other hand, JMP gives model matrices that use > >> > the > >> > main effects contrasts in computing the higher order interactions, > >> > without > >> > the dummy variable encoding. I verified this both by analyzing the > >> > linear > >> > model given in my first example and noting that JMP has one more > degree > >> > of > >> > freedom than R for the same model, as well as looking at the generated > >> > model > >> > matrices. It's easy to find a design where JMP will allow us fit our > >> > model > >> > with goodness-of-fit estimates and R will not due to the extra > degree(s) > >> > of > >> > freedom required. Let's keep the conversation limited to R. > >> > > >> > I want to refocus back onto my original bug report, which was not for > a > >> > missing main effects term, but rather for a missing lower-order > >> > interaction > >> > term. The behavior of model.matrix.default() for a missing main > effects > >> > term > >> > is a nice example to demonstrate how model.matrix encodes with dummy > >> > variables instead of contrasts, but doesn't demonstrate the > inconsistent > >> > behavior my bug report highlighted. > >> > > >> > I went looking for documentation on this behavior, and the issue stems > >> > not > >> > from model.matrix.default(), but rather the terms() function in > >> > interpreting > >> > the formula. This "clever" replacement of contrasts by dummy variables > >> > to > >> > maintain marginality (presuming that's the reason) is not described > >> > anywhere > >> > in the documentation for either the model.matrix() or the terms() > >> > function. > >> > In order to find a description for the behavior, I had to look in the > >> > underlying C code, buried above the "TermCode" function of the > "model.c" > >> > file, which says: > >> > > >> > "TermCode decides on the encoding of a model term. Returns 1 if > variable > >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it > >> > is to > >> > be encoded by dummy variables. This is decided using the heuristic > >> > described in Statistical Models in S, page 38." > >> > > >> > I do not have a copy of this book, and I suspect most R users do not > as > >> > well. Thankfully, however, some of the pages describing this behavior > >> > were > >> > available as part of Amazon's "Look Inside" feature--but if not for > >> > that, I > >> > would have no idea what heuristic R was using. Since those pages could > >> > made > >> > unavailable by Amazon at any time, at the very least we have an > problem > >> > with > >> > a lack of documentation. > >> > > >> > However, I still believe there is a bug when comparing R's > >> > implementation to > >> > the heuristic described in the book. From Statistical Models in S, > page > >> > 38-39: > >> > > >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote > the > >> > margin of T_i for factor F_j--that is, the term obtained by dropping > F_j > >> > from T_i. We say that T_{i(j)} has appeared in the formula if there is > >> > some > >> > term T_i' for i' < i such that T_i' contains all the factors appearing > >> > in > >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the > preceding > >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the > >> > formula and by dummy variables if it has not" > >> > > >> > Here, F_j refers to a factor (variable) in a model and not a > categorical > >> > factor, as specified later in that section (page 40): "Numeric > variables > >> > appear in the computations as themselves, uncoded. Therefore, the rule > >> > does > >> > not do anything special for them, and it remains valid, in a trivial > >> > sense, > >> > whenever any of the F_j is numeric rather than categorical." > >> > > >> > Going back to my original example with three variables: X1 (numeric), > X2 > >> > (numeric), X3 (categorical). This heuristic prescribes encoding > X1:X2:X3 > >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the > formula. > >> > When > >> > any of the preceding terms do not exist, this heuristic tells us to > use > >> > dummy variables to encode the interaction (e.g. "F_j [the interaction > >> > term] > >> > is coded ... by dummy variables if it [any of the marginal terms > >> > obtained by > >> > dropping a single factor in the interaction] has not [appeared in the > >> > formula]"). However, the example I gave demonstrated that this dummy > >> > variable encoding only occurs for the model where the missing term is > >> > the > >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the > >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy > variables. > >> > This > >> > is inconsistent with the description of the intended behavior given in > >> > the > >> > book. > >> > > >> > Best regards, > >> > Tyler > >> > > >> > > >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate <[hidden email] > > > >> > wrote: > >> >> > >> >> Hello Tyler, > >> >> > >> >> I want to bring to your attention the following document: "What > >> >> happens if you omit the main effect in a regression model with an > >> >> interaction?" > >> >> > >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-o > mit-the-main-effect-in-a-regression-model-with-an-interaction). > >> >> This gives a useful review of the problem. Your example is Case 2: a > >> >> continuous and a categorical regressor. > >> >> > >> >> The numerical examples are coded in Stata, and they give the same > >> >> result as in R. Hence, if this is a bug in R then it is also a bug in > >> >> Stata. That seems very unlikely. > >> >> > >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: > >> >> > >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) > >> >> print("Full model") > >> >> print(model.matrix(~(socst+grp)^2 ,data=df)) > >> >> print("Example 2.1: drop socst") > >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) > >> >> print("Example 2.2: drop grp") > >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) > >> >> > >> >> This gives indeed the following regressors: > >> >> > >> >> "Full model" > >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 > >> >> "Example 2.1: drop socst" > >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 > socst:grp4 > >> >> "Example 2.2: drop grp" > >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 > >> >> > >> >> There is a little bit of R documentation about this, based on the > >> >> concept of marginality, which typically forbids a model having an > >> >> interaction but not the corresponding main effects. (You might see > the > >> >> references in https://en.wikipedia.org/wiki/Principle_of_marginality > ) > >> >> See "An Introduction to R", by Venables and Smith and the R Core > >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the > >> >> details are complicated, model formulae in R will normally generate > >> >> the models that an expert statistician would expect, provided that > >> >> marginality is preserved. Fitting, for [a contrary] example, a model > >> >> with an interaction but not the corresponding main effects will in > >> >> general lead to surprising results ....". > >> >> The Reference Manual states that the R functions dropterm() and > >> >> addterm() resp. drop or add only terms such that marginality is > >> >> preserved. > >> >> > >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact > >> >> Example 2.1 in Case 2 discussed above. As discussed there, in Stata > >> >> and in R the drop of the continuous variable has no effect on the > >> >> degrees of freedom here: it is just a reparameterisation of the full > >> >> model, protecting you against losing marginality... Hence the > >> >> model.matrix 'mm' is still square and nonsingular after the drop of > >> >> X1, unless of course when a row is removed from the matrix 'design' > >> >> when before creating 'mm'. > >> >> > >> >> Arie > >> >> > >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: > >> >> > You could possibly try to explain away the behavior for a missing > >> >> > main > >> >> > effects term, since without the main effects term we don't have > main > >> >> > effect > >> >> > columns in the model matrix used to compute the interaction columns > >> >> > (At > >> >> > best this is undocumented behavior--I still think it's a bug, as we > >> >> > know > >> >> > how we would encode the categorical factors if they were in fact > >> >> > present. > >> >> > It's either specified in contrasts.arg or using the default set in > >> >> > options). However, when all the main effects are present, why would > >> >> > the > >> >> > three-factor interaction column not simply be the product of the > main > >> >> > effect columns? In my example: we know X1, we know X2, and we know > >> >> > X3. > >> >> > Why > >> >> > does the encoding of X1:X2:X3 depend on whether we specified a > >> >> > two-factor > >> >> > interaction, AND only changes for specific missing interactions? > >> >> > > >> >> > In addition, I can use a two-term example similar to yours to show > >> >> > how > >> >> > this > >> >> > behavior results in a singular covariance matrix when, given the > >> >> > desired > >> >> > factor encoding, it should not be singular. > >> >> > > >> >> > We start with a full factorial design for a two-level continuous > >> >> > factor > >> >> > and > >> >> > a three-level categorical factor, and remove a single row. This > >> >> > design > >> >> > matrix does not leave enough degrees of freedom to determine > >> >> > goodness-of-fit, but should allow us to obtain parameter estimates. > >> >> > > >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) > >> >> >> design = design[-1,] > >> >> >> design > >> >> > X1 X2 > >> >> > 2 -1 A > >> >> > 3 1 B > >> >> > 4 -1 B > >> >> > 5 1 C > >> >> > 6 -1 C > >> >> > > >> >> > Here, we first calculate the model matrix for the full model, and > >> >> > then > >> >> > manually remove the X1 column from the model matrix. This gives us > >> >> > the > >> >> > model matrix one would expect if X1 were removed from the model. We > >> >> > then > >> >> > successfully calculate the covariance matrix. > >> >> > > >> >> >> mm = model.matrix(~(X1+X2)^2,data=design) > >> >> >> mm > >> >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C > >> >> > 2 1 -1 0 0 0 0 > >> >> > 3 1 1 1 0 1 0 > >> >> > 4 1 -1 1 0 -1 0 > >> >> > 5 1 1 0 1 0 1 > >> >> > 6 1 -1 0 1 0 -1 > >> >> > > >> >> >> mm = mm[,-2] > >> >> >> solve(t(mm) %*% mm) > >> >> > (Intercept) X2B X2C X1:X2B X1:X2C > >> >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 > >> >> > X2B -1 1.5 1.0 0.0 0.0 > >> >> > X2C -1 1.0 1.5 0.0 0.0 > >> >> > X1:X2B 0 0.0 0.0 0.5 0.0 > >> >> > X1:X2C 0 0.0 0.0 0.0 0.5 > >> >> > > >> >> > Here, we see the actual behavior for model.matrix. The undesired > >> >> > re-coding > >> >> > of the model matrix interaction term makes the information matrix > >> >> > singular. > >> >> > > >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) > >> >> >> mm > >> >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C > >> >> > 2 1 0 0 -1 0 0 > >> >> > 3 1 1 0 0 1 0 > >> >> > 4 1 1 0 0 -1 0 > >> >> > 5 1 0 1 0 0 1 > >> >> > 6 1 0 1 0 0 -1 > >> >> > > >> >> >> solve(t(mm) %*% mm) > >> >> > Error in solve.default(t(mm) %*% mm) : system is computationally > >> >> > singular: > >> >> > reciprocal condition number = 5.55112e-18 > >> >> > > >> >> > I still believe this is a bug. > >> >> > > >> >> > Best regards, > >> >> > Tyler Morgan-Wall > >> >> > > >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate > >> >> > <[hidden email]> > >> >> > wrote: > >> >> > > >> >> >> I think it is not a bug. It is a general property of interactions. > >> >> >> This property is best observed if all variables are factors > >> >> >> (qualitative). > >> >> >> > >> >> >> For example, you have three variables (factors). You ask for as > many > >> >> >> interactions as possible, except an interaction term between two > >> >> >> particular variables. When this interaction is not a constant, it > is > >> >> >> different for different values of the remaining variable. More > >> >> >> precisely: for all values of that variable. In other words: you > have > >> >> >> a > >> >> >> three-way interaction, with all values of that variable. > >> >> >> > >> >> >> An even smaller example is the following script with only two > >> >> >> variables, each being a factor: > >> >> >> > >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) > >> >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) > >> >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) > >> >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) > >> >> >> > >> >> >> The result is: > >> >> >> > >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C > >> >> >> 1 1 0 0 0 0 0 > >> >> >> 2 1 1 0 0 0 0 > >> >> >> 3 1 0 1 0 0 0 > >> >> >> 4 1 1 1 0 1 0 > >> >> >> 5 1 0 0 1 0 0 > >> >> >> 6 1 1 0 1 0 1 > >> >> >> > >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C > >> >> >> 1 1 0 0 0 0 0 > >> >> >> 2 1 0 0 1 0 0 > >> >> >> 3 1 1 0 0 0 0 > >> >> >> 4 1 1 0 0 1 0 > >> >> >> 5 1 0 1 0 0 0 > >> >> >> 6 1 0 1 0 0 1 > >> >> >> > >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C > >> >> >> 1 1 0 0 0 0 0 > >> >> >> 2 1 1 0 0 0 0 > >> >> >> 3 1 0 1 0 0 0 > >> >> >> 4 1 1 0 1 0 0 > >> >> >> 5 1 0 0 0 1 0 > >> >> >> 6 1 1 0 0 0 1 > >> >> >> > >> >> >> Thus, in the second result, we have no main effect of X1. Instead, > >> >> >> the > >> >> >> effect of X1 depends on the value of X2; either A or B or C. In > >> >> >> fact, > >> >> >> this is a two-way interaction, including all three values of X2. > In > >> >> >> the third result, we have no main effect of X2, The effect of X2 > >> >> >> depends on the value of X1; either p or q. > >> >> >> > >> >> >> A complicating element with your example seems to be that your X1 > >> >> >> and > >> >> >> X2 are not factors. > >> >> >> > >> >> >> Arie > >> >> >> > >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: > >> >> >> > Hi, > >> >> >> > > >> >> >> > I recently ran into an inconsistency in the way > >> >> >> > model.matrix.default > >> >> >> > handles factor encoding for higher level interactions with > >> >> >> > categorical > >> >> >> > variables when the full hierarchy of effects is not present. > >> >> >> > Depending on > >> >> >> > which lower level interactions are specified, the factor > encoding > >> >> >> > changes > >> >> >> > for a higher level interaction. Consider the following minimal > >> >> >> reproducible > >> >> >> > example: > >> >> >> > > >> >> >> > -------------- > >> >> >> > > >> >> >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1, > -1),X3=c("A","B","C"))> > >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 > X3B > >> >> >> X3C > >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> >> > 1 1 1 1 0 0 1 0 0 0 0 > >> >> >> > 0 0 > >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 > >> >> >> > 0 0 > >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 > >> >> >> > 0 0 > >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 > >> >> >> > 0 0 > >> >> >> > 5 1 1 1 1 0 1 1 0 1 0 > >> >> >> > 1 0 > >> >> >> > 6 1 -1 1 1 0 -1 -1 0 1 0 > >> >> >> > -1 0 > >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 > >> >> >> > -1 0 > >> >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 > >> >> >> > 1 0 > >> >> >> > 9 1 1 1 0 1 1 0 1 0 1 > >> >> >> > 0 1 > >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 1 > >> >> >> > 0 -1 > >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 > >> >> >> > 0 -1 > >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 > >> >> >> > 0 1 > >> >> >> > attr(,"assign") > >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 > >> >> >> > attr(,"contrasts") > >> >> >> > attr(,"contrasts")$X3 > >> >> >> > [1] "contr.treatment" > >> >> >> > > >> >> >> > -------------- > >> >> >> > > >> >> >> > Specifying the full hierarchy gives us what we expect: the > >> >> >> > interaction > >> >> >> > columns are simply calculated from the product of the main > effect > >> >> >> columns. > >> >> >> > The interaction term X1:X2:X3 gives us two columns in the model > >> >> >> > matrix, > >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main > >> >> >> > effects. > >> >> >> > > >> >> >> > If we remove either the X2:X3 interaction or the X1:X3 > >> >> >> > interaction, > >> >> >> > we > >> >> >> get > >> >> >> > what we would expect for the X1:X2:X3 interaction, but when we > >> >> >> > remove > >> >> >> > the > >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: > >> >> >> > > >> >> >> > -------------- > >> >> >> > > >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) > (Intercept) X1 > >> >> >> >> X2 > >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> >> > 1 1 1 1 0 0 1 0 0 0 > >> >> >> > 0 > >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 > >> >> >> > 0 > >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 > >> >> >> > 0 > >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 > >> >> >> > 0 > >> >> >> > 5 1 1 1 1 0 1 1 0 1 > >> >> >> > 0 > >> >> >> > 6 1 -1 1 1 0 -1 1 0 -1 > >> >> >> > 0 > >> >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 > >> >> >> > 0 > >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > >> >> >> > 0 > >> >> >> > 9 1 1 1 0 1 1 0 1 0 > >> >> >> > 1 > >> >> >> > 10 1 -1 1 0 1 -1 0 1 0 > >> >> >> > -1 > >> >> >> > 11 1 1 -1 0 1 -1 0 -1 0 > >> >> >> > -1 > >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > >> >> >> > 1 > >> >> >> > attr(,"assign") > >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> >> > attr(,"contrasts") > >> >> >> > attr(,"contrasts")$X3 > >> >> >> > [1] "contr.treatment" > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) > (Intercept) X1 > >> >> >> >> X2 > >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C > >> >> >> > 1 1 1 1 0 0 1 0 0 0 > >> >> >> > 0 > >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 > >> >> >> > 0 > >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 > >> >> >> > 0 > >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 > >> >> >> > 0 > >> >> >> > 5 1 1 1 1 0 1 1 0 1 > >> >> >> > 0 > >> >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 > >> >> >> > 0 > >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 > >> >> >> > 0 > >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > >> >> >> > 0 > >> >> >> > 9 1 1 1 0 1 1 0 1 0 > >> >> >> > 1 > >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 > >> >> >> > -1 > >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 > >> >> >> > -1 > >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > >> >> >> > 1 > >> >> >> > attr(,"assign") > >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> >> > attr(,"contrasts") > >> >> >> > attr(,"contrasts")$X3 > >> >> >> > [1] "contr.treatment" > >> >> >> > > >> >> >> > > >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) > (Intercept) X1 > >> >> >> >> X2 > >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C > >> >> >> > 1 1 1 1 0 0 0 0 0 0 > 1 > >> >> >> > 0 0 > >> >> >> > 2 1 -1 1 0 0 0 0 0 0 > -1 > >> >> >> > 0 0 > >> >> >> > 3 1 1 -1 0 0 0 0 0 0 > -1 > >> >> >> > 0 0 > >> >> >> > 4 1 -1 -1 0 0 0 0 0 0 > 1 > >> >> >> > 0 0 > >> >> >> > 5 1 1 1 1 0 1 0 1 0 > 0 > >> >> >> > 1 0 > >> >> >> > 6 1 -1 1 1 0 -1 0 1 0 > 0 > >> >> >> > -1 0 > >> >> >> > 7 1 1 -1 1 0 1 0 -1 0 > 0 > >> >> >> > -1 0 > >> >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 > 0 > >> >> >> > 1 0 > >> >> >> > 9 1 1 1 0 1 0 1 0 1 > 0 > >> >> >> > 0 1 > >> >> >> > 10 1 -1 1 0 1 0 -1 0 1 > 0 > >> >> >> > 0 -1 > >> >> >> > 11 1 1 -1 0 1 0 1 0 -1 > 0 > >> >> >> > 0 -1 > >> >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 > 0 > >> >> >> > 0 1 > >> >> >> > attr(,"assign") > >> >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 > >> >> >> > attr(,"contrasts") > >> >> >> > attr(,"contrasts")$X3 > >> >> >> > [1] "contr.treatment" > >> >> >> > > >> >> >> > -------------- > >> >> >> > > >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is > now > >> >> >> > the > >> >> >> > interaction of X1 and X2 with a new encoding for X3 where each > >> >> >> > factor > >> >> >> level > >> >> >> > is represented by its own column. I would expect, given the two > >> >> >> > column > >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column would > >> >> >> > also > >> >> >> > be > >> >> >> two > >> >> >> > columns regardless of what two-factor interactions we also > >> >> >> > specified, > >> >> >> > but > >> >> >> > in this case it switches to three. If other two factor > >> >> >> > interactions > >> >> >> > are > >> >> >> > missing in addition to X1:X2, this issue still occurs. This also > >> >> >> > happens > >> >> >> > regardless of the contrast specified in contrasts.arg for X3. I > >> >> >> > don't > >> >> >> > see > >> >> >> > any reasoning for this behavior given in the documentation, so I > >> >> >> > suspect > >> >> >> it > >> >> >> > is a bug. > >> >> >> > > >> >> >> > Best regards, > >> >> >> > Tyler Morgan-Wall > >> >> >> > > >> >> >> > [[alternative HTML version deleted]] > >> >> >> > > >> >> >> > ______________________________________________ > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hello Tyler,
You write that you understand what I am saying. However, I am now at loss about what exactly is the problem with the behavior of R. Here is a script which reproduces your experiments with three variables (excluding the full model): m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C")) model.matrix(~(X1+X2+X3)^3-X1:X3,data=m) model.matrix(~(X1+X2+X3)^3-X2:X3,data=m) model.matrix(~(X1+X2+X3)^3-X1:X2,data=m) Below are the three results, similar to your first mail. (The first two are basically the same, of course.) Please pick one result which you think is not consistent with the heuristic and please give what you think is the correct result: model.matrix(~(X1+X2+X3)^3-X1:X3) (Intercept) X1 X2 X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C model.matrix(~(X1+X2+X3)^3-X2:X3) (Intercept) X1 X2 X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C model.matrix(~(X1+X2+X3)^3-X1:X2) (Intercept) X1 X2 X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C (I take it that the combination of X3A and X3B and X3C implies dummy encoding, and the combination of only X3B and X3C implies contrasts encoding, with respect to X3A.) Thanks in advance, Arie On Sat, Nov 4, 2017 at 5:33 PM, Tyler <[hidden email]> wrote: > Hi Arie, > > I understand what you're saying. The following excerpt out of the book shows > that F_j does not refer exclusively to categorical factors: "...the rule > does not do anything special for them, and it remains valid, in a trivial > sense, whenever any of the F_j is numeric rather than categorical." Since > F_j refers to both categorical and numeric variables, the behavior of > model.matrix is not consistent with the heuristic. > > Best regards, > Tyler > > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate <[hidden email]> wrote: >> >> Hello Tyler, >> >> I rephrase my previous mail, as follows: >> >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2, >> which in the example is dropped from the model. Hence the X3 in T_i >> must be encoded by dummy variables, as indeed it is. >> >> Arie >> >> >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler <[hidden email]> wrote: >> > Hi Arie, >> > >> > The book out of which this behavior is based does not use factor (in >> > this >> > section) to refer to categorical factor. I will again point to this >> > sentence, from page 40, in the same section and referring to the >> > behavior >> > under question, that shows F_j is not limited to categorical factors: >> > "Numeric variables appear in the computations as themselves, uncoded. >> > Therefore, the rule does not do anything special for them, and it >> > remains >> > valid, in a trivial sense, whenever any of the F_j is numeric rather >> > than >> > categorical." >> > >> > Note the "... whenever any of the F_j is numeric rather than >> > categorical." >> > Factor here is used in the more general sense of the word, not referring >> > to >> > the R type "factor." The behavior of R does not match the heuristic that >> > it's citing. >> > >> > Best regards, >> > Tyler >> > >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <[hidden email]> >> > wrote: >> >> >> >> Hello Tyler, >> >> >> >> Thank you for searching for, and finding, the basic description of the >> >> behavior of R in this matter. >> >> >> >> I think your example is in agreement with the book. >> >> >> >> But let me first note the following. You write: "F_j refers to a >> >> factor (variable) in a model and not a categorical factor". However: >> >> "a factor is a vector object used to specify a discrete >> >> classification" (start of chapter 4 of "An Introduction to R".) You >> >> might also see the description of the R function factor(). >> >> >> >> You note that the book says about a factor F_j: >> >> "... F_j is coded by contrasts if T_{i(j)} has appeared in the >> >> formula and by dummy variables if it has not" >> >> >> >> You find: >> >> "However, the example I gave demonstrated that this dummy variable >> >> encoding only occurs for the model where the missing term is the >> >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." >> >> >> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then >> >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i >> >> must be encoded by dummy variables, as indeed it is. >> >> >> >> Arie >> >> >> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: >> >> > Hi Arie, >> >> > >> >> > Thank you for your further research into the issue. >> >> > >> >> > Regarding Stata: On the other hand, JMP gives model matrices that use >> >> > the >> >> > main effects contrasts in computing the higher order interactions, >> >> > without >> >> > the dummy variable encoding. I verified this both by analyzing the >> >> > linear >> >> > model given in my first example and noting that JMP has one more >> >> > degree >> >> > of >> >> > freedom than R for the same model, as well as looking at the >> >> > generated >> >> > model >> >> > matrices. It's easy to find a design where JMP will allow us fit our >> >> > model >> >> > with goodness-of-fit estimates and R will not due to the extra >> >> > degree(s) >> >> > of >> >> > freedom required. Let's keep the conversation limited to R. >> >> > >> >> > I want to refocus back onto my original bug report, which was not for >> >> > a >> >> > missing main effects term, but rather for a missing lower-order >> >> > interaction >> >> > term. The behavior of model.matrix.default() for a missing main >> >> > effects >> >> > term >> >> > is a nice example to demonstrate how model.matrix encodes with dummy >> >> > variables instead of contrasts, but doesn't demonstrate the >> >> > inconsistent >> >> > behavior my bug report highlighted. >> >> > >> >> > I went looking for documentation on this behavior, and the issue >> >> > stems >> >> > not >> >> > from model.matrix.default(), but rather the terms() function in >> >> > interpreting >> >> > the formula. This "clever" replacement of contrasts by dummy >> >> > variables >> >> > to >> >> > maintain marginality (presuming that's the reason) is not described >> >> > anywhere >> >> > in the documentation for either the model.matrix() or the terms() >> >> > function. >> >> > In order to find a description for the behavior, I had to look in the >> >> > underlying C code, buried above the "TermCode" function of the >> >> > "model.c" >> >> > file, which says: >> >> > >> >> > "TermCode decides on the encoding of a model term. Returns 1 if >> >> > variable >> >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if >> >> > it >> >> > is to >> >> > be encoded by dummy variables. This is decided using the heuristic >> >> > described in Statistical Models in S, page 38." >> >> > >> >> > I do not have a copy of this book, and I suspect most R users do not >> >> > as >> >> > well. Thankfully, however, some of the pages describing this behavior >> >> > were >> >> > available as part of Amazon's "Look Inside" feature--but if not for >> >> > that, I >> >> > would have no idea what heuristic R was using. Since those pages >> >> > could >> >> > made >> >> > unavailable by Amazon at any time, at the very least we have an >> >> > problem >> >> > with >> >> > a lack of documentation. >> >> > >> >> > However, I still believe there is a bug when comparing R's >> >> > implementation to >> >> > the heuristic described in the book. From Statistical Models in S, >> >> > page >> >> > 38-39: >> >> > >> >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote >> >> > the >> >> > margin of T_i for factor F_j--that is, the term obtained by dropping >> >> > F_j >> >> > from T_i. We say that T_{i(j)} has appeared in the formula if there >> >> > is >> >> > some >> >> > term T_i' for i' < i such that T_i' contains all the factors >> >> > appearing >> >> > in >> >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the >> >> > preceding >> >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the >> >> > formula and by dummy variables if it has not" >> >> > >> >> > Here, F_j refers to a factor (variable) in a model and not a >> >> > categorical >> >> > factor, as specified later in that section (page 40): "Numeric >> >> > variables >> >> > appear in the computations as themselves, uncoded. Therefore, the >> >> > rule >> >> > does >> >> > not do anything special for them, and it remains valid, in a trivial >> >> > sense, >> >> > whenever any of the F_j is numeric rather than categorical." >> >> > >> >> > Going back to my original example with three variables: X1 (numeric), >> >> > X2 >> >> > (numeric), X3 (categorical). This heuristic prescribes encoding >> >> > X1:X2:X3 >> >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the >> >> > formula. >> >> > When >> >> > any of the preceding terms do not exist, this heuristic tells us to >> >> > use >> >> > dummy variables to encode the interaction (e.g. "F_j [the interaction >> >> > term] >> >> > is coded ... by dummy variables if it [any of the marginal terms >> >> > obtained by >> >> > dropping a single factor in the interaction] has not [appeared in the >> >> > formula]"). However, the example I gave demonstrated that this dummy >> >> > variable encoding only occurs for the model where the missing term is >> >> > the >> >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the >> >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy >> >> > variables. >> >> > This >> >> > is inconsistent with the description of the intended behavior given >> >> > in >> >> > the >> >> > book. >> >> > >> >> > Best regards, >> >> > Tyler >> >> > >> >> > >> >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate >> >> > <[hidden email]> >> >> > wrote: >> >> >> >> >> >> Hello Tyler, >> >> >> >> >> >> I want to bring to your attention the following document: "What >> >> >> happens if you omit the main effect in a regression model with an >> >> >> interaction?" >> >> >> >> >> >> >> >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction). >> >> >> This gives a useful review of the problem. Your example is Case 2: a >> >> >> continuous and a categorical regressor. >> >> >> >> >> >> The numerical examples are coded in Stata, and they give the same >> >> >> result as in R. Hence, if this is a bug in R then it is also a bug >> >> >> in >> >> >> Stata. That seems very unlikely. >> >> >> >> >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: >> >> >> >> >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) >> >> >> print("Full model") >> >> >> print(model.matrix(~(socst+grp)^2 ,data=df)) >> >> >> print("Example 2.1: drop socst") >> >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) >> >> >> print("Example 2.2: drop grp") >> >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) >> >> >> >> >> >> This gives indeed the following regressors: >> >> >> >> >> >> "Full model" >> >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 >> >> >> "Example 2.1: drop socst" >> >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 >> >> >> socst:grp4 >> >> >> "Example 2.2: drop grp" >> >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 >> >> >> >> >> >> There is a little bit of R documentation about this, based on the >> >> >> concept of marginality, which typically forbids a model having an >> >> >> interaction but not the corresponding main effects. (You might see >> >> >> the >> >> >> references in https://en.wikipedia.org/wiki/Principle_of_marginality >> >> >> ) >> >> >> See "An Introduction to R", by Venables and Smith and the R Core >> >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the >> >> >> details are complicated, model formulae in R will normally generate >> >> >> the models that an expert statistician would expect, provided that >> >> >> marginality is preserved. Fitting, for [a contrary] example, a model >> >> >> with an interaction but not the corresponding main effects will in >> >> >> general lead to surprising results ....". >> >> >> The Reference Manual states that the R functions dropterm() and >> >> >> addterm() resp. drop or add only terms such that marginality is >> >> >> preserved. >> >> >> >> >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact >> >> >> Example 2.1 in Case 2 discussed above. As discussed there, in Stata >> >> >> and in R the drop of the continuous variable has no effect on the >> >> >> degrees of freedom here: it is just a reparameterisation of the full >> >> >> model, protecting you against losing marginality... Hence the >> >> >> model.matrix 'mm' is still square and nonsingular after the drop of >> >> >> X1, unless of course when a row is removed from the matrix 'design' >> >> >> when before creating 'mm'. >> >> >> >> >> >> Arie >> >> >> >> >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: >> >> >> > You could possibly try to explain away the behavior for a missing >> >> >> > main >> >> >> > effects term, since without the main effects term we don't have >> >> >> > main >> >> >> > effect >> >> >> > columns in the model matrix used to compute the interaction >> >> >> > columns >> >> >> > (At >> >> >> > best this is undocumented behavior--I still think it's a bug, as >> >> >> > we >> >> >> > know >> >> >> > how we would encode the categorical factors if they were in fact >> >> >> > present. >> >> >> > It's either specified in contrasts.arg or using the default set in >> >> >> > options). However, when all the main effects are present, why >> >> >> > would >> >> >> > the >> >> >> > three-factor interaction column not simply be the product of the >> >> >> > main >> >> >> > effect columns? In my example: we know X1, we know X2, and we know >> >> >> > X3. >> >> >> > Why >> >> >> > does the encoding of X1:X2:X3 depend on whether we specified a >> >> >> > two-factor >> >> >> > interaction, AND only changes for specific missing interactions? >> >> >> > >> >> >> > In addition, I can use a two-term example similar to yours to show >> >> >> > how >> >> >> > this >> >> >> > behavior results in a singular covariance matrix when, given the >> >> >> > desired >> >> >> > factor encoding, it should not be singular. >> >> >> > >> >> >> > We start with a full factorial design for a two-level continuous >> >> >> > factor >> >> >> > and >> >> >> > a three-level categorical factor, and remove a single row. This >> >> >> > design >> >> >> > matrix does not leave enough degrees of freedom to determine >> >> >> > goodness-of-fit, but should allow us to obtain parameter >> >> >> > estimates. >> >> >> > >> >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) >> >> >> >> design = design[-1,] >> >> >> >> design >> >> >> > X1 X2 >> >> >> > 2 -1 A >> >> >> > 3 1 B >> >> >> > 4 -1 B >> >> >> > 5 1 C >> >> >> > 6 -1 C >> >> >> > >> >> >> > Here, we first calculate the model matrix for the full model, and >> >> >> > then >> >> >> > manually remove the X1 column from the model matrix. This gives us >> >> >> > the >> >> >> > model matrix one would expect if X1 were removed from the model. >> >> >> > We >> >> >> > then >> >> >> > successfully calculate the covariance matrix. >> >> >> > >> >> >> >> mm = model.matrix(~(X1+X2)^2,data=design) >> >> >> >> mm >> >> >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C >> >> >> > 2 1 -1 0 0 0 0 >> >> >> > 3 1 1 1 0 1 0 >> >> >> > 4 1 -1 1 0 -1 0 >> >> >> > 5 1 1 0 1 0 1 >> >> >> > 6 1 -1 0 1 0 -1 >> >> >> > >> >> >> >> mm = mm[,-2] >> >> >> >> solve(t(mm) %*% mm) >> >> >> > (Intercept) X2B X2C X1:X2B X1:X2C >> >> >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 >> >> >> > X2B -1 1.5 1.0 0.0 0.0 >> >> >> > X2C -1 1.0 1.5 0.0 0.0 >> >> >> > X1:X2B 0 0.0 0.0 0.5 0.0 >> >> >> > X1:X2C 0 0.0 0.0 0.0 0.5 >> >> >> > >> >> >> > Here, we see the actual behavior for model.matrix. The undesired >> >> >> > re-coding >> >> >> > of the model matrix interaction term makes the information matrix >> >> >> > singular. >> >> >> > >> >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) >> >> >> >> mm >> >> >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C >> >> >> > 2 1 0 0 -1 0 0 >> >> >> > 3 1 1 0 0 1 0 >> >> >> > 4 1 1 0 0 -1 0 >> >> >> > 5 1 0 1 0 0 1 >> >> >> > 6 1 0 1 0 0 -1 >> >> >> > >> >> >> >> solve(t(mm) %*% mm) >> >> >> > Error in solve.default(t(mm) %*% mm) : system is computationally >> >> >> > singular: >> >> >> > reciprocal condition number = 5.55112e-18 >> >> >> > >> >> >> > I still believe this is a bug. >> >> >> > >> >> >> > Best regards, >> >> >> > Tyler Morgan-Wall >> >> >> > >> >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate >> >> >> > <[hidden email]> >> >> >> > wrote: >> >> >> > >> >> >> >> I think it is not a bug. It is a general property of >> >> >> >> interactions. >> >> >> >> This property is best observed if all variables are factors >> >> >> >> (qualitative). >> >> >> >> >> >> >> >> For example, you have three variables (factors). You ask for as >> >> >> >> many >> >> >> >> interactions as possible, except an interaction term between two >> >> >> >> particular variables. When this interaction is not a constant, it >> >> >> >> is >> >> >> >> different for different values of the remaining variable. More >> >> >> >> precisely: for all values of that variable. In other words: you >> >> >> >> have >> >> >> >> a >> >> >> >> three-way interaction, with all values of that variable. >> >> >> >> >> >> >> >> An even smaller example is the following script with only two >> >> >> >> variables, each being a factor: >> >> >> >> >> >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) >> >> >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) >> >> >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) >> >> >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) >> >> >> >> >> >> >> >> The result is: >> >> >> >> >> >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> 2 1 1 0 0 0 0 >> >> >> >> 3 1 0 1 0 0 0 >> >> >> >> 4 1 1 1 0 1 0 >> >> >> >> 5 1 0 0 1 0 0 >> >> >> >> 6 1 1 0 1 0 1 >> >> >> >> >> >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> 2 1 0 0 1 0 0 >> >> >> >> 3 1 1 0 0 0 0 >> >> >> >> 4 1 1 0 0 1 0 >> >> >> >> 5 1 0 1 0 0 0 >> >> >> >> 6 1 0 1 0 0 1 >> >> >> >> >> >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> 2 1 1 0 0 0 0 >> >> >> >> 3 1 0 1 0 0 0 >> >> >> >> 4 1 1 0 1 0 0 >> >> >> >> 5 1 0 0 0 1 0 >> >> >> >> 6 1 1 0 0 0 1 >> >> >> >> >> >> >> >> Thus, in the second result, we have no main effect of X1. >> >> >> >> Instead, >> >> >> >> the >> >> >> >> effect of X1 depends on the value of X2; either A or B or C. In >> >> >> >> fact, >> >> >> >> this is a two-way interaction, including all three values of X2. >> >> >> >> In >> >> >> >> the third result, we have no main effect of X2, The effect of X2 >> >> >> >> depends on the value of X1; either p or q. >> >> >> >> >> >> >> >> A complicating element with your example seems to be that your X1 >> >> >> >> and >> >> >> >> X2 are not factors. >> >> >> >> >> >> >> >> Arie >> >> >> >> >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> wrote: >> >> >> >> > Hi, >> >> >> >> > >> >> >> >> > I recently ran into an inconsistency in the way >> >> >> >> > model.matrix.default >> >> >> >> > handles factor encoding for higher level interactions with >> >> >> >> > categorical >> >> >> >> > variables when the full hierarchy of effects is not present. >> >> >> >> > Depending on >> >> >> >> > which lower level interactions are specified, the factor >> >> >> >> > encoding >> >> >> >> > changes >> >> >> >> > for a higher level interaction. Consider the following minimal >> >> >> >> reproducible >> >> >> >> > example: >> >> >> >> > >> >> >> >> > -------------- >> >> >> >> > >> >> >> >> >> runmatrix = >> >> >> >> >> expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> >> >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 >> >> >> >> X3B >> >> >> >> X3C >> >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 0 >> >> >> >> > 0 0 >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 >> >> >> >> > 0 0 >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 >> >> >> >> > 0 0 >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 >> >> >> >> > 0 0 >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 0 >> >> >> >> > 1 0 >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 1 0 >> >> >> >> > -1 0 >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 >> >> >> >> > -1 0 >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 >> >> >> >> > 1 0 >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 1 >> >> >> >> > 0 1 >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 1 >> >> >> >> > 0 -1 >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 >> >> >> >> > 0 -1 >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 >> >> >> >> > 0 1 >> >> >> >> > attr(,"assign") >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 >> >> >> >> > attr(,"contrasts") >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> > [1] "contr.treatment" >> >> >> >> > >> >> >> >> > -------------- >> >> >> >> > >> >> >> >> > Specifying the full hierarchy gives us what we expect: the >> >> >> >> > interaction >> >> >> >> > columns are simply calculated from the product of the main >> >> >> >> > effect >> >> >> >> columns. >> >> >> >> > The interaction term X1:X2:X3 gives us two columns in the model >> >> >> >> > matrix, >> >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main >> >> >> >> > effects. >> >> >> >> > >> >> >> >> > If we remove either the X2:X3 interaction or the X1:X3 >> >> >> >> > interaction, >> >> >> >> > we >> >> >> >> get >> >> >> >> > what we would expect for the X1:X2:X3 interaction, but when we >> >> >> >> > remove >> >> >> >> > the >> >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: >> >> >> >> > >> >> >> >> > -------------- >> >> >> >> > >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) >> >> >> >> >> X1 >> >> >> >> >> X2 >> >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> > 0 >> >> >> >> > 6 1 -1 1 1 0 -1 1 0 -1 >> >> >> >> > 0 >> >> >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 >> >> >> >> > 0 >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> >> > 0 >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> > 1 >> >> >> >> > 10 1 -1 1 0 1 -1 0 1 0 >> >> >> >> > -1 >> >> >> >> > 11 1 1 -1 0 1 -1 0 -1 0 >> >> >> >> > -1 >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> > 1 >> >> >> >> > attr(,"assign") >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> >> > attr(,"contrasts") >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> > [1] "contr.treatment" >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) >> >> >> >> >> X1 >> >> >> >> >> X2 >> >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> > 0 >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> > 0 >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 >> >> >> >> > 0 >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 >> >> >> >> > 0 >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> >> > 0 >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> > 1 >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 >> >> >> >> > -1 >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 >> >> >> >> > -1 >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> > 1 >> >> >> >> > attr(,"assign") >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> >> > attr(,"contrasts") >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> > [1] "contr.treatment" >> >> >> >> > >> >> >> >> > >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) >> >> >> >> >> X1 >> >> >> >> >> X2 >> >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C >> >> >> >> > 1 1 1 1 0 0 0 0 0 0 >> >> >> >> > 1 >> >> >> >> > 0 0 >> >> >> >> > 2 1 -1 1 0 0 0 0 0 0 >> >> >> >> > -1 >> >> >> >> > 0 0 >> >> >> >> > 3 1 1 -1 0 0 0 0 0 0 >> >> >> >> > -1 >> >> >> >> > 0 0 >> >> >> >> > 4 1 -1 -1 0 0 0 0 0 0 >> >> >> >> > 1 >> >> >> >> > 0 0 >> >> >> >> > 5 1 1 1 1 0 1 0 1 0 >> >> >> >> > 0 >> >> >> >> > 1 0 >> >> >> >> > 6 1 -1 1 1 0 -1 0 1 0 >> >> >> >> > 0 >> >> >> >> > -1 0 >> >> >> >> > 7 1 1 -1 1 0 1 0 -1 0 >> >> >> >> > 0 >> >> >> >> > -1 0 >> >> >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 >> >> >> >> > 0 >> >> >> >> > 1 0 >> >> >> >> > 9 1 1 1 0 1 0 1 0 1 >> >> >> >> > 0 >> >> >> >> > 0 1 >> >> >> >> > 10 1 -1 1 0 1 0 -1 0 1 >> >> >> >> > 0 >> >> >> >> > 0 -1 >> >> >> >> > 11 1 1 -1 0 1 0 1 0 -1 >> >> >> >> > 0 >> >> >> >> > 0 -1 >> >> >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 >> >> >> >> > 0 >> >> >> >> > 0 1 >> >> >> >> > attr(,"assign") >> >> >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 >> >> >> >> > attr(,"contrasts") >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> > [1] "contr.treatment" >> >> >> >> > >> >> >> >> > -------------- >> >> >> >> > >> >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is >> >> >> >> > now >> >> >> >> > the >> >> >> >> > interaction of X1 and X2 with a new encoding for X3 where each >> >> >> >> > factor >> >> >> >> level >> >> >> >> > is represented by its own column. I would expect, given the two >> >> >> >> > column >> >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column would >> >> >> >> > also >> >> >> >> > be >> >> >> >> two >> >> >> >> > columns regardless of what two-factor interactions we also >> >> >> >> > specified, >> >> >> >> > but >> >> >> >> > in this case it switches to three. If other two factor >> >> >> >> > interactions >> >> >> >> > are >> >> >> >> > missing in addition to X1:X2, this issue still occurs. This >> >> >> >> > also >> >> >> >> > happens >> >> >> >> > regardless of the contrast specified in contrasts.arg for X3. I >> >> >> >> > don't >> >> >> >> > see >> >> >> >> > any reasoning for this behavior given in the documentation, so >> >> >> >> > I >> >> >> >> > suspect >> >> >> >> it >> >> >> >> > is a bug. >> >> >> >> > >> >> >> >> > Best regards, >> >> >> >> > Tyler Morgan-Wall >> >> >> >> > >> >> >> >> > [[alternative HTML version deleted]] >> >> >> >> > >> >> >> >> > ______________________________________________ > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi Arie,
Given the heuristic, in all of my examples with a missing two-factor interaction the three-factor interaction should be coded with dummy variables. In reality, it is encoded by dummy variables only when the numeric:numeric interaction is missing, and by contrasts for the other two. The heuristic does not specify separate behavior for numeric vs categorical factors (When the author of Statistical Models in S refers to F_j as a "factor", it is a more general usage than the R type "factor" and includes numeric variables--the language used later on in the chapter on page 40 confirms this): when there is a missing marginal term in the formula, the higher-order interaction should be coded by dummy variables, regardless of type. Thus, the terms() function is only following the cited behavior 1/3rd of the time. Best regards, Tyler On Mon, Nov 6, 2017 at 6:45 AM, Arie ten Cate <[hidden email]> wrote: > Hello Tyler, > > You write that you understand what I am saying. However, I am now at > loss about what exactly is the problem with the behavior of R. Here > is a script which reproduces your experiments with three variables > (excluding the full model): > > m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C")) > model.matrix(~(X1+X2+X3)^3-X1:X3,data=m) > model.matrix(~(X1+X2+X3)^3-X2:X3,data=m) > model.matrix(~(X1+X2+X3)^3-X1:X2,data=m) > > Below are the three results, similar to your first mail. (The first > two are basically the same, of course.) Please pick one result which > you think is not consistent with the heuristic and please give what > you think is the correct result: > > model.matrix(~(X1+X2+X3)^3-X1:X3) > (Intercept) > X1 X2 X3B X3C > X1:X2 X2:X3B X2:X3C > X1:X2:X3B X1:X2:X3C > > model.matrix(~(X1+X2+X3)^3-X2:X3) > (Intercept) > X1 X2 X3B X3C > X1:X2 X1:X3B X1:X3C > X1:X2:X3B X1:X2:X3C > > model.matrix(~(X1+X2+X3)^3-X1:X2) > (Intercept) > X1 X2 X3B X3C > X1:X3B X1:X3C X2:X3B X2:X3C > X1:X2:X3A X1:X2:X3B X1:X2:X3C > > (I take it that the combination of X3A and X3B and X3C implies dummy > encoding, and the combination of only X3B and X3C implies contrasts > encoding, with respect to X3A.) > > Thanks in advance, > > Arie > > > On Sat, Nov 4, 2017 at 5:33 PM, Tyler <[hidden email]> wrote: > > Hi Arie, > > > > I understand what you're saying. The following excerpt out of the book > shows > > that F_j does not refer exclusively to categorical factors: "...the rule > > does not do anything special for them, and it remains valid, in a trivial > > sense, whenever any of the F_j is numeric rather than categorical." Since > > F_j refers to both categorical and numeric variables, the behavior of > > model.matrix is not consistent with the heuristic. > > > > Best regards, > > Tyler > > > > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate <[hidden email]> > wrote: > >> > >> Hello Tyler, > >> > >> I rephrase my previous mail, as follows: > >> > >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical > >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2, > >> which in the example is dropped from the model. Hence the X3 in T_i > >> must be encoded by dummy variables, as indeed it is. > >> > >> Arie > >> > >> > >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler <[hidden email]> wrote: > >> > Hi Arie, > >> > > >> > The book out of which this behavior is based does not use factor (in > >> > this > >> > section) to refer to categorical factor. I will again point to this > >> > sentence, from page 40, in the same section and referring to the > >> > behavior > >> > under question, that shows F_j is not limited to categorical factors: > >> > "Numeric variables appear in the computations as themselves, uncoded. > >> > Therefore, the rule does not do anything special for them, and it > >> > remains > >> > valid, in a trivial sense, whenever any of the F_j is numeric rather > >> > than > >> > categorical." > >> > > >> > Note the "... whenever any of the F_j is numeric rather than > >> > categorical." > >> > Factor here is used in the more general sense of the word, not > referring > >> > to > >> > the R type "factor." The behavior of R does not match the heuristic > that > >> > it's citing. > >> > > >> > Best regards, > >> > Tyler > >> > > >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <[hidden email]> > >> > wrote: > >> >> > >> >> Hello Tyler, > >> >> > >> >> Thank you for searching for, and finding, the basic description of > the > >> >> behavior of R in this matter. > >> >> > >> >> I think your example is in agreement with the book. > >> >> > >> >> But let me first note the following. You write: "F_j refers to a > >> >> factor (variable) in a model and not a categorical factor". However: > >> >> "a factor is a vector object used to specify a discrete > >> >> classification" (start of chapter 4 of "An Introduction to R".) You > >> >> might also see the description of the R function factor(). > >> >> > >> >> You note that the book says about a factor F_j: > >> >> "... F_j is coded by contrasts if T_{i(j)} has appeared in the > >> >> formula and by dummy variables if it has not" > >> >> > >> >> You find: > >> >> "However, the example I gave demonstrated that this dummy variable > >> >> encoding only occurs for the model where the missing term is the > >> >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." > >> >> > >> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then > >> >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in > T_i > >> >> must be encoded by dummy variables, as indeed it is. > >> >> > >> >> Arie > >> >> > >> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: > >> >> > Hi Arie, > >> >> > > >> >> > Thank you for your further research into the issue. > >> >> > > >> >> > Regarding Stata: On the other hand, JMP gives model matrices that > use > >> >> > the > >> >> > main effects contrasts in computing the higher order interactions, > >> >> > without > >> >> > the dummy variable encoding. I verified this both by analyzing the > >> >> > linear > >> >> > model given in my first example and noting that JMP has one more > >> >> > degree > >> >> > of > >> >> > freedom than R for the same model, as well as looking at the > >> >> > generated > >> >> > model > >> >> > matrices. It's easy to find a design where JMP will allow us fit > our > >> >> > model > >> >> > with goodness-of-fit estimates and R will not due to the extra > >> >> > degree(s) > >> >> > of > >> >> > freedom required. Let's keep the conversation limited to R. > >> >> > > >> >> > I want to refocus back onto my original bug report, which was not > for > >> >> > a > >> >> > missing main effects term, but rather for a missing lower-order > >> >> > interaction > >> >> > term. The behavior of model.matrix.default() for a missing main > >> >> > effects > >> >> > term > >> >> > is a nice example to demonstrate how model.matrix encodes with > dummy > >> >> > variables instead of contrasts, but doesn't demonstrate the > >> >> > inconsistent > >> >> > behavior my bug report highlighted. > >> >> > > >> >> > I went looking for documentation on this behavior, and the issue > >> >> > stems > >> >> > not > >> >> > from model.matrix.default(), but rather the terms() function in > >> >> > interpreting > >> >> > the formula. This "clever" replacement of contrasts by dummy > >> >> > variables > >> >> > to > >> >> > maintain marginality (presuming that's the reason) is not described > >> >> > anywhere > >> >> > in the documentation for either the model.matrix() or the terms() > >> >> > function. > >> >> > In order to find a description for the behavior, I had to look in > the > >> >> > underlying C code, buried above the "TermCode" function of the > >> >> > "model.c" > >> >> > file, which says: > >> >> > > >> >> > "TermCode decides on the encoding of a model term. Returns 1 if > >> >> > variable > >> >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if > >> >> > it > >> >> > is to > >> >> > be encoded by dummy variables. This is decided using the heuristic > >> >> > described in Statistical Models in S, page 38." > >> >> > > >> >> > I do not have a copy of this book, and I suspect most R users do > not > >> >> > as > >> >> > well. Thankfully, however, some of the pages describing this > behavior > >> >> > were > >> >> > available as part of Amazon's "Look Inside" feature--but if not for > >> >> > that, I > >> >> > would have no idea what heuristic R was using. Since those pages > >> >> > could > >> >> > made > >> >> > unavailable by Amazon at any time, at the very least we have an > >> >> > problem > >> >> > with > >> >> > a lack of documentation. > >> >> > > >> >> > However, I still believe there is a bug when comparing R's > >> >> > implementation to > >> >> > the heuristic described in the book. From Statistical Models in S, > >> >> > page > >> >> > 38-39: > >> >> > > >> >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} > denote > >> >> > the > >> >> > margin of T_i for factor F_j--that is, the term obtained by > dropping > >> >> > F_j > >> >> > from T_i. We say that T_{i(j)} has appeared in the formula if there > >> >> > is > >> >> > some > >> >> > term T_i' for i' < i such that T_i' contains all the factors > >> >> > appearing > >> >> > in > >> >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the > >> >> > preceding > >> >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in > the > >> >> > formula and by dummy variables if it has not" > >> >> > > >> >> > Here, F_j refers to a factor (variable) in a model and not a > >> >> > categorical > >> >> > factor, as specified later in that section (page 40): "Numeric > >> >> > variables > >> >> > appear in the computations as themselves, uncoded. Therefore, the > >> >> > rule > >> >> > does > >> >> > not do anything special for them, and it remains valid, in a > trivial > >> >> > sense, > >> >> > whenever any of the F_j is numeric rather than categorical." > >> >> > > >> >> > Going back to my original example with three variables: X1 > (numeric), > >> >> > X2 > >> >> > (numeric), X3 (categorical). This heuristic prescribes encoding > >> >> > X1:X2:X3 > >> >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the > >> >> > formula. > >> >> > When > >> >> > any of the preceding terms do not exist, this heuristic tells us to > >> >> > use > >> >> > dummy variables to encode the interaction (e.g. "F_j [the > interaction > >> >> > term] > >> >> > is coded ... by dummy variables if it [any of the marginal terms > >> >> > obtained by > >> >> > dropping a single factor in the interaction] has not [appeared in > the > >> >> > formula]"). However, the example I gave demonstrated that this > dummy > >> >> > variable encoding only occurs for the model where the missing term > is > >> >> > the > >> >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the > >> >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy > >> >> > variables. > >> >> > This > >> >> > is inconsistent with the description of the intended behavior given > >> >> > in > >> >> > the > >> >> > book. > >> >> > > >> >> > Best regards, > >> >> > Tyler > >> >> > > >> >> > > >> >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate > >> >> > <[hidden email]> > >> >> > wrote: > >> >> >> > >> >> >> Hello Tyler, > >> >> >> > >> >> >> I want to bring to your attention the following document: "What > >> >> >> happens if you omit the main effect in a regression model with an > >> >> >> interaction?" > >> >> >> > >> >> >> > >> >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you- > omit-the-main-effect-in-a-regression-model-with-an-interaction). > >> >> >> This gives a useful review of the problem. Your example is Case > 2: a > >> >> >> continuous and a categorical regressor. > >> >> >> > >> >> >> The numerical examples are coded in Stata, and they give the same > >> >> >> result as in R. Hence, if this is a bug in R then it is also a bug > >> >> >> in > >> >> >> Stata. That seems very unlikely. > >> >> >> > >> >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: > >> >> >> > >> >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) > >> >> >> print("Full model") > >> >> >> print(model.matrix(~(socst+grp)^2 ,data=df)) > >> >> >> print("Example 2.1: drop socst") > >> >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) > >> >> >> print("Example 2.2: drop grp") > >> >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) > >> >> >> > >> >> >> This gives indeed the following regressors: > >> >> >> > >> >> >> "Full model" > >> >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 > >> >> >> "Example 2.1: drop socst" > >> >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 > >> >> >> socst:grp4 > >> >> >> "Example 2.2: drop grp" > >> >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 > >> >> >> > >> >> >> There is a little bit of R documentation about this, based on the > >> >> >> concept of marginality, which typically forbids a model having an > >> >> >> interaction but not the corresponding main effects. (You might see > >> >> >> the > >> >> >> references in https://en.wikipedia.org/wiki/ > Principle_of_marginality > >> >> >> ) > >> >> >> See "An Introduction to R", by Venables and Smith and the R > Core > >> >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the > >> >> >> details are complicated, model formulae in R will normally > generate > >> >> >> the models that an expert statistician would expect, provided that > >> >> >> marginality is preserved. Fitting, for [a contrary] example, a > model > >> >> >> with an interaction but not the corresponding main effects will in > >> >> >> general lead to surprising results ....". > >> >> >> The Reference Manual states that the R functions dropterm() > and > >> >> >> addterm() resp. drop or add only terms such that marginality is > >> >> >> preserved. > >> >> >> > >> >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact > >> >> >> Example 2.1 in Case 2 discussed above. As discussed there, in > Stata > >> >> >> and in R the drop of the continuous variable has no effect on the > >> >> >> degrees of freedom here: it is just a reparameterisation of the > full > >> >> >> model, protecting you against losing marginality... Hence the > >> >> >> model.matrix 'mm' is still square and nonsingular after the drop > of > >> >> >> X1, unless of course when a row is removed from the matrix > 'design' > >> >> >> when before creating 'mm'. > >> >> >> > >> >> >> Arie > >> >> >> > >> >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: > >> >> >> > You could possibly try to explain away the behavior for a > missing > >> >> >> > main > >> >> >> > effects term, since without the main effects term we don't have > >> >> >> > main > >> >> >> > effect > >> >> >> > columns in the model matrix used to compute the interaction > >> >> >> > columns > >> >> >> > (At > >> >> >> > best this is undocumented behavior--I still think it's a bug, as > >> >> >> > we > >> >> >> > know > >> >> >> > how we would encode the categorical factors if they were in fact > >> >> >> > present. > >> >> >> > It's either specified in contrasts.arg or using the default set > in > >> >> >> > options). However, when all the main effects are present, why > >> >> >> > would > >> >> >> > the > >> >> >> > three-factor interaction column not simply be the product of the > >> >> >> > main > >> >> >> > effect columns? In my example: we know X1, we know X2, and we > know > >> >> >> > X3. > >> >> >> > Why > >> >> >> > does the encoding of X1:X2:X3 depend on whether we specified a > >> >> >> > two-factor > >> >> >> > interaction, AND only changes for specific missing interactions? > >> >> >> > > >> >> >> > In addition, I can use a two-term example similar to yours to > show > >> >> >> > how > >> >> >> > this > >> >> >> > behavior results in a singular covariance matrix when, given the > >> >> >> > desired > >> >> >> > factor encoding, it should not be singular. > >> >> >> > > >> >> >> > We start with a full factorial design for a two-level continuous > >> >> >> > factor > >> >> >> > and > >> >> >> > a three-level categorical factor, and remove a single row. This > >> >> >> > design > >> >> >> > matrix does not leave enough degrees of freedom to determine > >> >> >> > goodness-of-fit, but should allow us to obtain parameter > >> >> >> > estimates. > >> >> >> > > >> >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) > >> >> >> >> design = design[-1,] > >> >> >> >> design > >> >> >> > X1 X2 > >> >> >> > 2 -1 A > >> >> >> > 3 1 B > >> >> >> > 4 -1 B > >> >> >> > 5 1 C > >> >> >> > 6 -1 C > >> >> >> > > >> >> >> > Here, we first calculate the model matrix for the full model, > and > >> >> >> > then > >> >> >> > manually remove the X1 column from the model matrix. This gives > us > >> >> >> > the > >> >> >> > model matrix one would expect if X1 were removed from the model. > >> >> >> > We > >> >> >> > then > >> >> >> > successfully calculate the covariance matrix. > >> >> >> > > >> >> >> >> mm = model.matrix(~(X1+X2)^2,data=design) > >> >> >> >> mm > >> >> >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C > >> >> >> > 2 1 -1 0 0 0 0 > >> >> >> > 3 1 1 1 0 1 0 > >> >> >> > 4 1 -1 1 0 -1 0 > >> >> >> > 5 1 1 0 1 0 1 > >> >> >> > 6 1 -1 0 1 0 -1 > >> >> >> > > >> >> >> >> mm = mm[,-2] > >> >> >> >> solve(t(mm) %*% mm) > >> >> >> > (Intercept) X2B X2C X1:X2B X1:X2C > >> >> >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 > >> >> >> > X2B -1 1.5 1.0 0.0 0.0 > >> >> >> > X2C -1 1.0 1.5 0.0 0.0 > >> >> >> > X1:X2B 0 0.0 0.0 0.5 0.0 > >> >> >> > X1:X2C 0 0.0 0.0 0.0 0.5 > >> >> >> > > >> >> >> > Here, we see the actual behavior for model.matrix. The undesired > >> >> >> > re-coding > >> >> >> > of the model matrix interaction term makes the information > matrix > >> >> >> > singular. > >> >> >> > > >> >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) > >> >> >> >> mm > >> >> >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C > >> >> >> > 2 1 0 0 -1 0 0 > >> >> >> > 3 1 1 0 0 1 0 > >> >> >> > 4 1 1 0 0 -1 0 > >> >> >> > 5 1 0 1 0 0 1 > >> >> >> > 6 1 0 1 0 0 -1 > >> >> >> > > >> >> >> >> solve(t(mm) %*% mm) > >> >> >> > Error in solve.default(t(mm) %*% mm) : system is computationally > >> >> >> > singular: > >> >> >> > reciprocal condition number = 5.55112e-18 > >> >> >> > > >> >> >> > I still believe this is a bug. > >> >> >> > > >> >> >> > Best regards, > >> >> >> > Tyler Morgan-Wall > >> >> >> > > >> >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate > >> >> >> > <[hidden email]> > >> >> >> > wrote: > >> >> >> > > >> >> >> >> I think it is not a bug. It is a general property of > >> >> >> >> interactions. > >> >> >> >> This property is best observed if all variables are factors > >> >> >> >> (qualitative). > >> >> >> >> > >> >> >> >> For example, you have three variables (factors). You ask for as > >> >> >> >> many > >> >> >> >> interactions as possible, except an interaction term between > two > >> >> >> >> particular variables. When this interaction is not a constant, > it > >> >> >> >> is > >> >> >> >> different for different values of the remaining variable. More > >> >> >> >> precisely: for all values of that variable. In other words: you > >> >> >> >> have > >> >> >> >> a > >> >> >> >> three-way interaction, with all values of that variable. > >> >> >> >> > >> >> >> >> An even smaller example is the following script with only two > >> >> >> >> variables, each being a factor: > >> >> >> >> > >> >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) > >> >> >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) > >> >> >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) > >> >> >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) > >> >> >> >> > >> >> >> >> The result is: > >> >> >> >> > >> >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C > >> >> >> >> 1 1 0 0 0 0 0 > >> >> >> >> 2 1 1 0 0 0 0 > >> >> >> >> 3 1 0 1 0 0 0 > >> >> >> >> 4 1 1 1 0 1 0 > >> >> >> >> 5 1 0 0 1 0 0 > >> >> >> >> 6 1 1 0 1 0 1 > >> >> >> >> > >> >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C > >> >> >> >> 1 1 0 0 0 0 0 > >> >> >> >> 2 1 0 0 1 0 0 > >> >> >> >> 3 1 1 0 0 0 0 > >> >> >> >> 4 1 1 0 0 1 0 > >> >> >> >> 5 1 0 1 0 0 0 > >> >> >> >> 6 1 0 1 0 0 1 > >> >> >> >> > >> >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C > >> >> >> >> 1 1 0 0 0 0 0 > >> >> >> >> 2 1 1 0 0 0 0 > >> >> >> >> 3 1 0 1 0 0 0 > >> >> >> >> 4 1 1 0 1 0 0 > >> >> >> >> 5 1 0 0 0 1 0 > >> >> >> >> 6 1 1 0 0 0 1 > >> >> >> >> > >> >> >> >> Thus, in the second result, we have no main effect of X1. > >> >> >> >> Instead, > >> >> >> >> the > >> >> >> >> effect of X1 depends on the value of X2; either A or B or C. In > >> >> >> >> fact, > >> >> >> >> this is a two-way interaction, including all three values of > X2. > >> >> >> >> In > >> >> >> >> the third result, we have no main effect of X2, The effect of > X2 > >> >> >> >> depends on the value of X1; either p or q. > >> >> >> >> > >> >> >> >> A complicating element with your example seems to be that your > X1 > >> >> >> >> and > >> >> >> >> X2 are not factors. > >> >> >> >> > >> >> >> >> Arie > >> >> >> >> > >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> > wrote: > >> >> >> >> > Hi, > >> >> >> >> > > >> >> >> >> > I recently ran into an inconsistency in the way > >> >> >> >> > model.matrix.default > >> >> >> >> > handles factor encoding for higher level interactions with > >> >> >> >> > categorical > >> >> >> >> > variables when the full hierarchy of effects is not present. > >> >> >> >> > Depending on > >> >> >> >> > which lower level interactions are specified, the factor > >> >> >> >> > encoding > >> >> >> >> > changes > >> >> >> >> > for a higher level interaction. Consider the following > minimal > >> >> >> >> reproducible > >> >> >> >> > example: > >> >> >> >> > > >> >> >> >> > -------------- > >> >> >> >> > > >> >> >> >> >> runmatrix = > >> >> >> >> >> expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> > >> >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 > >> >> >> >> X3B > >> >> >> >> X3C > >> >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 > 0 > >> >> >> >> > 0 0 > >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 > 0 > >> >> >> >> > 0 0 > >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 > 0 > >> >> >> >> > 0 0 > >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 > 0 > >> >> >> >> > 0 0 > >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 > 0 > >> >> >> >> > 1 0 > >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 1 > 0 > >> >> >> >> > -1 0 > >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 > 0 > >> >> >> >> > -1 0 > >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 > 0 > >> >> >> >> > 1 0 > >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 > 1 > >> >> >> >> > 0 1 > >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 > 1 > >> >> >> >> > 0 -1 > >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 > -1 > >> >> >> >> > 0 -1 > >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > -1 > >> >> >> >> > 0 1 > >> >> >> >> > attr(,"assign") > >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 > >> >> >> >> > attr(,"contrasts") > >> >> >> >> > attr(,"contrasts")$X3 > >> >> >> >> > [1] "contr.treatment" > >> >> >> >> > > >> >> >> >> > -------------- > >> >> >> >> > > >> >> >> >> > Specifying the full hierarchy gives us what we expect: the > >> >> >> >> > interaction > >> >> >> >> > columns are simply calculated from the product of the main > >> >> >> >> > effect > >> >> >> >> columns. > >> >> >> >> > The interaction term X1:X2:X3 gives us two columns in the > model > >> >> >> >> > matrix, > >> >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main > >> >> >> >> > effects. > >> >> >> >> > > >> >> >> >> > If we remove either the X2:X3 interaction or the X1:X3 > >> >> >> >> > interaction, > >> >> >> >> > we > >> >> >> >> get > >> >> >> >> > what we would expect for the X1:X2:X3 interaction, but when > we > >> >> >> >> > remove > >> >> >> >> > the > >> >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes > completely: > >> >> >> >> > > >> >> >> >> > -------------- > >> >> >> >> > > >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) > (Intercept) > >> >> >> >> >> X1 > >> >> >> >> >> X2 > >> >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 > >> >> >> >> > 0 > >> >> >> >> > 6 1 -1 1 1 0 -1 1 0 -1 > >> >> >> >> > 0 > >> >> >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 > >> >> >> >> > 0 > >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > >> >> >> >> > 0 > >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 > >> >> >> >> > 1 > >> >> >> >> > 10 1 -1 1 0 1 -1 0 1 0 > >> >> >> >> > -1 > >> >> >> >> > 11 1 1 -1 0 1 -1 0 -1 0 > >> >> >> >> > -1 > >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > >> >> >> >> > 1 > >> >> >> >> > attr(,"assign") > >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> >> >> > attr(,"contrasts") > >> >> >> >> > attr(,"contrasts")$X3 > >> >> >> >> > [1] "contr.treatment" > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) > (Intercept) > >> >> >> >> >> X1 > >> >> >> >> >> X2 > >> >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C > >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 > >> >> >> >> > 0 > >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 > >> >> >> >> > 0 > >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 > >> >> >> >> > 0 > >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 > >> >> >> >> > 0 > >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > >> >> >> >> > 0 > >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 > >> >> >> >> > 1 > >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 > >> >> >> >> > -1 > >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 > >> >> >> >> > -1 > >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > >> >> >> >> > 1 > >> >> >> >> > attr(,"assign") > >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> >> >> > attr(,"contrasts") > >> >> >> >> > attr(,"contrasts")$X3 > >> >> >> >> > [1] "contr.treatment" > >> >> >> >> > > >> >> >> >> > > >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) > (Intercept) > >> >> >> >> >> X1 > >> >> >> >> >> X2 > >> >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B > X1:X2:X3C > >> >> >> >> > 1 1 1 1 0 0 0 0 0 0 > >> >> >> >> > 1 > >> >> >> >> > 0 0 > >> >> >> >> > 2 1 -1 1 0 0 0 0 0 0 > >> >> >> >> > -1 > >> >> >> >> > 0 0 > >> >> >> >> > 3 1 1 -1 0 0 0 0 0 0 > >> >> >> >> > -1 > >> >> >> >> > 0 0 > >> >> >> >> > 4 1 -1 -1 0 0 0 0 0 0 > >> >> >> >> > 1 > >> >> >> >> > 0 0 > >> >> >> >> > 5 1 1 1 1 0 1 0 1 0 > >> >> >> >> > 0 > >> >> >> >> > 1 0 > >> >> >> >> > 6 1 -1 1 1 0 -1 0 1 0 > >> >> >> >> > 0 > >> >> >> >> > -1 0 > >> >> >> >> > 7 1 1 -1 1 0 1 0 -1 0 > >> >> >> >> > 0 > >> >> >> >> > -1 0 > >> >> >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 > >> >> >> >> > 0 > >> >> >> >> > 1 0 > >> >> >> >> > 9 1 1 1 0 1 0 1 0 1 > >> >> >> >> > 0 > >> >> >> >> > 0 1 > >> >> >> >> > 10 1 -1 1 0 1 0 -1 0 1 > >> >> >> >> > 0 > >> >> >> >> > 0 -1 > >> >> >> >> > 11 1 1 -1 0 1 0 1 0 -1 > >> >> >> >> > 0 > >> >> >> >> > 0 -1 > >> >> >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 > >> >> >> >> > 0 > >> >> >> >> > 0 1 > >> >> >> >> > attr(,"assign") > >> >> >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 > >> >> >> >> > attr(,"contrasts") > >> >> >> >> > attr(,"contrasts")$X3 > >> >> >> >> > [1] "contr.treatment" > >> >> >> >> > > >> >> >> >> > -------------- > >> >> >> >> > > >> >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is > >> >> >> >> > now > >> >> >> >> > the > >> >> >> >> > interaction of X1 and X2 with a new encoding for X3 where > each > >> >> >> >> > factor > >> >> >> >> level > >> >> >> >> > is represented by its own column. I would expect, given the > two > >> >> >> >> > column > >> >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column > would > >> >> >> >> > also > >> >> >> >> > be > >> >> >> >> two > >> >> >> >> > columns regardless of what two-factor interactions we also > >> >> >> >> > specified, > >> >> >> >> > but > >> >> >> >> > in this case it switches to three. If other two factor > >> >> >> >> > interactions > >> >> >> >> > are > >> >> >> >> > missing in addition to X1:X2, this issue still occurs. This > >> >> >> >> > also > >> >> >> >> > happens > >> >> >> >> > regardless of the contrast specified in contrasts.arg for > X3. I > >> >> >> >> > don't > >> >> >> >> > see > >> >> >> >> > any reasoning for this behavior given in the documentation, > so > >> >> >> >> > I > >> >> >> >> > suspect > >> >> >> >> it > >> >> >> >> > is a bug. > >> >> >> >> > > >> >> >> >> > Best regards, > >> >> >> >> > Tyler Morgan-Wall > >> >> >> >> > > >> >> >> >> > [[alternative HTML version deleted]] > >> >> >> >> > > >> >> >> >> > ______________________________________________ > > > > > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hello Tyler,
model.matrix(~(X1+X2+X3)^3-X1:X3) T_i = X1:X2:X3. Let F_j = X3. (The numerical variables X1 and X2 are not encoded at all. Then, again, T_{i(j)} = X1:X2, which in this example is NOT dropped from the model. Hence the X3 in T_i must be encoded by contrast, as indeed it is. Arie On Mon, Nov 6, 2017 at 5:09 PM, Tyler <[hidden email]> wrote: > Hi Arie, > > Given the heuristic, in all of my examples with a missing two-factor > interaction the three-factor interaction should be coded with dummy > variables. In reality, it is encoded by dummy variables only when the > numeric:numeric interaction is missing, and by contrasts for the other two. > The heuristic does not specify separate behavior for numeric vs categorical > factors (When the author of Statistical Models in S refers to F_j as a > "factor", it is a more general usage than the R type "factor" and includes > numeric variables--the language used later on in the chapter on page 40 > confirms this): when there is a missing marginal term in the formula, the > higher-order interaction should be coded by dummy variables, regardless of > type. Thus, the terms() function is only following the cited behavior 1/3rd > of the time. > > Best regards, > Tyler > > On Mon, Nov 6, 2017 at 6:45 AM, Arie ten Cate <[hidden email]> wrote: >> >> Hello Tyler, >> >> You write that you understand what I am saying. However, I am now at >> loss about what exactly is the problem with the behavior of R. Here >> is a script which reproduces your experiments with three variables >> (excluding the full model): >> >> m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C")) >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=m) >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=m) >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=m) >> >> Below are the three results, similar to your first mail. (The first >> two are basically the same, of course.) Please pick one result which >> you think is not consistent with the heuristic and please give what >> you think is the correct result: >> >> model.matrix(~(X1+X2+X3)^3-X1:X3) >> (Intercept) >> X1 X2 X3B X3C >> X1:X2 X2:X3B X2:X3C >> X1:X2:X3B X1:X2:X3C >> >> model.matrix(~(X1+X2+X3)^3-X2:X3) >> (Intercept) >> X1 X2 X3B X3C >> X1:X2 X1:X3B X1:X3C >> X1:X2:X3B X1:X2:X3C >> >> model.matrix(~(X1+X2+X3)^3-X1:X2) >> (Intercept) >> X1 X2 X3B X3C >> X1:X3B X1:X3C X2:X3B X2:X3C >> X1:X2:X3A X1:X2:X3B X1:X2:X3C >> >> (I take it that the combination of X3A and X3B and X3C implies dummy >> encoding, and the combination of only X3B and X3C implies contrasts >> encoding, with respect to X3A.) >> >> Thanks in advance, >> >> Arie >> >> >> On Sat, Nov 4, 2017 at 5:33 PM, Tyler <[hidden email]> wrote: >> > Hi Arie, >> > >> > I understand what you're saying. The following excerpt out of the book >> > shows >> > that F_j does not refer exclusively to categorical factors: "...the rule >> > does not do anything special for them, and it remains valid, in a >> > trivial >> > sense, whenever any of the F_j is numeric rather than categorical." >> > Since >> > F_j refers to both categorical and numeric variables, the behavior of >> > model.matrix is not consistent with the heuristic. >> > >> > Best regards, >> > Tyler >> > >> > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate <[hidden email]> >> > wrote: >> >> >> >> Hello Tyler, >> >> >> >> I rephrase my previous mail, as follows: >> >> >> >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical >> >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2, >> >> which in the example is dropped from the model. Hence the X3 in T_i >> >> must be encoded by dummy variables, as indeed it is. >> >> >> >> Arie >> >> >> >> >> >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler <[hidden email]> wrote: >> >> > Hi Arie, >> >> > >> >> > The book out of which this behavior is based does not use factor (in >> >> > this >> >> > section) to refer to categorical factor. I will again point to this >> >> > sentence, from page 40, in the same section and referring to the >> >> > behavior >> >> > under question, that shows F_j is not limited to categorical factors: >> >> > "Numeric variables appear in the computations as themselves, uncoded. >> >> > Therefore, the rule does not do anything special for them, and it >> >> > remains >> >> > valid, in a trivial sense, whenever any of the F_j is numeric rather >> >> > than >> >> > categorical." >> >> > >> >> > Note the "... whenever any of the F_j is numeric rather than >> >> > categorical." >> >> > Factor here is used in the more general sense of the word, not >> >> > referring >> >> > to >> >> > the R type "factor." The behavior of R does not match the heuristic >> >> > that >> >> > it's citing. >> >> > >> >> > Best regards, >> >> > Tyler >> >> > >> >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <[hidden email]> >> >> > wrote: >> >> >> >> >> >> Hello Tyler, >> >> >> >> >> >> Thank you for searching for, and finding, the basic description of >> >> >> the >> >> >> behavior of R in this matter. >> >> >> >> >> >> I think your example is in agreement with the book. >> >> >> >> >> >> But let me first note the following. You write: "F_j refers to a >> >> >> factor (variable) in a model and not a categorical factor". However: >> >> >> "a factor is a vector object used to specify a discrete >> >> >> classification" (start of chapter 4 of "An Introduction to R".) You >> >> >> might also see the description of the R function factor(). >> >> >> >> >> >> You note that the book says about a factor F_j: >> >> >> "... F_j is coded by contrasts if T_{i(j)} has appeared in the >> >> >> formula and by dummy variables if it has not" >> >> >> >> >> >> You find: >> >> >> "However, the example I gave demonstrated that this dummy >> >> >> variable >> >> >> encoding only occurs for the model where the missing term is the >> >> >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." >> >> >> >> >> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then >> >> >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in >> >> >> T_i >> >> >> must be encoded by dummy variables, as indeed it is. >> >> >> >> >> >> Arie >> >> >> >> >> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler <[hidden email]> wrote: >> >> >> > Hi Arie, >> >> >> > >> >> >> > Thank you for your further research into the issue. >> >> >> > >> >> >> > Regarding Stata: On the other hand, JMP gives model matrices that >> >> >> > use >> >> >> > the >> >> >> > main effects contrasts in computing the higher order interactions, >> >> >> > without >> >> >> > the dummy variable encoding. I verified this both by analyzing the >> >> >> > linear >> >> >> > model given in my first example and noting that JMP has one more >> >> >> > degree >> >> >> > of >> >> >> > freedom than R for the same model, as well as looking at the >> >> >> > generated >> >> >> > model >> >> >> > matrices. It's easy to find a design where JMP will allow us fit >> >> >> > our >> >> >> > model >> >> >> > with goodness-of-fit estimates and R will not due to the extra >> >> >> > degree(s) >> >> >> > of >> >> >> > freedom required. Let's keep the conversation limited to R. >> >> >> > >> >> >> > I want to refocus back onto my original bug report, which was not >> >> >> > for >> >> >> > a >> >> >> > missing main effects term, but rather for a missing lower-order >> >> >> > interaction >> >> >> > term. The behavior of model.matrix.default() for a missing main >> >> >> > effects >> >> >> > term >> >> >> > is a nice example to demonstrate how model.matrix encodes with >> >> >> > dummy >> >> >> > variables instead of contrasts, but doesn't demonstrate the >> >> >> > inconsistent >> >> >> > behavior my bug report highlighted. >> >> >> > >> >> >> > I went looking for documentation on this behavior, and the issue >> >> >> > stems >> >> >> > not >> >> >> > from model.matrix.default(), but rather the terms() function in >> >> >> > interpreting >> >> >> > the formula. This "clever" replacement of contrasts by dummy >> >> >> > variables >> >> >> > to >> >> >> > maintain marginality (presuming that's the reason) is not >> >> >> > described >> >> >> > anywhere >> >> >> > in the documentation for either the model.matrix() or the terms() >> >> >> > function. >> >> >> > In order to find a description for the behavior, I had to look in >> >> >> > the >> >> >> > underlying C code, buried above the "TermCode" function of the >> >> >> > "model.c" >> >> >> > file, which says: >> >> >> > >> >> >> > "TermCode decides on the encoding of a model term. Returns 1 if >> >> >> > variable >> >> >> > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 >> >> >> > if >> >> >> > it >> >> >> > is to >> >> >> > be encoded by dummy variables. This is decided using the >> >> >> > heuristic >> >> >> > described in Statistical Models in S, page 38." >> >> >> > >> >> >> > I do not have a copy of this book, and I suspect most R users do >> >> >> > not >> >> >> > as >> >> >> > well. Thankfully, however, some of the pages describing this >> >> >> > behavior >> >> >> > were >> >> >> > available as part of Amazon's "Look Inside" feature--but if not >> >> >> > for >> >> >> > that, I >> >> >> > would have no idea what heuristic R was using. Since those pages >> >> >> > could >> >> >> > made >> >> >> > unavailable by Amazon at any time, at the very least we have an >> >> >> > problem >> >> >> > with >> >> >> > a lack of documentation. >> >> >> > >> >> >> > However, I still believe there is a bug when comparing R's >> >> >> > implementation to >> >> >> > the heuristic described in the book. From Statistical Models in S, >> >> >> > page >> >> >> > 38-39: >> >> >> > >> >> >> > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} >> >> >> > denote >> >> >> > the >> >> >> > margin of T_i for factor F_j--that is, the term obtained by >> >> >> > dropping >> >> >> > F_j >> >> >> > from T_i. We say that T_{i(j)} has appeared in the formula if >> >> >> > there >> >> >> > is >> >> >> > some >> >> >> > term T_i' for i' < i such that T_i' contains all the factors >> >> >> > appearing >> >> >> > in >> >> >> > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the >> >> >> > preceding >> >> >> > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in >> >> >> > the >> >> >> > formula and by dummy variables if it has not" >> >> >> > >> >> >> > Here, F_j refers to a factor (variable) in a model and not a >> >> >> > categorical >> >> >> > factor, as specified later in that section (page 40): "Numeric >> >> >> > variables >> >> >> > appear in the computations as themselves, uncoded. Therefore, the >> >> >> > rule >> >> >> > does >> >> >> > not do anything special for them, and it remains valid, in a >> >> >> > trivial >> >> >> > sense, >> >> >> > whenever any of the F_j is numeric rather than categorical." >> >> >> > >> >> >> > Going back to my original example with three variables: X1 >> >> >> > (numeric), >> >> >> > X2 >> >> >> > (numeric), X3 (categorical). This heuristic prescribes encoding >> >> >> > X1:X2:X3 >> >> >> > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the >> >> >> > formula. >> >> >> > When >> >> >> > any of the preceding terms do not exist, this heuristic tells us >> >> >> > to >> >> >> > use >> >> >> > dummy variables to encode the interaction (e.g. "F_j [the >> >> >> > interaction >> >> >> > term] >> >> >> > is coded ... by dummy variables if it [any of the marginal terms >> >> >> > obtained by >> >> >> > dropping a single factor in the interaction] has not [appeared in >> >> >> > the >> >> >> > formula]"). However, the example I gave demonstrated that this >> >> >> > dummy >> >> >> > variable encoding only occurs for the model where the missing term >> >> >> > is >> >> >> > the >> >> >> > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the >> >> >> > interaction term X1:X2:X3 is encoded by contrasts, not dummy >> >> >> > variables. >> >> >> > This >> >> >> > is inconsistent with the description of the intended behavior >> >> >> > given >> >> >> > in >> >> >> > the >> >> >> > book. >> >> >> > >> >> >> > Best regards, >> >> >> > Tyler >> >> >> > >> >> >> > >> >> >> > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate >> >> >> > <[hidden email]> >> >> >> > wrote: >> >> >> >> >> >> >> >> Hello Tyler, >> >> >> >> >> >> >> >> I want to bring to your attention the following document: "What >> >> >> >> happens if you omit the main effect in a regression model with an >> >> >> >> interaction?" >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you-omit-the-main-effect-in-a-regression-model-with-an-interaction). >> >> >> >> This gives a useful review of the problem. Your example is Case >> >> >> >> 2: a >> >> >> >> continuous and a categorical regressor. >> >> >> >> >> >> >> >> The numerical examples are coded in Stata, and they give the same >> >> >> >> result as in R. Hence, if this is a bug in R then it is also a >> >> >> >> bug >> >> >> >> in >> >> >> >> Stata. That seems very unlikely. >> >> >> >> >> >> >> >> Here is a simulation in R of the above mentioned Case 2 in Stata: >> >> >> >> >> >> >> >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) >> >> >> >> print("Full model") >> >> >> >> print(model.matrix(~(socst+grp)^2 ,data=df)) >> >> >> >> print("Example 2.1: drop socst") >> >> >> >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) >> >> >> >> print("Example 2.2: drop grp") >> >> >> >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) >> >> >> >> >> >> >> >> This gives indeed the following regressors: >> >> >> >> >> >> >> >> "Full model" >> >> >> >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 >> >> >> >> "Example 2.1: drop socst" >> >> >> >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 >> >> >> >> socst:grp4 >> >> >> >> "Example 2.2: drop grp" >> >> >> >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 >> >> >> >> >> >> >> >> There is a little bit of R documentation about this, based on the >> >> >> >> concept of marginality, which typically forbids a model having an >> >> >> >> interaction but not the corresponding main effects. (You might >> >> >> >> see >> >> >> >> the >> >> >> >> references in >> >> >> >> https://en.wikipedia.org/wiki/Principle_of_marginality >> >> >> >> ) >> >> >> >> See "An Introduction to R", by Venables and Smith and the R >> >> >> >> Core >> >> >> >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the >> >> >> >> details are complicated, model formulae in R will normally >> >> >> >> generate >> >> >> >> the models that an expert statistician would expect, provided >> >> >> >> that >> >> >> >> marginality is preserved. Fitting, for [a contrary] example, a >> >> >> >> model >> >> >> >> with an interaction but not the corresponding main effects will >> >> >> >> in >> >> >> >> general lead to surprising results ....". >> >> >> >> The Reference Manual states that the R functions dropterm() >> >> >> >> and >> >> >> >> addterm() resp. drop or add only terms such that marginality is >> >> >> >> preserved. >> >> >> >> >> >> >> >> Finally, about your singular matrix t(mm)%*%mm. This is in fact >> >> >> >> Example 2.1 in Case 2 discussed above. As discussed there, in >> >> >> >> Stata >> >> >> >> and in R the drop of the continuous variable has no effect on the >> >> >> >> degrees of freedom here: it is just a reparameterisation of the >> >> >> >> full >> >> >> >> model, protecting you against losing marginality... Hence the >> >> >> >> model.matrix 'mm' is still square and nonsingular after the drop >> >> >> >> of >> >> >> >> X1, unless of course when a row is removed from the matrix >> >> >> >> 'design' >> >> >> >> when before creating 'mm'. >> >> >> >> >> >> >> >> Arie >> >> >> >> >> >> >> >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <[hidden email]> wrote: >> >> >> >> > You could possibly try to explain away the behavior for a >> >> >> >> > missing >> >> >> >> > main >> >> >> >> > effects term, since without the main effects term we don't have >> >> >> >> > main >> >> >> >> > effect >> >> >> >> > columns in the model matrix used to compute the interaction >> >> >> >> > columns >> >> >> >> > (At >> >> >> >> > best this is undocumented behavior--I still think it's a bug, >> >> >> >> > as >> >> >> >> > we >> >> >> >> > know >> >> >> >> > how we would encode the categorical factors if they were in >> >> >> >> > fact >> >> >> >> > present. >> >> >> >> > It's either specified in contrasts.arg or using the default set >> >> >> >> > in >> >> >> >> > options). However, when all the main effects are present, why >> >> >> >> > would >> >> >> >> > the >> >> >> >> > three-factor interaction column not simply be the product of >> >> >> >> > the >> >> >> >> > main >> >> >> >> > effect columns? In my example: we know X1, we know X2, and we >> >> >> >> > know >> >> >> >> > X3. >> >> >> >> > Why >> >> >> >> > does the encoding of X1:X2:X3 depend on whether we specified a >> >> >> >> > two-factor >> >> >> >> > interaction, AND only changes for specific missing >> >> >> >> > interactions? >> >> >> >> > >> >> >> >> > In addition, I can use a two-term example similar to yours to >> >> >> >> > show >> >> >> >> > how >> >> >> >> > this >> >> >> >> > behavior results in a singular covariance matrix when, given >> >> >> >> > the >> >> >> >> > desired >> >> >> >> > factor encoding, it should not be singular. >> >> >> >> > >> >> >> >> > We start with a full factorial design for a two-level >> >> >> >> > continuous >> >> >> >> > factor >> >> >> >> > and >> >> >> >> > a three-level categorical factor, and remove a single row. This >> >> >> >> > design >> >> >> >> > matrix does not leave enough degrees of freedom to determine >> >> >> >> > goodness-of-fit, but should allow us to obtain parameter >> >> >> >> > estimates. >> >> >> >> > >> >> >> >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) >> >> >> >> >> design = design[-1,] >> >> >> >> >> design >> >> >> >> > X1 X2 >> >> >> >> > 2 -1 A >> >> >> >> > 3 1 B >> >> >> >> > 4 -1 B >> >> >> >> > 5 1 C >> >> >> >> > 6 -1 C >> >> >> >> > >> >> >> >> > Here, we first calculate the model matrix for the full model, >> >> >> >> > and >> >> >> >> > then >> >> >> >> > manually remove the X1 column from the model matrix. This gives >> >> >> >> > us >> >> >> >> > the >> >> >> >> > model matrix one would expect if X1 were removed from the >> >> >> >> > model. >> >> >> >> > We >> >> >> >> > then >> >> >> >> > successfully calculate the covariance matrix. >> >> >> >> > >> >> >> >> >> mm = model.matrix(~(X1+X2)^2,data=design) >> >> >> >> >> mm >> >> >> >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C >> >> >> >> > 2 1 -1 0 0 0 0 >> >> >> >> > 3 1 1 1 0 1 0 >> >> >> >> > 4 1 -1 1 0 -1 0 >> >> >> >> > 5 1 1 0 1 0 1 >> >> >> >> > 6 1 -1 0 1 0 -1 >> >> >> >> > >> >> >> >> >> mm = mm[,-2] >> >> >> >> >> solve(t(mm) %*% mm) >> >> >> >> > (Intercept) X2B X2C X1:X2B X1:X2C >> >> >> >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 >> >> >> >> > X2B -1 1.5 1.0 0.0 0.0 >> >> >> >> > X2C -1 1.0 1.5 0.0 0.0 >> >> >> >> > X1:X2B 0 0.0 0.0 0.5 0.0 >> >> >> >> > X1:X2C 0 0.0 0.0 0.0 0.5 >> >> >> >> > >> >> >> >> > Here, we see the actual behavior for model.matrix. The >> >> >> >> > undesired >> >> >> >> > re-coding >> >> >> >> > of the model matrix interaction term makes the information >> >> >> >> > matrix >> >> >> >> > singular. >> >> >> >> > >> >> >> >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) >> >> >> >> >> mm >> >> >> >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C >> >> >> >> > 2 1 0 0 -1 0 0 >> >> >> >> > 3 1 1 0 0 1 0 >> >> >> >> > 4 1 1 0 0 -1 0 >> >> >> >> > 5 1 0 1 0 0 1 >> >> >> >> > 6 1 0 1 0 0 -1 >> >> >> >> > >> >> >> >> >> solve(t(mm) %*% mm) >> >> >> >> > Error in solve.default(t(mm) %*% mm) : system is >> >> >> >> > computationally >> >> >> >> > singular: >> >> >> >> > reciprocal condition number = 5.55112e-18 >> >> >> >> > >> >> >> >> > I still believe this is a bug. >> >> >> >> > >> >> >> >> > Best regards, >> >> >> >> > Tyler Morgan-Wall >> >> >> >> > >> >> >> >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate >> >> >> >> > <[hidden email]> >> >> >> >> > wrote: >> >> >> >> > >> >> >> >> >> I think it is not a bug. It is a general property of >> >> >> >> >> interactions. >> >> >> >> >> This property is best observed if all variables are factors >> >> >> >> >> (qualitative). >> >> >> >> >> >> >> >> >> >> For example, you have three variables (factors). You ask for >> >> >> >> >> as >> >> >> >> >> many >> >> >> >> >> interactions as possible, except an interaction term between >> >> >> >> >> two >> >> >> >> >> particular variables. When this interaction is not a constant, >> >> >> >> >> it >> >> >> >> >> is >> >> >> >> >> different for different values of the remaining variable. More >> >> >> >> >> precisely: for all values of that variable. In other words: >> >> >> >> >> you >> >> >> >> >> have >> >> >> >> >> a >> >> >> >> >> three-way interaction, with all values of that variable. >> >> >> >> >> >> >> >> >> >> An even smaller example is the following script with only two >> >> >> >> >> variables, each being a factor: >> >> >> >> >> >> >> >> >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) >> >> >> >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) >> >> >> >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) >> >> >> >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) >> >> >> >> >> >> >> >> >> >> The result is: >> >> >> >> >> >> >> >> >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C >> >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> >> 2 1 1 0 0 0 0 >> >> >> >> >> 3 1 0 1 0 0 0 >> >> >> >> >> 4 1 1 1 0 1 0 >> >> >> >> >> 5 1 0 0 1 0 0 >> >> >> >> >> 6 1 1 0 1 0 1 >> >> >> >> >> >> >> >> >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C >> >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> >> 2 1 0 0 1 0 0 >> >> >> >> >> 3 1 1 0 0 0 0 >> >> >> >> >> 4 1 1 0 0 1 0 >> >> >> >> >> 5 1 0 1 0 0 0 >> >> >> >> >> 6 1 0 1 0 0 1 >> >> >> >> >> >> >> >> >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C >> >> >> >> >> 1 1 0 0 0 0 0 >> >> >> >> >> 2 1 1 0 0 0 0 >> >> >> >> >> 3 1 0 1 0 0 0 >> >> >> >> >> 4 1 1 0 1 0 0 >> >> >> >> >> 5 1 0 0 0 1 0 >> >> >> >> >> 6 1 1 0 0 0 1 >> >> >> >> >> >> >> >> >> >> Thus, in the second result, we have no main effect of X1. >> >> >> >> >> Instead, >> >> >> >> >> the >> >> >> >> >> effect of X1 depends on the value of X2; either A or B or C. >> >> >> >> >> In >> >> >> >> >> fact, >> >> >> >> >> this is a two-way interaction, including all three values of >> >> >> >> >> X2. >> >> >> >> >> In >> >> >> >> >> the third result, we have no main effect of X2, The effect of >> >> >> >> >> X2 >> >> >> >> >> depends on the value of X1; either p or q. >> >> >> >> >> >> >> >> >> >> A complicating element with your example seems to be that your >> >> >> >> >> X1 >> >> >> >> >> and >> >> >> >> >> X2 are not factors. >> >> >> >> >> >> >> >> >> >> Arie >> >> >> >> >> >> >> >> >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <[hidden email]> >> >> >> >> >> wrote: >> >> >> >> >> > Hi, >> >> >> >> >> > >> >> >> >> >> > I recently ran into an inconsistency in the way >> >> >> >> >> > model.matrix.default >> >> >> >> >> > handles factor encoding for higher level interactions with >> >> >> >> >> > categorical >> >> >> >> >> > variables when the full hierarchy of effects is not present. >> >> >> >> >> > Depending on >> >> >> >> >> > which lower level interactions are specified, the factor >> >> >> >> >> > encoding >> >> >> >> >> > changes >> >> >> >> >> > for a higher level interaction. Consider the following >> >> >> >> >> > minimal >> >> >> >> >> reproducible >> >> >> >> >> > example: >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> >> runmatrix = >> >> >> >> >> >> expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 >> >> >> >> >> X3B >> >> >> >> >> X3C >> >> >> >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 0 0 >> >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 1 >> >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> > Specifying the full hierarchy gives us what we expect: the >> >> >> >> >> > interaction >> >> >> >> >> > columns are simply calculated from the product of the main >> >> >> >> >> > effect >> >> >> >> >> columns. >> >> >> >> >> > The interaction term X1:X2:X3 gives us two columns in the >> >> >> >> >> > model >> >> >> >> >> > matrix, >> >> >> >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main >> >> >> >> >> > effects. >> >> >> >> >> > >> >> >> >> >> > If we remove either the X2:X3 interaction or the X1:X3 >> >> >> >> >> > interaction, >> >> >> >> >> > we >> >> >> >> >> get >> >> >> >> >> > what we would expect for the X1:X2:X3 interaction, but when >> >> >> >> >> > we >> >> >> >> >> > remove >> >> >> >> >> > the >> >> >> >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes >> >> >> >> >> > completely: >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) >> >> >> >> >> >> (Intercept) >> >> >> >> >> >> X1 >> >> >> >> >> >> X2 >> >> >> >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 10 1 -1 1 0 1 -1 0 1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 11 1 1 -1 0 1 -1 0 -1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> >> > 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) >> >> >> >> >> >> (Intercept) >> >> >> >> >> >> X1 >> >> >> >> >> >> X2 >> >> >> >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 2 1 -1 1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 3 1 1 -1 0 0 -1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 4 1 -1 -1 0 0 1 0 0 0 >> >> >> >> >> > 0 >> >> >> >> >> > 5 1 1 1 1 0 1 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 7 1 1 -1 1 0 -1 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 8 1 -1 -1 1 0 1 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 9 1 1 1 0 1 1 0 1 0 >> >> >> >> >> > 1 >> >> >> >> >> > 10 1 -1 1 0 1 -1 0 -1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 11 1 1 -1 0 1 -1 0 1 0 >> >> >> >> >> > -1 >> >> >> >> >> > 12 1 -1 -1 0 1 1 0 -1 0 >> >> >> >> >> > 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 5 5 6 6 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) >> >> >> >> >> >> (Intercept) >> >> >> >> >> >> X1 >> >> >> >> >> >> X2 >> >> >> >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B >> >> >> >> >> X1:X2:X3C >> >> >> >> >> > 1 1 1 1 0 0 0 0 0 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 2 1 -1 1 0 0 0 0 0 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 3 1 1 -1 0 0 0 0 0 0 >> >> >> >> >> > -1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 4 1 -1 -1 0 0 0 0 0 0 >> >> >> >> >> > 1 >> >> >> >> >> > 0 0 >> >> >> >> >> > 5 1 1 1 1 0 1 0 1 0 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 6 1 -1 1 1 0 -1 0 1 0 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 7 1 1 -1 1 0 1 0 -1 0 >> >> >> >> >> > 0 >> >> >> >> >> > -1 0 >> >> >> >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 >> >> >> >> >> > 0 >> >> >> >> >> > 1 0 >> >> >> >> >> > 9 1 1 1 0 1 0 1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 1 >> >> >> >> >> > 10 1 -1 1 0 1 0 -1 0 1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 11 1 1 -1 0 1 0 1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 -1 >> >> >> >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 >> >> >> >> >> > 0 >> >> >> >> >> > 0 1 >> >> >> >> >> > attr(,"assign") >> >> >> >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 >> >> >> >> >> > attr(,"contrasts") >> >> >> >> >> > attr(,"contrasts")$X3 >> >> >> >> >> > [1] "contr.treatment" >> >> >> >> >> > >> >> >> >> >> > -------------- >> >> >> >> >> > >> >> >> >> >> > Here, we now see the encoding for the interaction X1:X2:X3 >> >> >> >> >> > is >> >> >> >> >> > now >> >> >> >> >> > the >> >> >> >> >> > interaction of X1 and X2 with a new encoding for X3 where >> >> >> >> >> > each >> >> >> >> >> > factor >> >> >> >> >> level >> >> >> >> >> > is represented by its own column. I would expect, given the >> >> >> >> >> > two >> >> >> >> >> > column >> >> >> >> >> > dummy variable encoding for X3, that the X1:X2:X3 column >> >> >> >> >> > would >> >> >> >> >> > also >> >> >> >> >> > be >> >> >> >> >> two >> >> >> >> >> > columns regardless of what two-factor interactions we also >> >> >> >> >> > specified, >> >> >> >> >> > but >> >> >> >> >> > in this case it switches to three. If other two factor >> >> >> >> >> > interactions >> >> >> >> >> > are >> >> >> >> >> > missing in addition to X1:X2, this issue still occurs. This >> >> >> >> >> > also >> >> >> >> >> > happens >> >> >> >> >> > regardless of the contrast specified in contrasts.arg for >> >> >> >> >> > X3. I >> >> >> >> >> > don't >> >> >> >> >> > see >> >> >> >> >> > any reasoning for this behavior given in the documentation, >> >> >> >> >> > so >> >> >> >> >> > I >> >> >> >> >> > suspect >> >> >> >> >> it >> >> >> >> >> > is a bug. >> >> >> >> >> > >> >> >> >> >> > Best regards, >> >> >> >> >> > Tyler Morgan-Wall >> >> >> >> >> > >> >> >> >> >> > [[alternative HTML version deleted]] >> >> >> >> >> > >> >> >> >> >> > ______________________________________________ >> > >> > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |