Hadley,

The S language modeling language was designed with Wilkinson and

Rogers in mind. The notation was changed from their paper to

retain consistency with the parsing rules for ordinary algebra in

S. I think of ":" as an indicator of an indexing system into the

dummy variables. It is not an indicator of degrees of freedom.

For simplicity in notation, let A be a factor with a levels and B

be a factor with b levels. Then A:B implies a set of dummy

variables with at most ab columns indexed by an A level and a B

level. The degrees of freedom associated with A:B depends on the

linear dependencies of the associated dummy variables with the

dummy variables of other terms in the model. The excess columns

can be suppressed when the dummy variables are generated or they

can be pivoted out during the analysis. When we have the special

case A:A, there is only one factor mentioned, so the indexing

scheme is based on just the one factor. You could generate the

full set of a^2 columns, and then you would discover that they

are all linearly dependent on the first a.

The columns can be labeled either

a1b1 a1b2 a1b3 a2b1 a2b2 a2b3

or

a1b1 a2b1 a1b2 a2b2 a1b3 a2b3

If there is crossing, we would report the a single sum of squares

and degrees of freedom for the interaction. If there is nesting,

say a/b , then it might make sense to group the dummy variables

say (a1b1 a1b2 a1b3) and (a2b1 a2b2 a2b3) and report simple

effects sum of squares and degrees of freedom for each of the

groups.

The structure of the individual columns depends on the set of

contrasts used for the A and B factors.

Rich

[[alternative HTML version deleted]]

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.