# why NA coefficients

12 messages
Open this post in threaded view
|
Report Content as Inappropriate

## why NA coefficients

 Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: > test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) > lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients:                   (Intercept)                 factor(treat)2                 factor(treat)3                       0.429244                       0.499982                       0.352971                 factor(treat)4                 factor(treat)5                 factor(treat)6                      -0.204752                       0.142042                       0.044155                 factor(treat)7                 factor(group)2  factor(treat)2:factor(group)2                      -0.007775                      -0.337907                      -0.208734  factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  factor(treat)5:factor(group)2                      -0.195138                       0.800029                       0.227514  factor(treat)6:factor(group)2  factor(treat)7:factor(group)2                       0.331548                             NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Thanks John______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. test.txt (1K) Download Attachment
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 On Nov 7, 2011, at 7:33 PM, array chip wrote: > Hi, I am trying to run ANOVA with an interaction term on 2 factors   > (treat has 7 levels, group has 2 levels). I found the coefficient   > for the last interaction term is always 0, see attached dataset and   > the code below: > >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >> lm(y~factor(treat)*factor(group),test) > > Call: > lm(formula = y ~ factor(treat) * factor(group), data = test) > > Coefficients: >                   (Intercept)                   > factor(treat)2                 factor(treat)3 >                      0.429244                         > 0.499982                       0.352971 >                factor(treat)4                   > factor(treat)5                 factor(treat)6 >                     -0.204752                         > 0.142042                       0.044155 >                factor(treat)7                 factor(group)2   > factor(treat)2:factor(group)2 >                     -0.007775                       > -0.337907                      -0.208734 > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   > factor(treat)5:factor(group)2 >                     -0.195138                         > 0.800029                       0.227514 > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >                      0.331548                             NA > > > I guess this is due to model matrix being singular or collinearity   > among the matrix columns? But I can't figure out how the matrix is   > singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 Thanks David. The only category that has no cases is "treat 1-group 2": > with(test,table(treat,group))      group treat 1 2     1 8 0     2 1 5     3 5 5     4 7 3     5 7 4     6 3 3     7 8 2 But why the coefficient for "treat 7-group 2" is not estimable?   Thanks John ________________________________ From: David Winsemius <[hidden email]> Cc: "[hidden email]" <[hidden email]> Sent: Monday, November 7, 2011 5:13 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 7:33 PM, array chip wrote: > Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: > >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >> lm(y~factor(treat)*factor(group),test) > > Call: > lm(formula = y ~ factor(treat) * factor(group), data = test) > > Coefficients: >                   (Intercept)                 factor(treat)2                 factor(treat)3 >                      0.429244                       0.499982                       0.352971 >                factor(treat)4                 factor(treat)5                 factor(treat)6 >                     -0.204752                       0.142042                       0.044155 >                factor(treat)7                 factor(group)2  factor(treat)2:factor(group)2 >                     -0.007775                      -0.337907                      -0.208734 > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  factor(treat)5:factor(group)2 >                     -0.195138                       0.800029                       0.227514 > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >                      0.331548                             NA > > > I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. --David Winsemius, MD West Hartford, CT         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 Hi Dennis, The cell mean mu_12 from the model involves the intercept and factor 2: Coefficients:                   (Intercept)                 factor(treat)2                 factor(treat)3                       0.429244                       0.499982                       0.352971                 factor(treat)4                 factor(treat)5                 factor(treat)6                      -0.204752                       0.142042                       0.044155                 factor(treat)7                 factor(group)2  factor(treat)2:factor(group)2                      -0.007775                      -0.337907                      -0.208734  factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  factor(treat)5:factor(group)2                      -0.195138                       0.800029                       0.227514  factor(treat)6:factor(group)2  factor(treat)7:factor(group)2                       0.331548                             NA So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by: > predict(fit,data.frame(list(treat=1,group=2)))          1 0.09133691 Warning message: In predict.lm(fit, data.frame(list(treat = 1, group = 2))) :   prediction from a rank-deficient fit may be misleading But as you can see, it gave a warning about rank-deficient fit... why this is a rank-deficient fit? Because "treat 1_group 2" has no cases, so why it is still estimable while on the contrary, "treat 7_group 2" which has 2 cases is not? Thanks John ________________________________ From: Dennis Murphy <[hidden email]> Sent: Monday, November 7, 2011 9:29 PM Subject: Re: [R] why NA coefficients Hi John: What is the estimate of the cell mean \mu_{12}? Which model effects involve that cell mean? With this data arrangement, the expected population marginal means of treatment 1 and group 2 are not estimable either, unless you're willing to assume a no-interaction model. Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data (vol. 1) cover this topic in some detail, but it assumes you're familiar with the matrix form of a linear statistical model. Both chapters cover the two-way model with interaction - Ch.13 from the cell means model approach and Ch. 14 from the model effects approach. Because this was written in the mid 80s and republished in the early 90s, all the code used is in SAS. HTH, Dennis > Thanks David. The only category that has no cases is "treat 1-group 2": > >> with(test,table(treat,group)) >      group > treat 1 2 >     1 8 0 >     2 1 5 >     3 5 5 >     4 7 3 >     5 7 4 >     6 3 3 >     7 8 2 > > But why the coefficient for "treat 7-group 2" is not estimable? > > Thanks > > John > > > > > ________________________________ > From: David Winsemius <[hidden email]> > > Cc: "[hidden email]" <[hidden email]> > Sent: Monday, November 7, 2011 5:13 PM > Subject: Re: [R] why NA coefficients > > > On Nov 7, 2011, at 7:33 PM, array chip wrote: > >> Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: >> >>> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >>> lm(y~factor(treat)*factor(group),test) >> >> Call: >> lm(formula = y ~ factor(treat) * factor(group), data = test) >> >> Coefficients: >>                   (Intercept)                 factor(treat)2                 factor(treat)3 >>                      0.429244                       0.499982                       0.352971 >>                factor(treat)4                 factor(treat)5                 factor(treat)6 >>                     -0.204752                       0.142042                       0.044155 >>                factor(treat)7                 factor(group)2  factor(treat)2:factor(group)2 >>                     -0.007775                      -0.337907                      -0.208734 >> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  factor(treat)5:factor(group)2 >>                     -0.195138                       0.800029                       0.227514 >> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >>                      0.331548                             NA >> >> >> I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? > > Because you have no cases in one of the crossed categories. > > --David Winsemius, MD > West Hartford, CT >        [[alternative HTML version deleted]] > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 In reply to this post by array chip On Nov 7, 2011, at 10:07 PM, array chip wrote: > Thanks David. The only category that has no cases is "treat 1-group   > 2": > > > with(test,table(treat,group)) >      group > treat 1 2 >     1 8 0 >     2 1 5 >     3 5 5 >     4 7 3 >     5 7 4 >     6 3 3 >     7 8 2 > > But why the coefficient for "treat 7-group 2" is not estimable? Well, it had to omit one of them didn't it? (But I don't know why that level was chosen.) -- David. > > Thanks > > John > > > From: David Winsemius <[hidden email]> > To: array chip <[hidden email]> > Cc: "[hidden email]" <[hidden email]> > Sent: Monday, November 7, 2011 5:13 PM > Subject: Re: [R] why NA coefficients > > > On Nov 7, 2011, at 7:33 PM, array chip wrote: > > > Hi, I am trying to run ANOVA with an interaction term on 2 factors   > (treat has 7 levels, group has 2 levels). I found the coefficient   > for the last interaction term is always 0, see attached dataset and   > the code below: > > > >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) > >> lm(y~factor(treat)*factor(group),test) > > > > Call: > > lm(formula = y ~ factor(treat) * factor(group), data = test) > > > > Coefficients: > >                  (Intercept)                 > factor(treat)2                factor(treat)3 > >                      0.429244                       > 0.499982                      0.352971 > >                factor(treat)4                 > factor(treat)5                factor(treat)6 > >                    -0.204752                       > 0.142042                      0.044155 > >                factor(treat)7                factor(group)2   > factor(treat)2:factor(group)2 > >                    -0.007775                       > -0.337907                      -0.208734 > > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   > factor(treat)5:factor(group)2 > >                    -0.195138                       > 0.800029                      0.227514 > > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 > >                      0.331548                            NA > > > > > > I guess this is due to model matrix being singular or collinearity   > among the matrix columns? But I can't figure out how the matrix is   > singular in this case? Can someone show me why this is the case? > > Because you have no cases in one of the crossed categories. > > --David Winsemius, MD > West Hartford, CT > > > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 On Nov 8, 2011, at 1:19 AM, David Winsemius wrote: > > On Nov 7, 2011, at 10:07 PM, array chip wrote: > >> Thanks David. The only category that has no cases is "treat 1-group   >> 2": >> >> > with(test,table(treat,group)) >>     group >> treat 1 2 >>    1 8 0 >>    2 1 5 >>    3 5 5 >>    4 7 3 >>    5 7 4 >>    6 3 3 >>    7 8 2 >> >> But why the coefficient for "treat 7-group 2" is not estimable? > > Well, it had to omit one of them didn't it? > > (But I don't know why that level was chosen.) But this output suggests there may be alligators in the swamp:  > predict(lmod, newdata=data.frame(treat=1, group=2))           1 0.09133691 Warning message: In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) :    prediction from a rank-deficient fit may be misleading -- David. > > -- > David. >> >> Thanks >> >> John >> >> >> From: David Winsemius <[hidden email]> >> To: array chip <[hidden email]> >> Cc: "[hidden email]" <[hidden email]> >> Sent: Monday, November 7, 2011 5:13 PM >> Subject: Re: [R] why NA coefficients >> >> >> On Nov 7, 2011, at 7:33 PM, array chip wrote: >> >> > Hi, I am trying to run ANOVA with an interaction term on 2   >> factors (treat has 7 levels, group has 2 levels). I found the   >> coefficient for the last interaction term is always 0, see attached   >> dataset and the code below: >> > >> >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >> >> lm(y~factor(treat)*factor(group),test) >> > >> > Call: >> > lm(formula = y ~ factor(treat) * factor(group), data = test) >> > >> > Coefficients: >> >                  (Intercept)                 >> factor(treat)2                factor(treat)3 >> >                      0.429244                       >> 0.499982                      0.352971 >> >                factor(treat)4                 >> factor(treat)5                factor(treat)6 >> >                    -0.204752                       >> 0.142042                      0.044155 >> >                factor(treat)7                factor(group)2   >> factor(treat)2:factor(group)2 >> >                    -0.007775                       >> -0.337907                      -0.208734 >> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   >> factor(treat)5:factor(group)2 >> >                    -0.195138                       >> 0.800029                      0.227514 >> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >> >                      0.331548                            NA >> > >> > >> > I guess this is due to model matrix being singular or   >> collinearity among the matrix columns? But I can't figure out how   >> the matrix is singular in this case? Can someone show me why this   >> is the case? >> >> Because you have no cases in one of the crossed categories. >> >> --David Winsemius, MD >> West Hartford, CT >> >> >> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 Sure it does, but still struggling with what is going on... Thanks John ________________________________ From:David Winsemius <[hidden email]> To:David Winsemius <[hidden email]> oject.org> Sent:Monday, November 7, 2011 10:27 PM Subject:Re: [R] why NA coefficients But this output suggests there may be alligators in the swamp: > predict(lmod, newdata=data.frame(treat=1, group=2))          1 0.09133691 Warning message: In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) :   prediction from a rank-deficient fit may be misleading --David. > > --David. >> >> Thanks >> >> John >> >> >> From: David Winsemius <[hidden email]> >> Cc: "[hidden email]" <[hidden email]> >> Sent: Monday, November 7, 2011 5:13 PM >> Subject: Re: [R] why NA coefficients >> >> >> On Nov 7, 2011, at 7:33 PM, array chip wrote: >> >> > Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: >> > >> >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >> >> lm(y~factor(treat)*factor(group),test) >> > >> > Call: >> > lm(formula = y ~ factor(treat) * factor(group), data = test) >> > >> > Coefficients: >> >                  (Intercept)                factor(treat)2                factor(treat)3 >> >                      0.429244                      0.499982                      0.352971 >> >                factor(treat)4                factor(treat)5                factor(treat)6 >> >                    -0.204752                      0.142042                      0.044155 >> >                factor(treat)7                factor(group)2  factor(treat)2:factor(group)2 >> >                    -0.007775                      -0.337907                      -0.208734 >> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  factor(treat)5:factor(group)2 >> >                    -0.195138                      0.800029                      0.227514 >> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >> >                      0.331548                            NA >> > >> > >> > I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? >> >> Because you have no cases in one of the crossed categories. >> >> --David Winsemius, MD >> West Hartford, CT >> >> >> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > [hidden email] list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 In reply to this post by David Winsemius true, why it has to omit "treat 7-group 2".... Thanks again ________________________________ From: David Winsemius <[hidden email]> Cc: "[hidden email]" <[hidden email]> Sent: Monday, November 7, 2011 10:19 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 10:07 PM, array chip wrote: > Thanks David. The only category that has no cases is "treat 1-group 2": > > > with(test,table(treat,group)) >      group > treat 1 2 >     1 8 0 >     2 1 5 >     3 5 5 >     4 7 3 >     5 7 4 >     6 3 3 >     7 8 2 > > But why the coefficient for "treat 7-group 2" is not estimable? Well, it had to omit one of them didn't it? (But I don't know why that level was chosen.) --David. > > Thanks > > John > > > From: David Winsemius <[hidden email]> > Cc: "[hidden email]" <[hidden email]> > Sent: Monday, November 7, 2011 5:13 PM > Subject: Re: [R] why NA coefficients > > > On Nov 7, 2011, at 7:33 PM, array chip wrote: > > > Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: > > > >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) > >> lm(y~factor(treat)*factor(group),test) > > > > Call: > > lm(formula = y ~ factor(treat) * factor(group), data = test) > > > > Coefficients: > >                  (Intercept)                factor(treat)2                factor(treat)3 > >                      0.429244                      0.499982                      0.352971 > >                factor(treat)4                factor(treat)5                factor(treat)6 > >                    -0.204752                      0.142042                      0.044155 > >                factor(treat)7                factor(group)2  factor(treat)2:factor(group)2 > >                    -0.007775                      -0.337907                      -0.208734 > > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  factor(treat)5:factor(group)2 > >                    -0.195138                      0.800029                      0.227514 > > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 > >                      0.331548                            NA > > > > > > I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? > > Because you have no cases in one of the crossed categories. > > --David Winsemius, MD > West Hartford, CT > > > David Winsemius, MD West Hartford, CT         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 In reply to this post by array chip On Nov 8, 2011, at 12:36 PM, array chip wrote: > Sure it does, but still struggling with what is going on... Have you considered redefining the implicit base level for "treat" so   it does not create the missing crossed-category?  > test$treat2_ <- factor(test$treat, levels=c(2:7, 1) )  > lm(y~treat2_*factor(group),test) Call: lm(formula = y ~ treat2_ * factor(group), data = test) Coefficients:              (Intercept)                 treat2_3                   treat2_4                0.9292256               -0.1470106                 -0.7047343                 treat2_5                 treat2_6                   treat2_7               -0.3579398               -0.4558269                 -0.5077571                 treat2_1           factor(group)2   treat2_3:factor(group)2               -0.4999820               -0.5466405                 0.0135963 treat2_4:factor(group)2  treat2_5:factor(group)2   treat2_6:factor(group)2                1.0087628                0.4362479                 0.5402821 treat2_7:factor(group)2  treat2_1:factor(group)2                0.2087338                       NA All the "group-less" coefficients are for group1 , so  now get a   prediction for group=1:treat=2 == "Intercept", group=1:treat=3 , ....   a total of 7 values. And there are 6 predictions for group2. The onus is obviously on you to check the predictions against the   data. 'aggregate' is a good function for that purpose. -- david. > > Thanks > > John > > From: David Winsemius <[hidden email]> > To: David Winsemius <[hidden email]> > Cc: array chip <[hidden email]>; "[hidden email]" <[hidden email] > > > Sent: Monday, November 7, 2011 10:27 PM > Subject: Re: [R] why NA coefficients > > But this output suggests there may be alligators in the swamp: > > > predict(lmod, newdata=data.frame(treat=1, group=2)) >         1 > 0.09133691 > Warning message: > In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) : >   prediction from a rank-deficient fit may be misleading > > --David. > > > > --David. > >> > >> Thanks > >> > >> John > >> > >> > >> From: David Winsemius <[hidden email]> > >> To: array chip <[hidden email]> > >> Cc: "[hidden email]" <[hidden email]> > >> Sent: Monday, November 7, 2011 5:13 PM > >> Subject: Re: [R] why NA coefficients > >> > >> > >> On Nov 7, 2011, at 7:33 PM, array chip wrote: > >> > >> > Hi, I am trying to run ANOVA with an interaction term on 2   > factors (treat has 7 levels, group has 2 levels). I found the   > coefficient for the last interaction term is always 0, see attached   > dataset and the code below: > >> > > >> >> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) > >> >> lm(y~factor(treat)*factor(group),test) > >> > > >> > Call: > >> > lm(formula = y ~ factor(treat) * factor(group), data = test) > >> > > >> > Coefficients: > >> >                  (Intercept)                 > factor(treat)2                factor(treat)3 > >> >                      0.429244                       > 0.499982                      0.352971 > >> >                factor(treat)4                 > factor(treat)5                factor(treat)6 > >> >                    -0.204752                       > 0.142042                      0.044155 > >> >                factor(treat)7                factor(group)2   > factor(treat)2:factor(group)2 > >> >                    -0.007775                       > -0.337907                      -0.208734 > >> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2   > factor(treat)5:factor(group)2 > >> >                    -0.195138                       > 0.800029                      0.227514 > >> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 > >> >                      0.331548                            NA > >> > > >> > > >> > I guess this is due to model matrix being singular or   > collinearity among the matrix columns? But I can't figure out how   > the matrix is singular in this case? Can someone show me why this is   > the case? > >> > >> Because you have no cases in one of the crossed categories. > >> > >> --David Winsemius, MD > >> West Hartford, CT > >> > >> > >> > > > > David Winsemius, MD > > West Hartford, CT > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 In reply to this post by array chip The cell mean mu_{12} is non-estimable because it has no data in the cell. How can you estimate something that's not there (at least without imputation :)? Every parametric function that involves mu_{12} will also be non-estimable - in particular,  the interaction term and the population marginal means . That's why you get the NA estimates and the warning. All this follows from the linear model theory described in, for example, Milliken and Johnson (1992), Analysis of Messy Data, vol. 1, ch. 13. Here's an example from Milliken and Johnson (1992) to illustrate:           B1         B2       B3 T1      2, 6                   8, 6 T2        3          14      12, 9 T3        6           9 Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means are estimated by the cell averages. >From M & J (p. 173, whence this example is taken): "Whenever treatment combinations are missing, certain hypotheses cannot be tested without making some additional assumptions about the parameters in the model. Hypotheses involving parameters corresponding to the missing cells generally cannot be tested. For example, for the data [above] it is not possible to estimate any linear combinations (or to test any hypotheses) that involve parameters \mu_{12} and \mu_{33} unless one is willing to make some assumptions about them." They continue: "One common assumption is that there is no interactions between the levels of T and the levels of B. In our opinion, this assumption should not be made without some supporting experimental evidence." In other words, removing the interaction term makes the non-estimability problem disappear, but it's a copout unless there is some tangible scientific justification for an additive rather than an interaction model. For the above data, M & J note that it is not possible to estimate all of the expected marginal means - in particular, one cannot estimate the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$, $\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and $\bar{\mu}_{.1}$ since these functions of the parameters involve terms associated with the means of the missing cells. Moreover, any hypotheses involving parametric functions that contain non-estimable cell means are not testable. In this example, the test of equal row population marginal means is not testable because $\bar{\mu}_{1.}$ and $\bar{\mu}_{3.}$ are not estimable. [Aside: if the term parametric function is not familiar, in this context it refers to linear combinations of model parameters.  In the M & J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a parametric function.] Hopefully this sheds some light on the situation. Dennis On Mon, Nov 7, 2011 at 10:17 PM, array chip <[hidden email]> wrote: > Hi Dennis, > The cell mean mu_12 from the model involves the intercept and factor 2: > Coefficients: >                   (Intercept)                 factor(treat)2 > factor(treat)3 >                      0.429244 > 0.499982                       0.352971 >                factor(treat)4                 factor(treat)5 > factor(treat)6 >                     -0.204752 > 0.142042                       0.044155 >                factor(treat)7                 factor(group)2 > factor(treat)2:factor(group)2 >                     -0.007775 > -0.337907                      -0.208734 > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2 > factor(treat)5:factor(group)2 >                     -0.195138 > 0.800029                       0.227514 > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >                      0.331548                             NA > So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by: >> predict(fit,data.frame(list(treat=1,group=2))) >          1 > 0.09133691 > Warning message: > In predict.lm(fit, data.frame(list(treat = 1, group = 2))) : >   prediction from a rank-deficient fit may be misleading > > But as you can see, it gave a warning about rank-deficient fit... why this > is a rank-deficient fit? > Because "treat 1_group 2" has no cases, so why it is still estimable while > on the contrary, "treat 7_group 2" which has 2 cases is not? > Thanks > John > > > > > ________________________________ > From: Dennis Murphy <[hidden email]> > To: array chip <[hidden email]> > Sent: Monday, November 7, 2011 9:29 PM > Subject: Re: [R] why NA coefficients > > Hi John: > > What is the estimate of the cell mean \mu_{12}? Which model effects > involve that cell mean? With this data arrangement, the expected > population marginal means of treatment 1 and group 2 are not estimable > either, unless you're willing to assume a no-interaction model. > > Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data > (vol. 1) cover this topic in some detail, but it assumes you're > familiar with the matrix form of a linear statistical model. Both > chapters cover the two-way model with interaction - Ch.13 from the > cell means model approach and Ch. 14 from the model effects approach. > Because this was written in the mid 80s and republished in the early > 90s, all the code used is in SAS. > > HTH, > Dennis > > On Mon, Nov 7, 2011 at 7:07 PM, array chip <[hidden email]> wrote: >> Thanks David. The only category that has no cases is "treat 1-group 2": >> >>> with(test,table(treat,group)) >>      group >> treat 1 2 >>     1 8 0 >>     2 1 5 >>     3 5 5 >>     4 7 3 >>     5 7 4 >>     6 3 3 >>     7 8 2 >> >> But why the coefficient for "treat 7-group 2" is not estimable? >> >> Thanks >> >> John >> >> >> >> >> ________________________________ >> From: David Winsemius <[hidden email]> >> >> Cc: "[hidden email]" <[hidden email]> >> Sent: Monday, November 7, 2011 5:13 PM >> Subject: Re: [R] why NA coefficients >> >> >> On Nov 7, 2011, at 7:33 PM, array chip wrote: >> >>> Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat >>> has 7 levels, group has 2 levels). I found the coefficient for the last >>> interaction term is always 0, see attached dataset and the code below: >>> >>>> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >>>> lm(y~factor(treat)*factor(group),test) >>> >>> Call: >>> lm(formula = y ~ factor(treat) * factor(group), data = test) >>> >>> Coefficients: >>>                   (Intercept)                 factor(treat)2 >>>    factor(treat)3 >>>                      0.429244                       0.499982 >>>          0.352971 >>>                factor(treat)4                 factor(treat)5 >>>    factor(treat)6 >>>                     -0.204752                       0.142042 >>>          0.044155 >>>                factor(treat)7                 factor(group)2 >>> factor(treat)2:factor(group)2 >>>                     -0.007775                      -0.337907 >>>         -0.208734 >>> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2 >>> factor(treat)5:factor(group)2 >>>                     -0.195138                       0.800029 >>>          0.227514 >>> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >>>                      0.331548                             NA >>> >>> >>> I guess this is due to model matrix being singular or collinearity among >>> the matrix columns? But I can't figure out how the matrix is singular in >>> this case? Can someone show me why this is the case? >> >> Because you have no cases in one of the crossed categories. >> >> --David Winsemius, MD >> West Hartford, CT >>        [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> >> > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: why NA coefficients

 Hi Dennis, Thanks very much for the details. All those explanations about non-estimable mu_{12} when it has no data make sense to me!  Regarding my specific example data where mu_{12} should NOT be estimable in a linear model with interaction because it has no data, yet the linear model I created by using lm() in R still CAN estimate the mean mu_{12}, while on the other hand, mu_{72} is instead NOT estimable from lm() even this category does have data. Does this contradiction to the theory imply that the linear model by lm() in R on my specific example data is NOT reliable/trustable and should not be used? Thanks John ________________________________ From: Dennis Murphy <[hidden email]> Cc: "[hidden email]" <[hidden email]> Sent: Tuesday, November 8, 2011 10:22 AM Subject: Re: [R] why NA coefficients The cell mean mu_{12} is non-estimable because it has no data in the cell. How can you estimate something that's not there (at least without imputation :)? Every parametric function that involves mu_{12} will also be non-estimable - in particular,  the interaction term and the population marginal means . That's why you get the NA estimates and the warning. All this follows from the linear model theory described in, for example, Milliken and Johnson (1992), Analysis of Messy Data, vol. 1, ch. 13. Here's an example from Milliken and Johnson (1992) to illustrate:           B1         B2       B3 T1      2, 6                   8, 6 T2        3          14      12, 9 T3        6           9 Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means are estimated by the cell averages. From M & J (p. 173, whence this example is taken): "Whenever treatment combinations are missing, certain hypotheses cannot be tested without making some additional assumptions about the parameters in the model. Hypotheses involving parameters corresponding to the missing cells generally cannot be tested. For example, for the data [above] it is not possible to estimate any linear combinations (or to test any hypotheses) that involve parameters \mu_{12} and \mu_{33} unless one is willing to make some assumptions about them." They continue: "One common assumption is that there is no interactions between the levels of T and the levels of B. In our opinion, this assumption should not be made without some supporting experimental evidence." In other words, removing the interaction term makes the non-estimability problem disappear, but it's a copout unless there is some tangible scientific justification for an additive rather than an interaction model. For the above data, M & J note that it is not possible to estimate all of the expected marginal means - in particular, one cannot estimate the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$, $\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and $\bar{\mu}_{.1}$ since these functions of the parameters involve terms associated with the means of the missing cells. Moreover, any hypotheses involving parametric functions that contain non-estimable cell means are not testable. In this example, the test of equal row population marginal means is not testable because $\bar{\mu}_{1.}$ and $\bar{\mu}_{3.}$ are not estimable. [Aside: if the term parametric function is not familiar, in this context it refers to linear combinations of model parameters.  In the M & J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a parametric function.] Hopefully this sheds some light on the situation. Dennis > Hi Dennis, > The cell mean mu_12 from the model involves the intercept and factor 2: > Coefficients: >                   (Intercept)                 factor(treat)2 > factor(treat)3 >                      0.429244 > 0.499982                       0.352971 >                factor(treat)4                 factor(treat)5 > factor(treat)6 >                     -0.204752 > 0.142042                       0.044155 >                factor(treat)7                 factor(group)2 > factor(treat)2:factor(group)2 >                     -0.007775 > -0.337907                      -0.208734 > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2 > factor(treat)5:factor(group)2 >                     -0.195138 > 0.800029                       0.227514 > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >                      0.331548                             NA > So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by: >> predict(fit,data.frame(list(treat=1,group=2))) >          1 > 0.09133691 > Warning message: > In predict.lm(fit, data.frame(list(treat = 1, group = 2))) : >   prediction from a rank-deficient fit may be misleading > > But as you can see, it gave a warning about rank-deficient fit... why this > is a rank-deficient fit? > Because "treat 1_group 2" has no cases, so why it is still estimable while > on the contrary, "treat 7_group 2" which has 2 cases is not? > Thanks > John > > > > > ________________________________ > From: Dennis Murphy <[hidden email]> > Sent: Monday, November 7, 2011 9:29 PM > Subject: Re: [R] why NA coefficients > > Hi John: > > What is the estimate of the cell mean \mu_{12}? Which model effects > involve that cell mean? With this data arrangement, the expected > population marginal means of treatment 1 and group 2 are not estimable > either, unless you're willing to assume a no-interaction model. > > Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data > (vol. 1) cover this topic in some detail, but it assumes you're > familiar with the matrix form of a linear statistical model. Both > chapters cover the two-way model with interaction - Ch.13 from the > cell means model approach and Ch. 14 from the model effects approach. > Because this was written in the mid 80s and republished in the early > 90s, all the code used is in SAS. > > HTH, > Dennis > >> Thanks David. The only category that has no cases is "treat 1-group 2": >> >>> with(test,table(treat,group)) >>      group >> treat 1 2 >>     1 8 0 >>     2 1 5 >>     3 5 5 >>     4 7 3 >>     5 7 4 >>     6 3 3 >>     7 8 2 >> >> But why the coefficient for "treat 7-group 2" is not estimable? >> >> Thanks >> >> John >> >> >> >> >> ________________________________ >> From: David Winsemius <[hidden email]> >> >> Cc: "[hidden email]" <[hidden email]> >> Sent: Monday, November 7, 2011 5:13 PM >> Subject: Re: [R] why NA coefficients >> >> >> On Nov 7, 2011, at 7:33 PM, array chip wrote: >> >>> Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat >>> has 7 levels, group has 2 levels). I found the coefficient for the last >>> interaction term is always 0, see attached dataset and the code below: >>> >>>> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) >>>> lm(y~factor(treat)*factor(group),test) >>> >>> Call: >>> lm(formula = y ~ factor(treat) * factor(group), data = test) >>> >>> Coefficients: >>>                   (Intercept)                 factor(treat)2 >>>    factor(treat)3 >>>                      0.429244                       0.499982 >>>          0.352971 >>>                factor(treat)4                 factor(treat)5 >>>    factor(treat)6 >>>                     -0.204752                       0.142042 >>>          0.044155 >>>                factor(treat)7                 factor(group)2 >>> factor(treat)2:factor(group)2 >>>                     -0.007775                      -0.337907 >>>         -0.208734 >>> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2 >>> factor(treat)5:factor(group)2 >>>                     -0.195138                       0.800029 >>>          0.227514 >>> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 >>>                      0.331548                             NA >>> >>> >>> I guess this is due to model matrix being singular or collinearity among >>> the matrix columns? But I can't figure out how the matrix is singular in >>> this case? Can someone show me why this is the case? >> >> Because you have no cases in one of the crossed categories. >> >> --David Winsemius, MD >> West Hartford, CT >>        [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> >> > > >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
 It might make the discussion easier to follow if you used a smaller dataset that anyone can make and did some experiments with contrasts. E.g., > D <- data.frame(expand.grid(X1=LETTERS[1:3], X2=letters[24:26])[-1,], Y=2^(1:8)) > D   X1 X2   Y 2  B  x   2 3  C  x   4 4  A  y   8 5  B  y  16 6  C  y  32 7  A  z  64 8  B  z 128 9  C  z 256 > lm(data=D, Y ~ X1 * X2) Call: lm(formula = Y ~ X1 * X2, data = D) Coefficients: (Intercept)          X1B          X1C          -188          190          192           X2y          X2z      X1B:X2y           196          252         -182       X1C:X2y      X1B:X2z      X1C:X2z          -168         -126           NA   > lm(data=D, Y ~ X1 * X2, contrasts=list(X2="contr.SAS")) Call: lm(formula = Y ~ X1 * X2, data = D, contrasts = list(X2 = "contr.SAS")) Coefficients: (Intercept)          X1B          X1C            64           64          192           X2x          X2y      X1B:X2x          -252          -56          126       X1C:X2x      X1B:X2y      X1C:X2y            NA          -56         -168   Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf Of array chip > Sent: Tuesday, November 08, 2011 10:57 AM > To: Dennis Murphy > Cc: [hidden email] > Subject: Re: [R] why NA coefficients > > Hi Dennis, Thanks very much for the details. All those explanations about non-estimable mu_{12} when > it has no data make sense to me! > > Regarding my specific example data where mu_{12} should NOT be estimable in a linear model with > interaction because it has no data, yet the linear model I created by using lm() in R still CAN > estimate the mean mu_{12}, while on the other hand, mu_{72} is instead NOT estimable from lm() even > this category does have data. Does this contradiction to the theory imply that the linear model by > lm() in R on my specific example data is NOT reliable/trustable and should not be used? > > Thanks > > John > > > > ________________________________ > From: Dennis Murphy <[hidden email]> > > Cc: "[hidden email]" <[hidden email]> > Sent: Tuesday, November 8, 2011 10:22 AM > Subject: Re: [R] why NA coefficients > > The cell mean mu_{12} is non-estimable because it has no data in the > cell. How can you estimate something that's not there (at least > without imputation :)? Every parametric function that involves mu_{12} > will also be non-estimable - in particular,  the interaction term and > the population marginal means . That's why you get the NA estimates > and the warning. All this follows from the linear model theory > described in, for example, Milliken and Johnson (1992), Analysis of > Messy Data, vol. 1, ch. 13. > > Here's an example from Milliken and Johnson (1992) to illustrate: >           B1         B2       B3 > T1      2, 6                   8, 6 > T2        3          14      12, 9 > T3        6           9 > > Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means > are estimated by the cell averages. > > From M & J (p. 173, whence this example is taken): > "Whenever treatment combinations are missing, certain > hypotheses cannot be tested without making some > additional assumptions about the parameters in the model. > Hypotheses involving parameters corresponding to the > missing cells generally cannot be tested. For example, > for the data [above] it is not possible to estimate any > linear combinations (or to test any hypotheses) that > involve parameters \mu_{12} and \mu_{33} unless one > is willing to make some assumptions about them." > > They continue: > "One common assumption is that there is no > interactions between the levels of T and the levels of B. > In our opinion, this assumption should not be made > without some supporting experimental evidence." > > In other words, removing the interaction term makes the > non-estimability problem disappear, but it's a copout unless there is > some tangible scientific justification for an additive rather than an > interaction model. > > For the above data, M & J note that it is not possible to estimate all > of the expected marginal means - in particular, one cannot estimate > the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$, > $\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and > $\bar{\mu}_{.1}$ since these functions of the parameters involve terms > associated with the means of the missing cells. Moreover, any > hypotheses involving parametric functions that contain non-estimable > cell means are not testable. In this example, the test of equal row > population marginal means is not testable because $\bar{\mu}_{1.}$ and > $\bar{\mu}_{3.}$ are not estimable. > > [Aside: if the term parametric function is not familiar, in this > context it refers to linear combinations of model parameters.  In the > M & J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a > parametric function.] > > Hopefully this sheds some light on the situation. > > Dennis > > > > Hi Dennis, > > The cell mean mu_12 from the model involves the intercept and factor 2: > > Coefficients: > >                   (Intercept)                 factor(treat)2 > > factor(treat)3 > >                      0.429244 > > 0.499982                       0.352971 > >                factor(treat)4                 factor(treat)5 > > factor(treat)6 > >                     -0.204752 > > 0.142042                       0.044155 > >                factor(treat)7                 factor(group)2 > > factor(treat)2:factor(group)2 > >                     -0.007775 > > -0.337907                      -0.208734 > > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2 > > factor(treat)5:factor(group)2 > >                     -0.195138 > > 0.800029                       0.227514 > > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 > >                      0.331548                             NA > > So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by: > >> predict(fit,data.frame(list(treat=1,group=2))) > >          1 > > 0.09133691 > > Warning message: > > In predict.lm(fit, data.frame(list(treat = 1, group = 2))) : > >   prediction from a rank-deficient fit may be misleading > > > > But as you can see, it gave a warning about rank-deficient fit... why this > > is a rank-deficient fit? > > Because "treat 1_group 2" has no cases, so why it is still estimable while > > on the contrary, "treat 7_group 2" which has 2 cases is not? > > Thanks > > John > > > > > > > > > > ________________________________ > > From: Dennis Murphy <[hidden email]> > > > Sent: Monday, November 7, 2011 9:29 PM > > Subject: Re: [R] why NA coefficients > > > > Hi John: > > > > What is the estimate of the cell mean \mu_{12}? Which model effects > > involve that cell mean? With this data arrangement, the expected > > population marginal means of treatment 1 and group 2 are not estimable > > either, unless you're willing to assume a no-interaction model. > > > > Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data > > (vol. 1) cover this topic in some detail, but it assumes you're > > familiar with the matrix form of a linear statistical model. Both > > chapters cover the two-way model with interaction - Ch.13 from the > > cell means model approach and Ch. 14 from the model effects approach. > > Because this was written in the mid 80s and republished in the early > > 90s, all the code used is in SAS. > > > > HTH, > > Dennis > > > > >> Thanks David. The only category that has no cases is "treat 1-group 2": > >> > >>> with(test,table(treat,group)) > >>      group > >> treat 1 2 > >>     1 8 0 > >>     2 1 5 > >>     3 5 5 > >>     4 7 3 > >>     5 7 4 > >>     6 3 3 > >>     7 8 2 > >> > >> But why the coefficient for "treat 7-group 2" is not estimable? > >> > >> Thanks > >> > >> John > >> > >> > >> > >> > >> ________________________________ > >> From: David Winsemius <[hidden email]> > >> > >> Cc: "[hidden email]" <[hidden email]> > >> Sent: Monday, November 7, 2011 5:13 PM > >> Subject: Re: [R] why NA coefficients > >> > >> > >> On Nov 7, 2011, at 7:33 PM, array chip wrote: > >> > >>> Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat > >>> has 7 levels, group has 2 levels). I found the coefficient for the last > >>> interaction term is always 0, see attached dataset and the code below: > >>> > >>>> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL) > >>>> lm(y~factor(treat)*factor(group),test) > >>> > >>> Call: > >>> lm(formula = y ~ factor(treat) * factor(group), data = test) > >>> > >>> Coefficients: > >>>                   (Intercept)                 factor(treat)2 > >>>    factor(treat)3 > >>>                      0.429244                       0.499982 > >>>          0.352971 > >>>                factor(treat)4                 factor(treat)5 > >>>    factor(treat)6 > >>>                     -0.204752                       0.142042 > >>>          0.044155 > >>>                factor(treat)7                 factor(group)2 > >>> factor(treat)2:factor(group)2 > >>>                     -0.007775                      -0.337907 > >>>         -0.208734 > >>> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2 > >>> factor(treat)5:factor(group)2 > >>>                     -0.195138                       0.800029 > >>>          0.227514 > >>> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2 > >>>                      0.331548                             NA > >>> > >>> > >>> I guess this is due to model matrix being singular or collinearity among > >>> the matrix columns? But I can't figure out how the matrix is singular in > >>> this case? Can someone show me why this is the case? > >> > >> Because you have no cases in one of the crossed categories. > >> > >> --David Winsemius, MD > >> West Hartford, CT > >>        [[alternative HTML version deleted]] > >> > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > > > > > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.