Dear R-users,
I am using the R-function "linearHypothesis" to test if the sum of all parameters, but the intercept, in a multiple linear regression is different from zero. I wonder if it is statistically valid to use the linearHypothesis-function for this? Below is a reproducible example in R. A multiple regression: y = beta0*t0+beta1*t1+beta2*t2+beta3*t3+beta4*t4 It seems to me that the linearHypothesis function does the calculation as an F-test on the extra residuals when going from the starting model to a 'subset' model, although all variables in the 'subset' model differ from the variables in the starting model. I normally think of a subset model as a model built on the same input data as the starting model but one variable. Hence, is this a valid calculation? Thanks in advance,Johan # R-code: y <- c(101133190,96663050,106866486,97678429,83212348,75719714,77861937,74018478,82181104,68667176,64599495,62414401,63534709,58571865,65222727,60139788, 63355011,57790610,55214971,55535484,55759192,49450719,48834699,51383864,51250871,50629835,52154608,54636478,54942637) data <- data.frame(y,"t0"=1,"t1"=1990:2018,"t2"=c(rep(0,12),1:17),"t3"=c(rep(0,17),1:12),"t4"=c(rep(0,23),1:6)) model <- lm(y~t0+t1+t2+t3+t4+0,data=data) linearHypothesis(model,"t1+t2+t3+t4=0",test=c("F")) # Reproduce the result from linearHypothesis: # beta1+beta2+beta3+beta4=0 -> beta4=-(beta1+beta2+beta3) -> # y=beta0+beta1*t1+beta2*t2+beta3*t3-(beta1+beta2+beta3)*t4 # y = beta0'+beta1'*(t1-t4)+beta2'*(t2-t4)+beta3'*(t3-t4) data$t1 <- data$t1-data$t4 data$t2 <- data$t2-data$t4 data$t3 <- data$t3-data$t4 model_reduced <- lm(y~t0+t1+t2+t3+0,data=data) anova(model_reduced,model) -- Johan Lassen "In the cities people live in time - in the mountains people live in space" (Budistisk munk). [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Johan,
On 2020-09-17 9:07 a.m., Johan Lassen wrote: > Dear R-users, > > I am using the R-function "linearHypothesis" to test if the sum of all > parameters, but the intercept, in a multiple linear regression is different > from zero. > I wonder if it is statistically valid to use the linearHypothesis-function > for this? Yes, assuming of course that the hypothesis makes sense. > Below is a reproducible example in R. A multiple regression: y = > beta0*t0+beta1*t1+beta2*t2+beta3*t3+beta4*t4 > > It seems to me that the linearHypothesis function does the calculation as > an F-test on the extra residuals when going from the starting model to a > 'subset' model, although all variables in the 'subset' model differ from > the variables in the starting model. > I normally think of a subset model as a model built on the same input data > as the starting model but one variable. > > Hence, is this a valid calculation? First, linearHypothesis() doesn't literally fit alternative models, but rather tests the linear hypothesis directly from the coefficient estimates and their covariance matrix. The test is standard -- look at the references in ?linearHypothesis or most texts on linear models. Second, formulating the hypothesis using alternative models is also legitimate, since the second model is a restricted version of the first. > > Thanks in advance,Johan > > # R-code: > y <- > c(101133190,96663050,106866486,97678429,83212348,75719714,77861937,74018478,82181104,68667176,64599495,62414401,63534709,58571865,65222727,60139788, > 63355011,57790610,55214971,55535484,55759192,49450719,48834699,51383864,51250871,50629835,52154608,54636478,54942637) > > data <- > data.frame(y,"t0"=1,"t1"=1990:2018,"t2"=c(rep(0,12),1:17),"t3"=c(rep(0,17),1:12),"t4"=c(rep(0,23),1:6)) > > model <- lm(y~t0+t1+t2+t3+t4+0,data=data) You need not supply the constant regressor t0 explicitly and suppress the intercept -- you'd get the same test from linearHypothesis() for lm(y~t1+t2+t3+t4,data=data). > > linearHypothesis(model,"t1+t2+t3+t4=0",test=c("F")) test = "F" is the default. > > # Reproduce the result from linearHypothesis: > # beta1+beta2+beta3+beta4=0 -> beta4=-(beta1+beta2+beta3) -> > # y=beta0+beta1*t1+beta2*t2+beta3*t3-(beta1+beta2+beta3)*t4 > # y = beta0'+beta1'*(t1-t4)+beta2'*(t2-t4)+beta3'*(t3-t4) > > data$t1 <- data$t1-data$t4 > data$t2 <- data$t2-data$t4 > data$t3 <- data$t3-data$t4 > > model_reduced <- lm(y~t0+t1+t2+t3+0,data=data) > > anova(model_reduced,model) Yes, this is equivalent to the test performed by linearHypothesis() using the coefficients and their covariances from the original model. I hope this helps, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/ > ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Johan,
It's generally a good idea to keep the conversation on r-help to allow list members to follow it, and so I'm cc'ing this response to the list. I hope that it's clear that car::linearHypothesis() computes the test as a Wald test of a linear hypothesis and not as a likelihood-ratio test by model comparison. As your example illustrates, however, the two tests are the same for a linear model, but this is not true more generally. As I mentioned, you can find the details in many sources, including in Section 5.3.5 of Fox and Weisberg, An R Companion to Applied Regression, 3rd Edition, the book with which the car package is associated. Best, John On 2020-09-17 4:03 p.m., Johan Lassen wrote: > Thank you John - highly appreciated! Yes, you are right, the less > complex model may be seen as a restricted model of the starting model. > Although the set of variables in the less complex model is not directly > a subset of the variables of the starting model. What confused me at > first was that I think of a subset model as a model having a direct > subset of the set of variables of the starting model. Even though this > is not the case in the example, the test still is on a restricted model > of the starting model. > Thanks, > Johan > > Den tor. 17. sep. 2020 kl. 15.55 skrev John Fox <[hidden email] > <mailto:[hidden email]>>: > > Dear Johan, > > On 2020-09-17 9:07 a.m., Johan Lassen wrote: > > Dear R-users, > > > > I am using the R-function "linearHypothesis" to test if the sum > of all > > parameters, but the intercept, in a multiple linear regression is > different > > from zero. > > I wonder if it is statistically valid to use the > linearHypothesis-function > > for this? > > Yes, assuming of course that the hypothesis makes sense. > > > > Below is a reproducible example in R. A multiple regression: y = > > beta0*t0+beta1*t1+beta2*t2+beta3*t3+beta4*t4 > > > > It seems to me that the linearHypothesis function does the > calculation as > > an F-test on the extra residuals when going from the starting > model to a > > 'subset' model, although all variables in the 'subset' model > differ from > > the variables in the starting model. > > I normally think of a subset model as a model built on the same > input data > > as the starting model but one variable. > > > > Hence, is this a valid calculation? > > First, linearHypothesis() doesn't literally fit alternative models, but > rather tests the linear hypothesis directly from the coefficient > estimates and their covariance matrix. The test is standard -- look at > the references in ?linearHypothesis or most texts on linear models. > > Second, formulating the hypothesis using alternative models is also > legitimate, since the second model is a restricted version of the first. > > > > > Thanks in advance,Johan > > > > # R-code: > > y <- > > > c(101133190,96663050,106866486,97678429,83212348,75719714,77861937,74018478,82181104,68667176,64599495,62414401,63534709,58571865,65222727,60139788, > > > 63355011,57790610,55214971,55535484,55759192,49450719,48834699,51383864,51250871,50629835,52154608,54636478,54942637) > > > > data <- > > > data.frame(y,"t0"=1,"t1"=1990:2018,"t2"=c(rep(0,12),1:17),"t3"=c(rep(0,17),1:12),"t4"=c(rep(0,23),1:6)) > > > > model <- lm(y~t0+t1+t2+t3+t4+0,data=data) > > You need not supply the constant regressor t0 explicitly and suppress > the intercept -- you'd get the same test from linearHypothesis() for > lm(y~t1+t2+t3+t4,data=data). > > > > > linearHypothesis(model,"t1+t2+t3+t4=0",test=c("F")) > > test = "F" is the default. > > > > > # Reproduce the result from linearHypothesis: > > # beta1+beta2+beta3+beta4=0 -> beta4=-(beta1+beta2+beta3) -> > > # y=beta0+beta1*t1+beta2*t2+beta3*t3-(beta1+beta2+beta3)*t4 > > # y = beta0'+beta1'*(t1-t4)+beta2'*(t2-t4)+beta3'*(t3-t4) > > > > data$t1 <- data$t1-data$t4 > > data$t2 <- data$t2-data$t4 > > data$t3 <- data$t3-data$t4 > > > > model_reduced <- lm(y~t0+t1+t2+t3+0,data=data) > > > > anova(model_reduced,model) > > Yes, this is equivalent to the test performed by linearHypothesis() > using the coefficients and their covariances from the original model. > > I hope this helps, > John > > -- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > > > > > > -- > Johan Lassen > > "In the cities people live in time - > in the mountains people live in space" (Budistisk munk). -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/ ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |