# multivariate regression and lm()

2 messages
Open this post in threaded view
|

## multivariate regression and lm()

 Hello, I would like to perform a multivariate regression analysis to model the relationship between m responses Y1, ... Ym and a single set of predictor variables X1, ..., Xr.  Each response is assumed to follow its own regression model, and the error terms in each model can be correlated. Based on my readings of the R help archives and R documentation, the function lm() should be able to perform this analysis.  However my tests of lm() show that it really just fits m separate regression models (without accounting for any possible correlation between the response variables). I have attached an example below that demonstrates that the multivariate analysis result produced by lm() is identical to running separate regression analyses on each of the Y1, ... Ym response variables, which is not the desired objective. Can anyone confirm if lm() is indeed capable of performing multivariate regression analysis, and if so how? Thanks very much, Ernest PS  my post is based on an earlier 2005 post where the same question was asked ... the conclusion at that time was that the function lm() is capable of handling the multivariate analysis.  However my tests (shown below) indicate that lm() actually seems to perform separate regressions for each of the response variables without accounting for their possible correlation. ### multivariate analysis using lm() ### ######################################## > ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4), y1 = c(1, 4, 3, 8, 9), y2 = c(-1, -1, 2, 3, 2)) > > f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8) > summary(f.mlm) Response y1 : Call: lm(formula = y1 ~ z1, data = ex7.8) Residuals:  1  2  3  4  5  0  1 -2  1  0 Coefficients:             Estimate Std. Error t value Pr(>|t|) (Intercept)   1.0000     1.0954   0.913   0.4286 z1            2.0000     0.4472   4.472   0.0208 * --- Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 Residual standard error: 1.414 on 3 degrees of freedom Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261 F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084 Response y2 : Call: lm(formula = y2 ~ z1, data = ex7.8) Residuals:          1          2          3          4          5 -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00 Coefficients:             Estimate Std. Error t value Pr(>|t|) (Intercept)  -1.0000     0.8944  -1.118   0.3450 z1            1.0000     0.3651   2.739   0.0714 . --- Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 Residual standard error: 1.155 on 3 degrees of freedom Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619 F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142 ### separate linear regressions on y1 and y2 ### ################################################ > f.mlm1 = lm(y1 ~ z1, data=ex7.8); summary(f.mlm1) Call: lm(formula = y1 ~ z1, data = ex7.8) Residuals:  1  2  3  4  5  0  1 -2  1  0 Coefficients:             Estimate Std. Error t value Pr(>|t|) (Intercept)   1.0000     1.0954   0.913   0.4286 z1            2.0000     0.4472   4.472   0.0208 * --- Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 Residual standard error: 1.414 on 3 degrees of freedom Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261 F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084 > f.mlm2 = lm(y2 ~ z1, data=ex7.8); summary(f.mlm2) Call: lm(formula = y2 ~ z1, data = ex7.8) Residuals:          1          2          3          4          5 -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00 Coefficients:             Estimate Std. Error t value Pr(>|t|) (Intercept)  -1.0000     0.8944  -1.118   0.3450 z1            1.0000     0.3651   2.739   0.0714 . --- Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 Residual standard error: 1.155 on 3 degrees of freedom Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619 F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: multivariate regression and lm()

 Dear Ernest, The ML estimator for the mulitvariate linear model assuming multinormal errors is the same as equation-by-equation LS, but multivariate tests performed for all of the responses on the resulting multivariate linear model object (e.g., by anova() or Anova() in the car package) will take the correlations of the responses into account. I hope that this helps,  John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/On Fri, 16 Mar 2012 07:32:02 -0400  Ernest Lo <[hidden email]> wrote: > Hello, > > I would like to perform a multivariate regression analysis to model the > relationship between m responses Y1, ... Ym and a single set of predictor > variables X1, ..., Xr.  Each response is assumed to follow its own > regression model, and the error terms in each model can be correlated. > > Based on my readings of the R help archives and R documentation, the > function lm() should be able to perform this analysis.  However my tests of > lm() show that it really just fits m separate regression models (without > accounting for any possible correlation between the response variables). > > I have attached an example below that demonstrates that the multivariate > analysis result produced by lm() is identical to running separate > regression analyses on each of the Y1, ... Ym response variables, which is > not the desired objective. > > Can anyone confirm if lm() is indeed capable of performing multivariate > regression analysis, and if so how? > > Thanks very much, > > Ernest > > PS  my post is based on an earlier 2005 post where the same question was > asked ... the conclusion at that time was that the function lm() is capable > of handling the multivariate analysis.  However my tests (shown below) > indicate that lm() actually seems to perform separate regressions for each > of the response variables without accounting for their possible correlation. > > > ### multivariate analysis using lm() ### > ######################################## > > > ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4), y1 = c(1, 4, 3, 8, 9), y2 = > c(-1, -1, 2, 3, 2)) > > > > f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8) > > summary(f.mlm) > Response y1 : > > Call: > lm(formula = y1 ~ z1, data = ex7.8) > > Residuals: >  1  2  3  4  5 >  0  1 -2  1  0 > > Coefficients: >             Estimate Std. Error t value Pr(>|t|) > (Intercept)   1.0000     1.0954   0.913   0.4286 > z1            2.0000     0.4472   4.472   0.0208 * > --- > Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 > > Residual standard error: 1.414 on 3 degrees of freedom > Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261 > F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084 > > > Response y2 : > > Call: > lm(formula = y2 ~ z1, data = ex7.8) > > Residuals: >          1          2          3          4          5 > -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00 > > Coefficients: >             Estimate Std. Error t value Pr(>|t|) > (Intercept)  -1.0000     0.8944  -1.118   0.3450 > z1            1.0000     0.3651   2.739   0.0714 . > --- > Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 > > Residual standard error: 1.155 on 3 degrees of freedom > Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619 > F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142 > > > ### separate linear regressions on y1 and y2 ### > ################################################ > > > f.mlm1 = lm(y1 ~ z1, data=ex7.8); summary(f.mlm1) > > Call: > lm(formula = y1 ~ z1, data = ex7.8) > > Residuals: >  1  2  3  4  5 >  0  1 -2  1  0 > > Coefficients: >             Estimate Std. Error t value Pr(>|t|) > (Intercept)   1.0000     1.0954   0.913   0.4286 > z1            2.0000     0.4472   4.472   0.0208 * > --- > Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 > > Residual standard error: 1.414 on 3 degrees of freedom > Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261 > F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084 > > > f.mlm2 = lm(y2 ~ z1, data=ex7.8); summary(f.mlm2) > > Call: > lm(formula = y2 ~ z1, data = ex7.8) > > Residuals: >          1          2          3          4          5 > -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00 > > Coefficients: >             Estimate Std. Error t value Pr(>|t|) > (Intercept)  -1.0000     0.8944  -1.118   0.3450 > z1            1.0000     0.3651   2.739   0.0714 . > --- > Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 > > Residual standard error: 1.155 on 3 degrees of freedom > Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619 > F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142 > > [[alternative HTML version deleted]] > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.