multivariate regression and lm()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

multivariate regression and lm()

treemake
Hello,

I would like to perform a multivariate regression analysis to model the
relationship between m responses Y1, ... Ym and a single set of predictor
variables X1, ..., Xr.  Each response is assumed to follow its own
regression model, and the error terms in each model can be correlated.

Based on my readings of the R help archives and R documentation, the
function lm() should be able to perform this analysis.  However my tests of
lm() show that it really just fits m separate regression models (without
accounting for any possible correlation between the response variables).

I have attached an example below that demonstrates that the multivariate
analysis result produced by lm() is identical to running separate
regression analyses on each of the Y1, ... Ym response variables, which is
not the desired objective.

Can anyone confirm if lm() is indeed capable of performing multivariate
regression analysis, and if so how?

Thanks very much,

Ernest

PS – my post is based on an earlier 2005 post where the same question was
asked ... the conclusion at that time was that the function lm() is capable
of handling the multivariate analysis.  However my tests (shown below)
indicate that lm() actually seems to perform separate regressions for each
of the response variables without accounting for their possible correlation.


### multivariate analysis using lm() ###
########################################

> ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4), y1 = c(1, 4, 3, 8, 9), y2 =
c(-1, -1, 2, 3, 2))
>
> f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8)
> summary(f.mlm)
Response y1 :

Call:
lm(formula = y1 ~ z1, data = ex7.8)

Residuals:
 1  2  3  4  5
 0  1 -2  1  0

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.0000     1.0954   0.913   0.4286
z1            2.0000     0.4472   4.472   0.0208 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.414 on 3 degrees of freedom
Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261
F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084


Response y2 :

Call:
lm(formula = y2 ~ z1, data = ex7.8)

Residuals:
         1          2          3          4          5
-1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.0000     0.8944  -1.118   0.3450
z1            1.0000     0.3651   2.739   0.0714 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.155 on 3 degrees of freedom
Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619
F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142


### separate linear regressions on y1 and y2 ###
################################################

> f.mlm1 = lm(y1 ~ z1, data=ex7.8); summary(f.mlm1)

Call:
lm(formula = y1 ~ z1, data = ex7.8)

Residuals:
 1  2  3  4  5
 0  1 -2  1  0

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.0000     1.0954   0.913   0.4286
z1            2.0000     0.4472   4.472   0.0208 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.414 on 3 degrees of freedom
Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261
F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084

> f.mlm2 = lm(y2 ~ z1, data=ex7.8); summary(f.mlm2)

Call:
lm(formula = y2 ~ z1, data = ex7.8)

Residuals:
         1          2          3          4          5
-1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.0000     0.8944  -1.118   0.3450
z1            1.0000     0.3651   2.739   0.0714 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.155 on 3 degrees of freedom
Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619
F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: multivariate regression and lm()

John Fox
Dear Ernest,

The ML estimator for the mulitvariate linear model assuming multinormal errors is the same as equation-by-equation LS, but multivariate tests performed for all of the responses on the resulting multivariate linear model object (e.g., by anova() or Anova() in the car package) will take the correlations of the responses into account.

I hope that this helps,
 John

------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Fri, 16 Mar 2012 07:32:02 -0400
 Ernest Lo <[hidden email]> wrote:

> Hello,
>
> I would like to perform a multivariate regression analysis to model the
> relationship between m responses Y1, ... Ym and a single set of predictor
> variables X1, ..., Xr.  Each response is assumed to follow its own
> regression model, and the error terms in each model can be correlated.
>
> Based on my readings of the R help archives and R documentation, the
> function lm() should be able to perform this analysis.  However my tests of
> lm() show that it really just fits m separate regression models (without
> accounting for any possible correlation between the response variables).
>
> I have attached an example below that demonstrates that the multivariate
> analysis result produced by lm() is identical to running separate
> regression analyses on each of the Y1, ... Ym response variables, which is
> not the desired objective.
>
> Can anyone confirm if lm() is indeed capable of performing multivariate
> regression analysis, and if so how?
>
> Thanks very much,
>
> Ernest
>
> PS – my post is based on an earlier 2005 post where the same question was
> asked ... the conclusion at that time was that the function lm() is capable
> of handling the multivariate analysis.  However my tests (shown below)
> indicate that lm() actually seems to perform separate regressions for each
> of the response variables without accounting for their possible correlation.
>
>
> ### multivariate analysis using lm() ###
> ########################################
>
> > ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4), y1 = c(1, 4, 3, 8, 9), y2 =
> c(-1, -1, 2, 3, 2))
> >
> > f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8)
> > summary(f.mlm)
> Response y1 :
>
> Call:
> lm(formula = y1 ~ z1, data = ex7.8)
>
> Residuals:
>  1  2  3  4  5
>  0  1 -2  1  0
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.0000     1.0954   0.913   0.4286
> z1            2.0000     0.4472   4.472   0.0208 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.414 on 3 degrees of freedom
> Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261
> F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084
>
>
> Response y2 :
>
> Call:
> lm(formula = y2 ~ z1, data = ex7.8)
>
> Residuals:
>          1          2          3          4          5
> -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -1.0000     0.8944  -1.118   0.3450
> z1            1.0000     0.3651   2.739   0.0714 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.155 on 3 degrees of freedom
> Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619
> F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142
>
>
> ### separate linear regressions on y1 and y2 ###
> ################################################
>
> > f.mlm1 = lm(y1 ~ z1, data=ex7.8); summary(f.mlm1)
>
> Call:
> lm(formula = y1 ~ z1, data = ex7.8)
>
> Residuals:
>  1  2  3  4  5
>  0  1 -2  1  0
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.0000     1.0954   0.913   0.4286
> z1            2.0000     0.4472   4.472   0.0208 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.414 on 3 degrees of freedom
> Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261
> F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084
>
> > f.mlm2 = lm(y2 ~ z1, data=ex7.8); summary(f.mlm2)
>
> Call:
> lm(formula = y2 ~ z1, data = ex7.8)
>
> Residuals:
>          1          2          3          4          5
> -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -1.0000     0.8944  -1.118   0.3450
> z1            1.0000     0.3651   2.739   0.0714 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.155 on 3 degrees of freedom
> Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619
> F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142
>
> [[alternative HTML version deleted]]
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.