how to run linear regression models at once

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to run linear regression models at once

wangwallace
hey, folks,

I have two very simple questions. I am not quite sure if I can do this using R.

so, I am analyzing a large data frame with thousands of variables. For example:

Dependent variables: d1, d2, d3 (i.e., there are three dependent variables)

Independent variables: s1, s2, s3, ......s1000 (i.e., there are 1000 independent variables)

now I want to run simple linear regression analyses of dependent variables on independent variables. A laborious way to do this is running 1000 linear regression models for each dependent variable separately. This would take a lot of time. My question is:

1) is there a easy way to run all 1000 linear regression analyses for each dependent variable at once?  
2) after running all 1000 linear regression analyses at once, how can I write 3000 regression weights (1000 regression weights for each dependent variable)  and its significance level in to a file (e.g., csv, excel, ect).

Many thanks in advance!!
Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

djmuseR
Hi:

On Fri, Jan 7, 2011 at 7:04 AM, wangwallace <[hidden email]> wrote:

>
> hey, folks,
>
> I have two very simple questions. I am not quite sure if I can do this
> using
> R.
>
> so, I am analyzing a large data frame with thousands of variables. For
> example:
>
> Dependent variables: d1, d2, d3 (i.e., there are three dependent variables)
>
> Independent variables: s1, s2, s3, ......s1000 (i.e., there are 1000
> independent variables)
>
> now I want to run simple linear regression analyses of dependent variables
> on independent variables. A laborious way to do this is running 1000 linear
> regression models for each dependent variable separately. This would take a
> lot of time. My question is:
>
> 1) is there a easy way to run all 1000 linear regression analyses for each
> dependent variable at once?
>

Yes.

> 2) after running all 1000 linear regression analyses at once, how can I
> write 3000 regression weights (1000 regression weights for each dependent
> variable)  and its significance level in to a file (e.g., csv, excel, ect).
>

Define 'weights'. Do you mean the parameter estimates?

Here's a simplified example, but the output is a separate list object for
each response. Each component of each list is an lm object, produced from
the lm() function.

 # Generate some fake data: three responses and eight covariates
 df <- data.frame(y1 = rnorm(50), y2 = rnorm(50), y3 = rnorm(50),
                  x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50),
                  x4 = rpois(50, 30), x5 = rpois(50, 20), x6 = rpois(50,
10),
                  x7 = runif(50), x8 = runif(50))

# Create a vector of covariate names
xs <- paste('x', 1:8, sep = '')
# Initialize a list whose length is that of the vector xs
rl1 <- vector('list', 8)
rl2 <- vector('list', 8)
rl3 <- vector('list', 8)

# The following loop regresses all three responses individually on a single
covariate x[i]
# and exports the results to separate lists for each response
# The first statement creates a formula with the name of the i-th covariate
# The second statement does the regression and assigns the output to the
i-th
# component of the list corresponding to the j-th response (j = 1, 2, 3)
for(i in 1:8) { fm1<- as.formula(paste('y1', xs[i], sep = '~'))
                     fm2 <- as.formula(paste('y2', xs[i], sep = '~'))
                     fm3 <- as.formula(paste('y3', xs[i], sep = '~'))
                     rl1[[i]] <- lm(fm1, data = df)
                     rl2[[i]] <- lm(fm2, data = df)
                     rl3[[i]] <- lm(fm3, data = df)
                   }

# The print method of lm() applied to the first component is
rl1[[1]]

Each component of each list will resemble the output object you would get
from running lm() on a single response with one explanatory variable at a
time. In each list, there are as many components as there are covariates.

You can extract all sorts of things from each component of the list; I
prefer using
the ldply() function from package plyr, but there are other ways with base
functions.
Here are some examples:

library(plyr)
# R^2 values:
ldply(rl1, function(x) summary(x)$r.squared)

# Model coefficients:
ldply(rl1, function(x) coef(x))

# p-values of significance tests for intercept and slope
ldply(rl1, function(x) summary(x)$coefficients[, 4])

# residuals from each model
res1 <- t(ldply(rl1, function(x) resid(x)))   # produces a matrix

Some comments:

1. If you want to run multivariate regression on the response matrix [y1 y2
y3],
   you only need one list object for output. Substitute 'cbind(y1, y2, y3)'
for
   yi when creating the formula. Only one call is needed for multivariate
   regression rather than three for the univariate regressions. However, you
   would then need to be more careful about output structures and pulling
them
   together over list components. For instance, in a multivariate
regression, the
   coefficients are output as a matrix rather than a vector, so to combine
them
   all, you would need to use a 3D array rather than a data frame.
2. The 'x' in the anonymous functions in ldply() corresponds to a generic
list
   component. In this context, x is an lm object, so anything you could do
with
   the output of an lm object could be mapped componentwise with ldply. This
   approach is very convenient if you want to pick off individual output
pieces
   (e.g., R^2 or the root MSE) for all the regressions at once.
3. To associate the covariates with rows of output in each generated data
    frame above, you could use, e.g.,
  r2y1 <- ldply(rl1, function(x) summary(x)$r.squared)
  rownames(r2y1) <- xs

   In the residual example, the data frame is 8 x 50; using the t() function
   [for transpose], the result is a 50 x 8 matrix whose columns correspond
to
   the individual covariates, so in this case
  colnames(res1) <- xs
4. Make sure you organize your code and desired output in advance; with 1000
    covariates, things could get messy if you're not careful.

HTH,
Dennis

> Many thanks in advance!!
> --
> View this message in context:
> http://r.789695.n4.nabble.com/how-to-run-linear-regression-models-at-once-tp3179256p3179256.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

wangwallace
cool, it worked. thank you very much for your help!! :)
Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

wangwallace
In reply to this post by djmuseR
hey, Dennis,

I applied your syntax into my data frame with different variable names. Specifically, I replaced y1 and y2, with ocbi and ocbo. I got the following error message. Could you please explain why would this happen? Again, thanks!
 
> for(i in 1:5){fm1<-as.formula(paste('ocbi',ms[i],sep='~'))
+ fm2<-as.formula(paste('ocbo',ms[i],sep'~'))

Error: unexpected string constant in:
"for(i in 1:5){fm1<-as.formula(paste('ocbi',ms[i],sep='~'))
fm2<-as.formula(paste('ocbo',ms[i],sep'~'"
Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

David Winsemius

On Jan 10, 2011, at 1:29 PM, wangwallace wrote:

>
> hey, Dennis,
>
> I applied your syntax into my data frame with different variable  
> names.
> Specifically, I replaced y1 and y2, with ocbi and ocbo. I got the  
> following
> error message. Could you please explain why would this happen? Again,
> thanks!
>
>>
for(i in 1:5){fm1<-as.formula(paste('ocbi',ms[i],sep='~'))  fm2<-
as.formula(paste('ocbo',ms[i],sep'~'))
>
> Error: unexpected string constant in:
> "for(i in 1:5){fm1<-as.formula(paste('ocbi',ms[i],sep='~'))

When you get a message like that it is going to be a missing  
<something>, usually right near the end of the presented string in the  
error message,
  ,,, and in your case it is a missing "=" sign,

> fm2<-as.formula(paste('ocbo',ms[i],sep'~'"
> --
> View this message in context: http://r.789695.n4.nabble.com/how-to-run-linear-regression-models-at-once-tp3179256p3207726.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

wangwallace
I am too negligent... thank you very much for pointing out that!
Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

myjournalclub
This post has NOT been accepted by the mailing list yet.
In reply to this post by djmuseR
Dear dennis,

I have tried your method, however it comes with the following error message,
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels
Do you have any advice? Thanks in advance and apologies for any ignorance as I am new.

Reply | Threaded
Open this post in threaded view
|

Re: how to run linear regression models at once

myjournalclub
This post has NOT been accepted by the mailing list yet.
Dear dennis,

I have tried your method, however it comes with the following error message,
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels
Do you have any advice? Thanks in advance and apologies for any ignorance as I am new.