automating regression or correlations for many variables

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

automating regression or correlations for many variables

geral
Dear All,

I have a large data frame with 10 rows and 82 columns. I want to apply the same function to all of the columns with a single command. e.g. zl <- lm (snp$a_109909 ~ snp$lat) will fit a linear model to the values in lat and  a_109909. What I want to do is fit linear models for the values in each column against lat. I tried doing zl <- (snp[,2:82] ~ snp$lat[,1]) but got the following error message "Error in model.frame.default(formula = snp[, 2:82] ~ snp[, 1], drop.unused.levels = TRUE) :
  invalid type (list) for variable 'snp[, 2:82]'". Does this mean I cannot use a data frame to do this? Should I have my data in a matrix instead?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: automating regression or correlations for many variables

djmuseR
Hi:

Here's a small example:
> df <- data.frame(y1 = rnorm(10), y2 = rnorm(10), y3 = rnorm(10), lat =
rnorm(10))
> m <- lm(cbind(y1, y2, y3) ~ lat, data = df)
> summary(m)
<snip...provides summaries for each response>

The LHS of the model formula needs to be a matrix. In your case, something
like

m <- lm(as.matrix(snp[, -1])  ~ lat, data = snp)

ought to work.

HTH,
Dennis


On Mon, Apr 4, 2011 at 10:13 AM, geral <[hidden email]> wrote:

> Dear All,
>
> I have a large data frame with 10 rows and 82 columns. I want to apply the
> same function to all of the columns with a single command. e.g. zl <- lm
> (snp$a_109909 ~ snp$lat) will fit a linear model to the values in lat and
> a_109909. What I want to do is fit linear models for the values in each
> column against lat. I tried doing zl <- (snp[,2:82] ~ snp$lat[,1]) but got
> the following error message "Error in model.frame.default(formula = snp[,
> 2:82] ~ snp[, 1], drop.unused.levels = TRUE) :
>  invalid type (list) for variable 'snp[, 2:82]'". Does this mean I cannot
> use a data frame to do this? Should I have my data in a matrix instead?
>
> Thanks
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/automating-regression-or-correlations-for-many-variables-tp3426091p3426091.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: automating regression or correlations for many variables

geral
Thanks!

I must confess I am just a beginner, but I followed your suggestion and did 'm <- lm(as.matrix(snp[, -1])  ~ lat, data = snp) ' and it worked perfectly. I would like to understand what is being done here. as.matrix I understand makes my data frame be a matrix, but I don't understand the part snp[,-1]. I think I am saying use all rows of... but I don't understand the -1, does that mean, all columns but column 1?

I really appreciate the help! It was really really valuable.

A related question is, what is m? If I do is.object(m) I get true. so I guess it is an object. I now need to use each of the fitted models for more things, such as calculate anova, etc... How do I do that? When I do this column by column I just do anova(m), but it does not seem to work now...

Thanks!
AG
Reply | Threaded
Open this post in threaded view
|

Re: automating regression or correlations for many variables

djmuseR
Hi:

On Mon, Apr 4, 2011 at 1:33 PM, geral <[hidden email]> wrote:

> Thanks!
>
> I must confess I am just a beginner, but I followed your suggestion and did
> 'm <- lm(as.matrix(snp[, -1])  ~ lat, data = snp) ' and it worked
> perfectly.
> I would like to understand what is being done here. as.matrix I understand
> makes my data frame be a matrix, but I don't understand the part snp[,-1].
> I
> think I am saying use all rows of... but I don't understand the -1, does
> that mean, all columns but column 1?
>

Yes.

>
> I really appreciate the help! It was really really valuable.
>
> A related question is, what is m? If I do is.object(m) I get true. so I
> guess it is an object. I now need to use each of the fitted models for more
> things, such as calculate anova, etc... How do I do that? When I do this
> column by column I just do anova(m), but it does not seem to work now...
>

m is the object that gets returned from the model fits. It doesn't just
perform individual linear regressions, it also does multivariate regression.
anova(m) returns a summary of the multivariate tests. summary(m) will print
out the results of the individual fits. I'm fairly certain you want to pick
off individual pieces of output from each model, though, so here's one
approach using the plyr package again.

You'll have a bunch of model fits assigned to the model object m, so the
first step is to assign the summary of each individual model to a list
component. Afterward, it's fairly easy to cherry pick individual pieces.

library(plyr)
h <- llply(summary(m))        # decompose the summaries into a list, one
component for each model

# Significance tests of coefficients in each model
# Each of these components is a matrix, so an easy first step is to use
llply() [list -> list]

llply(h, function(x) coefficients(x))
$`Response y1`
             Estimate Std. Error   t value  Pr(>|t|)
(Intercept) 0.2213683   0.392951 0.5633484 0.5886349
lat         0.3510525   0.511839 0.6858652 0.5121807

$`Response y2`
              Estimate Std. Error    t value  Pr(>|t|)
(Intercept)  0.2862828  0.3803295  0.7527231 0.4731816
lat         -0.8794926  0.4953989 -1.7753222 0.1137593

$`Response y3`
              Estimate Std. Error   t value   Pr(>|t|)
(Intercept) -0.5229037  0.2773981 -1.885030 0.09615855
lat         -0.5023407  0.3613253 -1.390272 0.20190305

# This can be reshaped using the melt() and possibly cast() functions in the
reshape[2] package if desired.

# Everybody loves R^2, so let's create a model-by-model summary using
ldply() [list -> data frame]:
# Note that r.squared is a scalar in each model, so we can ship the result
to a data frame

ldply(h, function(x) x$r.squared)
          .id        V1
1 Response y1 0.0555358
2 Response y2 0.2826250
3 Response y3 0.1945923

Notice that in each case, we are using an 'anonymous function' to take a
model summary object as input (x) and then extracting some piece from it.
llply() and ldply() are convenience functions of sorts for lapply() and
do.call().

The pieces that can be extracted from the output of summary.lm() are
names(h[[1]])    # List components of the summary of an individual model
 [1] "call"          "terms"         "residuals"     "coefficients"
 [5] "aliased"       "sigma"         "df"            "r.squared"
 [9] "adj.r.squared" "fstatistic"    "cov.unscaled"

HTH,
Dennis

Thanks!

> AG
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/automating-regression-or-correlations-for-many-variables-tp3426091p3426519.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: automating regression or correlations for many variables

geral
Thanks!

You are awesome! I am not sure I follow everything, but I am trying!

AG