Hi,
I am trying to recursively apply a function to a selection of columns in a dataframe. I've had a look around and from what I have read, I should be using some version of the apply function, but I'm really having some headaches with it. Let me be more specific with an example. Say I have a data frame similar to the following A x y z r1 r2 r3 r4 0.1 0.2 0.1 ... 0.1 0.3 ... 0.2 ... i.e., a number of columns, each of the same length, and all containing real numbers. Of these columns, I want to model one variable, say A, as a function of other variables, say x, y, z, and any one of my r1, r2, r3, ... variables. i.e., I want to model A ~ x + y + z + r1 A ~ x + y + z + r2 .... A ~ x + y + z + rn But where the number of 'r' variables I will have will be large, and I don't know the specific number of these variables in advance. My question first is, how can I select all the columns in a dataframe that have a heading that matches a string pattern? And then related to this, what would be the best way of repeatedly applying my modelling function to the result? Many thanks for any help for this occasional R armature. Claus ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi Claus,
On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <[hidden email]> wrote: > Hi, > I am trying to recursively apply a function to a selection of columns > in a dataframe. I've had a look around and from what I have read, I > should be using some version of the apply function, but I'm really > having some headaches with it. I would just do it in a loop (see below) > > Let me be more specific with an example. > > Say I have a data frame similar to the following > > A x y z r1 r2 r3 r4 > 0.1 0.2 0.1 ... > 0.1 0.3 ... > 0.2 ... > > i.e., a number of columns, each of the same length, and all containing > real numbers. Of these columns, I want to model one variable, say A, > as a function of other variables, say x, y, z, and any one of my r1, > r2, r3, ... variables. > > i.e., I want to model > A ~ x + y + z + r1 > A ~ x + y + z + r2 > .... > A ~ x + y + z + rn > > But where the number of 'r' variables I will have will be large, and I > don't know the specific number of these variables in advance. > > My question first is, how can I select all the columns in a dataframe > that have a heading that matches a string pattern? ?grep > > And then related to this, what would be the best way of repeatedly > applying my modelling function to the result? Well, I don't know about the "best" way. But why not just set.seed(21 ) dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" ))))) mods <- list() for(i in grep("r", names(dat ), value=TRUE)) { mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat ) } Note that you should be cautious about making any inferences based on this kind of method. In the example above 9 r variables are "significant" at the .05 level, even though the data was generated "randomly": sort(sapply(mods, function(x) coef(summary(x))[5, 4])) Best, Ista > > Many thanks for any help for this occasional R armature. > > Claus > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Brilliant - that was really useful!
On Tue, Mar 15, 2011 at 3:46 PM, Ista Zahn <[hidden email]> wrote: > Hi Claus, > > On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <[hidden email]> wrote: >> Hi, >> I am trying to recursively apply a function to a selection of columns >> in a dataframe. I've had a look around and from what I have read, I >> should be using some version of the apply function, but I'm really >> having some headaches with it. > > I would just do it in a loop (see below) >> >> Let me be more specific with an example. >> >> Say I have a data frame similar to the following >> >> A x y z r1 r2 r3 r4 >> 0.1 0.2 0.1 ... >> 0.1 0.3 ... >> 0.2 ... >> >> i.e., a number of columns, each of the same length, and all containing >> real numbers. Of these columns, I want to model one variable, say A, >> as a function of other variables, say x, y, z, and any one of my r1, >> r2, r3, ... variables. >> >> i.e., I want to model >> A ~ x + y + z + r1 >> A ~ x + y + z + r2 >> .... >> A ~ x + y + z + rn >> >> But where the number of 'r' variables I will have will be large, and I >> don't know the specific number of these variables in advance. >> >> My question first is, how can I select all the columns in a dataframe >> that have a heading that matches a string pattern? > > ?grep > >> >> And then related to this, what would be the best way of repeatedly >> applying my modelling function to the result? > > Well, I don't know about the "best" way. But why not just > > set.seed(21 ) > dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list > (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" ))))) > > mods <- list() > for(i in grep("r", names(dat ), value=TRUE)) { > mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat ) > } > > Note that you should be cautious about making any inferences based on > this kind of method. In the example above 9 r variables are > "significant" at the .05 level, even though the data was generated > "randomly": > > sort(sapply(mods, function(x) coef(summary(x))[5, 4])) > > Best, > Ista >> >> Many thanks for any help for this occasional R armature. >> >> Claus >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |