Newbie-ish question on iteratively applying function to dataframe

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Newbie-ish question on iteratively applying function to dataframe

Claus O'Rourke
Hi,
I am trying to recursively apply a function to a selection of columns
in a dataframe. I've had a look around and from what I have read, I
should be using some version of the apply function, but I'm really
having some headaches with it.

Let me be more specific with an example.

Say I have a data frame similar to the following

A     x     y     z     r1    r2    r3    r4
0.1  0.2  0.1 ...
0.1  0.3 ...
0.2 ...

i.e., a number of columns, each of the same length, and all containing
real numbers. Of these columns, I want to model one variable, say A,
as a function of other variables, say x, y, z, and any one of my r1,
r2, r3, ... variables.

i.e., I want to model
A ~ x + y + z + r1
A ~ x + y + z + r2
....
A ~ x + y + z + rn

But where the number of 'r' variables I will have will be large, and I
don't know the specific number of these variables in advance.

My question first is, how can I select all the columns in a dataframe
that have a heading that matches a string pattern?

And then related to this, what would be the best way of repeatedly
applying my modelling function to the result?

Many thanks for any help for this occasional R armature.

Claus

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Newbie-ish question on iteratively applying function to dataframe

Ista Zahn-2
Hi Claus,

On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <[hidden email]> wrote:
> Hi,
> I am trying to recursively apply a function to a selection of columns
> in a dataframe. I've had a look around and from what I have read, I
> should be using some version of the apply function, but I'm really
> having some headaches with it.

I would just do it in a loop (see below)

>
> Let me be more specific with an example.
>
> Say I have a data frame similar to the following
>
> A     x     y     z     r1    r2    r3    r4
> 0.1  0.2  0.1 ...
> 0.1  0.3 ...
> 0.2 ...
>
> i.e., a number of columns, each of the same length, and all containing
> real numbers. Of these columns, I want to model one variable, say A,
> as a function of other variables, say x, y, z, and any one of my r1,
> r2, r3, ... variables.
>
> i.e., I want to model
> A ~ x + y + z + r1
> A ~ x + y + z + r2
> ....
> A ~ x + y + z + rn
>
> But where the number of 'r' variables I will have will be large, and I
> don't know the specific number of these variables in advance.
>
> My question first is, how can I select all the columns in a dataframe
> that have a heading that matches a string pattern?

?grep

>
> And then related to this, what would be the best way of repeatedly
> applying my modelling function to the result?

Well, I don't know about the "best" way. But why not just

set.seed(21 )
dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list
(1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" )))))

mods <- list()
for(i in grep("r", names(dat ), value=TRUE)) {
    mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat )
}

Note that  you should be cautious about making any inferences based on
this kind of method. In the example above 9 r variables are
"significant" at the .05 level, even though the data was generated
"randomly":

sort(sapply(mods, function(x) coef(summary(x))[5, 4]))

Best,
Ista

>
> Many thanks for any help for this occasional R armature.
>
> Claus
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Newbie-ish question on iteratively applying function to dataframe

Claus O'Rourke
Brilliant - that was really useful!

On Tue, Mar 15, 2011 at 3:46 PM, Ista Zahn <[hidden email]> wrote:

> Hi Claus,
>
> On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <[hidden email]> wrote:
>> Hi,
>> I am trying to recursively apply a function to a selection of columns
>> in a dataframe. I've had a look around and from what I have read, I
>> should be using some version of the apply function, but I'm really
>> having some headaches with it.
>
> I would just do it in a loop (see below)
>>
>> Let me be more specific with an example.
>>
>> Say I have a data frame similar to the following
>>
>> A     x     y     z     r1    r2    r3    r4
>> 0.1  0.2  0.1 ...
>> 0.1  0.3 ...
>> 0.2 ...
>>
>> i.e., a number of columns, each of the same length, and all containing
>> real numbers. Of these columns, I want to model one variable, say A,
>> as a function of other variables, say x, y, z, and any one of my r1,
>> r2, r3, ... variables.
>>
>> i.e., I want to model
>> A ~ x + y + z + r1
>> A ~ x + y + z + r2
>> ....
>> A ~ x + y + z + rn
>>
>> But where the number of 'r' variables I will have will be large, and I
>> don't know the specific number of these variables in advance.
>>
>> My question first is, how can I select all the columns in a dataframe
>> that have a heading that matches a string pattern?
>
> ?grep
>
>>
>> And then related to this, what would be the best way of repeatedly
>> applying my modelling function to the result?
>
> Well, I don't know about the "best" way. But why not just
>
> set.seed(21 )
> dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list
> (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" )))))
>
> mods <- list()
> for(i in grep("r", names(dat ), value=TRUE)) {
>    mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat )
> }
>
> Note that  you should be cautious about making any inferences based on
> this kind of method. In the example above 9 r variables are
> "significant" at the .05 level, even though the data was generated
> "randomly":
>
> sort(sapply(mods, function(x) coef(summary(x))[5, 4]))
>
> Best,
> Ista
>>
>> Many thanks for any help for this occasional R armature.
>>
>> Claus
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.