"by" question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

"by" question

David Hugh-Jones-3
Hello all

I have a big data frame and I regularly want to break it down into subsets,
calculate some new data, and add it back to the data frame.

At the moment my technique seems a bit ugly and embarrassing. Something
like:

result <- by(mydata, mydata$some_factor, function (x) {
  # do something to create a vector v with length(v) == nrow(x)
 return(v)
})
# now result has a big list, argh... how do I put it neatly back into the
mydata data frame?
for (i in unique(mydata$some_factor) {
mydata$newvar[mydata$somefactor ==i] <- result[[i]]
}

What should I be doing instead of this?

David Hugh-Jones
Post-doctoral Researcher
Max Planck Institute of Economics, Jena
http://davidhughjones.googlepages.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "by" question

jholtman
How about something like this:

> x
   id      data
1   1 0.7773207
2   3 0.9606180
3   2 0.4346595
4   3 0.7125147
5   2 0.3999944
6   2 0.3253522
7   2 0.7570871
8   3 0.2026923
9   3 0.7111212
10  2 0.1216919
> # compute running sum for each ID
> x$run <- ave(x$data, x$id, FUN=cumsum)
> x
   id      data       run
1   1 0.7773207 0.7773207
2   3 0.9606180 0.9606180
3   2 0.4346595 0.4346595
4   3 0.7125147 1.6731327
5   2 0.3999944 0.8346539
6   2 0.3253522 1.1600060
7   2 0.7570871 1.9170932
8   3 0.2026923 1.8758249
9   3 0.7111212 2.5869462
10  2 0.1216919 2.0387851
>


On Wed, Jun 24, 2009 at 12:08 PM, David Hugh-Jones <[hidden email]
> wrote:

> Hello all
>
> I have a big data frame and I regularly want to break it down into subsets,
> calculate some new data, and add it back to the data frame.
>
> At the moment my technique seems a bit ugly and embarrassing. Something
> like:
>
> result <- by(mydata, mydata$some_factor, function (x) {
>  # do something to create a vector v with length(v) == nrow(x)
>  return(v)
> })
> # now result has a big list, argh... how do I put it neatly back into the
> mydata data frame?
> for (i in unique(mydata$some_factor) {
> mydata$newvar[mydata$somefactor ==i] <- result[[i]]
> }
>
> What should I be doing instead of this?
>
> David Hugh-Jones
> Post-doctoral Researcher
> Max Planck Institute of Economics, Jena
> http://davidhughjones.googlepages.com
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "by" question

David Hugh-Jones-3
That seems to work. I should add that to make "ave" work like "by" one can
do:

mydata$newvar <- ave(1:nrow(mydata), mydata$some_factor, FUN= function (x) {
  x <- ds[x,]
# ... etc...
})

Thanks!
David

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "by" question

hadley wickham
You might also want to look at the plyr package,
http://had.co.nz/plyr.  In particular, ddply + transform makes these
tasks very easy.

library(plyr)
ddply(mtcars, "cyl", transform, pos = seq_along(cyl), mpg_avg = mean(mpg))

Hadley

On Wed, Jun 24, 2009 at 11:48 AM, David
Hugh-Jones<[hidden email]> wrote:

> That seems to work. I should add that to make "ave" work like "by" one can
> do:
>
> mydata$newvar <- ave(1:nrow(mydata), mydata$some_factor, FUN= function (x) {
>  x <- ds[x,]
> # ... etc...
> })
>
> Thanks!
> David
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.