Data Table Subset Question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Table Subset Question

Bernstein, Elliot J

Is there a way to subset a data table by the result of a grouped aggregation without adding an interim column to the table? For example, if I want to select all rows for which the group mean value of x is less than 10, I can do the following:

 

data <- data.table(x = 1:20, g = rep(c("a", "b"), each = 10))

data[, mean.x := mean(x), by = .(g)]

data[mean.x < 10,]

 

But I’m not really interested in “mean.x”. Can I do the same thing without adding it to the table?

 

Thanks.

 

- Elliot


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|

Re: Data Table Subset Question

Frank Erickson-2
One idiom for testing group-level conditions is:

data[, if (mean(x) < 10) .SD, by=g]

This might be slower in the special case of taking a mean. See ?GForce.

There's a request for an idiom like SQL HAVING over here: https://github.com/Rdatatable/data.table/issues/788

--Frank

On Wed, Aug 16, 2017 at 4:44 PM, Bernstein, Elliot J <[hidden email]> wrote:

Is there a way to subset a data table by the result of a grouped aggregation without adding an interim column to the table? For example, if I want to select all rows for which the group mean value of x is less than 10, I can do the following:

 

data <- data.table(x = 1:20, g = rep(c("a", "b"), each = 10))

data[, mean.x := mean(x), by = .(g)]

data[mean.x < 10,]

 

But I’m not really interested in “mean.x”. Can I do the same thing without adding it to the table?

 

Thanks.

 

- Elliot


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|

Re: Data Table Subset Question

Bernstein, Elliot J

Thanks!

 

- Elliot

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Erickson
Sent: Wednesday, August 16, 2017 6:24 PM
To: Bernstein, Elliot J
Cc: [hidden email]
Subject: Re: [datatable-help] Data Table Subset Question

 

One idiom for testing group-level conditions is:

 

data[, if (mean(x) < 10) .SD, by=g]

 

This might be slower in the special case of taking a mean. See ?GForce.

 

There's a request for an idiom like SQL HAVING over here: https://github.com/Rdatatable/data.table/issues/788

 

--Frank

 

On Wed, Aug 16, 2017 at 4:44 PM, Bernstein, Elliot J <[hidden email]> wrote:

Is there a way to subset a data table by the result of a grouped aggregation without adding an interim column to the table? For example, if I want to select all rows for which the group mean value of x is less than 10, I can do the following:

 

data <- data.table(x = 1:20, g = rep(c("a", "b"), each = 10))

data[, mean.x := mean(x), by = .(g)]

data[mean.x < 10,]

 

But I’m not really interested in “mean.x”. Can I do the same thing without adding it to the table?

 

Thanks.

 

- Elliot


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

 


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help