|
All, I’m curious from a speed perspective what the analog of apply is in data.table as I have a problem where, for each row, I want to take either the min or the max of several columns depending upon the value of a third column: For example: test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10), SCORE_3=rnorm(10), MAX_OR_MIN=c(rep("Max", 5), rep("Min", 5))) For each row I’d like to get the max of SCORE_1, SCORE_2, and SCORE_3 if the MAX_OR_MIN value is MAX and the min of SCORE_1, SCORE_2, and SCORE_3 if the MAX_OR_MIN value is MIN. It isn’t too difficult to come up with a “bulky” and slow solution, but I’m wondering if I’m missing a way in which data.table would make such an effort elegant and quick. Any help greatly appreciated. Damian Betebenner Center for Assessment PO Box 351 Dover, NH 03821-0351 Phone (office): (603) 516-7900 Phone (cell): (857) 234-2474 Fax: (603) 516-7910 _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
|
Hi,
How about this : fns=c(max,min) test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10), SCORE_3=rnorm(10), fn=c(rep(1, 5), rep(2, 5))) test.dt[,fns[[fn]](SCORE_1,SCORE_2,SCORE_3),by=ID] # bug #1301 raised test.dt[,{fn;fns[[fn]](SCORE_1,SCORE_2,SCORE_3)},by=ID] # workaround ID V1 [1,] 1 -1.6788065 [2,] 2 -1.4021021 [3,] 3 -1.0469943 [4,] 4 -1.2663419 [5,] 5 -0.2765518 [6,] 6 0.3511581 [7,] 7 1.1809315 [8,] 8 0.3570631 [9,] 9 0.9680948 [10,] 10 1.3025652 The bug is because the variable 'fn' isn't being detected as used by j (incorrectly) so it isn't being subset. Maybe because it appears inside the [[]]. Using fn explicity in the workaround gets around that. Raised bug #1301 to fix that. Also, data.table could be enhanced to allow a column to contain a list of functions directly, rather than a lookup. Should be ok provided it was pointers to functions rather than the functions themselves repeated over and over. Might be quite useful. FR#1302 raised to do that. You can probably create data.frame and data.table with a list column containing functions already, but whether operations on those columns work I doubt. Might not be very difficult to do though. Thanks for helping to discover a new bug and new fr ! Matthew On Sat, 2011-02-26 at 16:55 -0600, Damian Betebenner wrote: > All, > > I’m curious from a speed perspective what the analog of apply is in > data.table as I have a problem where, for each row, I want to take > either the min or the max of several columns depending upon the value > of a third column: > > For example: > > test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10), > SCORE_3=rnorm(10), MAX_OR_MIN=c(rep("Max", 5), rep("Min", 5))) > > For each row I’d like to get the max of SCORE_1, SCORE_2, and SCORE_3 > if the MAX_OR_MIN value is MAX and the min of SCORE_1, SCORE_2, and > SCORE_3 if the MAX_OR_MIN value is MIN. > > It isn’t too difficult to come up with a “bulky” and slow solution, > but I’m wondering if I’m missing a way in which data.table would make > such an effort elegant and quick. > > Any help greatly appreciated. > > Damian Betebenner > > Center for Assessment > > PO Box 351 > > Dover, NH 03821-0351 > > > > Phone (office): (603) 516-7900 > > Phone (cell): (857) 234-2474 > > Fax: (603) 516-7910 > > > > [hidden email] > > www.nciea.org > > > > > > > > > _______________________________________________ > datatable-help mailing list > [hidden email] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
|
Damian,
Bug #1301 fixed. Workaround below no longer needed. Matthew On Sun, 2011-02-27 at 17:14 +0000, Matthew Dowle wrote: > Hi, > > How about this : > > fns=c(max,min) > test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10), > SCORE_3=rnorm(10), fn=c(rep(1, 5), rep(2, 5))) > > test.dt[,fns[[fn]](SCORE_1,SCORE_2,SCORE_3),by=ID] # bug #1301 raised > > test.dt[,{fn;fns[[fn]](SCORE_1,SCORE_2,SCORE_3)},by=ID] # workaround > ID V1 > [1,] 1 -1.6788065 > [2,] 2 -1.4021021 > [3,] 3 -1.0469943 > [4,] 4 -1.2663419 > [5,] 5 -0.2765518 > [6,] 6 0.3511581 > [7,] 7 1.1809315 > [8,] 8 0.3570631 > [9,] 9 0.9680948 > [10,] 10 1.3025652 > > The bug is because the variable 'fn' isn't being detected as used by j > (incorrectly) so it isn't being subset. Maybe because it appears inside > the [[]]. Using fn explicity in the workaround gets around that. Raised > bug #1301 to fix that. > > Also, data.table could be enhanced to allow a column to contain a list > of functions directly, rather than a lookup. Should be ok provided it > was pointers to functions rather than the functions themselves repeated > over and over. Might be quite useful. FR#1302 raised to do that. You can > probably create data.frame and data.table with a list column containing > functions already, but whether operations on those columns work I doubt. > Might not be very difficult to do though. > > Thanks for helping to discover a new bug and new fr ! > > Matthew > > > On Sat, 2011-02-26 at 16:55 -0600, Damian Betebenner wrote: > > All, > > > > I’m curious from a speed perspective what the analog of apply is in > > data.table as I have a problem where, for each row, I want to take > > either the min or the max of several columns depending upon the value > > of a third column: > > > > For example: > > > > test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10), > > SCORE_3=rnorm(10), MAX_OR_MIN=c(rep("Max", 5), rep("Min", 5))) > > > > For each row I’d like to get the max of SCORE_1, SCORE_2, and SCORE_3 > > if the MAX_OR_MIN value is MAX and the min of SCORE_1, SCORE_2, and > > SCORE_3 if the MAX_OR_MIN value is MIN. > > > > It isn’t too difficult to come up with a “bulky” and slow solution, > > but I’m wondering if I’m missing a way in which data.table would make > > such an effort elegant and quick. > > > > Any help greatly appreciated. > > > > Damian Betebenner > > > > Center for Assessment > > > > PO Box 351 > > > > Dover, NH 03821-0351 > > > > > > > > Phone (office): (603) 516-7900 > > > > Phone (cell): (857) 234-2474 > > > > Fax: (603) 516-7910 > > > > > > > > [hidden email] > > > > www.nciea.org > > > > > > > > > > > > > > > > > > _______________________________________________ > > datatable-help mailing list > > [hidden email] > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > _______________________________________________ > datatable-help mailing list > [hidden email] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
| Powered by Nabble | Edit this page |
