|
All, Trying to use := well but get errors and warnings and am looking for an elegant way to subset and use := together when multiple variables are being created and factors are involved. Here’s some code showing what I’m trying to do. Any help in doing this better greatly appreciated: ######################################################### ### ### Test of data.table and := ### ######################################################### require(data.table) ### Base data.table test.dt <- data.table(ID=rep(1:10, 2), CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10)) setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt ### Values to be looked up my.lookup <- data.table(ID=1:5, CONTENT_AREA=as.factor("MATH")) my.lookup ### Data table to be added to the original data.table my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1, levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")), Y=as.factor(letters[1:5]), Z=101:105) my.additional.table ### First attempt with error test.dt[my.lookup, names(my.new.table) := my.additional.table, with=FALSE, mult="first"] ### Create the variables in test.dt using := (but gives warnings and is cumbersome to have to specify the class of the variables that are going to be created) ### NOTE: for (i in c("VALID_CASE", "Y", "Z")) { test.dt[, i := NA_integer_, with=FALSE, mult="first"] class(test.dt[[i]]) <- class(my.additional.table[[i]]) if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <- levels(my.additional.table[[i]]) } ### Sucessfully perform the variable creation on the rows indicated by my.lookup test.dt[my.lookup, names(my.additional.table) := my.additional.table, with=FALSE, mult="first"] Damian Betebenner Center for Assessment PO Box 351 Dover, NH 03821-0351 Phone (office): (603) 516-7900 Phone (cell): (857) 234-2474 Fax: (603) 516-7910 _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
|
On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:
> All, > > Trying to use := well but get errors and warnings and am looking for > an elegant way to subset and use := together when multiple variables > are being created and factors are involved. > > Here’s some code showing what I’m trying to do. Any help in doing this > better greatly appreciated: > > require(data.table) > > ### Base data.table > > test.dt <- data.table(ID=rep(1:10, 2), > CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10)) > > setkeyv(test.dt, c("ID", "CONTENT_AREA")) > test.dt > > ### Values to be looked up > > my.lookup <- data.table(ID=1:5, CONTENT_AREA=as.factor("MATH")) > my.lookup > > ### Data table to be added to the original data.table > > my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1, > levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")), > Y=as.factor(letters[1:5]), Z=101:105) > my.additional.table > > ### First attempt with error > > test.dt[my.lookup, names(my.new.table) := my.additional.table, > with=FALSE, mult="first"] I get : Error in eval(expr, envir, enclos) : object 'my.new.table' not found but assuming that was typo, then with : test.dt[my.lookup, names(my.additional.table) := my.additional.table, with=FALSE, mult="first"] I get : Error in `[.data.table`(test.dt, my.lookup, `:=`(names(my.additional.table), : Attempt to add new column(s) and set subset of rows at the same time. Create the new column(s) first, and then you'll be able to assign to a subset. If i is set to 1:nrow(x) then please remove that (no need, it's faster without). That error was meant to say "for now", oops. Will try and implement that in 1.8.1 (automatic adding of new column, padding with NA where the sub assigning := doesn't touch). More comments below ... > > ### Create the variables in test.dt using := (but gives warnings and > is cumbersome to have to specify the class of the variables that are > going to be created) > > for (i in c("VALID_CASE", "Y", "Z")) { > test.dt[, i := NA_integer_, with=FALSE, mult="first"] > class(test.dt[[i]]) <- class(my.additional.table[[i]]) > if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <- > levels(my.additional.table[[i]]) > } Yes I get this warning (twice) too : Warning messages: 1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE, : Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R. Avoid key<-, names<- and attr<- which in R currently (and oddly) all copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr(). If this message doesn't help, please report to datatable-help so the root cause can be fixed. I guess that one or both the class<- and levels<- are copying the whole table. Consistent with the first iteration working without warning followed by warnings on the 2nd and 3rd. Just for now until it's automatic, and it might be useful for other tasks, empty factor columns can be created with factor(NA), and := is factor level aware so you can add new levels just by assigning a character value to an item (:= modifies the factor levels by reference for you). So : for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else NA_integer_, with=FALSE] # No warnings or, for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := my.additional.table[[i]][NA], with=FALSE] which copes with more types and also retains all levels. > > > ### Sucessfully perform the variable creation on the rows indicated by > my.lookup > > test.dt[my.lookup, names(my.additional.table) := my.additional.table, > with=FALSE, mult="first"] > > > > > Damian Betebenner > > Center for Assessment > > PO Box 351 > > Dover, NH 03821-0351 > > > > Phone (office): (603) 516-7900 > > Phone (cell): (857) 234-2474 > > Fax: (603) 516-7910 > > > > [hidden email] > > www.nciea.org > > _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
|
One last wrinkle to iron out:
Does assignment by reference work with a class that has a slot that is a data.table? I have defined a new class where one of the slots is a data.table. However, when I apply: for (i in c("VALID_CASE", "Y", "Z")) [hidden email][, i := my.additional.table[[i]][NA][1], with=FALSE] Nothing "sticks". That is, none of the variables I'm attempting to assign by reference using := are created. It does work when done outside of the class: for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := my.additional.table[[i]][NA][1], with=FALSE] Damian Betebenner Center for Assessment PO Box 351 Dover, NH 03821-0351 Phone (office): (603) 516-7900 Phone (cell): (857) 234-2474 Fax: (603) 516-7910 [hidden email] www.nciea.org -----Original Message----- From: Matthew Dowle [mailto:[hidden email]] On Behalf Of Matthew Dowle Sent: Friday, May 11, 2012 9:33 PM To: Damian Betebenner Cc: [hidden email] Subject: Re: := suggestions On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote: > All, > > Trying to use := well but get errors and warnings and am looking for > an elegant way to subset and use := together when multiple variables > are being created and factors are involved. > > Here’s some code showing what I’m trying to do. Any help in doing this > better greatly appreciated: > > require(data.table) > > ### Base data.table > > test.dt <- data.table(ID=rep(1:10, 2), > CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10)) > > setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt > > ### Values to be looked up > > my.lookup <- data.table(ID=1:5, CONTENT_AREA=as.factor("MATH")) > my.lookup > > ### Data table to be added to the original data.table > > my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1, > levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")), > Y=as.factor(letters[1:5]), Z=101:105) my.additional.table > > ### First attempt with error > > test.dt[my.lookup, names(my.new.table) := my.additional.table, > with=FALSE, mult="first"] I get : Error in eval(expr, envir, enclos) : object 'my.new.table' not found but assuming that was typo, then with : test.dt[my.lookup, names(my.additional.table) := my.additional.table, with=FALSE, mult="first"] I get : Error in `[.data.table`(test.dt, my.lookup, `:=`(names(my.additional.table), : Attempt to add new column(s) and set subset of rows at the same time. Create the new column(s) first, and then you'll be able to assign to a subset. If i is set to 1:nrow(x) then please remove that (no need, it's faster without). That error was meant to say "for now", oops. Will try and implement that in 1.8.1 (automatic adding of new column, padding with NA where the sub assigning := doesn't touch). More comments below ... > > ### Create the variables in test.dt using := (but gives warnings and > is cumbersome to have to specify the class of the variables that are > going to be created) > > for (i in c("VALID_CASE", "Y", "Z")) { > test.dt[, i := NA_integer_, with=FALSE, mult="first"] > class(test.dt[[i]]) <- class(my.additional.table[[i]]) > if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <- > levels(my.additional.table[[i]]) > } Yes I get this warning (twice) too : Warning messages: 1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE, : Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R. Avoid key<-, names<- and attr<- which in R currently (and oddly) all copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr(). If this message doesn't help, please report to datatable-help so the root cause can be fixed. I guess that one or both the class<- and levels<- are copying the whole table. Consistent with the first iteration working without warning followed by warnings on the 2nd and 3rd. Just for now until it's automatic, and it might be useful for other tasks, empty factor columns can be created with factor(NA), and := is factor level aware so you can add new levels just by assigning a character value to an item (:= modifies the factor levels by reference for you). So : for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else NA_integer_, with=FALSE] # No warnings or, for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := my.additional.table[[i]][NA], with=FALSE] which copes with more types and also retains all levels. > > > ### Sucessfully perform the variable creation on the rows indicated by > my.lookup > > test.dt[my.lookup, names(my.additional.table) := my.additional.table, > with=FALSE, mult="first"] > > > > > Damian Betebenner > > Center for Assessment > > PO Box 351 > > Dover, NH 03821-0351 > > > > Phone (office): (603) 516-7900 > > Phone (cell): (857) 234-2474 > > Fax: (603) 516-7910 > > > > [hidden email] > > www.nciea.org > > _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
|
I'm not too hot on S4 I'm afraid. In principle it should work I guess. If you run .Internal(inspect(my_object)) before and after, that should reveal what happened. It seems that merely instantiating the class copies its arguments. > setClass("test", representation(x="integer",y="data.table") + ) [1] "test" > x = new("test", x=1:4, y=data.table(a=1:3,b=4:6)) > data.table:::selfrefok(x@y) [1] 0 # i.e., its been copied already, by new() I guess > x@y a b [1,] 1 4 [2,] 2 5 [3,] 3 6 > x@y[,c:=7:9] a b c [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 Warning message: In `[.data.table`(x@y, , `:=`(c, 7:9)) : Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R. Avoid key<-, names<- and attr<- which in R currently (and oddly) all copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr(). If this message doesn't help, please report to datatable-help so the root cause can be fixed. > x An object of class "test" Slot "x": [1] 1 2 3 4 Slot "y": a b [1,] 1 4 [2,] 2 5 [3,] 3 6 > data.table:::selfrefok(x@y) [1] 1 > x@y[,c:=7:9] a b c [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > # no warning this time, but still hasn't updated by reference : > x@y a b [1,] 1 4 [2,] 2 5 [3,] 3 6 > If you'd like this then please raise a feature request. The first := on the slot would generate the warning about a previous copy but then assign back to the slot by reference. That warning could be switched off in the case of slots. But if new() copies its arguments and there's no way to stop or avoid that, then I wonder if it makes sense to include a large data.table inside an S4 class at all? Where else does S4 copy? Matthew > One last wrinkle to iron out: > > Does assignment by reference work with a class that has a slot that is a > data.table? > > I have defined a new class where one of the slots is a data.table. > However, when I apply: > > for (i in c("VALID_CASE", "Y", "Z")) [hidden email][, i := > my.additional.table[[i]][NA][1], with=FALSE] > > > > Nothing "sticks". That is, none of the variables I'm attempting to assign > by reference using := are created. > > > It does work when done outside of the class: > > > for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := > my.additional.table[[i]][NA][1], with=FALSE] > > > Damian Betebenner > Center for Assessment > PO Box 351 > Dover, NHÂ Â 03821-0351 > Â > Phone (office): (603) 516-7900 > Phone (cell): (857) 234-2474 > Fax: (603) 516-7910 > > [hidden email] > www.nciea.org > > > > > -----Original Message----- > From: Matthew Dowle [mailto:[hidden email]] On Behalf Of > Matthew Dowle > Sent: Friday, May 11, 2012 9:33 PM > To: Damian Betebenner > Cc: [hidden email] > Subject: Re: := suggestions > > On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote: >> All, >> >> Trying to use := well but get errors and warnings and am looking for >> an elegant way to subset and use := together when multiple variables >> are being created and factors are involved. >> >> Hereâs some code showing what Iâm trying to do. Any help in doing >> this >> better greatly appreciated: >> >> require(data.table) >> >> ### Base data.table >> >> test.dt <- data.table(ID=rep(1:10, 2), >> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10)) >> >> setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt >> >> ### Values to be looked up >> >> my.lookup <- data.table(ID=1:5, CONTENT_AREA=as.factor("MATH")) >> my.lookup >> >> ### Data table to be added to the original data.table >> >> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1, >> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")), >> Y=as.factor(letters[1:5]), Z=101:105) my.additional.table >> >> ### First attempt with error >> >> test.dt[my.lookup, names(my.new.table) := my.additional.table, >> with=FALSE, mult="first"] > > I get : > Error in eval(expr, envir, enclos) : object 'my.new.table' not found > > but assuming that was typo, then with : > > test.dt[my.lookup, names(my.additional.table) := my.additional.table, > with=FALSE, mult="first"] > > I get : > > Error in `[.data.table`(test.dt, my.lookup, > `:=`(names(my.additional.table), : > Attempt to add new column(s) and set subset of rows at the same time. > Create the new column(s) first, and then you'll be able to assign to a > subset. If i is set to 1:nrow(x) then please remove that (no need, it's > faster without). > > That error was meant to say "for now", oops. Will try and implement that > in 1.8.1 (automatic adding of new column, padding with NA where the sub > assigning := doesn't touch). More comments below ... > >> >> ### Create the variables in test.dt using := (but gives warnings and >> is cumbersome to have to specify the class of the variables that are >> going to be created) >> >> for (i in c("VALID_CASE", "Y", "Z")) { >> test.dt[, i := NA_integer_, with=FALSE, mult="first"] >> class(test.dt[[i]]) <- class(my.additional.table[[i]]) >> if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <- >> levels(my.additional.table[[i]]) >> } > > Yes I get this warning (twice) too : > Warning messages: > 1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE, : > Invalid .internal.selfref detected and fixed by taking a copy of the > whole table, so that := can add this new column by reference. At an > earlier point, this data.table has been copied by R. Avoid key<-, > names<- and attr<- which in R currently (and oddly) all copy the whole > data.table. Use set* syntax instead to avoid copying: setkey(), > setnames() and setattr(). If this message doesn't help, please report to > datatable-help so the root cause can be fixed. > > I guess that one or both the class<- and levels<- are copying the whole > table. Consistent with the first iteration working without warning > followed by warnings on the 2nd and 3rd. > > Just for now until it's automatic, and it might be useful for other tasks, > empty factor columns can be created with factor(NA), and := is factor > level aware so you can add new levels just by assigning a character value > to an item (:= modifies the factor levels by reference > for you). So : > > for (i in c("VALID_CASE", "Y", "Z")) > test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else > NA_integer_, with=FALSE] # No warnings > > or, > > for (i in c("VALID_CASE", "Y", "Z")) > test.dt[, i := my.additional.table[[i]][NA], with=FALSE] > > which copes with more types and also retains all levels. > >> >> >> ### Sucessfully perform the variable creation on the rows indicated by >> my.lookup >> >> test.dt[my.lookup, names(my.additional.table) := my.additional.table, >> with=FALSE, mult="first"] >> >> >> >> >> Damian Betebenner >> >> Center for Assessment >> >> PO Box 351 >> >> Dover, NH 03821-0351 >> >> >> >> Phone (office): (603) 516-7900 >> >> Phone (cell): (857) 234-2474 >> >> Fax: (603) 516-7910 >> >> >> >> [hidden email] >> >> www.nciea.org >> >> > > > > _______________________________________________ datatable-help mailing list [hidden email] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help |
| Powered by Nabble | Edit this page |
