Quantcast

:= suggestions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

:= suggestions

Damian Betebenner-2

All,

 

Trying to use := well but get errors and warnings and am looking for an elegant way to subset and use := together when multiple variables are being created and factors are involved.

 

Here’s some code showing what I’m trying to do. Any help in doing this better greatly appreciated:

 

#########################################################

###

### Test of data.table and :=

###

#########################################################

 

require(data.table)

 

 

### Base data.table

 

test.dt <- data.table(ID=rep(1:10, 2), CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))

setkeyv(test.dt, c("ID", "CONTENT_AREA"))

 

test.dt

 

 

### Values to be looked up

 

my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH"))

 

my.lookup

 

 

### Data table to be added to the original data.table

 

my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1, levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")), Y=as.factor(letters[1:5]), Z=101:105)

 

my.additional.table

 

 

### First attempt with error

 

test.dt[my.lookup, names(my.new.table) := my.additional.table, with=FALSE, mult="first"]

 

 

 

### Create the variables in test.dt using := (but gives warnings and is cumbersome to have to specify the class of the variables that are going to be created)

### NOTE:

 

 

for (i in c("VALID_CASE", "Y", "Z")) {

                test.dt[, i := NA_integer_, with=FALSE, mult="first"]

                class(test.dt[[i]]) <- class(my.additional.table[[i]])

                if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <- levels(my.additional.table[[i]])

}

 

 

### Sucessfully perform the variable creation on the rows indicated by my.lookup

 

test.dt[my.lookup, names(my.additional.table) := my.additional.table, with=FALSE, mult="first"]

 

 

 

Damian Betebenner

Center for Assessment

PO Box 351

Dover, NH   03821-0351

 

Phone (office): (603) 516-7900

Phone (cell): (857) 234-2474

Fax: (603) 516-7910

 

[hidden email]

www.nciea.org

 

 

 


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: := suggestions

Matthew Dowle
On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:

> All,
>
> Trying to use := well but get errors and warnings and am looking for
> an elegant way to subset and use := together when multiple variables
> are being created and factors are involved.
>
> Here’s some code showing what I’m trying to do. Any help in doing this
> better greatly appreciated:
>
> require(data.table)
>
> ### Base data.table
>
> test.dt <- data.table(ID=rep(1:10, 2),
> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))
>
> setkeyv(test.dt, c("ID", "CONTENT_AREA"))
> test.dt
>
> ### Values to be looked up
>
> my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH"))
> my.lookup
>  
> ### Data table to be added to the original data.table
>
> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1,
> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")),
> Y=as.factor(letters[1:5]), Z=101:105)
> my.additional.table
>
> ### First attempt with error
>
> test.dt[my.lookup, names(my.new.table) := my.additional.table,
> with=FALSE, mult="first"]

I get :
Error in eval(expr, envir, enclos) : object 'my.new.table' not found

but assuming that was typo, then with :

test.dt[my.lookup, names(my.additional.table) := my.additional.table,
with=FALSE, mult="first"]

I get :

Error in `[.data.table`(test.dt, my.lookup,
`:=`(names(my.additional.table),  :
  Attempt to add new column(s) and set subset of rows at the same time.
Create the new column(s) first, and then you'll be able to assign to a
subset. If i is set to 1:nrow(x) then please remove that (no need, it's
faster without).

That error was meant to say "for now", oops. Will try and implement that
in 1.8.1 (automatic adding of new column, padding with NA where the sub
assigning := doesn't touch).  More comments below ...
 

>
> ### Create the variables in test.dt using := (but gives warnings and
> is cumbersome to have to specify the class of the variables that are
> going to be created)
>
> for (i in c("VALID_CASE", "Y", "Z")) {
>                 test.dt[, i := NA_integer_, with=FALSE, mult="first"]
>                 class(test.dt[[i]]) <- class(my.additional.table[[i]])
>                 if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <-
> levels(my.additional.table[[i]])
> }

Yes I get this warning (twice) too :
Warning messages:
1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE,  :
  Invalid .internal.selfref detected and fixed by taking a copy of the
whole table, so that := can add this new column by reference. At an
earlier point, this data.table has been copied by R. Avoid key<-,
names<- and attr<- which in R currently (and oddly) all copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(),
setnames() and setattr(). If this message doesn't help, please report to
datatable-help so the root cause can be fixed.

I guess that one or both the class<- and levels<- are copying the whole
table. Consistent with the first iteration working without warning
followed by warnings on the 2nd and 3rd.

Just for now until it's automatic, and it might be useful for other
tasks, empty factor columns can be created with factor(NA), and := is
factor level aware so you can add new levels just by assigning a
character value to an item (:= modifies the factor levels by reference
for you).   So :

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA)
else NA_integer_, with=FALSE]
# No warnings

or,

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := my.additional.table[[i]][NA], with=FALSE]

which copes with more types and also retains all levels.

>  
>
> ### Sucessfully perform the variable creation on the rows indicated by
> my.lookup
>
> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
> with=FALSE, mult="first"]
>  
>
>  
>
> Damian Betebenner
>
> Center for Assessment
>
> PO Box 351
>
> Dover, NH   03821-0351
>
>  
>
> Phone (office): (603) 516-7900
>
> Phone (cell): (857) 234-2474
>
> Fax: (603) 516-7910
>
>  
>
> [hidden email]
>
> www.nciea.org
>
>  



_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: := suggestions

Damian Betebenner-2
One last wrinkle to iron out:

Does assignment by reference work with a class that has a slot that is a data.table?

I have defined a new class where one of the slots is a data.table. However, when I apply:

for (i in c("VALID_CASE", "Y", "Z")) [hidden email][, i := my.additional.table[[i]][NA][1], with=FALSE]



Nothing "sticks". That is, none of the variables I'm attempting to assign by reference using := are created.


It does work when done outside of the class:


for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := my.additional.table[[i]][NA][1], with=FALSE]


Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351
 
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

[hidden email]
www.nciea.org




-----Original Message-----
From: Matthew Dowle [mailto:[hidden email]] On Behalf Of Matthew Dowle
Sent: Friday, May 11, 2012 9:33 PM
To: Damian Betebenner
Cc: [hidden email]
Subject: Re: := suggestions

On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:

> All,
>
> Trying to use := well but get errors and warnings and am looking for
> an elegant way to subset and use := together when multiple variables
> are being created and factors are involved.
>
> Here’s some code showing what I’m trying to do. Any help in doing this
> better greatly appreciated:
>
> require(data.table)
>
> ### Base data.table
>
> test.dt <- data.table(ID=rep(1:10, 2),
> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))
>
> setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt
>
> ### Values to be looked up
>
> my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH"))
> my.lookup
>  
> ### Data table to be added to the original data.table
>
> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1,
> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")),
> Y=as.factor(letters[1:5]), Z=101:105) my.additional.table
>
> ### First attempt with error
>
> test.dt[my.lookup, names(my.new.table) := my.additional.table,
> with=FALSE, mult="first"]

I get :
Error in eval(expr, envir, enclos) : object 'my.new.table' not found

but assuming that was typo, then with :

test.dt[my.lookup, names(my.additional.table) := my.additional.table, with=FALSE, mult="first"]

I get :

Error in `[.data.table`(test.dt, my.lookup, `:=`(names(my.additional.table),  :
  Attempt to add new column(s) and set subset of rows at the same time.
Create the new column(s) first, and then you'll be able to assign to a subset. If i is set to 1:nrow(x) then please remove that (no need, it's faster without).

That error was meant to say "for now", oops. Will try and implement that in 1.8.1 (automatic adding of new column, padding with NA where the sub assigning := doesn't touch).  More comments below ...
 

>
> ### Create the variables in test.dt using := (but gives warnings and
> is cumbersome to have to specify the class of the variables that are
> going to be created)
>
> for (i in c("VALID_CASE", "Y", "Z")) {
>                 test.dt[, i := NA_integer_, with=FALSE, mult="first"]
>                 class(test.dt[[i]]) <- class(my.additional.table[[i]])
>                 if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <-
> levels(my.additional.table[[i]])
> }

Yes I get this warning (twice) too :
Warning messages:
1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE,  :
  Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R. Avoid key<-,
names<- and attr<- which in R currently (and oddly) all copy the whole data.table. Use set* syntax instead to avoid copying: setkey(),
setnames() and setattr(). If this message doesn't help, please report to datatable-help so the root cause can be fixed.

I guess that one or both the class<- and levels<- are copying the whole table. Consistent with the first iteration working without warning followed by warnings on the 2nd and 3rd.

Just for now until it's automatic, and it might be useful for other tasks, empty factor columns can be created with factor(NA), and := is factor level aware so you can add new levels just by assigning a character value to an item (:= modifies the factor levels by reference
for you).   So :

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else NA_integer_, with=FALSE] # No warnings

or,

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := my.additional.table[[i]][NA], with=FALSE]

which copes with more types and also retains all levels.

>  
>
> ### Sucessfully perform the variable creation on the rows indicated by
> my.lookup
>
> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
> with=FALSE, mult="first"]
>  
>
>  
>
> Damian Betebenner
>
> Center for Assessment
>
> PO Box 351
>
> Dover, NH   03821-0351
>
>  
>
> Phone (office): (603) 516-7900
>
> Phone (cell): (857) 234-2474
>
> Fax: (603) 516-7910
>
>  
>
> [hidden email]
>
> www.nciea.org
>
>  



_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: := suggestions

Matthew Dowle

I'm not too hot on S4 I'm afraid. In principle it should work I guess. If
you run .Internal(inspect(my_object)) before and after, that should reveal
what happened. It seems that merely instantiating the class copies its
arguments.

> setClass("test", representation(x="integer",y="data.table")
+ )
[1] "test"
> x = new("test", x=1:4, y=data.table(a=1:3,b=4:6))
> data.table:::selfrefok(x@y)
[1] 0    # i.e., its been copied already, by new() I guess
> x@y
     a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
> x@y[,c:=7:9]
     a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Warning message:
In `[.data.table`(x@y, , `:=`(c, 7:9)) :
  Invalid .internal.selfref detected and fixed by taking a copy of the
whole table, so that := can add this new column by reference. At an
earlier point, this data.table has been copied by R. Avoid key<-,
names<- and attr<- which in R currently (and oddly) all copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(),
setnames() and setattr(). If this message doesn't help, please report to
datatable-help so the root cause can be fixed.
> x
An object of class "test"
Slot "x":
[1] 1 2 3 4

Slot "y":
     a b
[1,] 1 4
[2,] 2 5
[3,] 3 6

> data.table:::selfrefok(x@y)
[1] 1
> x@y[,c:=7:9]
     a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> # no warning this time, but still hasn't updated by reference :
> x@y
     a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
>

If you'd like this then please raise a feature request. The first := on
the slot would generate the warning about a previous copy but then assign
back to the slot by reference.  That warning could be switched off in the
case of slots.

But if new() copies its arguments and there's no way to stop or avoid
that, then I wonder if it makes sense to include a large data.table inside
an S4 class at all?  Where else does S4 copy?

Matthew


> One last wrinkle to iron out:
>
> Does assignment by reference work with a class that has a slot that is a
> data.table?
>
> I have defined a new class where one of the slots is a data.table.
> However, when I apply:
>
> for (i in c("VALID_CASE", "Y", "Z")) [hidden email][, i :=
> my.additional.table[[i]][NA][1], with=FALSE]
>
>
>
> Nothing "sticks". That is, none of the variables I'm attempting to assign
> by reference using := are created.
>
>
> It does work when done outside of the class:
>
>
> for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i :=
> my.additional.table[[i]][NA][1], with=FALSE]
>
>
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH   03821-0351
>  
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
>
> [hidden email]
> www.nciea.org
>
>
>
>
> -----Original Message-----
> From: Matthew Dowle [mailto:[hidden email]] On Behalf Of
> Matthew Dowle
> Sent: Friday, May 11, 2012 9:33 PM
> To: Damian Betebenner
> Cc: [hidden email]
> Subject: Re: := suggestions
>
> On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:
>> All,
>>
>> Trying to use := well but get errors and warnings and am looking for
>> an elegant way to subset and use := together when multiple variables
>> are being created and factors are involved.
>>
>> Here’s some code showing what I’m trying to do. Any help in doing
>> this
>> better greatly appreciated:
>>
>> require(data.table)
>>
>> ### Base data.table
>>
>> test.dt <- data.table(ID=rep(1:10, 2),
>> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))
>>
>> setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt
>>
>> ### Values to be looked up
>>
>> my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH"))
>> my.lookup
>>
>> ### Data table to be added to the original data.table
>>
>> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1,
>> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")),
>> Y=as.factor(letters[1:5]), Z=101:105) my.additional.table
>>
>> ### First attempt with error
>>
>> test.dt[my.lookup, names(my.new.table) := my.additional.table,
>> with=FALSE, mult="first"]
>
> I get :
> Error in eval(expr, envir, enclos) : object 'my.new.table' not found
>
> but assuming that was typo, then with :
>
> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
> with=FALSE, mult="first"]
>
> I get :
>
> Error in `[.data.table`(test.dt, my.lookup,
> `:=`(names(my.additional.table),  :
>   Attempt to add new column(s) and set subset of rows at the same time.
> Create the new column(s) first, and then you'll be able to assign to a
> subset. If i is set to 1:nrow(x) then please remove that (no need, it's
> faster without).
>
> That error was meant to say "for now", oops. Will try and implement that
> in 1.8.1 (automatic adding of new column, padding with NA where the sub
> assigning := doesn't touch).  More comments below ...
>
>>
>> ### Create the variables in test.dt using := (but gives warnings and
>> is cumbersome to have to specify the class of the variables that are
>> going to be created)
>>
>> for (i in c("VALID_CASE", "Y", "Z")) {
>>                 test.dt[, i := NA_integer_, with=FALSE, mult="first"]
>>                 class(test.dt[[i]]) <- class(my.additional.table[[i]])
>>                 if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <-
>> levels(my.additional.table[[i]])
>> }
>
> Yes I get this warning (twice) too :
> Warning messages:
> 1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE,  :
>   Invalid .internal.selfref detected and fixed by taking a copy of the
> whole table, so that := can add this new column by reference. At an
> earlier point, this data.table has been copied by R. Avoid key<-,
> names<- and attr<- which in R currently (and oddly) all copy the whole
> data.table. Use set* syntax instead to avoid copying: setkey(),
> setnames() and setattr(). If this message doesn't help, please report to
> datatable-help so the root cause can be fixed.
>
> I guess that one or both the class<- and levels<- are copying the whole
> table. Consistent with the first iteration working without warning
> followed by warnings on the 2nd and 3rd.
>
> Just for now until it's automatic, and it might be useful for other tasks,
> empty factor columns can be created with factor(NA), and := is factor
> level aware so you can add new levels just by assigning a character value
> to an item (:= modifies the factor levels by reference
> for you).   So :
>
> for (i in c("VALID_CASE", "Y", "Z"))
>     test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else
> NA_integer_, with=FALSE] # No warnings
>
> or,
>
> for (i in c("VALID_CASE", "Y", "Z"))
>     test.dt[, i := my.additional.table[[i]][NA], with=FALSE]
>
> which copes with more types and also retains all levels.
>
>>
>>
>> ### Sucessfully perform the variable creation on the rows indicated by
>> my.lookup
>>
>> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
>> with=FALSE, mult="first"]
>>
>>
>>
>>
>> Damian Betebenner
>>
>> Center for Assessment
>>
>> PO Box 351
>>
>> Dover, NH   03821-0351
>>
>>
>>
>> Phone (office): (603) 516-7900
>>
>> Phone (cell): (857) 234-2474
>>
>> Fax: (603) 516-7910
>>
>>
>>
>> [hidden email]
>>
>> www.nciea.org
>>
>>
>
>
>
>


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Loading...