Quantcast

Is there any overhead to converting back and forth from a data.table to a data.frame?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Is there any overhead to converting back and forth from a data.table to a data.frame?

caneff
I prefer data.tables for all the code processing I do.  But others on my team using my functions aren't comfortable with data.tables, so most of the libraries I write end with

 return(data.frame(DT))

Is there any copying or other overhead happening there? Since it inherits from data.frame, I think the answer is no.

Now, if I have a function that does such a return, but I wrap that itself in a data.table call:

data.table(func_that_returns_df())

Is there any inefficiency there?  Is there a difference between data.table() and as.data.table() here?

_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is there any overhead to converting back and forth from a data.table to a data.frame?

Arunkumar Srinivasan

as.data.frame is a S3 with .data.table method and is definitely faster than data.frame(). But it still does copy(.). data.frame(.) would also convert strings to factors by default (if stringsAsFactors=TRUE).

The most efficient way to convert data.table to data.frame would be to do things by reference (in place). The code is already available in as.data.frame, just remove the copy(.):

# convert data.table to data.frame by reference
setDF <- function(x) {
    if (!is.data.table(x))
        stop("x must be a data.table")
    setattr(x, "row.names", .set_row_names(nrow(x)))
    setattr(x, "class", "data.frame")
    setattr(x, "sorted", NULL)
    setattr(x, ".internal.selfref", NULL)        
}

Now you’ve a function that’ll convert a data.table to data.frame by reference.

require(data.table)
dat <- data.table(x=1:5, y=6:10)
setDF(dat) # dat is now a data.frame

Probably we should export this function as well, like setDT so that users can switch between the two as they desire without hitting performance?


Arun

From: Chris Neff [hidden email]
Reply: Chris Neff [hidden email]
Date: April 7, 2014 at 5:32:47 PM
To: [hidden email] [hidden email]
Subject:  [datatable-help] Is there any overhead to converting back and forth from a data.table to a data.frame?

I prefer data.tables for all the code processing I do.  But others on my team using my functions aren't comfortable with data.tables, so most of the libraries I write end with

 return(data.frame(DT))

Is there any copying or other overhead happening there? Since it inherits from data.frame, I think the answer is no.

Now, if I have a function that does such a return, but I wrap that itself in a data.table call:

data.table(func_that_returns_df())

Is there any inefficiency there?  Is there a difference between data.table() and as.data.table() here?
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is there any overhead to converting back and forth from a data.table to a data.frame?

caneff
I would appreciate such a function, yes. Thanks for the explanation.


On Mon, Apr 7, 2014 at 2:25 PM, Arunkumar Srinivasan <[hidden email]> wrote:

as.data.frame is a S3 with .data.table method and is definitely faster than data.frame(). But it still does copy(.). data.frame(.) would also convert strings to factors by default (if stringsAsFactors=TRUE).

The most efficient way to convert data.table to data.frame would be to do things by reference (in place). The code is already available in as.data.frame, just remove the copy(.):

# convert data.table to data.frame by reference
setDF <- function(x) {
    if (!is.data.table(x))
        stop("x must be a data.table")
    setattr(x, "row.names", .set_row_names(nrow(x)))
    setattr(x, "class", "data.frame")
    setattr(x, "sorted", NULL)
    setattr(x, ".internal.selfref", NULL)        
}

Now you’ve a function that’ll convert a data.table to data.frame by reference.

require(data.table)
dat <- data.table(x=1:5, y=6:10)
setDF(dat) # dat is now a data.frame

Probably we should export this function as well, like setDT so that users can switch between the two as they desire without hitting performance?


Arun

From: Chris Neff [hidden email]
Reply: Chris Neff [hidden email]
Date: April 7, 2014 at 5:32:47 PM
To: [hidden email] [hidden email]
Subject:  [datatable-help] Is there any overhead to converting back and forth from a data.table to a data.frame?

I prefer data.tables for all the code processing I do.  But others on my team using my functions aren't comfortable with data.tables, so most of the libraries I write end with

 return(data.frame(DT))

Is there any copying or other overhead happening there? Since it inherits from data.frame, I think the answer is no.

Now, if I have a function that does such a return, but I wrap that itself in a data.table call:

data.table(func_that_returns_df())

Is there any inefficiency there?  Is there a difference between data.table() and as.data.table() here?
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is there any overhead to converting back and forth from a data.table to a data.frame?

Kevin Ushey
I agree; this would be very useful.

On Mon, Apr 7, 2014 at 11:29 AM, Chris Neff <[hidden email]> wrote:

> I would appreciate such a function, yes. Thanks for the explanation.
>
>
> On Mon, Apr 7, 2014 at 2:25 PM, Arunkumar Srinivasan <[hidden email]>
> wrote:
>>
>> as.data.frame is a S3 with .data.table method and is definitely faster
>> than data.frame(). But it still does copy(.). data.frame(.) would also
>> convert strings to factors by default (if stringsAsFactors=TRUE).
>>
>> The most efficient way to convert data.table to data.frame would be to do
>> things by reference (in place). The code is already available in
>> as.data.frame, just remove the copy(.):
>>
>> # convert data.table to data.frame by reference
>> setDF <- function(x) {
>>     if (!is.data.table(x))
>>         stop("x must be a data.table")
>>     setattr(x, "row.names", .set_row_names(nrow(x)))
>>     setattr(x, "class", "data.frame")
>>     setattr(x, "sorted", NULL)
>>     setattr(x, ".internal.selfref", NULL)
>> }
>>
>> Now you've a function that'll convert a data.table to data.frame by
>> reference.
>>
>> require(data.table)
>> dat <- data.table(x=1:5, y=6:10)
>> setDF(dat) # dat is now a data.frame
>>
>> Probably we should export this function as well, like setDT so that users
>> can switch between the two as they desire without hitting performance?
>>
>>
>> Arun
>>
>> From: Chris Neff [hidden email]
>> Reply: Chris Neff [hidden email]
>> Date: April 7, 2014 at 5:32:47 PM
>> To: [hidden email]
>> [hidden email]
>> Subject:  [datatable-help] Is there any overhead to converting back and
>> forth from a data.table to a data.frame?
>>
>> I prefer data.tables for all the code processing I do.  But others on my
>> team using my functions aren't comfortable with data.tables, so most of the
>> libraries I write end with
>>
>>  return(data.frame(DT))
>>
>> Is there any copying or other overhead happening there? Since it inherits
>> from data.frame, I think the answer is no.
>>
>> Now, if I have a function that does such a return, but I wrap that itself
>> in a data.table call:
>>
>> data.table(func_that_returns_df())
>>
>> Is there any inefficiency there?  Is there a difference between
>> data.table() and as.data.table() here?
>> _______________________________________________
>> datatable-help mailing list
>> [hidden email]
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Is there any overhead to converting back and forth from a data.table to a data.frame?

Steve Lianoglou-2
In reply to this post by Arunkumar Srinivasan
+1 on exporting setDF

On Mon, Apr 7, 2014 at 11:25 AM, Arunkumar Srinivasan
<[hidden email]> wrote:

> as.data.frame is a S3 with .data.table method and is definitely faster than
> data.frame(). But it still does copy(.). data.frame(.) would also convert
> strings to factors by default (if stringsAsFactors=TRUE).
>
> The most efficient way to convert data.table to data.frame would be to do
> things by reference (in place). The code is already available in
> as.data.frame, just remove the copy(.):
>
> # convert data.table to data.frame by reference
> setDF <- function(x) {
>     if (!is.data.table(x))
>         stop("x must be a data.table")
>     setattr(x, "row.names", .set_row_names(nrow(x)))
>     setattr(x, "class", "data.frame")
>     setattr(x, "sorted", NULL)
>     setattr(x, ".internal.selfref", NULL)
> }
>
> Now you've a function that'll convert a data.table to data.frame by
> reference.
>
> require(data.table)
> dat <- data.table(x=1:5, y=6:10)
> setDF(dat) # dat is now a data.frame
>
> Probably we should export this function as well, like setDT so that users
> can switch between the two as they desire without hitting performance?
>
>
> Arun
>
> From: Chris Neff [hidden email]
> Reply: Chris Neff [hidden email]
> Date: April 7, 2014 at 5:32:47 PM
> To: [hidden email]
> [hidden email]
> Subject:  [datatable-help] Is there any overhead to converting back and
> forth from a data.table to a data.frame?
>
> I prefer data.tables for all the code processing I do.  But others on my
> team using my functions aren't comfortable with data.tables, so most of the
> libraries I write end with
>
>  return(data.frame(DT))
>
> Is there any copying or other overhead happening there? Since it inherits
> from data.frame, I think the answer is no.
>
> Now, if I have a function that does such a return, but I wrap that itself in
> a data.table call:
>
> data.table(func_that_returns_df())
>
> Is there any inefficiency there?  Is there a difference between data.table()
> and as.data.table() here?
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> [hidden email]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



--
Steve Lianoglou
Computational Biologist
Genentech
_______________________________________________
datatable-help mailing list
[hidden email]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
Loading...