stats::reshape quadratic in number of input columns

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

stats::reshape quadratic in number of input columns

Toby Hocking-2
Hi R-core,

I have been performance testing R packages for wide-to-tall data reshaping
and for the most part I see they differ by constant factors.

However in one test, which involves converting into multiple output
columns, I see that stats::reshape is in fact quadratic in the number of
input columns. For example take the iris data, which has 4 input columns to
reshape, and the desired output has columns named
Species,Sepal,Petal,dimension (where part is either Length or Width). Of
course there is no performance issue with N=4 input columns in the original
iris data, but I made larger versions of this reshaping problem by making
copies of the input columns. The results
https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic
time complexity results in significant slowdowns after about N=10,000 input
columns to reshape. (e.g. several minutes for stats::reshape versus several
seconds for data.table::melt)

For a fix, I would suggest looking into how they implemented the same
operation in the data.table package, which in my test shows computation
times that seem to be linear.

Toby

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel