Hi R-core,

I have been performance testing R packages for wide-to-tall data reshaping

and for the most part I see they differ by constant factors.

However in one test, which involves converting into multiple output

columns, I see that stats::reshape is in fact quadratic in the number of

input columns. For example take the iris data, which has 4 input columns to

reshape, and the desired output has columns named

Species,Sepal,Petal,dimension (where part is either Length or Width). Of

course there is no performance issue with N=4 input columns in the original

iris data, but I made larger versions of this reshaping problem by making

copies of the input columns. The results

https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic

time complexity results in significant slowdowns after about N=10,000 input

columns to reshape. (e.g. several minutes for stats::reshape versus several

seconds for data.table::melt)

For a fix, I would suggest looking into how they implemented the same

operation in the data.table package, which in my test shows computation

times that seem to be linear.

Toby

[[alternative HTML version deleted]]

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel