Hi,
I am not sure if this is the right mailing list, so apologies in advance if it is not. I found the following link/presentation: https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf The implementation of fsort is interesting but incomplete (not sure why?) and can be improved or made faster (at least 25% I believe). I might be wrong but there are maybe a couple of bugs as well. My questions are: 1/ Is the R Core team interested in a faster sorting algo? (Multithread or even single threaded) 2/ I see an issue with the license, which is MPL-2.0, and hence not compatible with base R, Python and Julia. Is there an interest to change the license of fsort so all 3 languages (and all the people using these languages) can benefit from it? (Like suggested on the first page) Please let me know if there is an interest to address the above points, I would be happy to look into it (free of charge of course!). Thank you Best regards Morgan [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Isn’t the default method now “radix” which is the data.table sort, and
isn’t that already parallel using openmp where available? Avi On Mon, Mar 15, 2021 at 12:26 PM Morgan Morgan <[hidden email]> wrote: > Hi, > I am not sure if this is the right mailing list, so apologies in advance if > it is not. > > I found the following link/presentation: > https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf > > The implementation of fsort is interesting but incomplete (not sure why?) > and can be improved or made faster (at least 25% I believe). I might be > wrong but there are maybe a couple of bugs as well. > > My questions are: > > 1/ Is the R Core team interested in a faster sorting algo? (Multithread or > even single threaded) > > 2/ I see an issue with the license, which is MPL-2.0, and hence not > compatible with base R, Python and Julia. Is there an interest to change > the license of fsort so all 3 languages (and all the people using these > languages) can benefit from it? (Like suggested on the first page) > > Please let me know if there is an interest to address the above points, I > would be happy to look into it (free of charge of course!). > > Thank you > Best regards > Morgan > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Sent from Gmail Mobile [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Default method for sort is not radix(especially for character vector). You
might want to read the documentation of sort. For your second question, I invite you to look at the code of fsort. It is implemented only for positive finite double, and default to data.table:::forder ... when the types are different than positive double... Please read the pdf link I sent, everything is explained in it. Thank you Morgan On Mon, 15 Mar 2021, 16:52 Avraham Adler, <[hidden email]> wrote: > Isn’t the default method now “radix” which is the data.table sort, and > isn’t that already parallel using openmp where available? > > Avi > > On Mon, Mar 15, 2021 at 12:26 PM Morgan Morgan <[hidden email]> > wrote: > >> Hi, >> I am not sure if this is the right mailing list, so apologies in advance >> if >> it is not. >> >> I found the following link/presentation: >> https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf >> >> The implementation of fsort is interesting but incomplete (not sure why?) >> and can be improved or made faster (at least 25% I believe). I might be >> wrong but there are maybe a couple of bugs as well. >> >> My questions are: >> >> 1/ Is the R Core team interested in a faster sorting algo? (Multithread or >> even single threaded) >> >> 2/ I see an issue with the license, which is MPL-2.0, and hence not >> compatible with base R, Python and Julia. Is there an interest to change >> the license of fsort so all 3 languages (and all the people using these >> languages) can benefit from it? (Like suggested on the first page) >> >> Please let me know if there is an interest to address the above points, I >> would be happy to look into it (free of charge of course!). >> >> Thank you >> Best regards >> Morgan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > -- > Sent from Gmail Mobile > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by MorganMorgan
In principle, I agree that faster ranking/sorting algorithms are
important, and should be a priority. But I can't help but feel that the paper focuses on textbook-oriented problems. Given that in real world problems, there's almost always some form of prior knowledge: Wouldn't it be better, from a management perspective, to focus on sorting algorithms, that incorporate that prior knowledge? I'm not sure whether that's an R-devel discussion, or for another forum... On Tue, Mar 16, 2021 at 5:25 AM Morgan Morgan <[hidden email]> wrote: > > Hi, > I am not sure if this is the right mailing list, so apologies in advance if > it is not. > > I found the following link/presentation: > https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf > > The implementation of fsort is interesting but incomplete (not sure why?) > and can be improved or made faster (at least 25% I believe). I might be > wrong but there are maybe a couple of bugs as well. > > My questions are: > > 1/ Is the R Core team interested in a faster sorting algo? (Multithread or > even single threaded) > > 2/ I see an issue with the license, which is MPL-2.0, and hence not > compatible with base R, Python and Julia. Is there an interest to change > the license of fsort so all 3 languages (and all the people using these > languages) can benefit from it? (Like suggested on the first page) > > Please let me know if there is an interest to address the above points, I > would be happy to look into it (free of charge of course!). > > Thank you > Best regards > Morgan > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
In reply to this post by MorganMorgan
Those interested in faster sorting may want to look at the merge sort
implemented in pqR (see pqR-project.org). It's often used as the default, because it is stable, and does different collations, while being faster than shell sort (except for small vectors). Here are examples, with timings, for pqR-2020-07-23 and R-4.0.2, compiled identically: ----------------------------- pqR-2020-07-23 in C locale: > set.seed(1) > N <- 1000000 > x <- as.character (sample(N,N,replace=TRUE)) > print(system.time (os <- order(x,method="shell"))) user system elapsed 1.332 0.000 1.334 > print(system.time (or <- order(x,method="radix"))) user system elapsed 0.092 0.004 0.096 > print(system.time (om <- order(x,method="merge"))) user system elapsed 0.363 0.000 0.363 > print(identical(os,or)) [1] TRUE > print(identical(os,om)) [1] TRUE > > x <- c("a","~") > print(order(x,method="shell")) [1] 1 2 > print(order(x,method="radix")) [1] 1 2 > print(order(x,method="merge")) [1] 1 2 ----------------------------- R-4.0.2 in C locale: > set.seed(1) > N <- 1000000 > x <- as.character (sample(N,N,replace=TRUE)) > print(system.time (os <- order(x,method="shell"))) user system elapsed 2.381 0.004 2.387 > print(system.time (or <- order(x,method="radix"))) user system elapsed 0.138 0.000 0.137 > #print(system.time (om <- order(x,method="merge"))) > print(identical(os,or)) [1] TRUE > #print(identical(os,om)) > > x <- c("a","~") > print(order(x,method="shell")) [1] 1 2 > print(order(x,method="radix")) [1] 1 2 > #print(order(x,method="merge")) ------------------------------------ pqR-2020-07-23 in fr_CA.utf8 locale: > set.seed(1) > N <- 1000000 > x <- as.character (sample(N,N,replace=TRUE)) > print(system.time (os <- order(x,method="shell"))) utilisateur système écoulé 2.960 0.000 2.962 > print(system.time (or <- order(x,method="radix"))) utilisateur système écoulé 0.083 0.008 0.092 > print(system.time (om <- order(x,method="merge"))) utilisateur système écoulé 1.143 0.000 1.142 > print(identical(os,or)) [1] TRUE > print(identical(os,om)) [1] TRUE > > x <- c("a","~") > print(order(x,method="shell")) [1] 2 1 > print(order(x,method="radix")) [1] 1 2 > print(order(x,method="merge")) [1] 2 1 ------------------------------------ R-4.0.2 in fr_CA.utf8 locale: > set.seed(1) > N <- 1000000 > x <- as.character (sample(N,N,replace=TRUE)) > print(system.time (os <- order(x,method="shell"))) utilisateur système écoulé 4.222 0.016 4.239 > print(system.time (or <- order(x,method="radix"))) utilisateur système écoulé 0.136 0.000 0.137 > #print(system.time (om <- order(x,method="merge"))) > print(identical(os,or)) [1] TRUE > #print(identical(os,om)) > > x <- c("a","~") > print(order(x,method="shell")) [1] 2 1 > print(order(x,method="radix")) [1] 1 2 > #print(order(x,method="merge")) pqR is faster in all the tests, but more relevant to this discussion is that the "merge" method is substantially faster than "shell" for these character vectors with a million elements, while retaining the stability and collation properties of "shell" (whereas "radix" only does C collation). It would probably not be too hard to take the merge sort code from pqR and use it in R core's implementation. The merge sort in pqR doesn't exploit parallelism at the moment, but merge sort is potentially quite parallelizable (though I think the storage allocation strategy I use would have to be modified). Regards, Radford Neal ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Thank you Neal. This is interesting. I will have a look at pqR.
Indeed radix only does C collation, I believe that is why it is not the default choice for character ordering and sorting. Not sure but I believe it can help address the following bugzilla item: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17400 On the same topic of collation, there is an experimental sorting function "psort" in package kit that might help address this issue. > library(kit) Attaching kit 0.0.7 (OPENMP enabled using 1 thread) > x <- c("b","A","B","a","\xe4") > Encoding(x) <- "latin1" > identical(psort(x, c.locale=FALSE), sort(x)) [1] TRUE > identical(psort(x, c.locale=TRUE), sort(x, method="radix")) [1] TRUE Coming back to the topic of fsort, I have just finished the implementation for double, integer, factor and logical. The implementation takes into account NA, Inf.. values. Values can be sorted in a decreasing order or increasing order. Comparing benchmark with the current implementation in data.table, it is currently over 30% faster. There might bugs but I am sure performance can be further improved as I did not really try hard. If there is interest in both the implementation and cross community sharing, please let know.... Best regards, Morgan On Wed, 17 Mar 2021, 00:37 Radford Neal, <[hidden email]> wrote: > Those interested in faster sorting may want to look at the merge sort > implemented in pqR (see pqR-project.org). It's often used as the > default, because it is stable, and does different collations, while > being faster than shell sort (except for small vectors). > > Here are examples, with timings, for pqR-2020-07-23 and R-4.0.2, > compiled identically: > > ----------------------------- > pqR-2020-07-23 in C locale: > > > set.seed(1) > > N <- 1000000 > > x <- as.character (sample(N,N,replace=TRUE)) > > print(system.time (os <- order(x,method="shell"))) > user system elapsed > 1.332 0.000 1.334 > > print(system.time (or <- order(x,method="radix"))) > user system elapsed > 0.092 0.004 0.096 > > print(system.time (om <- order(x,method="merge"))) > user system elapsed > 0.363 0.000 0.363 > > print(identical(os,or)) > [1] TRUE > > print(identical(os,om)) > [1] TRUE > > > > x <- c("a","~") > > print(order(x,method="shell")) > [1] 1 2 > > print(order(x,method="radix")) > [1] 1 2 > > print(order(x,method="merge")) > [1] 1 2 > > ----------------------------- > R-4.0.2 in C locale: > > > set.seed(1) > > N <- 1000000 > > x <- as.character (sample(N,N,replace=TRUE)) > > print(system.time (os <- order(x,method="shell"))) > user system elapsed > 2.381 0.004 2.387 > > print(system.time (or <- order(x,method="radix"))) > user system elapsed > 0.138 0.000 0.137 > > #print(system.time (om <- order(x,method="merge"))) > > print(identical(os,or)) > [1] TRUE > > #print(identical(os,om)) > > > > x <- c("a","~") > > print(order(x,method="shell")) > [1] 1 2 > > print(order(x,method="radix")) > [1] 1 2 > > #print(order(x,method="merge")) > > ------------------------------------ > pqR-2020-07-23 in fr_CA.utf8 locale: > > > set.seed(1) > > N <- 1000000 > > x <- as.character (sample(N,N,replace=TRUE)) > > print(system.time (os <- order(x,method="shell"))) > utilisateur système écoulé > 2.960 0.000 2.962 > > print(system.time (or <- order(x,method="radix"))) > utilisateur système écoulé > 0.083 0.008 0.092 > > print(system.time (om <- order(x,method="merge"))) > utilisateur système écoulé > 1.143 0.000 1.142 > > print(identical(os,or)) > [1] TRUE > > print(identical(os,om)) > [1] TRUE > > > > x <- c("a","~") > > print(order(x,method="shell")) > [1] 2 1 > > print(order(x,method="radix")) > [1] 1 2 > > print(order(x,method="merge")) > [1] 2 1 > > ------------------------------------ > R-4.0.2 in fr_CA.utf8 locale: > > > set.seed(1) > > N <- 1000000 > > x <- as.character (sample(N,N,replace=TRUE)) > > print(system.time (os <- order(x,method="shell"))) > utilisateur système écoulé > 4.222 0.016 4.239 > > print(system.time (or <- order(x,method="radix"))) > utilisateur système écoulé > 0.136 0.000 0.137 > > #print(system.time (om <- order(x,method="merge"))) > > print(identical(os,or)) > [1] TRUE > > #print(identical(os,om)) > > > > x <- c("a","~") > > print(order(x,method="shell")) > [1] 2 1 > > print(order(x,method="radix")) > [1] 1 2 > > #print(order(x,method="merge")) > > > pqR is faster in all the tests, but more relevant to this discussion > is that the "merge" method is substantially faster than "shell" for > these character vectors with a million elements, while retaining the > stability and collation properties of "shell" (whereas "radix" only > does C collation). > > It would probably not be too hard to take the merge sort code from pqR > and use it in R core's implementation. > > The merge sort in pqR doesn't exploit parallelism at the moment, but > merge sort is potentially quite parallelizable (though I think the > storage allocation strategy I use would have to be modified). > > Regards, > > Radford Neal > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
I think it is "Professor Neal" :)
I also appreciate the pqR comparisons. On Wed, Mar 17, 2021 at 09:23:15AM +0000, Morgan Morgan wrote: >Thank you Neal. This is interesting. I will have a look at pqR. >Indeed radix only does C collation, I believe that is why it is not the >default choice for character ordering and sorting. >Not sure but I believe it can help address the following bugzilla item: >https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17400 > >On the same topic of collation, there is an experimental sorting function >"psort" in package kit that might help address this issue. > >> library(kit) >Attaching kit 0.0.7 (OPENMP enabled using 1 thread) >> x <- c("b","A","B","a","\xe4") >> Encoding(x) <- "latin1" >> identical(psort(x, c.locale=FALSE), sort(x)) >[1] TRUE >> identical(psort(x, c.locale=TRUE), sort(x, method="radix")) >[1] TRUE > >Coming back to the topic of fsort, I have just finished the implementation >for double, integer, factor and logical. >The implementation takes into account NA, Inf.. values. Values can be >sorted in a decreasing order or increasing order. >Comparing benchmark with the current implementation in data.table, it is >currently over 30% faster. >There might bugs but I am sure performance can be further improved as I did >not really try hard. >If there is interest in both the implementation and cross community >sharing, please let know.... > >Best regards, >Morgan > >On Wed, 17 Mar 2021, 00:37 Radford Neal, <[hidden email]> wrote: > >> Those interested in faster sorting may want to look at the merge sort >> implemented in pqR (see pqR-project.org). It's often used as the >> default, because it is stable, and does different collations, while >> being faster than shell sort (except for small vectors). >> >> Here are examples, with timings, for pqR-2020-07-23 and R-4.0.2, >> compiled identically: >> >> ----------------------------- >> pqR-2020-07-23 in C locale: >> >> > set.seed(1) >> > N <- 1000000 >> > x <- as.character (sample(N,N,replace=TRUE)) >> > print(system.time (os <- order(x,method="shell"))) >> user system elapsed >> 1.332 0.000 1.334 >> > print(system.time (or <- order(x,method="radix"))) >> user system elapsed >> 0.092 0.004 0.096 >> > print(system.time (om <- order(x,method="merge"))) >> user system elapsed >> 0.363 0.000 0.363 >> > print(identical(os,or)) >> [1] TRUE >> > print(identical(os,om)) >> [1] TRUE >> > >> > x <- c("a","~") >> > print(order(x,method="shell")) >> [1] 1 2 >> > print(order(x,method="radix")) >> [1] 1 2 >> > print(order(x,method="merge")) >> [1] 1 2 >> >> ----------------------------- >> R-4.0.2 in C locale: >> >> > set.seed(1) >> > N <- 1000000 >> > x <- as.character (sample(N,N,replace=TRUE)) >> > print(system.time (os <- order(x,method="shell"))) >> user system elapsed >> 2.381 0.004 2.387 >> > print(system.time (or <- order(x,method="radix"))) >> user system elapsed >> 0.138 0.000 0.137 >> > #print(system.time (om <- order(x,method="merge"))) >> > print(identical(os,or)) >> [1] TRUE >> > #print(identical(os,om)) >> > >> > x <- c("a","~") >> > print(order(x,method="shell")) >> [1] 1 2 >> > print(order(x,method="radix")) >> [1] 1 2 >> > #print(order(x,method="merge")) >> >> ------------------------------------ >> pqR-2020-07-23 in fr_CA.utf8 locale: >> >> > set.seed(1) >> > N <- 1000000 >> > x <- as.character (sample(N,N,replace=TRUE)) >> > print(system.time (os <- order(x,method="shell"))) >> utilisateur système écoulé >> 2.960 0.000 2.962 >> > print(system.time (or <- order(x,method="radix"))) >> utilisateur système écoulé >> 0.083 0.008 0.092 >> > print(system.time (om <- order(x,method="merge"))) >> utilisateur système écoulé >> 1.143 0.000 1.142 >> > print(identical(os,or)) >> [1] TRUE >> > print(identical(os,om)) >> [1] TRUE >> > >> > x <- c("a","~") >> > print(order(x,method="shell")) >> [1] 2 1 >> > print(order(x,method="radix")) >> [1] 1 2 >> > print(order(x,method="merge")) >> [1] 2 1 >> >> ------------------------------------ >> R-4.0.2 in fr_CA.utf8 locale: >> >> > set.seed(1) >> > N <- 1000000 >> > x <- as.character (sample(N,N,replace=TRUE)) >> > print(system.time (os <- order(x,method="shell"))) >> utilisateur système écoulé >> 4.222 0.016 4.239 >> > print(system.time (or <- order(x,method="radix"))) >> utilisateur système écoulé >> 0.136 0.000 0.137 >> > #print(system.time (om <- order(x,method="merge"))) >> > print(identical(os,or)) >> [1] TRUE >> > #print(identical(os,om)) >> > >> > x <- c("a","~") >> > print(order(x,method="shell")) >> [1] 2 1 >> > print(order(x,method="radix")) >> [1] 1 2 >> > #print(order(x,method="merge")) >> >> >> pqR is faster in all the tests, but more relevant to this discussion >> is that the "merge" method is substantially faster than "shell" for >> these character vectors with a million elements, while retaining the >> stability and collation properties of "shell" (whereas "radix" only >> does C collation). >> >> It would probably not be too hard to take the merge sort code from pqR >> and use it in R core's implementation. >> >> The merge sort in pqR doesn't exploit parallelism at the moment, but >> merge sort is potentially quite parallelizable (though I think the >> storage allocation strategy I use would have to be modified). >> >> Regards, >> >> Radford Neal >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
My apologies to Professor Neal.
Thank you for correcting me. Best regards Morgan On Mon, 22 Mar 2021, 05:05 , <[hidden email]> wrote: > I think it is "Professor Neal" :) > > I also appreciate the pqR comparisons. > > On Wed, Mar 17, 2021 at 09:23:15AM +0000, Morgan Morgan wrote: > >Thank you Neal. This is interesting. I will have a look at pqR. > >Indeed radix only does C collation, I believe that is why it is not the > >default choice for character ordering and sorting. > >Not sure but I believe it can help address the following bugzilla item: > >https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17400 > > > >On the same topic of collation, there is an experimental sorting function > >"psort" in package kit that might help address this issue. > > > >> library(kit) > >Attaching kit 0.0.7 (OPENMP enabled using 1 thread) > >> x <- c("b","A","B","a","\xe4") > >> Encoding(x) <- "latin1" > >> identical(psort(x, c.locale=FALSE), sort(x)) > >[1] TRUE > >> identical(psort(x, c.locale=TRUE), sort(x, method="radix")) > >[1] TRUE > > > >Coming back to the topic of fsort, I have just finished the implementation > >for double, integer, factor and logical. > >The implementation takes into account NA, Inf.. values. Values can be > >sorted in a decreasing order or increasing order. > >Comparing benchmark with the current implementation in data.table, it is > >currently over 30% faster. > >There might bugs but I am sure performance can be further improved as I > did > >not really try hard. > >If there is interest in both the implementation and cross community > >sharing, please let know.... > > > >Best regards, > >Morgan > > > >On Wed, 17 Mar 2021, 00:37 Radford Neal, <[hidden email]> wrote: > > > >> Those interested in faster sorting may want to look at the merge sort > >> implemented in pqR (see pqR-project.org). It's often used as the > >> default, because it is stable, and does different collations, while > >> being faster than shell sort (except for small vectors). > >> > >> Here are examples, with timings, for pqR-2020-07-23 and R-4.0.2, > >> compiled identically: > >> > >> ----------------------------- > >> pqR-2020-07-23 in C locale: > >> > >> > set.seed(1) > >> > N <- 1000000 > >> > x <- as.character (sample(N,N,replace=TRUE)) > >> > print(system.time (os <- order(x,method="shell"))) > >> user system elapsed > >> 1.332 0.000 1.334 > >> > print(system.time (or <- order(x,method="radix"))) > >> user system elapsed > >> 0.092 0.004 0.096 > >> > print(system.time (om <- order(x,method="merge"))) > >> user system elapsed > >> 0.363 0.000 0.363 > >> > print(identical(os,or)) > >> [1] TRUE > >> > print(identical(os,om)) > >> [1] TRUE > >> > > >> > x <- c("a","~") > >> > print(order(x,method="shell")) > >> [1] 1 2 > >> > print(order(x,method="radix")) > >> [1] 1 2 > >> > print(order(x,method="merge")) > >> [1] 1 2 > >> > >> ----------------------------- > >> R-4.0.2 in C locale: > >> > >> > set.seed(1) > >> > N <- 1000000 > >> > x <- as.character (sample(N,N,replace=TRUE)) > >> > print(system.time (os <- order(x,method="shell"))) > >> user system elapsed > >> 2.381 0.004 2.387 > >> > print(system.time (or <- order(x,method="radix"))) > >> user system elapsed > >> 0.138 0.000 0.137 > >> > #print(system.time (om <- order(x,method="merge"))) > >> > print(identical(os,or)) > >> [1] TRUE > >> > #print(identical(os,om)) > >> > > >> > x <- c("a","~") > >> > print(order(x,method="shell")) > >> [1] 1 2 > >> > print(order(x,method="radix")) > >> [1] 1 2 > >> > #print(order(x,method="merge")) > >> > >> ------------------------------------ > >> pqR-2020-07-23 in fr_CA.utf8 locale: > >> > >> > set.seed(1) > >> > N <- 1000000 > >> > x <- as.character (sample(N,N,replace=TRUE)) > >> > print(system.time (os <- order(x,method="shell"))) > >> utilisateur système écoulé > >> 2.960 0.000 2.962 > >> > print(system.time (or <- order(x,method="radix"))) > >> utilisateur système écoulé > >> 0.083 0.008 0.092 > >> > print(system.time (om <- order(x,method="merge"))) > >> utilisateur système écoulé > >> 1.143 0.000 1.142 > >> > print(identical(os,or)) > >> [1] TRUE > >> > print(identical(os,om)) > >> [1] TRUE > >> > > >> > x <- c("a","~") > >> > print(order(x,method="shell")) > >> [1] 2 1 > >> > print(order(x,method="radix")) > >> [1] 1 2 > >> > print(order(x,method="merge")) > >> [1] 2 1 > >> > >> ------------------------------------ > >> R-4.0.2 in fr_CA.utf8 locale: > >> > >> > set.seed(1) > >> > N <- 1000000 > >> > x <- as.character (sample(N,N,replace=TRUE)) > >> > print(system.time (os <- order(x,method="shell"))) > >> utilisateur système écoulé > >> 4.222 0.016 4.239 > >> > print(system.time (or <- order(x,method="radix"))) > >> utilisateur système écoulé > >> 0.136 0.000 0.137 > >> > #print(system.time (om <- order(x,method="merge"))) > >> > print(identical(os,or)) > >> [1] TRUE > >> > #print(identical(os,om)) > >> > > >> > x <- c("a","~") > >> > print(order(x,method="shell")) > >> [1] 2 1 > >> > print(order(x,method="radix")) > >> [1] 1 2 > >> > #print(order(x,method="merge")) > >> > >> > >> pqR is faster in all the tests, but more relevant to this discussion > >> is that the "merge" method is substantially faster than "shell" for > >> these character vectors with a million elements, while retaining the > >> stability and collation properties of "shell" (whereas "radix" only > >> does C collation). > >> > >> It would probably not be too hard to take the merge sort code from pqR > >> and use it in R core's implementation. > >> > >> The merge sort in pqR doesn't exploit parallelism at the moment, but > >> merge sort is potentially quite parallelizable (though I think the > >> storage allocation strategy I use would have to be modified). > >> > >> Regards, > >> > >> Radford Neal > >> > >> ______________________________________________ > >> [hidden email] mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >[hidden email] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-devel > > > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |