big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler
Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec <-
   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
   dump('df0', file='testdump')
   cat('length:', i, '\n')
   print(system.time(source('testdump', keep.source = FALSE,
                            encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):
> sessionInfo()
R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.02    0.00    0.01
length: 1e+05
    user  system elapsed
    0.21    0.00    0.20
length: 1e+06
    user  system elapsed
    4.47    0.04    4.51
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26
>


output for R version 3.0.2 Patched (2013-10-27 r64116):
> sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.01    0.00    0.01
length: 1e+05
    user  system elapsed
    0.36    0.06    0.42
length: 1e+06
    user  system elapsed
    6.02    1.86    7.88
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41
>



--
Heinz Tüchler +4317146261 / +436605653878

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Carl Witthoft
Did you run the identical code on the identical machine, and did you verify there were no other tasks running which might have limited the RAM available to R?  And equally important, did you run these tests in the reverse order (in case R was storing large objects from the first run, thus chewing up RAM)?


<quote author="Heinz Tuechler">
Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec <-
   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
   dump('df0', file='testdump')
   cat('length:', i, '\n')
   print(system.time(source('testdump', keep.source = FALSE,
                            encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):
> sessionInfo()
R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.02    0.00    0.01
length: 1e+05
    user  system elapsed
    0.21    0.00    0.20
length: 1e+06
    user  system elapsed
    4.47    0.04    4.51
length: 1e+07
    user  system elapsed
   62.04    0.22   62.26
>


output for R version 3.0.2 Patched (2013-10-27 r64116):
> sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
> sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
    user  system elapsed
       0       0       0
length: 100
    user  system elapsed
       0       0       0
length: 1000
    user  system elapsed
       0       0       0
length: 10000
    user  system elapsed
    0.01    0.00    0.01
length: 1e+05
    user  system elapsed
    0.36    0.06    0.42
length: 1e+06
    user  system elapsed
    6.02    1.86    7.88
length: 1e+07
    user  system elapsed
  388.63  176.42  566.41
>

Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler
All was run on the identical machine in independent sessions. I did not
restart Windows. I also tried 32bit R 3.0.2 and it seemed slightly
faster than 64bit.
Using Process Explorer v15.23
(http://technet.microsoft.com/de-de/sysinternals/bb896653) my impression
was that R 3.0.2 manages memory in a different way than R 2.15.2. While
in R 2.15.2 the physical memory used grows steadily, when sourcing a big
file, in R 3.0.2 growth and shrinking cycle.

best,
Heinz

on/am 30.10.2013 13:28, Carl Witthoft wrote/hat geschrieben:

> Did you run the identical code on the identical machine, and did you verify
> there were no other tasks running which might have limited the RAM available
> to R?  And equally important, did you run these tests in the reverse order
> (in case R was storing large objects from the first run, thus chewing up
> RAM)?
>
>
>
> Dear All,
>
> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
> In the example below I observe e.g. for a data.frame with 10^7 rows the
> following timings:
>
> R version 2.15.2 Patched (2012-11-29 r61184)
> length: 1e+07
>      user  system elapsed
>     62.04    0.22   62.26
>
> R version 3.0.2 Patched (2013-10-27 r64116)
> length: 1e+07
>      user  system elapsed
>    388.63  176.42  566.41
>
> Is there a way to speed R version 3.0.2 up to the performance of R
> version 2.15.2?
>
> best regards,
>
> Heinz Tüchler
>
>
> example:
> sessionInfo()
> sample.vec <-
>     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>       'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
>     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>     dump('df0', file='testdump')
>     cat('length:', i, '\n')
>     print(system.time(source('testdump', keep.source = FALSE,
>                              encoding='')))
> }
>
> output for R version 2.15.2 Patched (2012-11-29 r61184):
>> sessionInfo()
> R version 2.15.2 Patched (2012-11-29 r61184)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>> sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>      user  system elapsed
>         0       0       0
> length: 100
>      user  system elapsed
>         0       0       0
> length: 1000
>      user  system elapsed
>         0       0       0
> length: 10000
>      user  system elapsed
>      0.02    0.00    0.01
> length: 1e+05
>      user  system elapsed
>      0.21    0.00    0.20
> length: 1e+06
>      user  system elapsed
>      4.47    0.04    4.51
> length: 1e+07
>      user  system elapsed
>     62.04    0.22   62.26
>>
>
>
> output for R version 3.0.2 Patched (2013-10-27 r64116):
>> sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>> sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>      user  system elapsed
>         0       0       0
> length: 100
>      user  system elapsed
>         0       0       0
> length: 1000
>      user  system elapsed
>         0       0       0
> length: 10000
>      user  system elapsed
>      0.01    0.00    0.01
> length: 1e+05
>      user  system elapsed
>      0.36    0.06    0.42
> length: 1e+06
>      user  system elapsed
>      6.02    1.86    7.88
> length: 1e+07
>      user  system elapsed
>    388.63  176.42  566.41
>>
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

William Dunlap
In reply to this post by Carl Witthoft
I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
when it is parsing long vectors of numeric data.  dump/source has never been an efficient
way of transferring data between different R session, but it is much worse
now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
In 3.0.2 that factor is more like 4.4.

       n elapsed-2.15.2 elapsed-3.0.2
    2048          0.003         0.018
    4096          0.006         0.065
    8192          0.013         0.254
   16384          0.025         1.067
   32768          0.050         4.114
   65536          0.100        16.236
  131072          0.219        66.013
  262144          0.808       291.883
  524288          2.022      1285.265
 1048576          4.918            NA
 2097152          9.857            NA
 4194304         22.916            NA
 8388608         49.671            NA
16777216        101.042            NA
33554432        512.719            NA

I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
was:
  test <- function(n = 2^(11:25))
  {
      tf <- tempfile()
      on.exit(unlink(tf))
      t(sapply(n, function(n){
          dput(log(seq_len(n)), file=tf)
          print(c(n=n, system.time(parse(file=tf))[1:3]))
      }))
  }

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Carl Witthoft
> Sent: Wednesday, October 30, 2013 5:29 AM
> To: [hidden email]
> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>
> Did you run the identical code on the identical machine, and did you verify
> there were no other tasks running which might have limited the RAM available
> to R?  And equally important, did you run these tests in the reverse order
> (in case R was storing large objects from the first run, thus chewing up
> RAM)?
>
>
>
> Dear All,
>
> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
> In the example below I observe e.g. for a data.frame with 10^7 rows the
> following timings:
>
> R version 2.15.2 Patched (2012-11-29 r61184)
> length: 1e+07
>     user  system elapsed
>    62.04    0.22   62.26
>
> R version 3.0.2 Patched (2013-10-27 r64116)
> length: 1e+07
>     user  system elapsed
>   388.63  176.42  566.41
>
> Is there a way to speed R version 3.0.2 up to the performance of R
> version 2.15.2?
>
> best regards,
>
> Heinz Tüchler
>
>
> example:
> sessionInfo()
> sample.vec <-
>    c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>      'named', 'file', 'or', 'URL', 'or', 'connection')
> dmp.size <- c(10^(1:7))
> set.seed(37)
>
> for(i in dmp.size) {
>    df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>    dump('df0', file='testdump')
>    cat('length:', i, '\n')
>    print(system.time(source('testdump', keep.source = FALSE,
>                             encoding='')))
> }
>
> output for R version 2.15.2 Patched (2012-11-29 r61184):
> > sessionInfo()
> R version 2.15.2 Patched (2012-11-29 r61184)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> > sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> > dmp.size <- c(10^(1:7))
> > set.seed(37)
> >
> > for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>     user  system elapsed
>        0       0       0
> length: 100
>     user  system elapsed
>        0       0       0
> length: 1000
>     user  system elapsed
>        0       0       0
> length: 10000
>     user  system elapsed
>     0.02    0.00    0.01
> length: 1e+05
>     user  system elapsed
>     0.21    0.00    0.20
> length: 1e+06
>     user  system elapsed
>     4.47    0.04    4.51
> length: 1e+07
>     user  system elapsed
>    62.04    0.22   62.26
> >
>
>
> output for R version 3.0.2 Patched (2013-10-27 r64116):
> > sessionInfo()
> R version 3.0.2 Patched (2013-10-27 r64116)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Switzerland.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> > sample.vec <-
> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> 'the',
> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> > dmp.size <- c(10^(1:7))
> > set.seed(37)
> >
> > for(i in dmp.size) {
> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> +   dump('df0', file='testdump')
> +   cat('length:', i, '\n')
> +   print(system.time(source('testdump', keep.source = FALSE,
> +                            encoding='')))
> + }
> length: 10
>     user  system elapsed
>        0       0       0
> length: 100
>     user  system elapsed
>        0       0       0
> length: 1000
>     user  system elapsed
>        0       0       0
> length: 10000
>     user  system elapsed
>     0.01    0.00    0.01
> length: 1e+05
>     user  system elapsed
>     0.36    0.06    0.42
> length: 1e+06
>     user  system elapsed
>     6.02    1.86    7.88
> length: 1e+07
>     user  system elapsed
>   388.63  176.42  566.41
> >
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-
> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler
Best thanks for confirming my impression. I use dump for storing large
data.frames with a number of attributes for each variable. save/load is
much faster, but I am unsure, if such files will be readable by R
versions years later.
What format/functions would you suggest for data storage/transfer
between different (future) R versions?

best regards,
Heinz

on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:

> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
> when it is parsing long vectors of numeric data.  dump/source has never been an efficient
> way of transferring data between different R session, but it is much worse
> now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
> in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
> In 3.0.2 that factor is more like 4.4.
>
>         n elapsed-2.15.2 elapsed-3.0.2
>      2048          0.003         0.018
>      4096          0.006         0.065
>      8192          0.013         0.254
>     16384          0.025         1.067
>     32768          0.050         4.114
>     65536          0.100        16.236
>    131072          0.219        66.013
>    262144          0.808       291.883
>    524288          2.022      1285.265
>   1048576          4.918            NA
>   2097152          9.857            NA
>   4194304         22.916            NA
>   8388608         49.671            NA
> 16777216        101.042            NA
> 33554432        512.719            NA
>
> I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
> finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
> was:
>    test <- function(n = 2^(11:25))
>    {
>        tf <- tempfile()
>        on.exit(unlink(tf))
>        t(sapply(n, function(n){
>            dput(log(seq_len(n)), file=tf)
>            print(c(n=n, system.time(parse(file=tf))[1:3]))
>        }))
>    }
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf
>> Of Carl Witthoft
>> Sent: Wednesday, October 30, 2013 5:29 AM
>> To: [hidden email]
>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>
>> Did you run the identical code on the identical machine, and did you verify
>> there were no other tasks running which might have limited the RAM available
>> to R?  And equally important, did you run these tests in the reverse order
>> (in case R was storing large objects from the first run, thus chewing up
>> RAM)?
>>
>>
>>
>> Dear All,
>>
>> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
>> In the example below I observe e.g. for a data.frame with 10^7 rows the
>> following timings:
>>
>> R version 2.15.2 Patched (2012-11-29 r61184)
>> length: 1e+07
>>      user  system elapsed
>>     62.04    0.22   62.26
>>
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> length: 1e+07
>>      user  system elapsed
>>    388.63  176.42  566.41
>>
>> Is there a way to speed R version 3.0.2 up to the performance of R
>> version 2.15.2?
>>
>> best regards,
>>
>> Heinz Tüchler
>>
>>
>> example:
>> sessionInfo()
>> sample.vec <-
>>     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>>       'named', 'file', 'or', 'URL', 'or', 'connection')
>> dmp.size <- c(10^(1:7))
>> set.seed(37)
>>
>> for(i in dmp.size) {
>>     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>     dump('df0', file='testdump')
>>     cat('length:', i, '\n')
>>     print(system.time(source('testdump', keep.source = FALSE,
>>                              encoding='')))
>> }
>>
>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>> sessionInfo()
>> R version 2.15.2 Patched (2012-11-29 r61184)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> sample.vec <-
>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>> 'the',
>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>> dmp.size <- c(10^(1:7))
>>> set.seed(37)
>>>
>>> for(i in dmp.size) {
>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> +   dump('df0', file='testdump')
>> +   cat('length:', i, '\n')
>> +   print(system.time(source('testdump', keep.source = FALSE,
>> +                            encoding='')))
>> + }
>> length: 10
>>      user  system elapsed
>>         0       0       0
>> length: 100
>>      user  system elapsed
>>         0       0       0
>> length: 1000
>>      user  system elapsed
>>         0       0       0
>> length: 10000
>>      user  system elapsed
>>      0.02    0.00    0.01
>> length: 1e+05
>>      user  system elapsed
>>      0.21    0.00    0.20
>> length: 1e+06
>>      user  system elapsed
>>      4.47    0.04    4.51
>> length: 1e+07
>>      user  system elapsed
>>     62.04    0.22   62.26
>>>
>>
>>
>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>> sessionInfo()
>> R version 3.0.2 Patched (2013-10-27 r64116)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>> [5] LC_TIME=German_Switzerland.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> sample.vec <-
>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>> 'the',
>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>> dmp.size <- c(10^(1:7))
>>> set.seed(37)
>>>
>>> for(i in dmp.size) {
>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>> +   dump('df0', file='testdump')
>> +   cat('length:', i, '\n')
>> +   print(system.time(source('testdump', keep.source = FALSE,
>> +                            encoding='')))
>> + }
>> length: 10
>>      user  system elapsed
>>         0       0       0
>> length: 100
>>      user  system elapsed
>>         0       0       0
>> length: 1000
>>      user  system elapsed
>>         0       0       0
>> length: 10000
>>      user  system elapsed
>>      0.01    0.00    0.01
>> length: 1e+05
>>      user  system elapsed
>>      0.36    0.06    0.42
>> length: 1e+06
>>      user  system elapsed
>>      6.02    1.86    7.88
>> length: 1e+07
>>      user  system elapsed
>>    388.63  176.42  566.41
>>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-in-
>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

William Dunlap
I have to defer to others for policy declarations like how long
the current format used by load and save should be readable.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: Heinz Tuechler [mailto:[hidden email]]
> Sent: Wednesday, October 30, 2013 1:43 PM
> To: William Dunlap
> Cc: Carl Witthoft; [hidden email]
> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>
> Best thanks for confirming my impression. I use dump for storing large
> data.frames with a number of attributes for each variable. save/load is
> much faster, but I am unsure, if such files will be readable by R
> versions years later.
> What format/functions would you suggest for data storage/transfer
> between different (future) R versions?
>
> best regards,
> Heinz
>
> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
> > I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
> > when it is parsing long vectors of numeric data.  dump/source has never been an
> efficient
> > way of transferring data between different R session, but it is much worse
> > now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
> > in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
> > In 3.0.2 that factor is more like 4.4.
> >
> >         n elapsed-2.15.2 elapsed-3.0.2
> >      2048          0.003         0.018
> >      4096          0.006         0.065
> >      8192          0.013         0.254
> >     16384          0.025         1.067
> >     32768          0.050         4.114
> >     65536          0.100        16.236
> >    131072          0.219        66.013
> >    262144          0.808       291.883
> >    524288          2.022      1285.265
> >   1048576          4.918            NA
> >   2097152          9.857            NA
> >   4194304         22.916            NA
> >   8388608         49.671            NA
> > 16777216        101.042            NA
> > 33554432        512.719            NA
> >
> > I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
> > finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
> > was:
> >    test <- function(n = 2^(11:25))
> >    {
> >        tf <- tempfile()
> >        on.exit(unlink(tf))
> >        t(sapply(n, function(n){
> >            dput(log(seq_len(n)), file=tf)
> >            print(c(n=n, system.time(parse(file=tf))[1:3]))
> >        }))
> >    }
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -----Original Message-----
> >> From: [hidden email] [mailto:[hidden email]] On
> Behalf
> >> Of Carl Witthoft
> >> Sent: Wednesday, October 30, 2013 5:29 AM
> >> To: [hidden email]
> >> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
> >>
> >> Did you run the identical code on the identical machine, and did you verify
> >> there were no other tasks running which might have limited the RAM available
> >> to R?  And equally important, did you run these tests in the reverse order
> >> (in case R was storing large objects from the first run, thus chewing up
> >> RAM)?
> >>
> >>
> >>
> >> Dear All,
> >>
> >> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
> >> In the example below I observe e.g. for a data.frame with 10^7 rows the
> >> following timings:
> >>
> >> R version 2.15.2 Patched (2012-11-29 r61184)
> >> length: 1e+07
> >>      user  system elapsed
> >>     62.04    0.22   62.26
> >>
> >> R version 3.0.2 Patched (2013-10-27 r64116)
> >> length: 1e+07
> >>      user  system elapsed
> >>    388.63  176.42  566.41
> >>
> >> Is there a way to speed R version 3.0.2 up to the performance of R
> >> version 2.15.2?
> >>
> >> best regards,
> >>
> >> Heinz Tüchler
> >>
> >>
> >> example:
> >> sessionInfo()
> >> sample.vec <-
> >>     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
> >>       'named', 'file', 'or', 'URL', 'or', 'connection')
> >> dmp.size <- c(10^(1:7))
> >> set.seed(37)
> >>
> >> for(i in dmp.size) {
> >>     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >>     dump('df0', file='testdump')
> >>     cat('length:', i, '\n')
> >>     print(system.time(source('testdump', keep.source = FALSE,
> >>                              encoding='')))
> >> }
> >>
> >> output for R version 2.15.2 Patched (2012-11-29 r61184):
> >>> sessionInfo()
> >> R version 2.15.2 Patched (2012-11-29 r61184)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> >> [5] LC_TIME=German_Switzerland.1252
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>> sample.vec <-
> >> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> >> 'the',
> >> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> >>> dmp.size <- c(10^(1:7))
> >>> set.seed(37)
> >>>
> >>> for(i in dmp.size) {
> >> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> +   dump('df0', file='testdump')
> >> +   cat('length:', i, '\n')
> >> +   print(system.time(source('testdump', keep.source = FALSE,
> >> +                            encoding='')))
> >> + }
> >> length: 10
> >>      user  system elapsed
> >>         0       0       0
> >> length: 100
> >>      user  system elapsed
> >>         0       0       0
> >> length: 1000
> >>      user  system elapsed
> >>         0       0       0
> >> length: 10000
> >>      user  system elapsed
> >>      0.02    0.00    0.01
> >> length: 1e+05
> >>      user  system elapsed
> >>      0.21    0.00    0.20
> >> length: 1e+06
> >>      user  system elapsed
> >>      4.47    0.04    4.51
> >> length: 1e+07
> >>      user  system elapsed
> >>     62.04    0.22   62.26
> >>>
> >>
> >>
> >> output for R version 3.0.2 Patched (2013-10-27 r64116):
> >>> sessionInfo()
> >> R version 3.0.2 Patched (2013-10-27 r64116)
> >> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >> locale:
> >> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
> >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> >> [5] LC_TIME=German_Switzerland.1252
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>> sample.vec <-
> >> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
> >> 'the',
> >> +     'named', 'file', 'or', 'URL', 'or', 'connection')
> >>> dmp.size <- c(10^(1:7))
> >>> set.seed(37)
> >>>
> >>> for(i in dmp.size) {
> >> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
> >> +   dump('df0', file='testdump')
> >> +   cat('length:', i, '\n')
> >> +   print(system.time(source('testdump', keep.source = FALSE,
> >> +                            encoding='')))
> >> + }
> >> length: 10
> >>      user  system elapsed
> >>         0       0       0
> >> length: 100
> >>      user  system elapsed
> >>         0       0       0
> >> length: 1000
> >>      user  system elapsed
> >>         0       0       0
> >> length: 10000
> >>      user  system elapsed
> >>      0.01    0.00    0.01
> >> length: 1e+05
> >>      user  system elapsed
> >>      0.36    0.06    0.42
> >> length: 1e+06
> >>      user  system elapsed
> >>      6.02    1.86    7.88
> >> length: 1e+07
> >>      user  system elapsed
> >>    388.63  176.42  566.41
> >>>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-
> in-
> >> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Prof Brian Ripley
On 30/10/2013 21:15, William Dunlap wrote:
> I have to defer to others for policy declarations like how long
> the current format used by load and save should be readable.

You could also ask how long R will last ....

R can still read (but not write) save() formats used in the 1990's.  We
would expect R to be able to read saves since R 1.0.0 for as long as R
exists.  And as R is Open Source, you would be able to compile it and
dump the objects you want for as long as suitable compilers and OSes
exist ....  And of course R is not the only application which will read
the format.

There is no guarantee that source() will be able to parse dumps from
earlier versions of R, and that has not always been true.

People commenting on parse() speed should note the NEWS for R-devel:

     • The parser has been modified to use less memory.


>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: Heinz Tuechler [mailto:[hidden email]]
>> Sent: Wednesday, October 30, 2013 1:43 PM
>> To: William Dunlap
>> Cc: Carl Witthoft; [hidden email]
>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>
>> Best thanks for confirming my impression. I use dump for storing large
>> data.frames with a number of attributes for each variable. save/load is
>> much faster, but I am unsure, if such files will be readable by R
>> versions years later.
>> What format/functions would you suggest for data storage/transfer
>> between different (future) R versions?
>>
>> best regards,
>> Heinz
>>
>> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
>>> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
>>> when it is parsing long vectors of numeric data.  dump/source has never been an
>> efficient
>>> way of transferring data between different R session, but it is much worse
>>> now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
>>> in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
>>> In 3.0.2 that factor is more like 4.4.
>>>
>>>          n elapsed-2.15.2 elapsed-3.0.2
>>>       2048          0.003         0.018
>>>       4096          0.006         0.065
>>>       8192          0.013         0.254
>>>      16384          0.025         1.067
>>>      32768          0.050         4.114
>>>      65536          0.100        16.236
>>>     131072          0.219        66.013
>>>     262144          0.808       291.883
>>>     524288          2.022      1285.265
>>>    1048576          4.918            NA
>>>    2097152          9.857            NA
>>>    4194304         22.916            NA
>>>    8388608         49.671            NA
>>> 16777216        101.042            NA
>>> 33554432        512.719            NA
>>>
>>> I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did not
>>> finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
>>> was:
>>>     test <- function(n = 2^(11:25))
>>>     {
>>>         tf <- tempfile()
>>>         on.exit(unlink(tf))
>>>         t(sapply(n, function(n){
>>>             dput(log(seq_len(n)), file=tf)
>>>             print(c(n=n, system.time(parse(file=tf))[1:3]))
>>>         }))
>>>     }
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: [hidden email] [mailto:[hidden email]] On
>> Behalf
>>>> Of Carl Witthoft
>>>> Sent: Wednesday, October 30, 2013 5:29 AM
>>>> To: [hidden email]
>>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
>>>>
>>>> Did you run the identical code on the identical machine, and did you verify
>>>> there were no other tasks running which might have limited the RAM available
>>>> to R?  And equally important, did you run these tests in the reverse order
>>>> (in case R was storing large objects from the first run, thus chewing up
>>>> RAM)?
>>>>
>>>>
>>>>
>>>> Dear All,
>>>>
>>>> is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
>>>> In the example below I observe e.g. for a data.frame with 10^7 rows the
>>>> following timings:
>>>>
>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>> length: 1e+07
>>>>       user  system elapsed
>>>>      62.04    0.22   62.26
>>>>
>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>> length: 1e+07
>>>>       user  system elapsed
>>>>     388.63  176.42  566.41
>>>>
>>>> Is there a way to speed R version 3.0.2 up to the performance of R
>>>> version 2.15.2?
>>>>
>>>> best regards,
>>>>
>>>> Heinz Tüchler
>>>>
>>>>
>>>> example:
>>>> sessionInfo()
>>>> sample.vec <-
>>>>      c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
>>>>        'named', 'file', 'or', 'URL', 'or', 'connection')
>>>> dmp.size <- c(10^(1:7))
>>>> set.seed(37)
>>>>
>>>> for(i in dmp.size) {
>>>>      df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>      dump('df0', file='testdump')
>>>>      cat('length:', i, '\n')
>>>>      print(system.time(source('testdump', keep.source = FALSE,
>>>>                               encoding='')))
>>>> }
>>>>
>>>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>>>> sessionInfo()
>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=German_Switzerland.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>> sample.vec <-
>>>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>> 'the',
>>>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>> +   dump('df0', file='testdump')
>>>> +   cat('length:', i, '\n')
>>>> +   print(system.time(source('testdump', keep.source = FALSE,
>>>> +                            encoding='')))
>>>> + }
>>>> length: 10
>>>>       user  system elapsed
>>>>          0       0       0
>>>> length: 100
>>>>       user  system elapsed
>>>>          0       0       0
>>>> length: 1000
>>>>       user  system elapsed
>>>>          0       0       0
>>>> length: 10000
>>>>       user  system elapsed
>>>>       0.02    0.00    0.01
>>>> length: 1e+05
>>>>       user  system elapsed
>>>>       0.21    0.00    0.20
>>>> length: 1e+06
>>>>       user  system elapsed
>>>>       4.47    0.04    4.51
>>>> length: 1e+07
>>>>       user  system elapsed
>>>>      62.04    0.22   62.26
>>>>>
>>>>
>>>>
>>>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>>>> sessionInfo()
>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=German_Switzerland.1252
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>> sample.vec <-
>>>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>> 'the',
>>>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>> +   dump('df0', file='testdump')
>>>> +   cat('length:', i, '\n')
>>>> +   print(system.time(source('testdump', keep.source = FALSE,
>>>> +                            encoding='')))
>>>> + }
>>>> length: 10
>>>>       user  system elapsed
>>>>          0       0       0
>>>> length: 100
>>>>       user  system elapsed
>>>>          0       0       0
>>>> length: 1000
>>>>       user  system elapsed
>>>>          0       0       0
>>>> length: 10000
>>>>       user  system elapsed
>>>>       0.01    0.00    0.01
>>>> length: 1e+05
>>>>       user  system elapsed
>>>>       0.36    0.06    0.42
>>>> length: 1e+06
>>>>       user  system elapsed
>>>>       6.02    1.86    7.88
>>>> length: 1e+07
>>>>       user  system elapsed
>>>>     388.63  176.42  566.41
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://r.789695.n4.nabble.com/big-speed-difference-
>> in-
>>>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Heinz Tuechler
on/am 31.10.2013 09:12, Prof Brian Ripley wrote/hat geschrieben:

> On 30/10/2013 21:15, William Dunlap wrote:
>> I have to defer to others for policy declarations like how long
>> the current format used by load and save should be readable.
>
> You could also ask how long R will last ....
>
> R can still read (but not write) save() formats used in the 1990's.  We
> would expect R to be able to read saves since R 1.0.0 for as long as R
> exists.  And as R is Open Source, you would be able to compile it and
> dump the objects you want for as long as suitable compilers and OSes
> exist ....  And of course R is not the only application which will read
> the format.
>
> There is no guarantee that source() will be able to parse dumps from
> earlier versions of R, and that has not always been true.
>
> People commenting on parse() speed should note the NEWS for R-devel:
>
>      • The parser has been modified to use less memory.
>
>
Thank you for the hint.
It appears to me that source() in R-devel performs at about the same
speed as in R 2.15.2.

>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: Heinz Tuechler [mailto:[hidden email]]
>>> Sent: Wednesday, October 30, 2013 1:43 PM
>>> To: William Dunlap
>>> Cc: Carl Witthoft; [hidden email]
>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
>>> 3.0.2 ?
>>>
>>> Best thanks for confirming my impression. I use dump for storing large
>>> data.frames with a number of attributes for each variable. save/load is
>>> much faster, but I am unsure, if such files will be readable by R
>>> versions years later.
>>> What format/functions would you suggest for data storage/transfer
>>> between different (future) R versions?
>>>
>>> best regards,
>>> Heinz
>>>
>>> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
>>>> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used
>>>> by source())
>>>> when it is parsing long vectors of numeric data.  dump/source has
>>>> never been an
>>> efficient
>>>> way of transferring data between different R session, but it is much
>>>> worse
>>>> now for long vectors.   In 2.15.2 doubling the size of the vector
>>>> (of lengths
>>>> in the range 10^4 to 10^7) makes the time to parse go up by a factor
>>>> of c. 2.1.
>>>> In 3.0.2 that factor is more like 4.4.
>>>>
>>>>          n elapsed-2.15.2 elapsed-3.0.2
>>>>       2048          0.003         0.018
>>>>       4096          0.006         0.065
>>>>       8192          0.013         0.254
>>>>      16384          0.025         1.067
>>>>      32768          0.050         4.114
>>>>      65536          0.100        16.236
>>>>     131072          0.219        66.013
>>>>     262144          0.808       291.883
>>>>     524288          2.022      1285.265
>>>>    1048576          4.918            NA
>>>>    2097152          9.857            NA
>>>>    4194304         22.916            NA
>>>>    8388608         49.671            NA
>>>> 16777216        101.042            NA
>>>> 33554432        512.719            NA
>>>>
>>>> I tried this with 64-bit R on a Linux box.  The NA's represent sizes
>>>> that did not
>>>> finish while I was at a 1 1/2 hour dentist's apppointment.  The
>>>> timing function
>>>> was:
>>>>     test <- function(n = 2^(11:25))
>>>>     {
>>>>         tf <- tempfile()
>>>>         on.exit(unlink(tf))
>>>>         t(sapply(n, function(n){
>>>>             dput(log(seq_len(n)), file=tf)
>>>>             print(c(n=n, system.time(parse(file=tf))[1:3]))
>>>>         }))
>>>>     }
>>>>
>>>> Bill Dunlap
>>>> Spotfire, TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: [hidden email]
>>>>> [mailto:[hidden email]] On
>>> Behalf
>>>>> Of Carl Witthoft
>>>>> Sent: Wednesday, October 30, 2013 5:29 AM
>>>>> To: [hidden email]
>>>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
>>>>> 3.0.2 ?
>>>>>
>>>>> Did you run the identical code on the identical machine, and did
>>>>> you verify
>>>>> there were no other tasks running which might have limited the RAM
>>>>> available
>>>>> to R?  And equally important, did you run these tests in the
>>>>> reverse order
>>>>> (in case R was storing large objects from the first run, thus
>>>>> chewing up
>>>>> RAM)?
>>>>>
>>>>>
>>>>>
>>>>> Dear All,
>>>>>
>>>>> is it known that source works much faster in  R 2.15.2 than in R
>>>>> 3.0.2 ?
>>>>> In the example below I observe e.g. for a data.frame with 10^7 rows
>>>>> the
>>>>> following timings:
>>>>>
>>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>      62.04    0.22   62.26
>>>>>
>>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>     388.63  176.42  566.41
>>>>>
>>>>> Is there a way to speed R version 3.0.2 up to the performance of R
>>>>> version 2.15.2?
>>>>>
>>>>> best regards,
>>>>>
>>>>> Heinz Tüchler
>>>>>
>>>>>
>>>>> example:
>>>>> sessionInfo()
>>>>> sample.vec <-
>>>>>      c('source', 'causes', 'R', 'to', 'accept', 'its', 'input',
>>>>> 'from', 'the',
>>>>>        'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>>>      df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>>      dump('df0', file='testdump')
>>>>>      cat('length:', i, '\n')
>>>>>      print(system.time(source('testdump', keep.source = FALSE,
>>>>>                               encoding='')))
>>>>> }
>>>>>
>>>>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>>>>> sessionInfo()
>>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=German_Switzerland.1252
>>>>> LC_CTYPE=German_Switzerland.1252
>>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=German_Switzerland.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>> sample.vec <-
>>>>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>>> 'the',
>>>>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>>> dmp.size <- c(10^(1:7))
>>>>>> set.seed(37)
>>>>>>
>>>>>> for(i in dmp.size) {
>>>>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> +   dump('df0', file='testdump')
>>>>> +   cat('length:', i, '\n')
>>>>> +   print(system.time(source('testdump', keep.source = FALSE,
>>>>> +                            encoding='')))
>>>>> + }
>>>>> length: 10
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 100
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 1000
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 10000
>>>>>       user  system elapsed
>>>>>       0.02    0.00    0.01
>>>>> length: 1e+05
>>>>>       user  system elapsed
>>>>>       0.21    0.00    0.20
>>>>> length: 1e+06
>>>>>       user  system elapsed
>>>>>       4.47    0.04    4.51
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>      62.04    0.22   62.26
>>>>>>
>>>>>
>>>>>
>>>>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>>>>> sessionInfo()
>>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=German_Switzerland.1252
>>>>> LC_CTYPE=German_Switzerland.1252
>>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=German_Switzerland.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>> sample.vec <-
>>>>> +   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>>> 'the',
>>>>> +     'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>>> dmp.size <- c(10^(1:7))
>>>>>> set.seed(37)
>>>>>>
>>>>>> for(i in dmp.size) {
>>>>> +   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> +   dump('df0', file='testdump')
>>>>> +   cat('length:', i, '\n')
>>>>> +   print(system.time(source('testdump', keep.source = FALSE,
>>>>> +                            encoding='')))
>>>>> + }
>>>>> length: 10
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 100
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 1000
>>>>>       user  system elapsed
>>>>>          0       0       0
>>>>> length: 10000
>>>>>       user  system elapsed
>>>>>       0.01    0.00    0.01
>>>>> length: 1e+05
>>>>>       user  system elapsed
>>>>>       0.36    0.06    0.42
>>>>> length: 1e+06
>>>>>       user  system elapsed
>>>>>       6.02    1.86    7.88
>>>>> length: 1e+07
>>>>>       user  system elapsed
>>>>>     388.63  176.42  566.41
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/big-speed-difference-
>>> in-
>>>>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.