dget() much slower in recent R versions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

dget() much slower in recent R versions

Ista Zahn
Hello,

I've noticed that dget() is much slower in the current and devel R
versions than in previous versions. In 2.15 reading a 10000-row
data.frame takes less than half a second:

> (which.r <- R.Version()$version.string)
[1] "R version 2.15.2 (2012-10-26)"
> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol = 10))
> dput(x, which.r)
> system.time(y <- dget(which.r))
   user  system elapsed
  0.546   0.033   0.586

While in 3.1.0 and r-devel it takes around 7 seconds.

> (which.r <- R.Version()$version.string)
[1] "R version 3.1.0 (2014-04-10)"
> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol = 10))
> dput(x, which.r)
> system.time(y <- dget(which.r))
   user  system elapsed
  6.920   0.060   7.074

> (which.r <- R.Version()$version.string)
[1] "R Under development (unstable) (2014-06-19 r65979)"
> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol = 10))
> dput(x, which.r)
> system.time(y <- dget(which.r))
   user  system elapsed
  6.886   0.047   6.943
>

I know dput/dget is probably not the right tool for this job:
nevertheless the slowdown in quite dramatic so I thought it was worth
calling attention to.

Best,
Ista

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dget() much slower in recent R versions

Prof Brian Ripley
On 20/06/2014 15:37, Ista Zahn wrote:

> Hello,
>
> I've noticed that dget() is much slower in the current and devel R
> versions than in previous versions. In 2.15 reading a 10000-row
> data.frame takes less than half a second:
>
>> (which.r <- R.Version()$version.string)
> [1] "R version 2.15.2 (2012-10-26)"
>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol = 10))
>> dput(x, which.r)
>> system.time(y <- dget(which.r))
>     user  system elapsed
>    0.546   0.033   0.586
>
> While in 3.1.0 and r-devel it takes around 7 seconds.
>
>> (which.r <- R.Version()$version.string)
> [1] "R version 3.1.0 (2014-04-10)"
>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol = 10))
>> dput(x, which.r)
>> system.time(y <- dget(which.r))
>     user  system elapsed
>    6.920   0.060   7.074
>
>> (which.r <- R.Version()$version.string)
> [1] "R Under development (unstable) (2014-06-19 r65979)"
>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol = 10))
>> dput(x, which.r)
>> system.time(y <- dget(which.r))
>     user  system elapsed
>    6.886   0.047   6.943
>>
>
> I know dput/dget is probably not the right tool for this job:
> nevertheless the slowdown in quite dramatic so I thought it was worth
> calling attention to.

This is completely the wrong way to do this. See ?dump.

dget() basically calls eval(parse()).  parse() is much slower in R >=
3.0 mainly because it keeps more information.  Using keep.source=FALSE
here speeds things up a lot.

 > system.time(y <- dget(which.r))
    user  system elapsed
   3.233   0.012   3.248
 > options(keep.source=FALSE)
 > system.time(y <- dget(which.r))
    user  system elapsed
   0.090   0.001   0.092


--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dget() much slower in recent R versions

Ista Zahn
Makes sense, thanks for the explanation.

Best,
Ista

On Sat, Jun 21, 2014 at 3:56 AM, Prof Brian Ripley
<[hidden email]> wrote:

> On 20/06/2014 15:37, Ista Zahn wrote:
>>
>> Hello,
>>
>> I've noticed that dget() is much slower in the current and devel R
>> versions than in previous versions. In 2.15 reading a 10000-row
>> data.frame takes less than half a second:
>>
>>> (which.r <- R.Version()$version.string)
>>
>> [1] "R version 2.15.2 (2012-10-26)"
>>>
>>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol =
>>> 10))
>>> dput(x, which.r)
>>> system.time(y <- dget(which.r))
>>
>>     user  system elapsed
>>    0.546   0.033   0.586
>>
>> While in 3.1.0 and r-devel it takes around 7 seconds.
>>
>>> (which.r <- R.Version()$version.string)
>>
>> [1] "R version 3.1.0 (2014-04-10)"
>>>
>>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol =
>>> 10))
>>> dput(x, which.r)
>>> system.time(y <- dget(which.r))
>>
>>     user  system elapsed
>>    6.920   0.060   7.074
>>
>>> (which.r <- R.Version()$version.string)
>>
>> [1] "R Under development (unstable) (2014-06-19 r65979)"
>>>
>>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol =
>>> 10))
>>> dput(x, which.r)
>>> system.time(y <- dget(which.r))
>>
>>     user  system elapsed
>>    6.886   0.047   6.943
>>>
>>>
>>
>> I know dput/dget is probably not the right tool for this job:
>> nevertheless the slowdown in quite dramatic so I thought it was worth
>> calling attention to.
>
>
> This is completely the wrong way to do this. See ?dump.
>
> dget() basically calls eval(parse()).  parse() is much slower in R >= 3.0
> mainly because it keeps more information.  Using keep.source=FALSE here
> speeds things up a lot.
>
>
>> system.time(y <- dget(which.r))
>    user  system elapsed
>   3.233   0.012   3.248
>> options(keep.source=FALSE)
>
>> system.time(y <- dget(which.r))
>    user  system elapsed
>   0.090   0.001   0.092
>
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dget() much slower in recent R versions

Hervé Pagès
In reply to this post by Prof Brian Ripley


On 06/21/2014 12:56 AM, Prof Brian Ripley wrote:

> On 20/06/2014 15:37, Ista Zahn wrote:
>> Hello,
>>
>> I've noticed that dget() is much slower in the current and devel R
>> versions than in previous versions. In 2.15 reading a 10000-row
>> data.frame takes less than half a second:
>>
>>> (which.r <- R.Version()$version.string)
>> [1] "R version 2.15.2 (2012-10-26)"
>>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol
>>> = 10))
>>> dput(x, which.r)
>>> system.time(y <- dget(which.r))
>>     user  system elapsed
>>    0.546   0.033   0.586
>>
>> While in 3.1.0 and r-devel it takes around 7 seconds.
>>
>>> (which.r <- R.Version()$version.string)
>> [1] "R version 3.1.0 (2014-04-10)"
>>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol
>>> = 10))
>>> dput(x, which.r)
>>> system.time(y <- dget(which.r))
>>     user  system elapsed
>>    6.920   0.060   7.074
>>
>>> (which.r <- R.Version()$version.string)
>> [1] "R Under development (unstable) (2014-06-19 r65979)"
>>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol
>>> = 10))
>>> dput(x, which.r)
>>> system.time(y <- dget(which.r))
>>     user  system elapsed
>>    6.886   0.047   6.943
>>>
>>
>> I know dput/dget is probably not the right tool for this job:
>> nevertheless the slowdown in quite dramatic so I thought it was worth
>> calling attention to.
>
> This is completely the wrong way to do this. See ?dump.
>
> dget() basically calls eval(parse()).  parse() is much slower in R >=
> 3.0 mainly because it keeps more information.  Using keep.source=FALSE
> here speeds things up a lot.
>
>  > system.time(y <- dget(which.r))
>     user  system elapsed
>    3.233   0.012   3.248
>  > options(keep.source=FALSE)
>  > system.time(y <- dget(which.r))
>     user  system elapsed
>    0.090   0.001   0.092

Nice. But why add the 'keep.source' arg do dget() in R-devel rev 65990:

   dget <- function(file, keep.source = FALSE)
       eval(parse(file = file, keep.source = FALSE))

(Note that the 'keep.source' arg is actually ignored.)

Why not just:

   dget <- function(file)
       eval(parse(file = file, keep.source = FALSE))

Cheers,

H.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel