Parallel compression support for saving to rds/rdata files?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel compression support for saving to rds/rdata files?

Kenny Bell
Hi,

I have tried to follow the instructions in the ``save`` documentation and
it doesn't seem to work (see below):

mydata <- do.call(rbind, rep(iris, 10000))
con <- pipe("pigz -p8 > fname.gz", "wb");
save(mydata, file = con); close(con) # This runs

R.utils::gunzip("fname.gz", "fname.RData", overwrite = TRUE)
load("fname.RData") # Error: error reading from connection

First question: Should the above work?

Second question: Is it possible to make this dummy friendly by allowing
"pigz" as an option for ``compress`` in saveRDS and save? And in such a way
that the decompressing is hidden from the user like normal?

Thanks!
Kenny


--
Kendon Bell
Email: [hidden email]
Phone: (510) 612-3375

Ph.D. Candidate
Department of Agricultural & Resource Economics
University of California, Berkeley

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Parallel compression support for saving to rds/rdata files?

Simon Urbanek

> On Dec 15, 2016, at 12:08 AM, Kenny Bell <[hidden email]> wrote:
>
> Hi,
>
> I have tried to follow the instructions in the ``save`` documentation and
> it doesn't seem to work (see below):
>
> mydata <- do.call(rbind, rep(iris, 10000))
> con <- pipe("pigz -p8 > fname.gz", "wb");
> save(mydata, file = con); close(con) # This runs
>
> R.utils::gunzip("fname.gz", "fname.RData", overwrite = TRUE)
> load("fname.RData") # Error: error reading from connection
>
> First question: Should the above work?
>


Not really, gzip is a bad example, because it doesn't really support parallel compression (since a gzip stream cannot be chopped into blocks by design), but you can do it with bzip2:

mydata <- do.call(rbind, rep(iris, 10000))
con <- pipe("pbzip2 -p8 > fname.bz2", "wb")
save(mydata, file = con)
close(con)

load("fname.bz2")

you can also use parallel read:

load(pipe("pbzip2 -dc fname.bz2"))

Cheers,
Simon



> Second question: Is it possible to make this dummy friendly by allowing
> "pigz" as an option for ``compress`` in saveRDS and save? And in such a way
> that the decompressing is hidden from the user like normal?
>
> Thanks!
> Kenny
>
>
> --
> Kendon Bell
> Email: [hidden email]
> Phone: (510) 612-3375
>
> Ph.D. Candidate
> Department of Agricultural & Resource Economics
> University of California, Berkeley
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel