Compression

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Compression

R devel mailing list
This is a CRAN question:

I have taken care to compress files in the data directory using "xz" (and checked that it
is the best).  Is there then any impact or use for the LazyDataCompression option in the
DESCRIPTION file?


--
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
[hidden email]

"TERR-ree THUR-noh"


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Compression (really about LazyDate)

Prof Brian Ripley
On 18/02/2021 18:30, Therneau, Terry M., Ph.D. via R-devel wrote:
> This is a CRAN question:
>
> I have taken care to compress files in the data directory using "xz" (and checked that it
> is the best).  Is there then any impact or use for the LazyDataCompression option in the
> DESCRIPTION file?
>

I have difficulty comprehending that, so I will try to answer my guess
at what you meant to ask.

What LazyDataCompression does is completely separate from the contents
of the data directory.  As the manual say

<quote>
Some packages using ‘LazyData’ will benefit from using a form of
compression other than gzip in the installed lazy-loading database. This
can be selected by the --data-compress option to R CMD INSTALL or by
using the ‘LazyDataCompression’ field in the DESCRIPTION file. Useful
values are bzip2, xz and the default, gzip. The only way to discover
which is best is to try them all and look at the size of the
pkgname/data/Rdata.rdb file.
</quote>

When a package is installed with LazyData (and you neglected to tell us
if that is the case), the datasets in the data directory are loaded (and
hence decompressed), and stored in a database.  For a LazyData package
the compression used in the data directory only affects the source
package size (I guess your criterion for 'best') and how fast it is
installed (rarely a consideration but there have been LazyData packages
where installing the data takes most of the time).  At run-time only the
compression specified by LazyDataCompression is relevant.

--
Brian D. Ripley,                  [hidden email]
Emeritus Professor of Applied Statistics, University of Oxford

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Compression (really about LazyDate)

R devel mailing list
Thank you Brian.   I had not quite grasped how the process works, now the descriptions and
usage make sense.

Terry


On 2/19/21 4:28 AM, Prof Brian Ripley wrote:

> On 18/02/2021 18:30, Therneau, Terry M., Ph.D. via R-devel wrote:
>> This is a CRAN question:
>>
>> I have taken care to compress files in the data directory using "xz" (and checked that it
>> is the best).  Is there then any impact or use for the LazyDataCompression option in the
>> DESCRIPTION file?
>>
>
> I have difficulty comprehending that, so I will try to answer my guess at what you meant
> to ask.
>
> What LazyDataCompression does is completely separate from the contents of the data
> directory.  As the manual say
>
> <quote>
> Some packages using ‘LazyData’ will benefit from using a form of compression other than
> gzip in the installed lazy-loading database. This can be selected by the --data-compress
> option to R CMD INSTALL or by using the ‘LazyDataCompression’ field in the DESCRIPTION
> file. Useful values are bzip2, xz and the default, gzip. The only way to discover which
> is best is to try them all and look at the size of the pkgname/data/Rdata.rdb file.
> </quote>
>
> When a package is installed with LazyData (and you neglected to tell us if that is the
> case), the datasets in the data directory are loaded (and hence decompressed), and
> stored in a database. For a LazyData package the compression used in the data directory
> only affects the source package size (I guess your criterion for 'best') and how fast it
> is installed (rarely a consideration but there have been LazyData packages where
> installing the data takes most of the time).  At run-time only the compression specified
> by LazyDataCompression is relevant.
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel