Automatic Compression by Save Causes Check Warning

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Automatic Compression by Save Causes Check Warning

Dario Strbenac-2
Good day,

Save sometimes chooses a compression method which causes a warning during package checking. An example of this is:

measurements <- matrix(round(rnorm(2000*190), 2), nrow = 2000, ncol = 190)
classes <- factor(sample(LETTERS[1:2], 190, replace = TRUE))
save(measurements, classes, file = "data/experiment.RData")

then, when the package is checked,

* checking data for ASCII and uncompressed saves ... WARNING
 
  Note: significantly better compression could be obtained
        by using R CMD build --resave-data
                   old_size new_size compress
  experiment.RData    689Kb    447Kb    bzip2

Could save and R CMD check consistently agree on a suitable compression scheme? Could R CMD check not emit warnings if the data is already small and the alternative compression doesn't reduce the size much, such as for this example? Perhaps it could only emit warnings when the data file is more than 5 MB and the alternative scheme's resulting file is 50% or more than the size of the existing file. There is also no explanation in Section 1.1.6 Data in Packages of Writing R Extensions that compression of data files is implicitly mandatory for R packages to pass the checking process these days.

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Automatic Compression by Save Causes Check Warning

Tomas Kalibera
Dear Dario,

this question may be more suitable for R-pkg-devel or perhaps R-help
list, if you have subsequent questions, you might get better advice
there. In short, save() does no automated selection of a compression
algorithm - it just uses the one specified, by default "gzip". For
automated selection, please use resaveRdaFiles(). You can also use
"--resave-data" when building your package (see "Building package
tarballs" in R-exts) for this to be done automatically for all your data
files. Finally, "R CMD check --as-cran" will report when it can find a
significantly better compression (note that CRAN policies ask for
--as-cran check being run locally before submitting a package, anyway).

The CRAN repository policy mentions that packages should be of minimum
necessary size and the checks are in line with that (and there are
already heuristics in place to avoid the warning if possible gains are
small). I don't think in principle things could be made any simpler than
they are now: "R CMD check --as-cran" will report a possible improvement
by resave, "R CMD build --resave-data" will do the resave. Note that the
detection of the best compression algorithm cannot be done without
actually compressing the data using different algorithms, which is what
resaveRdaFiles() does -- doing this in save() by default is not possible
due to the performance overhead.

Best
Tomas

On 06/18/2018 10:00 AM, Dario Strbenac wrote:

> Good day,
>
> Save sometimes chooses a compression method which causes a warning during package checking. An example of this is:
>
> measurements <- matrix(round(rnorm(2000*190), 2), nrow = 2000, ncol = 190)
> classes <- factor(sample(LETTERS[1:2], 190, replace = TRUE))
> save(measurements, classes, file = "data/experiment.RData")
>
> then, when the package is checked,
>
> * checking data for ASCII and uncompressed saves ... WARNING
>    
>    Note: significantly better compression could be obtained
>          by using R CMD build --resave-data
>                     old_size new_size compress
>    experiment.RData    689Kb    447Kb    bzip2
>
> Could save and R CMD check consistently agree on a suitable compression scheme? Could R CMD check not emit warnings if the data is already small and the alternative compression doesn't reduce the size much, such as for this example? Perhaps it could only emit warnings when the data file is more than 5 MB and the alternative scheme's resulting file is 50% or more than the size of the existing file. There is also no explanation in Section 1.1.6 Data in Packages of Writing R Extensions that compression of data files is implicitly mandatory for R packages to pass the checking process these days.
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel