utils::tar() and files >= 2GB

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

utils::tar() and files >= 2GB

Hervé Pagès
Hi,

The current implementation for utils::tar() seems to generate broken
tarballs when some of the files to include in the tarball are >= 2GB.

For example, when running 'R CMD build' on a big Bioconductor data
package, we see this warning:

   * checking for file ‘ChIPXpressData/DESCRIPTION’ ... OK
   * preparing ‘ChIPXpressData’:
   * checking DESCRIPTION meta-information ... OK
   * checking for LF line-endings in source and make files
   * checking for empty or unneeded directories
   * building ‘ChIPXpressData_0.99.1.tar.gz’
   Warning in sprintf("%011o", as.integer(size)) :
     NAs introduced by coercion

probably because 'size' was > .Machine$integer.max for one of the files
included in the tarball. This suggests that the resulting tarball might
be incorrect. Which is confirmed later when running 'tar ztvf' on that
tarball from the Unix comand line:

   -rw-r--r-- biocbuild/phs_compbio 38 2012-12-11 09:37
ChIPXpressData/.BBSoptions
   -rw-r--r-- biocbuild/phs_compbio 442 2013-01-18 12:04
ChIPXpressData/DESCRIPTION
   -rw-r--r-- biocbuild/phs_compbio   0 2012-12-11 09:37
ChIPXpressData/NAMESPACE
   -rw-r--r-- biocbuild/phs_compbio  14 2012-12-11 09:37
ChIPXpressData/external_data_store.txt
   drwxr-xr-x biocbuild/phs_compbio   0 2012-12-20 13:00
ChIPXpressData/inst/
   drwxr-xr-x biocbuild/phs_compbio   0 2012-12-20 13:15
ChIPXpressData/inst/extdata/
   -rw-r--r-- biocbuild/phs_compbio 1601278008 2012-12-20 13:15
ChIPXpressData/inst/extdata/DB_GPL1261.bigmemory
   -rw-r--r-- biocbuild/phs_compbio     210497 2012-12-20 13:14
ChIPXpressData/inst/extdata/DB_GPL1261.bigmemory.desc
   tar: Archive contains `  ' where numeric off_t value expected
   -rw-r--r-- biocbuild/phs_compbio 18446744073709551615 2012-12-20
13:15 ChIPXpressData/inst/extdata/DB_GPL570.bigmemory
tar: Skipping to next header
   -rw-r--r-- biocbuild/phs_compbio               193165 2012-12-20
13:14 ChIPXpressData/inst/extdata/DB_GPL570.bigmemory.desc
drwxr-xr-x biocbuild/phs_compbio                    0 2012-12-11 09:37
ChIPXpressData/man/
   -rw-r--r-- biocbuild/phs_compbio                  863 2012-12-11
09:37 ChIPXpressData/man/ChIPXpressData-package.Rd
   -rw-r--r-- biocbuild/phs_compbio                 1891 2012-12-11
09:37 ChIPXpressData/man/DB_GPL1261.bigmemory.Rd
   -rw-r--r-- biocbuild/phs_compbio                 1888 2012-12-11
09:37 ChIPXpressData/man/DB_GPL570.bigmemory.Rd
   tar: Exiting with failure status due to previous errors

Also, not too surprisingly, 'R CMD check' on this tarball fails, but
with a somewhat obscure error message:

   Error in getOct(block, 124, 12) : invalid octal digit
   Execution halted

Would it make sense to support files that are >= 2GB, possibly with a
warning about portability issues, if there are any? If not, maybe
utils::tar() (and consequently 'R CMD build') should just fail, with
an informative error message.

Thanks,
H.

 > sessionInfo()
R Under development (unstable) (2013-01-03 r61544)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel