I used `gzfile` and `gzcon` to read a compressed file but I found that
`gzcon` gave me a different result than `gzfile`. It seems like the `gzcon`
does not handle the data correctly. I have posted an example below. In the
example, a portion of a compressed file is downloaded from Google Cloud as
a raw vector, and the data is saved into a temp file. If I use ` gzfile` to
read the file, it can show the first 1000 lines successfully. However, if I
wrap the raw vector as a connection, and use `gzcon` to read from that
connection, it shows the first 884 lines along with a warning(see the
> # installed.packages("BiocManager")
> # BiocManager::install("GCSConnection", version = "devel")
> ## Download data from cloud
> uri <-
> con <- gcs_connection(uri)
> data <- readBin(con, raw(), 4*1024*1024)
## write data to a file
> file_path <- tempfile()
> writeBin(data, file_path)
## Read the data using `gzfile`
> con1 <- gzfile(file_path)
> str(readLines(con1, 1000))
## Read the data using `gzcon`
> ## We create a raw connection from the raw vector
> con2 <- gzcon(rawConnection(data))
> str(readLines(con2, 1000))
> > str(readLines(con1, 1000))
> chr [1:1000] "##fileformat=VCFv4.2" "##hailversion=0.2.24-9cd88d97bedd"
> > str(readLines(con2, 1000))
> chr [1:884] "##fileformat=VCFv4.2" "##hailversion=0.2.24-9cd88d97bedd" ...
> Warning message:
> In readLines(con2, 1000) : incomplete final line found on 'gzcon(data)'
I am not sure if this is caused by a bug in `gzcon` or the misuse of the
function. The same result can be observed at R4.0 and R4.1 devel on Win.
Here is my session info, I hope it can be helpful. Any suggestions and help
would be appreciated.
R Under development (unstable) (2020-06-27 r78747)