readLines() behaves differently for gzfile connection

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

readLines() behaves differently for gzfile connection

Ben Heavner
When I read a .gz file with readLines() in 3.4.3, it returns text (and a
warning). In 3.5.0, it gives a warning, but no text. Is this expected
behavior or a bug?

3.4.3:
> source_file = "1k_annotation.gz"
> readfile_con <- gzfile(source_file, "r")
> readLines(readfile_con, n = 5)
[1] "#chr\tpos\tref\talt\t

<truncated output here>

Warning message:
In readLines(readfile_con, n = 5) :
  seek on a gzfile connection returned an internal error

> close(readfile_con)

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3

---------------------------------------------

3.5.0:
> source_file = "1k_annotation.gz"
> readfile_con <- gzfile(source_file, "r")
> readLines(readfile_con, n = 5)
[1] "" "" "" "" ""
Warning message:
In readLines(readfile_con, n = 5) :
  seek on a gzfile connection returned an internal error
> close(readfile_con)
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0

----------------------------------------
(note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container, and
3.4.3 on my mac desktop machine)

Thanks!
Ben Heavner

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: readLines() behaves differently for gzfile connection

Michael Lawrence-3
Would it be possible to get that file or a representative subset of it
somewhere so that I can reproduce this?

Thanks,
Michael

On Thu, May 10, 2018 at 3:31 PM, Ben Heavner <[hidden email]> wrote:

> When I read a .gz file with readLines() in 3.4.3, it returns text (and a
> warning). In 3.5.0, it gives a warning, but no text. Is this expected
> behavior or a bug?
>
> 3.4.3:
>> source_file = "1k_annotation.gz"
>> readfile_con <- gzfile(source_file, "r")
>> readLines(readfile_con, n = 5)
> [1] "#chr\tpos\tref\talt\t
>
> <truncated output here>
>
> Warning message:
> In readLines(readfile_con, n = 5) :
>   seek on a gzfile connection returned an internal error
>
>> close(readfile_con)
>
>> sessionInfo()
> R version 3.4.3 (2017-11-30)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Sierra 10.12.6
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.3
>
> ---------------------------------------------
>
> 3.5.0:
>> source_file = "1k_annotation.gz"
>> readfile_con <- gzfile(source_file, "r")
>> readLines(readfile_con, n = 5)
> [1] "" "" "" "" ""
> Warning message:
> In readLines(readfile_con, n = 5) :
>   seek on a gzfile connection returned an internal error
>> close(readfile_con)
>> sessionInfo()
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 9 (stretch)
>
> Matrix products: default
> BLAS: /usr/lib/openblas-base/libblas.so.3
> LAPACK: /usr/lib/libopenblasp-r0.2.19.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0
>
> ----------------------------------------
> (note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container, and
> 3.4.3 on my mac desktop machine)
>
> Thanks!
> Ben Heavner
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: readLines() behaves differently for gzfile connection

Ben Heavner
You bet - it's available on github at
https://github.com/UW-GAC/wgsaparsr/blob/master/tests/testthat/1k_annotation.gz

-Ben

On Thu, May 10, 2018 at 4:17 PM, Michael Lawrence <[hidden email]
> wrote:

> Would it be possible to get that file or a representative subset of it
> somewhere so that I can reproduce this?
>
> Thanks,
> Michael
>
> On Thu, May 10, 2018 at 3:31 PM, Ben Heavner <[hidden email]> wrote:
> > When I read a .gz file with readLines() in 3.4.3, it returns text (and a
> > warning). In 3.5.0, it gives a warning, but no text. Is this expected
> > behavior or a bug?
> >
> > 3.4.3:
> >> source_file = "1k_annotation.gz"
> >> readfile_con <- gzfile(source_file, "r")
> >> readLines(readfile_con, n = 5)
> > [1] "#chr\tpos\tref\talt\t
> >
> > <truncated output here>
> >
> > Warning message:
> > In readLines(readfile_con, n = 5) :
> >   seek on a gzfile connection returned an internal error
> >
> >> close(readfile_con)
> >
> >> sessionInfo()
> > R version 3.4.3 (2017-11-30)
> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
> > Running under: macOS Sierra 10.12.6
> >
> > Matrix products: default
> > BLAS:
> > /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRblas.0.dylib
> > LAPACK:
> > /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRlapack.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.4.3
> >
> > ---------------------------------------------
> >
> > 3.5.0:
> >> source_file = "1k_annotation.gz"
> >> readfile_con <- gzfile(source_file, "r")
> >> readLines(readfile_con, n = 5)
> > [1] "" "" "" "" ""
> > Warning message:
> > In readLines(readfile_con, n = 5) :
> >   seek on a gzfile connection returned an internal error
> >> close(readfile_con)
> >> sessionInfo()
> > R version 3.5.0 (2018-04-23)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Debian GNU/Linux 9 (stretch)
> >
> > Matrix products: default
> > BLAS: /usr/lib/openblas-base/libblas.so.3
> > LAPACK: /usr/lib/libopenblasp-r0.2.19.so
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.5.0
> >
> > ----------------------------------------
> > (note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container,
> and
> > 3.4.3 on my mac desktop machine)
> >
> > Thanks!
> > Ben Heavner
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: readLines() behaves differently for gzfile connection

Michael Lawrence-3
I haven't been able to reproduce the empty lines issue on my Mac or
Linux laptop, but I have yet to try that container.

The warning is because of a SEEK_SET to -1, which apparently is
unsupported by zlib. Maybe the zlib version in that container is
getting confused. I'm not sure why readLines() wants to seek to -1
instead of 0, but it only does that on non-blocking connections. The
compressed file connections are effectively blocking but are marked as
non-blocking. Marking them as blocking removes the warning. I will get
that into devel and release soon. Hopefully that fixes the empty lines
issue also.

Michael

On Thu, May 10, 2018 at 4:21 PM, Ben Heavner <[hidden email]> wrote:

> You bet - it's available on github at
> https://github.com/UW-GAC/wgsaparsr/blob/master/tests/testthat/1k_annotation.gz
>
> -Ben
>
> On Thu, May 10, 2018 at 4:17 PM, Michael Lawrence
> <[hidden email]> wrote:
>>
>> Would it be possible to get that file or a representative subset of it
>> somewhere so that I can reproduce this?
>>
>> Thanks,
>> Michael
>>
>> On Thu, May 10, 2018 at 3:31 PM, Ben Heavner <[hidden email]> wrote:
>> > When I read a .gz file with readLines() in 3.4.3, it returns text (and a
>> > warning). In 3.5.0, it gives a warning, but no text. Is this expected
>> > behavior or a bug?
>> >
>> > 3.4.3:
>> >> source_file = "1k_annotation.gz"
>> >> readfile_con <- gzfile(source_file, "r")
>> >> readLines(readfile_con, n = 5)
>> > [1] "#chr\tpos\tref\talt\t
>> >
>> > <truncated output here>
>> >
>> > Warning message:
>> > In readLines(readfile_con, n = 5) :
>> >   seek on a gzfile connection returned an internal error
>> >
>> >> close(readfile_con)
>> >
>> >> sessionInfo()
>> > R version 3.4.3 (2017-11-30)
>> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
>> > Running under: macOS Sierra 10.12.6
>> >
>> > Matrix products: default
>> > BLAS:
>> >
>> > /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
>> > LAPACK:
>> >
>> > /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
>> >
>> > locale:
>> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> >
>> > attached base packages:
>> > [1] stats     graphics  grDevices utils     datasets  methods   base
>> >
>> > loaded via a namespace (and not attached):
>> > [1] compiler_3.4.3
>> >
>> > ---------------------------------------------
>> >
>> > 3.5.0:
>> >> source_file = "1k_annotation.gz"
>> >> readfile_con <- gzfile(source_file, "r")
>> >> readLines(readfile_con, n = 5)
>> > [1] "" "" "" "" ""
>> > Warning message:
>> > In readLines(readfile_con, n = 5) :
>> >   seek on a gzfile connection returned an internal error
>> >> close(readfile_con)
>> >> sessionInfo()
>> > R version 3.5.0 (2018-04-23)
>> > Platform: x86_64-pc-linux-gnu (64-bit)
>> > Running under: Debian GNU/Linux 9 (stretch)
>> >
>> > Matrix products: default
>> > BLAS: /usr/lib/openblas-base/libblas.so.3
>> > LAPACK: /usr/lib/libopenblasp-r0.2.19.so
>> >
>> > locale:
>> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> >
>> > attached base packages:
>> > [1] stats     graphics  grDevices utils     datasets  methods   base
>> >
>> > loaded via a namespace (and not attached):
>> > [1] compiler_3.5.0
>> >
>> > ----------------------------------------
>> > (note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container,
>> > and
>> > 3.4.3 on my mac desktop machine)
>> >
>> > Thanks!
>> > Ben Heavner
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel