file.info() on WinXP/NTFS > 2Gb

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

file.info() on WinXP/NTFS > 2Gb

Henrik Bengtsson-2
Hi,

on WinXP Pro SP2 with NTFS, I noticed that file.info() under
Rv2.2.1pat (2006-02-09) does not report the correct file size if the
file is >= 2^31 bytes (2GB).   Is this problem known?  Is this related
to the note in ?file.info:

"Some (broken) systems allow files of more than 2Gb to be created but
not accessed by the 'stat' system call.  Such files will show up as
non-readable (and very likely not be readable by any of R's input
functions)."

Example:

# Create a 2Gb(!) file, report file.info()$size, and remove the file.

fname <- "tmp.Rbin";
con <- file(fname, open="w+b");
size <- 2^31-1; # OK
size <- 2^31;    # Not OK
seek(con, where=size-1, rw="write");
writeBin(con=con, as.integer(0), size=1);
close(con);
print(file.info(fname)$size);
file.remove(fname);

With size = c(2^31-1,2^31), file.info() reports file size
c(2147483647,-2147483648).  This looks to me like a non-signed to
signed integer case.

Note that the above code does indeed create files >2Gb on WinXP with
NTFS; I've created files about 6Gb this way.

BTW, about seek() on Windows:  I noticed the warning about using
seek() on Windows (see ?seek): "We have found so many errors in the
Windows implementation of file positioning that users are advised to
use it only at their own risk, and asked not to waste the R
developers' time with bug reports on Windows' deficiencies.".  Not
going to bug report, but I haven't noticed any problems this far.
What errors/symptoms are expected? Is regardless of file system used,
e.g. FAT, NTFS?

Thanks

Henrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: file.info() on WinXP/NTFS > 2Gb

Brian Ripley
On Sat, 18 Feb 2006, Henrik Bengtsson wrote:

> Hi,
>
> on WinXP Pro SP2 with NTFS, I noticed that file.info() under
> Rv2.2.1pat (2006-02-09) does not report the correct file size if the
> file is >= 2^31 bytes (2GB).   Is this problem known?  Is this related
> to the note in ?file.info:
>
> "Some (broken) systems allow files of more than 2Gb to be created but
> not accessed by the 'stat' system call.  Such files will show up as
> non-readable (and very likely not be readable by any of R's input
> functions)."

Yes, no (although I have forgotten which file system that was).  The code
relies on the stat call.  On a modern OS this comes from POSIX and says

                   off_t         st_size;     /* total size, in bytes */

It looks the corresponding MinGW/MSVCRT type is _off_t which is long, and
on Windows that is a 32-bit type.  This can be avoided by
calling stati64 rather than stat: that is incorrectly documented in my
copy of MSDN library, but I have been able to find the correct
incantation so this will be fixed in R-devel soon.

> Example:
>
> # Create a 2Gb(!) file, report file.info()$size, and remove the file.
>
> fname <- "tmp.Rbin";
> con <- file(fname, open="w+b");
> size <- 2^31-1; # OK
> size <- 2^31;    # Not OK
> seek(con, where=size-1, rw="write");
> writeBin(con=con, as.integer(0), size=1);
> close(con);
> print(file.info(fname)$size);
> file.remove(fname);
>
> With size = c(2^31-1,2^31), file.info() reports file size
> c(2147483647,-2147483648).  This looks to me like a non-signed to
> signed integer case.
>
> Note that the above code does indeed create files >2Gb on WinXP with
> NTFS; I've created files about 6Gb this way.
>
> BTW, about seek() on Windows:  I noticed the warning about using
> seek() on Windows (see ?seek): "We have found so many errors in the
> Windows implementation of file positioning that users are advised to
> use it only at their own risk, and asked not to waste the R
> developers' time with bug reports on Windows' deficiencies.".  Not
> going to bug report, but I haven't noticed any problems this far.
> What errors/symptoms are expected? Is regardless of file system used,
> e.g. FAT, NTFS?

Look in the bug repository.  There have been issues with text files and
with files > 2Gb, and others.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: file.info() on WinXP/NTFS > 2Gb

Henrik Bengtsson-2
On 2/18/06, Prof Brian Ripley <[hidden email]> wrote:

> On Sat, 18 Feb 2006, Henrik Bengtsson wrote:
>
> > Hi,
> >
> > on WinXP Pro SP2 with NTFS, I noticed that file.info() under
> > Rv2.2.1pat (2006-02-09) does not report the correct file size if the
> > file is >= 2^31 bytes (2GB).   Is this problem known?  Is this related
> > to the note in ?file.info:
> >
> > "Some (broken) systems allow files of more than 2Gb to be created but
> > not accessed by the 'stat' system call.  Such files will show up as
> > non-readable (and very likely not be readable by any of R's input
> > functions)."
>
> Yes, no (although I have forgotten which file system that was).  The code
> relies on the stat call.  On a modern OS this comes from POSIX and says
>
>                    off_t         st_size;     /* total size, in bytes */
>
> It looks the corresponding MinGW/MSVCRT type is _off_t which is long, and
> on Windows that is a 32-bit type.  This can be avoided by
> calling stati64 rather than stat: that is incorrectly documented in my
> copy of MSDN library, but I have been able to find the correct
> incantation so this will be fixed in R-devel soon.

Thank you very much for this.

Regards,

Henrik

> > Example:
> >
> > # Create a 2Gb(!) file, report file.info()$size, and remove the file.
> >
> > fname <- "tmp.Rbin";
> > con <- file(fname, open="w+b");
> > size <- 2^31-1; # OK
> > size <- 2^31;    # Not OK
> > seek(con, where=size-1, rw="write");
> > writeBin(con=con, as.integer(0), size=1);
> > close(con);
> > print(file.info(fname)$size);
> > file.remove(fname);
> >
> > With size = c(2^31-1,2^31), file.info() reports file size
> > c(2147483647,-2147483648).  This looks to me like a non-signed to
> > signed integer case.
> >
> > Note that the above code does indeed create files >2Gb on WinXP with
> > NTFS; I've created files about 6Gb this way.
> >
> > BTW, about seek() on Windows:  I noticed the warning about using
> > seek() on Windows (see ?seek): "We have found so many errors in the
> > Windows implementation of file positioning that users are advised to
> > use it only at their own risk, and asked not to waste the R
> > developers' time with bug reports on Windows' deficiencies.".  Not
> > going to bug report, but I haven't noticed any problems this far.
> > What errors/symptoms are expected? Is regardless of file system used,
> > e.g. FAT, NTFS?
>
> Look in the bug repository.  There have been issues with text files and
> with files > 2Gb, and others.
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>


--
Henrik Bengtsson
Mobile: +46 708 909208 (+1h UTC)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel