Re: R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Tomas Kalibera
Fixed in R-devel 74754.
Tomas

On 04/19/2018 12:15 PM, Tomas Kalibera wrote:

> On 04/19/2018 11:47 AM, Serguei Sokol wrote:
>> Le 19/04/2018 à 09:30, Tomas Kalibera a écrit :
>>> On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
>>>> On 18/04/2018 5:08 PM, Tousey, Colton wrote:
>>>>> Hello,
>>>>>
>>>>> I want to report a bug in R that is limiting my capabilities to
>>>>> export a matrix with write.csv or write.table with over
>>>>> 2,147,483,648 elements (C's int limit). I found this bug already
>>>>> reported about before:
>>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182.
>>>>> However, there appears to be no solution or fixes in upcoming R
>>>>> version releases.
>>>>>
>>>>> The error message is coming from the writetable part of the utils
>>>>> package in the io.c source
>>>>> code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
>>>>> /* quick integrity check */
>>>>>                  if(XLENGTH(x) != (R_len_t)nr * nc)
>>>>>                      error(_("corrupt matrix -- dims not not match
>>>>> length"));
>>>>>
>>>>> The issue is that nr*nc is an integer and the size of my matrix,
>>>>> 2.8 billion elements, exceeds C's limit, so the check forces the
>>>>> code to fail.
>>>>
>>>> Yes, looks like a typo:  R_len_t is an int, and that's how nr was
>>>> declared.  It should be R_xlen_t, which is bigger on machines that
>>>> support big vectors.
>>>>
>>>> I haven't tested the change; there may be something else in that
>>>> function that assumes short vectors.
>>> Indeed, I think the function won't work for long vectors because of
>>> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be
>>> changed, including their signatures
>>
>> That would be a definite fix but before such deep rewriting is
>> undertaken may the following small fix (in addition to "(R_xlen_t)nr
>> * nc") will be sufficient for cases where nr and nc are in int range
>> but their product can reach long vector limit:
>>
>> replace
>>     tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
>>                     &strBuf, sdec);
>> by
>>     tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0,
>> quote_col[j], qmethod,
>>                     &strBuf, sdec);
>
> Unfortunately we can't do that, x is a matrix of an atomic vector
> type. VECTOR_ELT is taking elements of a generic vector, so it cannot
> be applied to "x". But even if we extracted a single element from "x"
> (e.g. via a type-switch etc), we would not be able to pass it to
> EncodeElement0 which expects a full atomic vector (that is, including
> its header). Instead we would have to call functions like
> EncodeInteger, EncodeReal0, etc on the individual elements. Which is
> then the same as changing EncodeElement0 or implementing a new version
> of it. This does not seem that hard to fix, just is not as trivial as
> changing the cast..
>
> Tomas
>
>
>>
>> Serguei
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel