strtoi output of empty string inconsistent across platforms

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

strtoi output of empty string inconsistent across platforms

Michael Chirico
Identified as root cause of a bug in data.table:

https://github.com/Rdatatable/data.table/issues/3267

On my machine, strtoi("", base = 2L) produces NA_integer_ (which seems
consistent with ?strtoi: "Values which cannot be interpreted as integers or
would overflow are returned as NA_integer_").

But on all the other machines I've seen, 0L is returned. This seems to be
consistent with the output of a simple C program using the underlying
strtol function (see data.table link for this program, and for full
sessionInfo() of some environments with differing output).

So, what is the correct output of strtoi("", base = 2L)? Is the
cross-platform inconsistency to be expected/documentable?

Michael Chirico

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: strtoi output of empty string inconsistent across platforms

Martin Maechler
>>>>> Michael Chirico
>>>>>     on Fri, 11 Jan 2019 14:36:17 +0800 writes:

    > Identified as root cause of a bug in data.table:
    > https://github.com/Rdatatable/data.table/issues/3267

    > On my machine, strtoi("", base = 2L) produces NA_integer_
    > (which seems consistent with ?strtoi: "Values which cannot
    > be interpreted as integers or would overflow are returned
    > as NA_integer_").

indeed consistent with R's documentation on strtoi().
What machine would that be?

    > But on all the other machines I've seen, 0L is
    > returned. This seems to be consistent with the output of a
    > simple C program using the underlying strtol function (see
    > data.table link for this program, and for full
    > sessionInfo() of some environments with differing output).

    > So, what is the correct output of strtoi("", base = 2L)?

    > Is the cross-platform inconsistency to be
    > expected/documentable?

The inconsistency is certainly undesirable.
The relevant utility function in R's source (<R>/src/main/character.c)
is

static int strtoi(SEXP s, int base)
{
    long int res;
    char *endp;

    /* strtol might return extreme values on error */
    errno = 0;

    if(s == NA_STRING) return(NA_INTEGER);
    res = strtol(CHAR(s), &endp, base); /* ASCII */
    if(errno || *endp != '\0') res = NA_INTEGER;
    if(res > INT_MAX || res < INT_MIN) res = NA_INTEGER;
    return (int) res;
}

and so it clearly is a platform-inconsistency in the underlying C
library's strtol().

I think we should make this cross-platform consistent... and
indeed it make much sense to ensure the result of

   strtoi("", base=2L)  to become  NA_integer_

but changes are that would break code that has relied on the
current behavior {on "all but your computer" ;-)}  ?

    > Michael Chirico

Thank you for the report,

Martin Maechler
ETH Zurich and R Core Team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: strtoi output of empty string inconsistent across platforms

Martin Maechler
>>>>> Martin Maechler
>>>>>     on Fri, 11 Jan 2019 09:44:14 +0100 writes:

>>>>> Michael Chirico
>>>>>     on Fri, 11 Jan 2019 14:36:17 +0800 writes:

    >> Identified as root cause of a bug in data.table:
    >> https://github.com/Rdatatable/data.table/issues/3267

    >> On my machine, strtoi("", base = 2L) produces NA_integer_
    >> (which seems consistent with ?strtoi: "Values which
    >> cannot be interpreted as integers or would overflow are
    >> returned as NA_integer_").

    > indeed consistent with R's documentation on strtoi().
    > What machine would that be?

    >> But on all the other machines I've seen, 0L is
    >> returned. This seems to be consistent with the output of
    >> a simple C program using the underlying strtol function
    >> (see data.table link for this program, and for full
    >> sessionInfo() of some environments with differing
    >> output).

    >> So, what is the correct output of strtoi("", base = 2L)?

    >> Is the cross-platform inconsistency to be
    >> expected/documentable?

    > The inconsistency is certainly undesirable.  The relevant
    > utility function in R's source (<R>/src/main/character.c)
    > is

    > static int strtoi(SEXP s, int base) { long int res; char
    > *endp;

    >     /* strtol might return extreme values on error */
    > errno = 0;

    >     if(s == NA_STRING) return(NA_INTEGER); res =
    > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno ||
    > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res <
    > INT_MIN) res = NA_INTEGER; return (int) res; }

    > and so it clearly is a platform-inconsistency in the
    > underlying C library's strtol().

(corrected typos here: )

    > I think we should make this cross-platform consistent ...
    > and indeed it makes much sense to ensure the result of

    >     strtoi("", base=2L)    to become   NA_integer_

    > but chances are that would break code that has relied on
    > the current behavior {on "all but your computer" ;-)} ?

I still think that such a change should be done.

'make check all' on the R source (+ Recommended packages) seems
not to signal any error or warning with such a change, so I plan
to commit that change to "the trunk" / "R-devel" soon, unless
concerns are raised highly (and quickly enough).

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: strtoi output of empty string inconsistent across platforms

Michael Chirico
Thanks Martin.

For what it's worth, this extremely representative, highly scientific
Twitter poll suggests the Mac/Linux split is pretty stark (NA on Mac, 0 on
Linux)

https://twitter.com/michael_chirico/status/1083649190117306369?s=17

On Sat, Jan 12, 2019, 2:00 AM Martin Maechler <[hidden email]
wrote:

> >>>>> Martin Maechler
> >>>>>     on Fri, 11 Jan 2019 09:44:14 +0100 writes:
>
> >>>>> Michael Chirico
> >>>>>     on Fri, 11 Jan 2019 14:36:17 +0800 writes:
>
>     >> Identified as root cause of a bug in data.table:
>     >> https://github.com/Rdatatable/data.table/issues/3267
>
>     >> On my machine, strtoi("", base = 2L) produces NA_integer_
>     >> (which seems consistent with ?strtoi: "Values which
>     >> cannot be interpreted as integers or would overflow are
>     >> returned as NA_integer_").
>
>     > indeed consistent with R's documentation on strtoi().
>     > What machine would that be?
>
>     >> But on all the other machines I've seen, 0L is
>     >> returned. This seems to be consistent with the output of
>     >> a simple C program using the underlying strtol function
>     >> (see data.table link for this program, and for full
>     >> sessionInfo() of some environments with differing
>     >> output).
>
>     >> So, what is the correct output of strtoi("", base = 2L)?
>
>     >> Is the cross-platform inconsistency to be
>     >> expected/documentable?
>
>     > The inconsistency is certainly undesirable.  The relevant
>     > utility function in R's source (<R>/src/main/character.c)
>     > is
>
>     > static int strtoi(SEXP s, int base) { long int res; char
>     > *endp;
>
>     >     /* strtol might return extreme values on error */
>     > errno = 0;
>
>     >     if(s == NA_STRING) return(NA_INTEGER); res =
>     > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno ||
>     > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res <
>     > INT_MIN) res = NA_INTEGER; return (int) res; }
>
>     > and so it clearly is a platform-inconsistency in the
>     > underlying C library's strtol().
>
> (corrected typos here: )
>
>     > I think we should make this cross-platform consistent ...
>     > and indeed it makes much sense to ensure the result of
>
>     >     strtoi("", base=2L)    to become   NA_integer_
>
>     > but chances are that would break code that has relied on
>     > the current behavior {on "all but your computer" ;-)} ?
>
> I still think that such a change should be done.
>
> 'make check all' on the R source (+ Recommended packages) seems
> not to signal any error or warning with such a change, so I plan
> to commit that change to "the trunk" / "R-devel" soon, unless
> concerns are raised highly (and quickly enough).
>
> Martin
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: strtoi output of empty string inconsistent across platforms

Martin Maechler
>>>>> Michael Chirico
>>>>>     on Sat, 12 Jan 2019 17:34:03 +0800 writes:

    > Thanks Martin.  For what it's worth, this extremely
    > representative, highly scientific Twitter poll suggests
    > the Mac/Linux split is pretty stark (NA on Mac, 0 on
    > Linux)

    > https://twitter.com/michael_chirico/status/1083649190117306369?s=17

    > On Sat, Jan 12, 2019, 2:00 AM Martin Maechler
    > <[hidden email] wrote:

    >> >>>>> Martin Maechler >>>>> on Fri, 11 Jan 2019 09:44:14
    >> +0100 writes:
    >>
    >> >>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17
    >> +0800 writes:
    >>
    >> >> Identified as root cause of a bug in data.table: >>
    >> https://github.com/Rdatatable/data.table/issues/3267
    >>
    >> >> On my machine, strtoi("", base = 2L) produces
    >> NA_integer_ >> (which seems consistent with ?strtoi:
    >> "Values which >> cannot be interpreted as integers or
    >> would overflow are >> returned as NA_integer_").
    >>
    >> > indeed consistent with R's documentation on strtoi().
    >> > What machine would that be?
    >>
    >> >> But on all the other machines I've seen, 0L is >>
    >> returned. This seems to be consistent with the output of
    >> >> a simple C program using the underlying strtol
    >> function >> (see data.table link for this program, and
    >> for full >> sessionInfo() of some environments with
    >> differing >> output).
    >>
    >> >> So, what is the correct output of strtoi("", base =
    >> 2L)?
    >>
    >> >> Is the cross-platform inconsistency to be >>
    >> expected/documentable?
    >>
    >> > The inconsistency is certainly undesirable.  The
    >> relevant > utility function in R's source
    >> (<R>/src/main/character.c) > is
    >>
    >> > static int strtoi(SEXP s, int base) { long int res;
    >> char > *endp;
    >>
    >> > /* strtol might return extreme values on error */ >
    >> errno = 0;
    >>
    >> > if(s == NA_STRING) return(NA_INTEGER); res = >
    >> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || >
    >> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res
    >> < > INT_MIN) res = NA_INTEGER; return (int) res; }
    >>
    >> > and so it clearly is a platform-inconsistency in the >
    >> underlying C library's strtol().
    >>
    >> (corrected typos here: )
    >>
    >> > I think we should make this cross-platform consistent
    >> ...  > and indeed it makes much sense to ensure the
    >> result of
    >>
    >> > strtoi("", base=2L) to become NA_integer_
    >>
    >> > but chances are that would break code that has relied
    >> on > the current behavior {on "all but your computer"
    >> ;-)} ?
    >>
    >> I still think that such a change should be done.
    >>
    >> 'make check all' on the R source (+ Recommended packages)
    >> seems not to signal any error or warning with such a
    >> change, so I plan to commit that change to "the trunk" /
    >> "R-devel" soon, unless concerns are raised highly (and
    >> quickly enough).

I've now committed  svn rev 75977  to R-devel  which does change
strtoi("", <base>)  to return NA_integer_   (for all <base>)
platform independently, not even calling lib C's strtol() in
that case.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel