Bug: floating point bug in nclass.FD can cause hist() to crash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug: floating point bug in nclass.FD can cause hist() to crash

Sietse Brouwer
Hello everybody,

This is a bug involving functions in core R package:
graphics::hist.default, grDevices::nclass.FD, and
base::pretty.default. It is not yet on Bugzilla. I cannot submit it
myself, as I do not have an account. Could somebody else add it for
me, perhaps? That would be much appreciated.

Kind regards,

Sietse
Sietse Brouwer


Summary
-------

Floating point errors can cause a data vector to have an ultra-small
inter-quartile range, which causes `grDevices::nclass.FD` to suggest
an absurdly large number of breaks to `graphics::hist(breaks="FD")`.
Because this large float becomes NA when converted to integer, hist's
call to `base::pretty` crashes.

How could nclass.FD, which has the job of suggesting a reasonable number of
classes, avoid suggesting an absurdly large number of classes when the
inter-quartile range is absurdly small compared to the range?


Steps to reproduce
------------------

    hist(c(1, 1, 1, 1 + 1e-15, 2), breaks="FD")


Observed behaviour
------------------

Running this code gives the following error message:

    Error in pretty.default(range(x), n = breaks, min.n = 1):
      invalid 'n' argument
    In addition: Warning message:
    In pretty.default(range(x), n = breaks, min.n = 1) :
      NAs introduced by coercion to integer range


Expected behaviour
------------------

That hist() should never crash when given valid numerical data. Specifically,
that it should be robust even to those rare datasets where (through floating
point inaccuracy) the inter-quartile range is tens of orders of magnitude
smaller than the range.


Analysis
--------

Dramatis personae:

* graphics::hist.default
  https://svn.r-project.org/R/trunk/src/library/graphics/R/hist.R

* grDevices::nclass.FD
  https://svn.r-project.org/R/trunk/src/library/grDevices/R/calc.R

* base::pretty.default
  https://svn.r-project.org/R/trunk/src/library/base/R/pretty.R

`nclass.FD` examines the inter-quartile range of `x`, and gets a positive, but
very small floating point value -- let's call it TINYFLOAT. It inserts this
ultra-low IQR into the `nclass` denominator, which means `nclass`
becoms a huge number -- let's call it BIGFLOAT. `nclass.FD` then returns this
huge value to `hist`.

Once `hist` has its 'number of breaks' suggestion, it feeds this
number to `pretty`:

    pretty(range(x), BIGFLOAT, min.n = 1)

`pretty`, in turn, calls

    .Internal(pretty(min(x), max(x), BIGFLOAT, min.n, shrink.sml,
        c(high.u.bias, u5.bias), eps.correct))

Which fails with the error and warning shown at start of this e-mail. (Invalid
'n' argument / NA's introduced by coercion to integer range.) My reading is
that .Internal tried to coerce BIGFLOAT to integer range and produced an NA,
and that (the C implementation of) `pretty`, in turn, choked when confronted
with NA.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug: floating point bug in nclass.FD can cause hist() to crash

Spencer Graves-3
I just got the same error message with


 > sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.4

Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils
[5] datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0
 >

On 2017-05-18 3:50 PM, Sietse Brouwer wrote:

> Hello everybody,
>
> This is a bug involving functions in core R package:
> graphics::hist.default, grDevices::nclass.FD, and
> base::pretty.default. It is not yet on Bugzilla. I cannot submit it
> myself, as I do not have an account. Could somebody else add it for
> me, perhaps? That would be much appreciated.
>
> Kind regards,
>
> Sietse
> Sietse Brouwer
>
>
> Summary
> -------
>
> Floating point errors can cause a data vector to have an ultra-small
> inter-quartile range, which causes `grDevices::nclass.FD` to suggest
> an absurdly large number of breaks to `graphics::hist(breaks="FD")`.
> Because this large float becomes NA when converted to integer, hist's
> call to `base::pretty` crashes.
>
> How could nclass.FD, which has the job of suggesting a reasonable number of
> classes, avoid suggesting an absurdly large number of classes when the
> inter-quartile range is absurdly small compared to the range?
>
>
> Steps to reproduce
> ------------------
>
>      hist(c(1, 1, 1, 1 + 1e-15, 2), breaks="FD")
>
>
> Observed behaviour
> ------------------
>
> Running this code gives the following error message:
>
>      Error in pretty.default(range(x), n = breaks, min.n = 1):
>        invalid 'n' argument
>      In addition: Warning message:
>      In pretty.default(range(x), n = breaks, min.n = 1) :
>        NAs introduced by coercion to integer range
>
>
> Expected behaviour
> ------------------
>
> That hist() should never crash when given valid numerical data. Specifically,
> that it should be robust even to those rare datasets where (through floating
> point inaccuracy) the inter-quartile range is tens of orders of magnitude
> smaller than the range.
>
>
> Analysis
> --------
>
> Dramatis personae:
>
> * graphics::hist.default
>    https://svn.r-project.org/R/trunk/src/library/graphics/R/hist.R
>
> * grDevices::nclass.FD
>    https://svn.r-project.org/R/trunk/src/library/grDevices/R/calc.R
>
> * base::pretty.default
>    https://svn.r-project.org/R/trunk/src/library/base/R/pretty.R
>
> `nclass.FD` examines the inter-quartile range of `x`, and gets a positive, but
> very small floating point value -- let's call it TINYFLOAT. It inserts this
> ultra-low IQR into the `nclass` denominator, which means `nclass`
> becoms a huge number -- let's call it BIGFLOAT. `nclass.FD` then returns this
> huge value to `hist`.
>
> Once `hist` has its 'number of breaks' suggestion, it feeds this
> number to `pretty`:
>
>      pretty(range(x), BIGFLOAT, min.n = 1)
>
> `pretty`, in turn, calls
>
>      .Internal(pretty(min(x), max(x), BIGFLOAT, min.n, shrink.sml,
>          c(high.u.bias, u5.bias), eps.correct))
>
> Which fails with the error and warning shown at start of this e-mail. (Invalid
> 'n' argument / NA's introduced by coercion to integer range.) My reading is
> that .Internal tried to coerce BIGFLOAT to integer range and produced an NA,
> and that (the C implementation of) `pretty`, in turn, choked when confronted
> with NA.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug: floating point bug in nclass.FD can cause hist() to crash

Sietse Brouwer
Hi, all,

Sietse wrote:
> Floating point errors can cause a data vector to have an ultra-small
> inter-quartile range, which causes `grDevices::nclass.FD` to suggest
> an absurdly large number of breaks to `graphics::hist(breaks="FD")`.
> Because this large float becomes NA when converted to integer, hist's
> call to `base::pretty` crashes.

I have been provided with an account, and filed the bug at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17274

Discussion continues there.

Cheers,
Sietse

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel