Hello everybody,

This is a bug involving functions in core R package:

graphics::hist.default, grDevices::nclass.FD, and

base::pretty.default. It is not yet on Bugzilla. I cannot submit it

myself, as I do not have an account. Could somebody else add it for

me, perhaps? That would be much appreciated.

Kind regards,

Sietse

Sietse Brouwer

Summary

-------

Floating point errors can cause a data vector to have an ultra-small

inter-quartile range, which causes `grDevices::nclass.FD` to suggest

an absurdly large number of breaks to `graphics::hist(breaks="FD")`.

Because this large float becomes NA when converted to integer, hist's

call to `base::pretty` crashes.

How could nclass.FD, which has the job of suggesting a reasonable number of

classes, avoid suggesting an absurdly large number of classes when the

inter-quartile range is absurdly small compared to the range?

Steps to reproduce

------------------

hist(c(1, 1, 1, 1 + 1e-15, 2), breaks="FD")

Observed behaviour

------------------

Running this code gives the following error message:

Error in pretty.default(range(x), n = breaks, min.n = 1):

invalid 'n' argument

In addition: Warning message:

In pretty.default(range(x), n = breaks, min.n = 1) :

NAs introduced by coercion to integer range

Expected behaviour

------------------

That hist() should never crash when given valid numerical data. Specifically,

that it should be robust even to those rare datasets where (through floating

point inaccuracy) the inter-quartile range is tens of orders of magnitude

smaller than the range.

Analysis

--------

Dramatis personae:

* graphics::hist.default

https://svn.r-project.org/R/trunk/src/library/graphics/R/hist.R* grDevices::nclass.FD

https://svn.r-project.org/R/trunk/src/library/grDevices/R/calc.R* base::pretty.default

https://svn.r-project.org/R/trunk/src/library/base/R/pretty.R`nclass.FD` examines the inter-quartile range of `x`, and gets a positive, but

very small floating point value -- let's call it TINYFLOAT. It inserts this

ultra-low IQR into the `nclass` denominator, which means `nclass`

becoms a huge number -- let's call it BIGFLOAT. `nclass.FD` then returns this

huge value to `hist`.

Once `hist` has its 'number of breaks' suggestion, it feeds this

number to `pretty`:

pretty(range(x), BIGFLOAT, min.n = 1)

`pretty`, in turn, calls

.Internal(pretty(min(x), max(x), BIGFLOAT, min.n, shrink.sml,

c(high.u.bias, u5.bias), eps.correct))

Which fails with the error and warning shown at start of this e-mail. (Invalid

'n' argument / NA's introduced by coercion to integer range.) My reading is

that .Internal tried to coerce BIGFLOAT to integer range and produced an NA,

and that (the C implementation of) `pretty`, in turn, choked when confronted

with NA.

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel