|
Dear developers,
the current implementation of hist.default() calculates 'density' (and 'intensities') as dens <- counts/(n*h) where h has been calculated before as h <- diff(fuzzybreaks) which results in 'fuzzy' values for the density, see e.g. > tmp <- hist(1:10,breaks=c(-2.5,2.5,7.5,12.5),plot=FALSE) > print(tmp$density,digits=15) [1] 0.0399999920000016 0.1000000000000000 0.0600000000000000 Since hist.default()$breaks are not the fuzzy breaks used for the calculation of dens, the sum of the bins' area is significantly different from 1 in many cases, see e.g. > print(sum(tmp$density*diff(tmp$breaks)),digits=15) [1] 0.999999960000008 Is this intended, or should the calculation of dens read dens <- counts/(n*diff(breaks)) instead (or should hist.default()$breaks return the fuzzy breaks)? Best wishes Martin -- Dr. Martin Becker Statistics and Econometrics Saarland University Campus C3 1, Room 206 66123 Saarbruecken Germany ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
|
Dear developers,
since running 'example(hist)' produces ... hist> sum(r$density * diff(r$breaks)) # == 1 [1] 0.9999999 ... I suppose that the current behaviour of hist() is not as intended (and documented). So, please find attached (and inline below) a (trivial) patch for hist.default(). Best wishes Martin Index: src/library/graphics/R/hist.R =================================================================== --- src/library/graphics/R/hist.R (revision 51652) +++ src/library/graphics/R/hist.R (working copy) @@ -111,7 +111,7 @@ stop("negative 'counts'. Internal Error in C-code for \"bincount\"") if (sum(counts) < n) stop("some 'x' not counted; maybe 'breaks' do not span range of 'x'") - dens <- counts/(n*h) + dens <- counts/(n*diff(breaks)) mids <- 0.5 * (breaks[-1L] + breaks[-nB]) r <- structure(list(breaks = breaks, counts = counts, intensities = dens, Martin Becker wrote: > Dear developers, > > the current implementation of hist.default() calculates 'density' (and > 'intensities') as > dens <- counts/(n*h) > where h has been calculated before as > h <- diff(fuzzybreaks) > which results in 'fuzzy' values for the density, see e.g. > > > tmp <- hist(1:10,breaks=c(-2.5,2.5,7.5,12.5),plot=FALSE) > > print(tmp$density,digits=15) > [1] 0.0399999920000016 0.1000000000000000 0.0600000000000000 > > Since hist.default()$breaks are not the fuzzy breaks used for the > calculation of dens, the sum of the bins' area is significantly > different from 1 in many cases, see e.g. > > > print(sum(tmp$density*diff(tmp$breaks)),digits=15) > [1] 0.999999960000008 > > Is this intended, or should the calculation of dens read > dens <- counts/(n*diff(breaks)) > instead (or should hist.default()$breaks return the fuzzy breaks)? > > Best wishes > Martin > > -- Dr. Martin Becker Statistics and Econometrics Saarland University Campus C3 1, Room 217 66123 Saarbruecken Germany Index: src/library/graphics/R/hist.R =================================================================== --- src/library/graphics/R/hist.R (revision 51652) +++ src/library/graphics/R/hist.R (working copy) @@ -111,7 +111,7 @@ stop("negative 'counts'. Internal Error in C-code for \"bincount\"") if (sum(counts) < n) stop("some 'x' not counted; maybe 'breaks' do not span range of 'x'") - dens <- counts/(n*h) + dens <- counts/(n*diff(breaks)) mids <- 0.5 * (breaks[-1L] + breaks[-nB]) r <- structure(list(breaks = breaks, counts = counts, intensities = dens, ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
| Powered by Nabble | Edit this page |
