

# Can someone help with this simple frequency histogram problem (n = 15)?
# I use four class limits: [90,95], [95,100], [100,105], [105,110].
# These coincide with the limits obtain by pretty {base}.
# Proper frequencies would be: (1,5,6,3).
# But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
# with or without argument break = ...
# Replicable codes below. Thanks.
set.seed(123)
x<rnorm(15,mean=100,sd=5); x<as.integer(x)
x<sort(x)
x
breaks<seq(90,110,by=5); breaks
pretty(x,n=5) # pretty {base}
x.cut<cut(x,breaks,right=F) ; x.cut
freq<table(x.cut); cbind(freq)
hist(x,breaks=breaks) # hist {graphics}
hist(x)
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Never mind. Thanks.
I found that adding parameter right=F to the call fixes it.
On 2019/7/12 下午 05:10, Steven wrote:
> # Can someone help with this simple frequency histogram problem (n = 15)?
> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
> # These coincide with the limits obtain by pretty {base}.
> # Proper frequencies would be: (1,5,6,3).
> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
> # with or without argument break = ...
> # Replicable codes below. Thanks.
>
> set.seed(123)
> x<rnorm(15,mean=100,sd=5); x<as.integer(x)
> x<sort(x)
> x
> breaks<seq(90,110,by=5); breaks
> pretty(x,n=5) # pretty {base}
> x.cut<cut(x,breaks,right=F) ; x.cut
> freq<table(x.cut); cbind(freq)
> hist(x,breaks=breaks) # hist {graphics}
> hist(x)
>
>
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 12/07/2019 11:38 a.m., Steven wrote:
> Never mind. Thanks.
>
> I found that adding parameter right=F to the call fixes it.
Drawing a histogram of discrete data often leads to bad results.
Histograms are intended for continuous data, where no observations fall
on bin boundaries.
You often get a more faithful representation of discrete data using
something like
plot(table(x))
Duncan Murdoch
>
> On 2019/7/12 下午 05:10, Steven wrote:
>> # Can someone help with this simple frequency histogram problem (n = 15)?
>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>> # These coincide with the limits obtain by pretty {base}.
>> # Proper frequencies would be: (1,5,6,3).
>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>> # with or without argument break = ...
>> # Replicable codes below. Thanks.
>>
>> set.seed(123)
>> x<rnorm(15,mean=100,sd=5); x<as.integer(x)
>> x<sort(x)
>> x
>> breaks<seq(90,110,by=5); breaks
>> pretty(x,n=5) # pretty {base}
>> x.cut<cut(x,breaks,right=F) ; x.cut
>> freq<table(x.cut); cbind(freq)
>> hist(x,breaks=breaks) # hist {graphics}
>> hist(x)
>>
>>
>>
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Also checkout MASS::truehist or simply consider setting breaks so as not to coincide with data values. (hist() not doing something like this, but instead actively aiming for pretty breaks is something of a design bug in my book, but ancient history and not easy to change at this point in time.)
pd
> On 13 Jul 2019, at 11:29 , Duncan Murdoch < [hidden email]> wrote:
>
> On 12/07/2019 11:38 a.m., Steven wrote:
>> Never mind. Thanks.
>> I found that adding parameter right=F to the call fixes it.
>
> Drawing a histogram of discrete data often leads to bad results. Histograms are intended for continuous data, where no observations fall on bin boundaries.
>
> You often get a more faithful representation of discrete data using something like
>
> plot(table(x))
>
> Duncan Murdoch
>
>> On 2019/7/12 下午 05:10, Steven wrote:
>>> # Can someone help with this simple frequency histogram problem (n = 15)?
>>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>>> # These coincide with the limits obtain by pretty {base}.
>>> # Proper frequencies would be: (1,5,6,3).
>>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>>> # with or without argument break = ...
>>> # Replicable codes below. Thanks.
>>>
>>> set.seed(123)
>>> x<rnorm(15,mean=100,sd=5); x<as.integer(x)
>>> x<sort(x)
>>> x
>>> breaks<seq(90,110,by=5); breaks
>>> pretty(x,n=5) # pretty {base}
>>> x.cut<cut(x,breaks,right=F) ; x.cut
>>> freq<table(x.cut); cbind(freq)
>>> hist(x,breaks=breaks) # hist {graphics}
>>> hist(x)
>>>
>>>
>>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Also there is
@ARTICLE{JCGS180021,
author = {Denby, L. and Mallows, C.},
year = {2009},
title = {Variations on the histogram},
journal = {Journal of Computational and Graphical Statistics},
volume = {18},
number = {1},
pages = {2131},
doi = {10.1198/jcgs.2009.0002},
abstract = {When constructing a histogram, it is common to make all bars the same
width. One could also choose to make them all have the same area.
These two options have complementary strengths and weaknesses; the
equalwidth histogram oversmooths in regions of high density, and
is poor at identifying sharp peaks; the equalarea histogram oversmooths
in regions of low density, and so does not identify outliers. We
describe a compromise approach which avoids both of these defects.
We regard the histogram as an exploratory device, rather than as
an estimate of a density. We argue that relying on the asymptotics
of integrated mean squared error leads to inappropriate recommendations
for choosing binwidths. Datasets and R codes are available in the
online supplements.},
keywords = {diagonallycut histogram; equalarea histogram; asymptotics;
IMSE.},
}
I have not looked at the site for a while but I think it has some code in ?Splus which should work in R.
This follows a report in the same name which appears to be no longer available at the original site which has code
Regards
Duncan
Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2350
Original Message
From: Rhelp [mailto: [hidden email]] On Behalf Of peter dalgaard
Sent: Sunday, 14 July 2019 02:15
To: Duncan Murdoch
Cc: [hidden email]; Steven
Subject: Re: [R] hist{graphics}
Also checkout MASS::truehist or simply consider setting breaks so as not to coincide with data values. (hist() not doing something like this, but instead actively aiming for pretty breaks is something of a design bug in my book, but ancient history and not easy to change at this point in time.)
pd
> On 13 Jul 2019, at 11:29 , Duncan Murdoch < [hidden email]> wrote:
>
> On 12/07/2019 11:38 a.m., Steven wrote:
>> Never mind. Thanks.
>> I found that adding parameter right=F to the call fixes it.
>
> Drawing a histogram of discrete data often leads to bad results. Histograms are intended for continuous data, where no observations fall on bin boundaries.
>
> You often get a more faithful representation of discrete data using something like
>
> plot(table(x))
>
> Duncan Murdoch
>
>> On 2019/7/12 下午 05:10, Steven wrote:
>>> # Can someone help with this simple frequency histogram problem (n = 15)?
>>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>>> # These coincide with the limits obtain by pretty {base}.
>>> # Proper frequencies would be: (1,5,6,3).
>>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>>> # with or without argument break = ...
>>> # Replicable codes below. Thanks.
>>>
>>> set.seed(123)
>>> x<rnorm(15,mean=100,sd=5); x<as.integer(x)
>>> x<sort(x)
>>> x
>>> breaks<seq(90,110,by=5); breaks
>>> pretty(x,n=5) # pretty {base}
>>> x.cut<cut(x,breaks,right=F) ; x.cut
>>> freq<table(x.cut); cbind(freq)
>>> hist(x,breaks=breaks) # hist {graphics}
>>> hist(x)
>>>
>>>
>>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


>>>>> Duncan Murdoch
>>>>> on Sat, 13 Jul 2019 05:29:18 0400 writes:
> On 12/07/2019 11:38 a.m., Steven wrote:
>> Never mind. Thanks.
>>
>> I found that adding parameter right=F to the call fixes it.
> Drawing a histogram of discrete data often leads to bad results.
> Histograms are intended for continuous data, where no observations fall
> on bin boundaries.
> You often get a more faithful representation of discrete data using
> something like
> plot(table(x))
> Duncan Murdoch
yes!!
including plot(<factor>)
[ if you really want, you can add something like 'lwd = 4' there ]
And relatedly, possibly more generally:
Many many people and hence useRs do
*NOT* distinguish between what R (and I think statistical graphics more
generally) calls *histograms* on one side vs
*bar plots* / *bar charts* / "spear charts"(?) etc on the other.
As Duncan said: Visually distinguishing quantities that are
inherently (mostly/almost) continuous ["mostly/..": think of quantum physics]
from those that are inherently "integerlike" or categorical.
We (the R user community, notably the graphically oriented
subset) should really strive to keep these concepts and the
corresponding visualizations separate as well as possible
[and educate the consumers of our graphics if necessary ..]
Martin Maechler
ETH Zurich and R Core Team
>> On 2019/7/12 下午 05:10, Steven wrote:
>>> # Can someone help with this simple frequency histogram problem (n = 15)?
>>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>>> # These coincide with the limits obtain by pretty {base}.
>>> # Proper frequencies would be: (1,5,6,3).
>>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>>> # with or without argument break = ...
>>> # Replicable codes below. Thanks.
>>>
>>> set.seed(123)
>>> x<rnorm(15,mean=100,sd=5); x<as.integer(x)
>>> x<sort(x)
>>> x
>>> breaks<seq(90,110,by=5); breaks
>>> pretty(x,n=5) # pretty {base}
>>> x.cut<cut(x,breaks,right=F) ; x.cut
>>> freq<table(x.cut); cbind(freq)
>>> hist(x,breaks=breaks) # hist {graphics}
>>> hist(x)
>>>
>>>
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp >> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html >> and provide commented, minimal, selfcontained, reproducible code.
>>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp > PLEASE do read the posting guide http://www.Rproject.org/postingguide.html > and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

