hist{graphics}

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

hist{graphics}

Steven Yen-2
# Can someone help with this simple frequency histogram problem (n = 15)?
# I use four class limits: [90,95], [95,100], [100,105], [105,110].
# These coincide with the limits obtain by pretty {base}.
# Proper frequencies would be: (1,5,6,3).
# But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
# with or without argument break = ...
# Replicable codes below. Thanks.

set.seed(123)
x<-rnorm(15,mean=100,sd=5); x<-as.integer(x)
x<-sort(x)
x
breaks<-seq(90,110,by=5); breaks
pretty(x,n=5) # pretty {base}
x.cut<-cut(x,breaks,right=F) ; x.cut
freq<-table(x.cut); cbind(freq)
hist(x,breaks=breaks) # hist {graphics}
hist(x)

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: hist{graphics}

Steven Yen-2
Never mind. Thanks.

I found that adding parameter right=F to the call fixes it.

On 2019/7/12 下午 05:10, Steven wrote:

> # Can someone help with this simple frequency histogram problem (n = 15)?
> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
> # These coincide with the limits obtain by pretty {base}.
> # Proper frequencies would be: (1,5,6,3).
> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
> # with or without argument break = ...
> # Replicable codes below. Thanks.
>
> set.seed(123)
> x<-rnorm(15,mean=100,sd=5); x<-as.integer(x)
> x<-sort(x)
> x
> breaks<-seq(90,110,by=5); breaks
> pretty(x,n=5) # pretty {base}
> x.cut<-cut(x,breaks,right=F) ; x.cut
> freq<-table(x.cut); cbind(freq)
> hist(x,breaks=breaks) # hist {graphics}
> hist(x)
>
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: hist{graphics}

Duncan Murdoch-2
On 12/07/2019 11:38 a.m., Steven wrote:
> Never mind. Thanks.
>
> I found that adding parameter right=F to the call fixes it.

Drawing a histogram of discrete data often leads to bad results.
Histograms are intended for continuous data, where no observations fall
on bin boundaries.

You often get a more faithful representation of discrete data using
something like

plot(table(x))

Duncan Murdoch

>
> On 2019/7/12 下午 05:10, Steven wrote:
>> # Can someone help with this simple frequency histogram problem (n = 15)?
>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>> # These coincide with the limits obtain by pretty {base}.
>> # Proper frequencies would be: (1,5,6,3).
>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>> # with or without argument break = ...
>> # Replicable codes below. Thanks.
>>
>> set.seed(123)
>> x<-rnorm(15,mean=100,sd=5); x<-as.integer(x)
>> x<-sort(x)
>> x
>> breaks<-seq(90,110,by=5); breaks
>> pretty(x,n=5) # pretty {base}
>> x.cut<-cut(x,breaks,right=F) ; x.cut
>> freq<-table(x.cut); cbind(freq)
>> hist(x,breaks=breaks) # hist {graphics}
>> hist(x)
>>
>>
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: hist{graphics}

Peter Dalgaard-2
Also checkout MASS::truehist or simply consider setting breaks so as not to coincide with data values. (hist() not doing something like this, but instead actively aiming for pretty breaks is something of a design bug in my book, but ancient history and not easy to change at this point in time.)

-pd

> On 13 Jul 2019, at 11:29 , Duncan Murdoch <[hidden email]> wrote:
>
> On 12/07/2019 11:38 a.m., Steven wrote:
>> Never mind. Thanks.
>> I found that adding parameter right=F to the call fixes it.
>
> Drawing a histogram of discrete data often leads to bad results. Histograms are intended for continuous data, where no observations fall on bin boundaries.
>
> You often get a more faithful representation of discrete data using something like
>
> plot(table(x))
>
> Duncan Murdoch
>
>> On 2019/7/12 下午 05:10, Steven wrote:
>>> # Can someone help with this simple frequency histogram problem (n = 15)?
>>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>>> # These coincide with the limits obtain by pretty {base}.
>>> # Proper frequencies would be: (1,5,6,3).
>>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>>> # with or without argument break = ...
>>> # Replicable codes below. Thanks.
>>>
>>> set.seed(123)
>>> x<-rnorm(15,mean=100,sd=5); x<-as.integer(x)
>>> x<-sort(x)
>>> x
>>> breaks<-seq(90,110,by=5); breaks
>>> pretty(x,n=5) # pretty {base}
>>> x.cut<-cut(x,breaks,right=F) ; x.cut
>>> freq<-table(x.cut); cbind(freq)
>>> hist(x,breaks=breaks) # hist {graphics}
>>> hist(x)
>>>
>>>
>>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: hist{graphics}

Duncan Mackay-4
Also there is

@ARTICLE{JCGS18-0021,
  author = {Denby, L. and Mallows, C.},
  year = {2009},
  title = {Variations on the histogram},
  journal = {Journal of Computational and Graphical Statistics},
  volume = {18},
  number = {1},
  pages = {21--31},
  doi = {10.1198/jcgs.2009.0002},
  abstract = {When constructing a histogram, it is common to make all bars the same
        width. One could also choose to make them all have the same area.
        These two options have complementary strengths and weaknesses; the
        equal-width histogram oversmooths in regions of high density, and
        is poor at identifying sharp peaks; the equal-area histogram oversmooths
        in regions of low density, and so does not identify outliers. We
        describe a compromise approach which avoids both of these defects.
        We regard the histogram as an exploratory device, rather than as
        an estimate of a density. We argue that relying on the asymptotics
        of integrated mean squared error leads to inappropriate recommendations
        for choosing bin-widths. Datasets and R codes are available in the
        online supplements.},
  keywords = {diagonally-cut histogram; equal-area histogram; asymptotics;
        IMSE.},
}

I have not looked at the site for a while but I think it has some code in ?Splus which should work in R.
This follows a report in the same name which appears to be no longer available at the original site which has code

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2350

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of peter dalgaard
Sent: Sunday, 14 July 2019 02:15
To: Duncan Murdoch
Cc: [hidden email]; Steven
Subject: Re: [R] hist{graphics}

Also checkout MASS::truehist or simply consider setting breaks so as not to coincide with data values. (hist() not doing something like this, but instead actively aiming for pretty breaks is something of a design bug in my book, but ancient history and not easy to change at this point in time.)

-pd

> On 13 Jul 2019, at 11:29 , Duncan Murdoch <[hidden email]> wrote:
>
> On 12/07/2019 11:38 a.m., Steven wrote:
>> Never mind. Thanks.
>> I found that adding parameter right=F to the call fixes it.
>
> Drawing a histogram of discrete data often leads to bad results. Histograms are intended for continuous data, where no observations fall on bin boundaries.
>
> You often get a more faithful representation of discrete data using something like
>
> plot(table(x))
>
> Duncan Murdoch
>
>> On 2019/7/12 下午 05:10, Steven wrote:
>>> # Can someone help with this simple frequency histogram problem (n = 15)?
>>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
>>> # These coincide with the limits obtain by pretty {base}.
>>> # Proper frequencies would be: (1,5,6,3).
>>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
>>> # with or without argument break = ...
>>> # Replicable codes below. Thanks.
>>>
>>> set.seed(123)
>>> x<-rnorm(15,mean=100,sd=5); x<-as.integer(x)
>>> x<-sort(x)
>>> x
>>> breaks<-seq(90,110,by=5); breaks
>>> pretty(x,n=5) # pretty {base}
>>> x.cut<-cut(x,breaks,right=F) ; x.cut
>>> freq<-table(x.cut); cbind(freq)
>>> hist(x,breaks=breaks) # hist {graphics}
>>> hist(x)
>>>
>>>
>>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: hist{graphics}

Martin Maechler
In reply to this post by Duncan Murdoch-2
>>>>> Duncan Murdoch
>>>>>     on Sat, 13 Jul 2019 05:29:18 -0400 writes:

    > On 12/07/2019 11:38 a.m., Steven wrote:
    >> Never mind. Thanks.
    >>
    >> I found that adding parameter right=F to the call fixes it.

    > Drawing a histogram of discrete data often leads to bad results.
    > Histograms are intended for continuous data, where no observations fall
    > on bin boundaries.

    > You often get a more faithful representation of discrete data using
    > something like

    >   plot(table(x))

    > Duncan Murdoch

yes!!
        including   plot(<factor>)  
 
[ if you really want, you can add something like   'lwd = 4' there ]

And relatedly, possibly more generally:

Many many people and hence useRs do
*NOT* distinguish between what R (and I think statistical graphics more
generally) calls  *histograms*  on one side vs
*bar plots* / *bar charts* / "spear charts"(?) etc on the other.

As Duncan said: Visually distinguishing quantities that are
inherently (mostly/almost) continuous ["mostly/..": think of quantum physics]
from those that are inherently "integer-like" or  categorical.

We (the R user community, notably the graphically oriented
subset) should really strive to keep these concepts and the
corresponding visualizations separate as well as possible
[and educate the consumers of our graphics if necessary ..]

Martin Maechler
ETH Zurich and R Core Team

    >> On 2019/7/12 下午 05:10, Steven wrote:
    >>> # Can someone help with this simple frequency histogram problem (n = 15)?
    >>> # I use four class limits: [90,95], [95,100], [100,105], [105,110].
    >>> # These coincide with the limits obtain by pretty {base}.
    >>> # Proper frequencies would be: (1,5,6,3).
    >>> # But hist{graphics} gives me a histogram showing frequencies (1,8,3,3),
    >>> # with or without argument break = ...
    >>> # Replicable codes below. Thanks.
    >>>
    >>> set.seed(123)
    >>> x<-rnorm(15,mean=100,sd=5); x<-as.integer(x)
    >>> x<-sort(x)
    >>> x
    >>> breaks<-seq(90,110,by=5); breaks
    >>> pretty(x,n=5) # pretty {base}
    >>> x.cut<-cut(x,breaks,right=F) ; x.cut
    >>> freq<-table(x.cut); cbind(freq)
    >>> hist(x,breaks=breaks) # hist {graphics}
    >>> hist(x)
    >>>
    >>>
    >>>
    >>
    >> ______________________________________________
    >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.
    >>

    > ______________________________________________
    > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.