Re-binning histogram data

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re-binning histogram data

cokelid
Hi,

Short Version:
Is there a function to re-bin a histogram to new, broader bins?

Long version: I'm trying to create a histogram, however my input-data is
itself in the form of a fine-grained histogram, i.e. numbers of counts
in regular one-second bins. I want to produce a histogram of, say,
10-minute bins (though possibly irregular bins also).

I suppose I could re-create a data set as expected by the hist() function
(i.e. if time t=3600 has 6 counts, add six entries of 3600 to a list)
however this seems neither elegant nor efficient (though I'd be pleased to
be mistaken!). I could then re-create a histogram as normal.

I guessing there's a better solution however! Apologies if this is a basic
question - I'm rather new to R and trying to get up to speed.

Regards,

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

PIKAL Petr
Hi

try truehist from MASS package and look for argument breaks or h.

HTH
Petr




On 8 Jun 2006 at 10:46, Justin Ashmall wrote:

Date sent:       Thu, 8 Jun 2006 10:46:19 +0100 (BST)
From:           Justin Ashmall <[hidden email]>
To:             [hidden email]
Subject:         [R] Re-binning histogram data

> Hi,
>
> Short Version:
> Is there a function to re-bin a histogram to new, broader bins?
>
> Long version: I'm trying to create a histogram, however my input-data
> is itself in the form of a fine-grained histogram, i.e. numbers of
> counts in regular one-second bins. I want to produce a histogram of,
> say, 10-minute bins (though possibly irregular bins also).
>
> I suppose I could re-create a data set as expected by the hist()
> function (i.e. if time t=3600 has 6 counts, add six entries of 3600 to
> a list) however this seems neither elegant nor efficient (though I'd
> be pleased to be mistaken!). I could then re-create a histogram as
> normal.
>
> I guessing there's a better solution however! Apologies if this is a
> basic question - I'm rather new to R and trying to get up to speed.
>
> Regards,
>
> Justin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

cokelid

Thanks for the reply Petr,

It looks to me that truehist() needs a vector of data just like hist()?
Whereas I have histogram-style input data? Am I missing something?

Cheers,

Justin



On Thu, 8 Jun 2006, Petr Pikal wrote:

> Hi
>
> try truehist from MASS package and look for argument breaks or h.
>
> HTH
> Petr
>
>
>
>
> On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
>
> Date sent:       Thu, 8 Jun 2006 10:46:19 +0100 (BST)
> From:           Justin Ashmall <[hidden email]>
> To:             [hidden email]
> Subject:         [R] Re-binning histogram data
>
>> Hi,
>>
>> Short Version:
>> Is there a function to re-bin a histogram to new, broader bins?
>>
>> Long version: I'm trying to create a histogram, however my input-data
>> is itself in the form of a fine-grained histogram, i.e. numbers of
>> counts in regular one-second bins. I want to produce a histogram of,
>> say, 10-minute bins (though possibly irregular bins also).
>>
>> I suppose I could re-create a data set as expected by the hist()
>> function (i.e. if time t=3600 has 6 counts, add six entries of 3600 to
>> a list) however this seems neither elegant nor efficient (though I'd
>> be pleased to be mistaken!). I could then re-create a histogram as
>> normal.
>>
>> I guessing there's a better solution however! Apologies if this is a
>> basic question - I'm rather new to R and trying to get up to speed.
>>
>> Regards,
>>
>> Justin
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
> Petr Pikal
> [hidden email]
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

PIKAL Petr


On 8 Jun 2006 at 11:35, Justin Ashmall wrote:

Date sent:       Thu, 8 Jun 2006 11:35:46 +0100 (BST)
From:           Justin Ashmall <[hidden email]>
To:             Petr Pikal <[hidden email]>
Copies to:       [hidden email]
Subject:         Re: [R] Re-binning histogram data

>
> Thanks for the reply Petr,
>
> It looks to me that truehist() needs a vector of data just like
> hist()? Whereas I have histogram-style input data? Am I missing
> something?

Well, maybe you could use barplot. Or as you suggested recreate the
original vector and call hist or truehist with other bins.

> hhh<-hist(rnorm(1000))
> barplot(tapply(hhh$counts, c(rep(1:7,each=2),7), sum))
> tapply(hhh$mids, c(rep(1:7,each=2),7), mean)
    1     2     3     4     5     6     7
-3.00 -2.00 -1.00  0.00  1.00  2.00  3.25
> hhh1<-rep(hhh$mids,hhh$counts)
> plot(hhh, freq=F)
> lines(density(hhh1))
>

HTH
Petr






>
> Cheers,
>
> Justin
>
>
>
> On Thu, 8 Jun 2006, Petr Pikal wrote:
>
> > Hi
> >
> > try truehist from MASS package and look for argument breaks or h.
> >
> > HTH
> > Petr
> >
> >
> >
> >
> > On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
> >
> > Date sent:       Thu, 8 Jun 2006 10:46:19 +0100 (BST)
> > From:           Justin Ashmall <[hidden email]>
> > To:             [hidden email]
> > Subject:         [R] Re-binning histogram data
> >
> >> Hi,
> >>
> >> Short Version:
> >> Is there a function to re-bin a histogram to new, broader bins?
> >>
> >> Long version: I'm trying to create a histogram, however my
> >> input-data is itself in the form of a fine-grained histogram, i.e.
> >> numbers of counts in regular one-second bins. I want to produce a
> >> histogram of, say, 10-minute bins (though possibly irregular bins
> >> also).
> >>
> >> I suppose I could re-create a data set as expected by the hist()
> >> function (i.e. if time t=3600 has 6 counts, add six entries of 3600
> >> to a list) however this seems neither elegant nor efficient (though
> >> I'd be pleased to be mistaken!). I could then re-create a histogram
> >> as normal.
> >>
> >> I guessing there's a better solution however! Apologies if this is
> >> a basic question - I'm rather new to R and trying to get up to
> >> speed.
> >>
> >> Regards,
> >>
> >> Justin
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide!
> >> http://www.R-project.org/posting-guide.html
> >
> > Petr Pikal
> > [hidden email]
> >
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Bert Gunter
I would argue that histograms are outdated relics and that density plots
(whatever your favorite flavor is) should **always** be used instead these
days.

In this vein, I would appreciate critical rejoinders (public or private) to
the following proposition: Given modern computer power and software like R
on multi ghz machines, statistical and graphical relics of the pre-computer
era (like histograms, low resolution printer-type plots, and perhaps even
method of moments EMS calculations) should be abandoned in favor of superior
but perhaps computation-intensive alternatives (like density plots, high
resolution plots, and likelihood or resampling or Bayes based methods).

NB: Please -- no pleadings that new methods would be mystifying to the
non-cogniscenti. Following that to its logical conclusion would mean that
we'd all have to give up our TV remotes and cell phones, and what kind of
world would that be?! :-)

-- Bert Gunter

 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Petr Pikal
> Sent: Thursday, June 08, 2006 6:17 AM
> To: Justin Ashmall; [hidden email]
> Subject: Re: [R] Re-binning histogram data
>
>
>
> On 8 Jun 2006 at 11:35, Justin Ashmall wrote:
>
> Date sent:       Thu, 8 Jun 2006 11:35:46 +0100 (BST)
> From:           Justin Ashmall <[hidden email]>
> To:             Petr Pikal <[hidden email]>
> Copies to:       [hidden email]
> Subject:         Re: [R] Re-binning histogram data
>
> >
> > Thanks for the reply Petr,
> >
> > It looks to me that truehist() needs a vector of data just like
> > hist()? Whereas I have histogram-style input data? Am I missing
> > something?
>
> Well, maybe you could use barplot. Or as you suggested recreate the
> original vector and call hist or truehist with other bins.
>
> > hhh<-hist(rnorm(1000))
> > barplot(tapply(hhh$counts, c(rep(1:7,each=2),7), sum))
> > tapply(hhh$mids, c(rep(1:7,each=2),7), mean)
>     1     2     3     4     5     6     7
> -3.00 -2.00 -1.00  0.00  1.00  2.00  3.25
> > hhh1<-rep(hhh$mids,hhh$counts)
> > plot(hhh, freq=F)
> > lines(density(hhh1))
> >
>
> HTH
> Petr
>
>
>
>
>
>
> >
> > Cheers,
> >
> > Justin
> >
> >
> >
> > On Thu, 8 Jun 2006, Petr Pikal wrote:
> >
> > > Hi
> > >
> > > try truehist from MASS package and look for argument breaks or h.
> > >
> > > HTH
> > > Petr
> > >
> > >
> > >
> > >
> > > On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
> > >
> > > Date sent:       Thu, 8 Jun 2006 10:46:19 +0100 (BST)
> > > From:           Justin Ashmall <[hidden email]>
> > > To:             [hidden email]
> > > Subject:         [R] Re-binning histogram data
> > >
> > >> Hi,
> > >>
> > >> Short Version:
> > >> Is there a function to re-bin a histogram to new, broader bins?
> > >>
> > >> Long version: I'm trying to create a histogram, however my
> > >> input-data is itself in the form of a fine-grained
> histogram, i.e.
> > >> numbers of counts in regular one-second bins. I want to produce a
> > >> histogram of, say, 10-minute bins (though possibly irregular bins
> > >> also).
> > >>
> > >> I suppose I could re-create a data set as expected by the hist()
> > >> function (i.e. if time t=3600 has 6 counts, add six
> entries of 3600
> > >> to a list) however this seems neither elegant nor
> efficient (though
> > >> I'd be pleased to be mistaken!). I could then re-create
> a histogram
> > >> as normal.
> > >>
> > >> I guessing there's a better solution however! Apologies
> if this is
> > >> a basic question - I'm rather new to R and trying to get up to
> > >> speed.
> > >>
> > >> Regards,
> > >>
> > >> Justin
> > >>
> > >> ______________________________________________
> > >> [hidden email] mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide!
> > >> http://www.R-project.org/posting-guide.html
> > >
> > > Petr Pikal
> > > [hidden email]
> > >
> > >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
>
> Petr Pikal
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Ted.Harding
On 08-Jun-06 Berton Gunter wrote:

> I would argue that histograms are outdated relics and that density
> plots (whatever your favorite flavor is) should **always** be used
> instead these days.
>
> In this vein, I would appreciate critical rejoinders (public or
> private) to the following proposition: Given modern computer power
> and software like R on multi ghz machines, statistical and graphical
> relics of the pre-computer era (like histograms, low resolution
> printer-type plots, and perhaps even method of moments EMS
> calculations) should be abandoned in favor of superior but perhaps
> computation-intensive alternatives (like density plots, high
> resolution plots, and likelihood or resampling or Bayes based methods).

While your head is above the parapet, Bert ...

Your general question could go in many directions, but there's a
lot to be said for that point of view (as well as some against).

However, my short answer is that it's a matter of horses for courses.

In particular, where the histogram is concerned, it has a straightforward
property that it exactly represents the information about the counts
within the bin-ranges. While usually the bars are not labelled with
count values, you can (and I quite often have, when it was the only
way) recover the counts using a ruler graduated in millimetres. And
the same time it usually (if judiciously constructed) presents a
good blockwise representation of the implied underlying continuous
distribution.

A continuous density estimation may be a better and smoother (or
at least more appealing) representation of the distribution (though
you would need to be careful about local humps), but to recover the
data from it would take a combination of optical scanning, image
analysis software, and (if you don't know what smoothing method
was used) heuristic algorithm-inference software. Well within
your technological utopia, of course, but ...

> NB: Please -- no pleadings that new methods would be mystifying
> to the non-cogniscenti. Following that to its logical conclusion
> would mean that we'd all have to give up our TV remotes and cell
> phones, and what kind of world would that be?! :-)

One day, let me show you how to use my wooden plough-share.

Best wishes,
Ted.

PS Please bring your own horse.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jun-06                                       Time: 17:16:53
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

cokelid
In reply to this post by Bert Gunter
> histograms [...] should be abandoned in favor of [...] density plots.

I take your point Bert, but I think there is value in data that is simple
and can be intuitively understood.

For my application, 10-minutes is a good charactersitic chunk of time, and
I have an intuitive feeling of how many events I would expect to see in a
10-minute period. By looking at a histogram with 10-minute bins I can
tell immediately if something looks amiss. I could not do this simply with
a pdf. Similarly histograms have the nice feature of compartmentalising
bad data. Perhaps this is practical-use vs mathematical-idealism?

Also it's a case of simple tools for simple jobs. If the handle is loose
on my kitchen cabient, I tighten the screw with a screwdriver or even the
tip of a knife from the drawer. I don't need my power-drill with
torque-controlled screwdriver attachement, much as I love it.

Justin


On Thu, 8 Jun 2006, Berton Gunter wrote:

> I would argue that histograms are outdated relics and that density plots
> (whatever your favorite flavor is) should **always** be used instead these
> days.
>
> In this vein, I would appreciate critical rejoinders (public or private) to
> the following proposition: Given modern computer power and software like R
> on multi ghz machines, statistical and graphical relics of the pre-computer
> era (like histograms, low resolution printer-type plots, and perhaps even
> method of moments EMS calculations) should be abandoned in favor of superior
> but perhaps computation-intensive alternatives (like density plots, high
> resolution plots, and likelihood or resampling or Bayes based methods).
>
> NB: Please -- no pleadings that new methods would be mystifying to the
> non-cogniscenti. Following that to its logical conclusion would mean that
> we'd all have to give up our TV remotes and cell phones, and what kind of
> world would that be?! :-)
>
> -- Bert Gunter
>
>
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Petr Pikal
>> Sent: Thursday, June 08, 2006 6:17 AM
>> To: Justin Ashmall; [hidden email]
>> Subject: Re: [R] Re-binning histogram data
>>
>>
>>
>> On 8 Jun 2006 at 11:35, Justin Ashmall wrote:
>>
>> Date sent:       Thu, 8 Jun 2006 11:35:46 +0100 (BST)
>> From:           Justin Ashmall <[hidden email]>
>> To:             Petr Pikal <[hidden email]>
>> Copies to:       [hidden email]
>> Subject:         Re: [R] Re-binning histogram data
>>
>>>
>>> Thanks for the reply Petr,
>>>
>>> It looks to me that truehist() needs a vector of data just like
>>> hist()? Whereas I have histogram-style input data? Am I missing
>>> something?
>>
>> Well, maybe you could use barplot. Or as you suggested recreate the
>> original vector and call hist or truehist with other bins.
>>
>>> hhh<-hist(rnorm(1000))
>>> barplot(tapply(hhh$counts, c(rep(1:7,each=2),7), sum))
>>> tapply(hhh$mids, c(rep(1:7,each=2),7), mean)
>>     1     2     3     4     5     6     7
>> -3.00 -2.00 -1.00  0.00  1.00  2.00  3.25
>>> hhh1<-rep(hhh$mids,hhh$counts)
>>> plot(hhh, freq=F)
>>> lines(density(hhh1))
>>>
>>
>> HTH
>> Petr
>>
>>
>>
>>
>>
>>
>>>
>>> Cheers,
>>>
>>> Justin
>>>
>>>
>>>
>>> On Thu, 8 Jun 2006, Petr Pikal wrote:
>>>
>>>> Hi
>>>>
>>>> try truehist from MASS package and look for argument breaks or h.
>>>>
>>>> HTH
>>>> Petr
>>>>
>>>>
>>>>
>>>>
>>>> On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
>>>>
>>>> Date sent:       Thu, 8 Jun 2006 10:46:19 +0100 (BST)
>>>> From:           Justin Ashmall <[hidden email]>
>>>> To:             [hidden email]
>>>> Subject:         [R] Re-binning histogram data
>>>>
>>>>> Hi,
>>>>>
>>>>> Short Version:
>>>>> Is there a function to re-bin a histogram to new, broader bins?
>>>>>
>>>>> Long version: I'm trying to create a histogram, however my
>>>>> input-data is itself in the form of a fine-grained
>> histogram, i.e.
>>>>> numbers of counts in regular one-second bins. I want to produce a
>>>>> histogram of, say, 10-minute bins (though possibly irregular bins
>>>>> also).
>>>>>
>>>>> I suppose I could re-create a data set as expected by the hist()
>>>>> function (i.e. if time t=3600 has 6 counts, add six
>> entries of 3600
>>>>> to a list) however this seems neither elegant nor
>> efficient (though
>>>>> I'd be pleased to be mistaken!). I could then re-create
>> a histogram
>>>>> as normal.
>>>>>
>>>>> I guessing there's a better solution however! Apologies
>> if this is
>>>>> a basic question - I'm rather new to R and trying to get up to
>>>>> speed.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Justin
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide!
>>>>> http://www.R-project.org/posting-guide.html
>>>>
>>>> Petr Pikal
>>>> [hidden email]
>>>>
>>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>
>> Petr Pikal
>> [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Spencer Graves
In reply to this post by Ted.Harding
Hi, Bert, Ted, et al.:

          Do you use normal probability plots?  They are the best tool I know
for identifying all kinds of nonnormality, including normal mixtures
with either outliers or multimodality as well as skewness.  I've
experimented with PP and nonnormal QQ plots, and not found them that
useful.  I prefer to transform the data to apparent normality, because
that seems to produce about the right amount of visual separation in the
tails:  A PP plot provides very poor resolution of tail behavior, and
the image in a QQ plot with longer than normal tails becomes for me so
overwhelmed by random tail behavior that I'm unable to make sense of it.

          Also, could someone explain the rationale behind the "datax=FALSE"
default?  I presume this default was established before research showed
that humans have better judgment about vertical and horizontal lines
than lines at other angles, and that 45 degree lines are more easily
judged than lines at other angles.  This research led to "the 45 degree
banking rule (see _Visualizing Data_ by William S. Cleveland for
details)", mentioned on the help page for xyplot{lattice}.

          In my experience, most normal probability plots come closer to
meeting this "45 degree banking rule" when datax=TRUE than FALSE.  With
a typical aspect ratio, normally distributed data will appear with an
angle less than 45 degrees.  An outlier with the default datax=FALSE
will reduce that 45 degrees, making it harder to process visually.  By
contrast, with datax=TRUE, an outlier increases the banking, moving it
closer to (or even beyond) the 45 degree line that seems to facilitate
the best human visual processing.

          Beyond this, what do you think about combining a normal plot with
either a histogram or a density estimate on the bottom?  With multiple
lines on the normal probability plot, I've seen stacked-bar histograms
on the bottom that seemed intelligible.  Would you suggest replacing
stacked-bar histograms with overlapping plots of density estimates?  And
how many observations would you need in each group for that to make sense?

          What do you think?
          Best Wishes,
          Spencer Graves


(Ted Harding) wrote:

> On 08-Jun-06 Berton Gunter wrote:
>> I would argue that histograms are outdated relics and that density
>> plots (whatever your favorite flavor is) should **always** be used
>> instead these days.
>>
>> In this vein, I would appreciate critical rejoinders (public or
>> private) to the following proposition: Given modern computer power
>> and software like R on multi ghz machines, statistical and graphical
>> relics of the pre-computer era (like histograms, low resolution
>> printer-type plots, and perhaps even method of moments EMS
>> calculations) should be abandoned in favor of superior but perhaps
>> computation-intensive alternatives (like density plots, high
>> resolution plots, and likelihood or resampling or Bayes based methods).
>
> While your head is above the parapet, Bert ...
>
> Your general question could go in many directions, but there's a
> lot to be said for that point of view (as well as some against).
>
> However, my short answer is that it's a matter of horses for courses.
>
> In particular, where the histogram is concerned, it has a straightforward
> property that it exactly represents the information about the counts
> within the bin-ranges. While usually the bars are not labelled with
> count values, you can (and I quite often have, when it was the only
> way) recover the counts using a ruler graduated in millimetres. And
> the same time it usually (if judiciously constructed) presents a
> good blockwise representation of the implied underlying continuous
> distribution.
>
> A continuous density estimation may be a better and smoother (or
> at least more appealing) representation of the distribution (though
> you would need to be careful about local humps), but to recover the
> data from it would take a combination of optical scanning, image
> analysis software, and (if you don't know what smoothing method
> was used) heuristic algorithm-inference software. Well within
> your technological utopia, of course, but ...
>
>> NB: Please -- no pleadings that new methods would be mystifying
>> to the non-cogniscenti. Following that to its logical conclusion
>> would mean that we'd all have to give up our TV remotes and cell
>> phones, and what kind of world would that be?! :-)
>
> One day, let me show you how to use my wooden plough-share.
>
> Best wishes,
> Ted.
>
> PS Please bring your own horse.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <[hidden email]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 08-Jun-06                                       Time: 17:16:53
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

François Pinard
In reply to this post by Bert Gunter
[Berton Gunter]

> I would argue that histograms are outdated relics and that density  
> plots (whatever your favorite flavor is) should **always** be used
> instead these days.

When a now retired researcher paid us a visit, I showed him a density
plot produced by R over some data he did work a lot, before he left.
I, too, find them rather sexy, and I wanted to impress him with some of
the pleasures of R, especially knowing he has been a dedicated user of
SAS in his times.  Yet, this old and wise man _immediately_ caught that
the density curve was leaking a tiny bit through the extrema.

Not a big deal of course -- and he did like what he saw.  Nevertheless,
this reminded me that we should be careful at not dismissing too lightly
years of accumulated knowledge, experience and know-how, merely because
we give in joyful enthusiasm for more recent things.

Let me make a comparison, looking at the R mailing lists themselves.  
Some would much like sending HTML email in here: they would get colours,
use various fonts, offer links, and have indentation which dynamically
adapts on the receiving end to the window size of the reading guy.  But
the collective wisdom is to stick to non-HTML email, which is quite
proven and still very functional, after all.  Some impatient people or
dubious tools use other things than fixed-width fonts while presenting
text/plain email, or merely ignore the usual 79-column limit and other
oldish etiquette issues while sending it: in last analysis, they kibitz
the community more than they help it, and deep down, are a bit selfish.  
There is a long way to go before HTML email is really ubiquitous and
correctly supported.  Consider the long time MIME took to establish
itself: even now, email readers correctly supporting MIME are hard to
find -- most are fond on gadgets much more than they know standards.

Another comparison which pops to my mind is how some people fanatically
try to impose UTF-8 all around, saying that ASCII or ISO-8859-1 (and
many others) are part of the prehistory of computers.  When mere users,
they can always talk without making too much damage.  But I've seen
a few maintainers going overboard on such matters, consciously breaking
software to force their convictions forward: "Crois ou meurs!" as we say
in French (approximately: "Believe or perish!").  Here, just like for
HTML mail or nicer bitmapped R graphics, Unicode does have technical
merit; the truth is that we are _far_ from mastering everything about
it, and there are lots of open issues that are not strictly technical.

Many proponent of these various things are tempted to say that they want
to clean out the planet of outdated relics (I liked your expression!)
and have the honest feeling they do trigger overall progress.  Moreover,
new good things do not necessarily make older things wrong.  In a word,
we should rather wait for progress with calm, and with respectful care
of what already exists.  Progress will impose itself slowly over time,
and is not so much in need of forceful evangelists. :-)

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Charles Annis, P.E.
In reply to this post by cokelid
Concerning the several comments on your note relating to histograms, an
informative and entertaining illustration, using Java, of how your
subjective assessment of the data can change with different histograms
constructed from the same data, is provided by R. Webster West, recently
with the Department of Statistics at the University of South Carolina, but
as of May 2006 with the Department of Statistics at Texas A & M University,
http://www.stat.sc.edu/~west/javahtml/Histogram.html  and
http://www.stat.tamu.edu/~west/ 


Charles Annis, P.E.

[hidden email]
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Justin Ashmall
Sent: Thursday, June 08, 2006 5:46 AM
To: [hidden email]
Subject: [R] Re-binning histogram data

Hi,

Short Version:
Is there a function to re-bin a histogram to new, broader bins?

Long version: I'm trying to create a histogram, however my input-data is
itself in the form of a fine-grained histogram, i.e. numbers of counts
in regular one-second bins. I want to produce a histogram of, say,
10-minute bins (though possibly irregular bins also).

I suppose I could re-create a data set as expected by the hist() function
(i.e. if time t=3600 has 6 counts, add six entries of 3600 to a list)
however this seems neither elegant nor efficient (though I'd be pleased to
be mistaken!). I could then re-create a histogram as normal.

I guessing there's a better solution however! Apologies if this is a basic
question - I'm rather new to R and trying to get up to speed.

Regards,

Justin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Liaw, Andy
In reply to this post by cokelid
I'm not sure why you would consider density plots "modern".  Kernel density
estimators were proposed in 1956 and 1962, though they were not used
commonly until the computational powers caught up.
 
Also, I think the advantages of density plots over histograms far out-weight
their possible shortcomings.  If you are unwilling to go that far, at least
consider replacing histograms with ASH plots.
 
[I wouldn't go quite as far as Bert.  Occasionally I still look at
histograms, but _very_ rarely.]
 
Andy

  _____  

From: [hidden email] on behalf of François Pinard
Sent: Thu 6/8/2006 7:53 PM
To: Berton Gunter
Cc: [hidden email]
Subject: Re: [R] Re-binning histogram data [Broadcast]



[Berton Gunter]

> I would argue that histograms are outdated relics and that density  
> plots (whatever your favorite flavor is) should **always** be used
> instead these days.

When a now retired researcher paid us a visit, I showed him a density
plot produced by R over some data he did work a lot, before he left.
I, too, find them rather sexy, and I wanted to impress him with some of
the pleasures of R, especially knowing he has been a dedicated user of
SAS in his times.  Yet, this old and wise man _immediately_ caught that
the density curve was leaking a tiny bit through the extrema.

Not a big deal of course -- and he did like what he saw.  Nevertheless,
this reminded me that we should be careful at not dismissing too lightly
years of accumulated knowledge, experience and know-how, merely because
we give in joyful enthusiasm for more recent things.

Let me make a comparison, looking at the R mailing lists themselves.  
Some would much like sending HTML email in here: they would get colours,
use various fonts, offer links, and have indentation which dynamically
adapts on the receiving end to the window size of the reading guy.  But
the collective wisdom is to stick to non-HTML email, which is quite
proven and still very functional, after all.  Some impatient people or
dubious tools use other things than fixed-width fonts while presenting
text/plain email, or merely ignore the usual 79-column limit and other
oldish etiquette issues while sending it: in last analysis, they kibitz
the community more than they help it, and deep down, are a bit selfish.  
There is a long way to go before HTML email is really ubiquitous and
correctly supported.  Consider the long time MIME took to establish
itself: even now, email readers correctly supporting MIME are hard to
find -- most are fond on gadgets much more than they know standards.

Another comparison which pops to my mind is how some people fanatically
try to impose UTF-8 all around, saying that ASCII or ISO-8859-1 (and
many others) are part of the prehistory of computers.  When mere users,
they can always talk without making too much damage.  But I've seen
a few maintainers going overboard on such matters, consciously breaking
software to force their convictions forward: "Crois ou meurs!" as we say
in French (approximately: "Believe or perish!").  Here, just like for
HTML mail or nicer bitmapped R graphics, Unicode does have technical
merit; the truth is that we are _far_ from mastering everything about
it, and there are lots of open issues that are not strictly technical.

Many proponent of these various things are tempted to say that they want
to clean out the planet of outdated relics (I liked your expression!)
and have the honest feeling they do trigger overall progress.  Moreover,
new good things do not necessarily make older things wrong.  In a word,
we should rather wait for progress with calm, and with respectful care
of what already exists.  Progress will impose itself slowly over time,
and is not so much in need of forceful evangelists. :-)

--
François Pinard   http://pinard.progiciels-bpi.ca
<http://pinard.progiciels-bpi.ca/>  

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>  
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Chris Evans-4
In reply to this post by François Pinard
François Pinard sent the following  at 09/06/2006 00:53:

> [Berton Gunter]
>
>> I would argue that histograms are outdated relics and that density  
>> plots (whatever your favorite flavor is) should **always** be used
>> instead these days.
>
> When a now retired researcher paid us a visit, I showed him a density
> plot produced by R over some data he did work a lot, before he left.
> I, too, find them rather sexy, and I wanted to impress him with some of
> the pleasures of R, especially knowing he has been a dedicated user of
> SAS in his times.  Yet, this old and wise man _immediately_ caught that
> the density curve was leaking a tiny bit through the extrema.
>
> Not a big deal of course -- and he did like what he saw.  Nevertheless,

... rest snipped ...

I did like Francois's post very much and confess I'm not very familiar
with density plots and use histograms a lot still.  However, I'm not a
statistician, though like to think I'm not a complete Luddite.

Rather naive question: doesn't this depend a bit on whether you see
yourself as describing the sample or describing the (inferred)
population.  It's intrigued me, much though I think the developing
graphical methods of data exploration are wonderful, that I think that
distinction between sample and population is not made as clearly for
graphical methods as perhaps it would be if the presentation were
textual.  Perhaps that's because it's often implicitly pretty clear, for
example, boxplots and histograms, with inevitable problems, describing
samples, some density plots at least, implicitly describing populations.

I know there's an argument that only the inferences (and their CIs)
about the population are statistics and the rest is accountancy but I am
not happy with that idea!

I'd be interested to hear others' views even if we are rather OTT (Off
The Topic, not Over The Top) here.  Perhaps I'm completely wrong?

Thanks to all for their posts, as ever, I'm learning much.

Chris

--
Chris Evans <[hidden email]>
Hon. Professor of Psychotherapy, Nottingham University;
Consultant Psychiatrist in Psychotherapy, Rampton Hospital;
Research Programmes Director, Nottinghamshire NHS Trust;
Hon. SL Institute of Psychiatry, Hon. Con., Tavistock & Portman Trust
**If I am writing from one of those roles, it will be clear. Otherwise**

**my views are my own and not representative of those institutions    **

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Duncan Murdoch
In reply to this post by Bert Gunter
On 6/8/2006 11:51 AM, Berton Gunter wrote:
> I would argue that histograms are outdated relics and that density plots
> (whatever your favorite flavor is) should **always** be used instead these
> days.

But my favourite density plot is a histogram!

I agree that computational complexity should weigh much less in the
decision to do something than it used to.  But I'd say a histogram (with
more bins than the R default) is a good input to my mental density
estimator.   Adding a rug of points below it is helpful in small
datasets.  It is very easy to see how much smoothing has been done;
that's often hard to see in presentations of density plots produced in
other ways.  It's also easier to recognize discrete atoms in the
distribution:  they'll show up as isolated bars a lot higher than the usual.

For example, compare these two plots:

  set.seed(123)
  par(mfrow=c(2,1))
  x <- c(rnorm(1000), rbinom(100, 3, 0.5))
  hist(x, breaks=60)
  plot(density(x))

This isn't a fair comparison, since I used the default bandwidth on the
smoother but not on the histogram (it would be fairer to compare to
plot(density(x,bw=0.05)) ), but I think it still illustrates my point:
in the latter density plot where the atoms are clearly visible, I still
need to read the text at the bottom to know the sample size and
bandwidth, whereas I can see those at a glance in the histogram.  And an
untrained user could get a lot of information out of the histogram,
whereas they'd have a lot of trouble getting anything out of the density
plots.

>
> In this vein, I would appreciate critical rejoinders (public or private) to
> the following proposition: Given modern computer power and software like R
> on multi ghz machines, statistical and graphical relics of the pre-computer
> era (like histograms, low resolution printer-type plots, and perhaps even
> method of moments EMS calculations) should be abandoned in favor of superior
> but perhaps computation-intensive alternatives (like density plots, high
> resolution plots, and likelihood or resampling or Bayes based methods).
>
> NB: Please -- no pleadings that new methods would be mystifying to the
> non-cogniscenti. Following that to its logical conclusion would mean that
> we'd all have to give up our TV remotes and cell phones, and what kind of
> world would that be?! :-)

Now, if you were to suggest that the stem() function is a bizarre
simulation of a stone-age tool on a modern computer, I might agree.

Duncan Murdoch

>
> -- Bert Gunter
>
>  
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Petr Pikal
>> Sent: Thursday, June 08, 2006 6:17 AM
>> To: Justin Ashmall; [hidden email]
>> Subject: Re: [R] Re-binning histogram data
>>
>>
>>
>> On 8 Jun 2006 at 11:35, Justin Ashmall wrote:
>>
>> Date sent:       Thu, 8 Jun 2006 11:35:46 +0100 (BST)
>> From:           Justin Ashmall <[hidden email]>
>> To:             Petr Pikal <[hidden email]>
>> Copies to:       [hidden email]
>> Subject:         Re: [R] Re-binning histogram data
>>
>> >
>> > Thanks for the reply Petr,
>> >
>> > It looks to me that truehist() needs a vector of data just like
>> > hist()? Whereas I have histogram-style input data? Am I missing
>> > something?
>>
>> Well, maybe you could use barplot. Or as you suggested recreate the
>> original vector and call hist or truehist with other bins.
>>
>> > hhh<-hist(rnorm(1000))
>> > barplot(tapply(hhh$counts, c(rep(1:7,each=2),7), sum))
>> > tapply(hhh$mids, c(rep(1:7,each=2),7), mean)
>>     1     2     3     4     5     6     7
>> -3.00 -2.00 -1.00  0.00  1.00  2.00  3.25
>> > hhh1<-rep(hhh$mids,hhh$counts)
>> > plot(hhh, freq=F)
>> > lines(density(hhh1))
>> >
>>
>> HTH
>> Petr
>>
>>
>>
>>
>>
>>
>> >
>> > Cheers,
>> >
>> > Justin
>> >
>> >
>> >
>> > On Thu, 8 Jun 2006, Petr Pikal wrote:
>> >
>> > > Hi
>> > >
>> > > try truehist from MASS package and look for argument breaks or h.
>> > >
>> > > HTH
>> > > Petr
>> > >
>> > >
>> > >
>> > >
>> > > On 8 Jun 2006 at 10:46, Justin Ashmall wrote:
>> > >
>> > > Date sent:       Thu, 8 Jun 2006 10:46:19 +0100 (BST)
>> > > From:           Justin Ashmall <[hidden email]>
>> > > To:             [hidden email]
>> > > Subject:         [R] Re-binning histogram data
>> > >
>> > >> Hi,
>> > >>
>> > >> Short Version:
>> > >> Is there a function to re-bin a histogram to new, broader bins?
>> > >>
>> > >> Long version: I'm trying to create a histogram, however my
>> > >> input-data is itself in the form of a fine-grained
>> histogram, i.e.
>> > >> numbers of counts in regular one-second bins. I want to produce a
>> > >> histogram of, say, 10-minute bins (though possibly irregular bins
>> > >> also).
>> > >>
>> > >> I suppose I could re-create a data set as expected by the hist()
>> > >> function (i.e. if time t=3600 has 6 counts, add six
>> entries of 3600
>> > >> to a list) however this seems neither elegant nor
>> efficient (though
>> > >> I'd be pleased to be mistaken!). I could then re-create
>> a histogram
>> > >> as normal.
>> > >>
>> > >> I guessing there's a better solution however! Apologies
>> if this is
>> > >> a basic question - I'm rather new to R and trying to get up to
>> > >> speed.
>> > >>
>> > >> Regards,
>> > >>
>> > >> Justin
>> > >>
>> > >> ______________________________________________
>> > >> [hidden email] mailing list
>> > >> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >> PLEASE do read the posting guide!
>> > >> http://www.R-project.org/posting-guide.html
>> > >
>> > > Petr Pikal
>> > > [hidden email]
>> > >
>> > >
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide!
>> > http://www.R-project.org/posting-guide.html
>>
>> Petr Pikal
>> [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

RKoenker
On Jun 9, 2006, at 7:38 AM, Duncan Murdoch wrote:
>
> Now, if you were to suggest that the stem() function is a bizarre
> simulation of a stone-age tool on a modern computer, I might agree.
>

But as a stone-age (blackboard)  tool it is unsurpassed.  It is the only
bright spot in the usually depressing ritual  of returning exam
results.  Full disclosure of the distribution in a very concise  
encoding.

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Re-binning histogram data

Bert Gunter
In reply to this post by Charles Annis, P.E.
Charles:

To be fair ... both histograms and densityplots are nonparametric density
estimators whose appearance and effectiveness are dependent on various
parameters. Neither are immune from misleading due to a poor choice of the
parameters. For histograms they are the bin boundaries; for kde's and
friends it is some version of bandwidth.

-- Bert
 
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Charles Annis, P.E.
> Sent: Thursday, June 08, 2006 7:17 PM
> To: 'Justin Ashmall'; [hidden email]
> Subject: Re: [R] Re-binning histogram data
>
> Concerning the several comments on your note relating to
> histograms, an
> informative and entertaining illustration, using Java, of how your
> subjective assessment of the data can change with different histograms
> constructed from the same data, is provided by R. Webster
> West, recently
> with the Department of Statistics at the University of South
> Carolina, but
> as of May 2006 with the Department of Statistics at Texas A &
> M University,
> http://www.stat.sc.edu/~west/javahtml/Histogram.html  and
> http://www.stat.tamu.edu/~west/ 
>
>
> Charles Annis, P.E.
>
> [hidden email]
> phone: 561-352-9699
> eFax:  614-455-3265
> http://www.StatisticalEngineering.com
>  
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Justin Ashmall
> Sent: Thursday, June 08, 2006 5:46 AM
> To: [hidden email]
> Subject: [R] Re-binning histogram data
>
> Hi,
>
> Short Version:
> Is there a function to re-bin a histogram to new, broader bins?
>
> Long version: I'm trying to create a histogram, however my
> input-data is
> itself in the form of a fine-grained histogram, i.e. numbers
> of counts
> in regular one-second bins. I want to produce a histogram of, say,
> 10-minute bins (though possibly irregular bins also).
>
> I suppose I could re-create a data set as expected by the
> hist() function
> (i.e. if time t=3600 has 6 counts, add six entries of 3600 to a list)
> however this seems neither elegant nor efficient (though I'd
> be pleased to
> be mistaken!). I could then re-create a histogram as normal.
>
> I guessing there's a better solution however! Apologies if
> this is a basic
> question - I'm rather new to R and trying to get up to speed.
>
> Regards,
>
> Justin
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html