"Raw" histogram plots

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

"Raw" histogram plots

Andre Nathan
Hello

I need to plot a histogram, but insted of using bars, I'd like to plot
the data points. I've been doing it like this so far:

  h <- hist(x, plot = F)
  plot(y = x$counts / sum(x$counts),
       x = x$breaks[2:length(x$breaks)],
       type = "p", log = "xy")

Sometimes I want to have a look at the "raw" data (avoiding any kind of
binning). When x only contains integers, it's easy to just use bins of
size 1 when generating h with "breaks = seq(0, max(x))".

Is there any way to do something similar when x consists of fractional
data? What I'm doing is setting a small bin length (for example, "breaks
= seq(0, 1, by = 1e-6)", but there's still a chance that points will be
grouped in a single bin.

Is there a better way to do this kind of "raw histogram" plotting?

Thanks,
Andre

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

RKoenker
take a look at

        ?stem

There is still a place for handtools in the age of integrated
circuits.  Of course, avoiding binning isn't really desirable.

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820


On Feb 26, 2008, at 4:10 PM, Andre Nathan wrote:

> Hello
>
> I need to plot a histogram, but insted of using bars, I'd like to plot
> the data points. I've been doing it like this so far:
>
>  h <- hist(x, plot = F)
>  plot(y = x$counts / sum(x$counts),
>       x = x$breaks[2:length(x$breaks)],
>       type = "p", log = "xy")
>
> Sometimes I want to have a look at the "raw" data (avoiding any kind  
> of
> binning). When x only contains integers, it's easy to just use bins of
> size 1 when generating h with "breaks = seq(0, max(x))".
>
> Is there any way to do something similar when x consists of fractional
> data? What I'm doing is setting a small bin length (for example,  
> "breaks
> = seq(0, 1, by = 1e-6)", but there's still a chance that points will  
> be
> grouped in a single bin.
>
> Is there a better way to do this kind of "raw histogram" plotting?
>
> Thanks,
> Andre
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

hadley wickham
In reply to this post by Andre Nathan
On Tue, Feb 26, 2008 at 4:10 PM, Andre Nathan <[hidden email]> wrote:
> Hello
>
>  I need to plot a histogram, but insted of using bars, I'd like to plot
>  the data points. I've been doing it like this so far:
>
>   h <- hist(x, plot = F)
>   plot(y = x$counts / sum(x$counts),
>        x = x$breaks[2:length(x$breaks)],
>        type = "p", log = "xy")

Another approach would be to use ggplot2, where all statistical
transformations can be performed separately from their traditional
appearance:

install.packages("ggplot2")
qplot(x, stat="bin", geom="bar")
qplot(x, stat="bin")

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Andre Nathan
In reply to this post by RKoenker
I know about stem, but the data set has 1 million points, so it's not
very useful here. I want to avoid binning just to have an idea about the
shape of the distribution, before deciding how I'll bin it.

Andre

On Tue, 2008-02-26 at 16:20 -0600, roger koenker wrote:

> take a look at
>
> ?stem
>
> There is still a place for handtools in the age of integrated
> circuits.  Of course, avoiding binning isn't really desirable.
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Champaign, IL 61820
>
>
> On Feb 26, 2008, at 4:10 PM, Andre Nathan wrote:
>
> > Hello
> >
> > I need to plot a histogram, but insted of using bars, I'd like to plot
> > the data points. I've been doing it like this so far:
> >
> >  h <- hist(x, plot = F)
> >  plot(y = x$counts / sum(x$counts),
> >       x = x$breaks[2:length(x$breaks)],
> >       type = "p", log = "xy")
> >
> > Sometimes I want to have a look at the "raw" data (avoiding any kind  
> > of
> > binning). When x only contains integers, it's easy to just use bins of
> > size 1 when generating h with "breaks = seq(0, max(x))".
> >
> > Is there any way to do something similar when x consists of fractional
> > data? What I'm doing is setting a small bin length (for example,  
> > "breaks
> > = seq(0, 1, by = 1e-6)", but there's still a chance that points will  
> > be
> > grouped in a single bin.
> >
> > Is there a better way to do this kind of "raw histogram" plotting?
> >
> > Thanks,
> > Andre
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Peter Alspach
Andre

If I understand you correctly, you could try a barplot() on the result
of table().

HTH ......

Peter Alspach
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Andre Nathan
> Sent: Wednesday, 27 February 2008 1:34 p.m.
> To: roger koenker
> Cc: r-help
> Subject: Re: [R] "Raw" histogram plots
>
> I know about stem, but the data set has 1 million points, so
> it's not very useful here. I want to avoid binning just to
> have an idea about the shape of the distribution, before
> deciding how I'll bin it.
>
> Andre
>
> On Tue, 2008-02-26 at 16:20 -0600, roger koenker wrote:
> > take a look at
> >
> > ?stem
> >
> > There is still a place for handtools in the age of integrated
> > circuits.  Of course, avoiding binning isn't really desirable.
> >
> > url:    www.econ.uiuc.edu/~roger            Roger Koenker
> > email    [hidden email]            Department of Economics
> > vox:     217-333-4558                University of Illinois
> > fax:       217-244-6678                Champaign, IL 61820
> >
> >
> > On Feb 26, 2008, at 4:10 PM, Andre Nathan wrote:
> >
> > > Hello
> > >
> > > I need to plot a histogram, but insted of using bars, I'd like to
> > > plot the data points. I've been doing it like this so far:
> > >
> > >  h <- hist(x, plot = F)
> > >  plot(y = x$counts / sum(x$counts),
> > >       x = x$breaks[2:length(x$breaks)],
> > >       type = "p", log = "xy")
> > >
> > > Sometimes I want to have a look at the "raw" data
> (avoiding any kind
> > > of binning). When x only contains integers, it's easy to just use
> > > bins of size 1 when generating h with "breaks = seq(0, max(x))".
> > >
> > > Is there any way to do something similar when x consists of
> > > fractional data? What I'm doing is setting a small bin
> length (for
> > > example, "breaks = seq(0, 1, by = 1e-6)", but there's
> still a chance
> > > that points will be grouped in a single bin.
> > >
> > > Is there a better way to do this kind of "raw histogram" plotting?
> > >
> > > Thanks,
> > > Andre
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

The contents of this e-mail are privileged and/or confidential to the named
 recipient and are not to be used by any other person and/or organisation.
 If you have received this e-mail in error, please notify the sender and delete
 all material pertaining to this e-mail.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Marc Schwartz
If the goal is to get a sense of the 'shape' of the overall distribution
of 'x', then why not use:

   plot(density(x))

?

HTH,

Marc Schwartz


Peter Alspach wrote:

> Andre
>
> If I understand you correctly, you could try a barplot() on the result
> of table().
>
> HTH ......
>
> Peter Alspach
>
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Andre Nathan
>> Sent: Wednesday, 27 February 2008 1:34 p.m.
>> To: roger koenker
>> Cc: r-help
>> Subject: Re: [R] "Raw" histogram plots
>>
>> I know about stem, but the data set has 1 million points, so
>> it's not very useful here. I want to avoid binning just to
>> have an idea about the shape of the distribution, before
>> deciding how I'll bin it.
>>
>> Andre
>>
>> On Tue, 2008-02-26 at 16:20 -0600, roger koenker wrote:
>>> take a look at
>>>
>>> ?stem
>>>
>>> There is still a place for handtools in the age of integrated
>>> circuits.  Of course, avoiding binning isn't really desirable.
>>>
>>>
>>> On Feb 26, 2008, at 4:10 PM, Andre Nathan wrote:
>>>
>>>> Hello
>>>>
>>>> I need to plot a histogram, but insted of using bars, I'd like to
>>>> plot the data points. I've been doing it like this so far:
>>>>
>>>>   h<- hist(x, plot = F)
>>>>   plot(y = x$counts / sum(x$counts),
>>>>        x = x$breaks[2:length(x$breaks)],
>>>>        type = "p", log = "xy")
>>>>
>>>> Sometimes I want to have a look at the "raw" data
>> (avoiding any kind
>>>> of binning). When x only contains integers, it's easy to just use
>>>> bins of size 1 when generating h with "breaks = seq(0, max(x))".
>>>>
>>>> Is there any way to do something similar when x consists of
>>>> fractional data? What I'm doing is setting a small bin
>> length (for
>>>> example, "breaks = seq(0, 1, by = 1e-6)", but there's
>> still a chance
>>>> that points will be grouped in a single bin.
>>>>
>>>> Is there a better way to do this kind of "raw histogram" plotting?
>>>>
>>>> Thanks,
>>>> Andre
>>>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Prof Brian Ripley
In reply to this post by Andre Nathan
On Tue, 26 Feb 2008, Andre Nathan wrote:

> I know about stem, but the data set has 1 million points, so it's not
> very useful here. I want to avoid binning just to have an idea about the
> shape of the distribution, before deciding how I'll bin it.

Ideas:

1) use a much smaller sample of the data (1000 should suffice)
2) use a density plot (see ?density), perhaps on a sub-sample
(although as that will bin the data on a fine grid, this does not matter
much).


>
> Andre
>
> On Tue, 2008-02-26 at 16:20 -0600, roger koenker wrote:
>> take a look at
>>
>> ?stem
>>
>> There is still a place for handtools in the age of integrated
>> circuits.  Of course, avoiding binning isn't really desirable.
>>
>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> email    [hidden email]            Department of Economics
>> vox:     217-333-4558                University of Illinois
>> fax:       217-244-6678                Champaign, IL 61820
>>
>>
>> On Feb 26, 2008, at 4:10 PM, Andre Nathan wrote:
>>
>>> Hello
>>>
>>> I need to plot a histogram, but insted of using bars, I'd like to plot
>>> the data points. I've been doing it like this so far:
>>>
>>>  h <- hist(x, plot = F)
>>>  plot(y = x$counts / sum(x$counts),
>>>       x = x$breaks[2:length(x$breaks)],
>>>       type = "p", log = "xy")
>>>
>>> Sometimes I want to have a look at the "raw" data (avoiding any kind
>>> of
>>> binning). When x only contains integers, it's easy to just use bins of
>>> size 1 when generating h with "breaks = seq(0, max(x))".
>>>
>>> Is there any way to do something similar when x consists of fractional
>>> data? What I'm doing is setting a small bin length (for example,
>>> "breaks
>>> = seq(0, 1, by = 1e-6)", but there's still a chance that points will
>>> be
>>> grouped in a single bin.
>>>
>>> Is there a better way to do this kind of "raw histogram" plotting?
>>>
>>> Thanks,
>>> Andre
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Andre Nathan
In reply to this post by Peter Alspach
On Wed, 2008-02-27 at 14:15 +1300, Peter Alspach wrote:
> If I understand you correctly, you could try a barplot() on the result
> of table().

Hmm, table() does the counting exactly the way I want, i.e., just
counting individual values. Is there a way to extract the counts vs. the
values from a table, so that I can pass them as the x and y arguments to
plot()?

Thanks,
Andre

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Henrique Dallazuanna
If I understand:

x <- rnorm(1e6)
out <- tapply(x, ceiling(x), length)
plot(as.numeric(names(out)), out)

On 27/02/2008, Andre Nathan <[hidden email]> wrote:

> On Wed, 2008-02-27 at 14:15 +1300, Peter Alspach wrote:
>  > If I understand you correctly, you could try a barplot() on the result
>  > of table().
>
>
> Hmm, table() does the counting exactly the way I want, i.e., just
>  counting individual values. Is there a way to extract the counts vs. the
>  values from a table, so that I can pass them as the x and y arguments to
>  plot()?
>
>
>  Thanks,
>  Andre
>
>  ______________________________________________
>  [hidden email] mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Charilaos Skiadas-3
In reply to this post by Andre Nathan
On Feb 27, 2008, at 8:16 AM, Andre Nathan wrote:

> On Wed, 2008-02-27 at 14:15 +1300, Peter Alspach wrote:
>> If I understand you correctly, you could try a barplot() on the  
>> result
>> of table().
>
> Hmm, table() does the counting exactly the way I want, i.e., just
> counting individual values. Is there a way to extract the counts  
> vs. the
> values from a table, so that I can pass them as the x and y  
> arguments to
> plot()?
>

x <- table(rbinom(20,2,0.5))
plot(names(x),x)

should do it. You can also try just plot(x). Use prop.table on table  
if you want the relative frequencies instead.

> Thanks,
> Andre

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Frank Harrell
In reply to this post by Andre Nathan
Andre Nathan wrote:

> On Wed, 2008-02-27 at 14:15 +1300, Peter Alspach wrote:
>> If I understand you correctly, you could try a barplot() on the result
>> of table().
>
> Hmm, table() does the counting exactly the way I want, i.e., just
> counting individual values. Is there a way to extract the counts vs. the
> values from a table, so that I can pass them as the x and y arguments to
> plot()?
>
> Thanks,
> Andre

Also take a lot at the Hmisc package's spike histogram-related functions
such as histSpike and scat1d.

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: "Raw" histogram plots

Andre Nathan
In reply to this post by Charilaos Skiadas-3
On Wed, 2008-02-27 at 08:48 -0500, Charilaos Skiadas wrote:
> x <- table(rbinom(20,2,0.5))
> plot(names(x),x)
>
> should do it. You can also try just plot(x). Use prop.table on table  
> if you want the relative frequencies instead.

Yes, names is what I needed :) Thanks for the prop.table hint. I looked
everywhere but none of my searches hinted at table/table.prop. You guys'
help has been invaluable for me.

Thanks again,
Andre

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.