curve of density on histogram

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

curve of density on histogram

KOITA Lassana - STAC/ACE

Hi R users,
I would like to know why these following curve densities don't appear
correctly on the histograms.
Thank you for your help



library(lattice)
library(grid)
resp  <- rnorm(2000)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
histogram(~ resp | group, col="steelblue",
  panel = function(x, ...){
    std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
    n <- length(x)
    m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
    panel.histogram(x, ...)
    panel.mathdensity(dmath = dnorm, col = "green",
                                args = list(mean=mean(x),sd=sd(x)))
    panel.abline(v= mean(x), col = "red")
    panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

    x1 <- unit(1, "npc") - unit(2, "mm")
    y1 <- unit(1, "npc") - unit(2, "mm")
    grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
    grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
    grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")

})

#################################################""""""

Lassana KOITA
Chargé d'Etudes de Sécurité Aéroportuaire et d'Analyse Statistique /
Project Engineer Airport Safety Studies & Statistical analysis
Service Technique de l'Aviation Civile (STAC) / Civil Aviation Technical
Department
Direction Générale de l'Aviation Civile (DGAC) / French Civil Aviation
Authority
Tel: 01 49 56 80 60
Fax: 01 49 56 82 14
E-mail: [hidden email]
http://www.stac.aviation-civile.gouv.fr/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: curve of density on histogram

Ted.Harding
On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:

>
> Hi R users,
> I would like to know why these following curve densities don't appear
> correctly on the histograms.
> Thank you for your help
>
>
>
> library(lattice)
> library(grid)
> resp  <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
> histogram(~ resp | group, col="steelblue",
>   panel = function(x, ...){
>     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
> "NA"
>     n <- length(x)
>     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
> "NA"
>     panel.histogram(x, ...)
>     panel.mathdensity(dmath = dnorm, col = "green",
>                                 args = list(mean=mean(x),sd=sd(x)))
>     panel.abline(v= mean(x), col = "red")
>     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
>
>     x1 <- unit(1, "npc") - unit(2, "mm")
>     y1 <- unit(1, "npc") - unit(2, "mm")
>     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
> "right")
>     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
>     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
>
> })

The following is an approximation to what you want to do.
However, it needs one thing to be determined, which I have
not managed to work out how to do.

The reason for your bad density plots is that the density
function dnorm needs to be scaled (by the number of values
whose histogram is being lotted) and by the width of the
intervals.

Hence I first define a function sdnorm() (for "scaled dnorm")
and then change one line in your code, as follows:

library(lattice)
library(grid)
resp  <- rnorm(2000)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
histogram(~ resp | group, col="steelblue",
  panel = function(x, ...){
    std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
    n <- length(x)
    m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
    panel.histogram(x, ...)
#### Changed line:
    panel.mathdensity(dmath = sdnorm, col = "green",
          args = list(mean=mean(x),sd=sd(x),N=length(x),binwid=0.10))
    panel.abline(v= mean(x), col = "red")
    panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

    x1 <- unit(1, "npc") - unit(2, "mm")
    y1 <- unit(1, "npc") - unit(2, "mm")
    grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
    grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
    grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})

The argument N to sdnorm is readily available from the argument x,
as N = length(x).

However, I cannot work out from the documentation for these panel
functions how to determine the width of the histogram bins, which
is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
to illustrate the point, since this gives an approximately correct
plot. But it will only be really correct when binwid can somehow
be determined from the hosyogram being plotted, and it is this
which I cannot see!

I hope someone else will help us out of our difficulty.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-07                                       Time: 16:29:45
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: curve of density on histogram

Duncan Murdoch
On 3/8/2007 11:29 AM, (Ted Harding) wrote:

> On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:
>>
>> Hi R users,
>> I would like to know why these following curve densities don't appear
>> correctly on the histograms.
>> Thank you for your help
>>
>>
>>
>> library(lattice)
>> library(grid)
>> resp  <- rnorm(2000)
>> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
>> histogram(~ resp | group, col="steelblue",
>>   panel = function(x, ...){
>>     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
>> "NA"
>>     n <- length(x)
>>     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
>> "NA"
>>     panel.histogram(x, ...)
>>     panel.mathdensity(dmath = dnorm, col = "green",
>>                                 args = list(mean=mean(x),sd=sd(x)))
>>     panel.abline(v= mean(x), col = "red")
>>     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
>> col='yellow' )
>>
>>     x1 <- unit(1, "npc") - unit(2, "mm")
>>     y1 <- unit(1, "npc") - unit(2, "mm")
>>     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
>> "right")
>>     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
>> "lines"), just = "right")
>>     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
>> unit(2, "lines"), just = "right")
>>
>> })
>
> The following is an approximation to what you want to do.
> However, it needs one thing to be determined, which I have
> not managed to work out how to do.
>
> The reason for your bad density plots is that the density
> function dnorm needs to be scaled (by the number of values
> whose histogram is being lotted) and by the width of the
> intervals.
>
> Hence I first define a function sdnorm() (for "scaled dnorm")
> and then change one line in your code, as follows:
>
> library(lattice)
> library(grid)
> resp  <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
> #### New function sdnorm:
> sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
> histogram(~ resp | group, col="steelblue",
>   panel = function(x, ...){
>     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
>     n <- length(x)
>     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
>     panel.histogram(x, ...)
> #### Changed line:
>     panel.mathdensity(dmath = sdnorm, col = "green",
>           args = list(mean=mean(x),sd=sd(x),N=length(x),binwid=0.10))
>     panel.abline(v= mean(x), col = "red")
>     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
>
>     x1 <- unit(1, "npc") - unit(2, "mm")
>     y1 <- unit(1, "npc") - unit(2, "mm")
>     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
>     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
>     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
> })
>
> The argument N to sdnorm is readily available from the argument x,
> as N = length(x).
>
> However, I cannot work out from the documentation for these panel
> functions how to determine the width of the histogram bins, which
> is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
> to illustrate the point, since this gives an approximately correct
> plot. But it will only be really correct when binwid can somehow
> be determined from the hosyogram being plotted, and it is this
> which I cannot see!

The breaks are one of the ... args passed to the panel function, so you
can get the binwidth from there.  But there's another problem:  the
panel.histogram function gives percent of total, so should integrate to
100, not to N.  I think this version gives what is wanted:

library(lattice)
library(grid)
resp  <- rnorm(2000)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
histogram(~ resp | group, col="steelblue",
   panel = function(x, ...){
     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
     n <- length(x)
     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
     panel.histogram(x, ...)


     breaks <- list(...)$breaks
     binwid <- breaks[2]-breaks[1]
     panel.mathdensity(dmath = sdnorm, col = "green",
           args =
list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))


     panel.abline(v= mean(x), col = "red")
     panel.abline(h=5)
     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

     x1 <- unit(1, "npc") - unit(2, "mm")
     y1 <- unit(1, "npc") - unit(2, "mm")
     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: curve of density on histogram

Ted.Harding
On 08-Mar-07 Duncan Murdoch wrote:

>
> On 3/8/2007 11:29 AM, (Ted Harding) wrote:
>>
>> On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:
>>>
>>> [snip]
>>
>> [snip]
>> The argument N to sdnorm is readily available from the argument x,
>> as N = length(x).
>>
>> However, I cannot work out from the documentation for these panel
>> functions how to determine the width of the histogram bins, which
>> is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
>> to illustrate the point, since this gives an approximately correct
>> plot. But it will only be really correct when binwid can somehow
>> be determined from the hosyogram being plotted, and it is this
>> which I cannot see!
>
>
> The breaks are one of the ... args passed to the panel function, so you
> can get the binwidth from there.  But there's another problem:  the
> panel.histogram function gives percent of total, so should integrate to
> 100, not to N.  I think this version gives what is wanted:
>
> library(lattice)
> library(grid)
> resp  <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
>#### New function sdnorm:
> sdnorm
> <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
> histogram(~ resp | group, col="steelblue",
>    panel = function(x, ...){
>      std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
> "NA"
>      n <- length(x)
>      m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
> "NA"
>      panel.histogram(x, ...)
>
>
>      breaks <- list(...)$breaks
>      binwid <- breaks[2]-breaks[1]
>      panel.mathdensity(dmath = sdnorm, col = "green",
>            args =
> list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))
>
>
>      panel.abline(v= mean(x), col = "red")
>      panel.abline(h=5)
>      panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
>
>      x1 <- unit(1, "npc") - unit(2, "mm")
>      y1 <- unit(1, "npc") - unit(2, "mm")
>      grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
> "right")
>      grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
>      grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
> })

Duncan, your statement that "But there's another problem:  the
panel.histogram function gives percent of total, so should integrate
to 100, not to N" got me challenged -- there must be a work-round!

Finally I found it, but you have to make the change in (to me)
an unexpected place.

The following code (your code above, with two changed lines commented
out, and followed by the changed version) really does do the intended
job of plotting a panel of histograms of *counts* with the appropriate
scaled normal densities superimposed.


library(lattice)
library(grid)
N<-20000
resp  <- rnorm(N)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = N/4)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}

##Changed line
#### histogram(~ resp | group, col="steelblue",
histogram(~ resp | group, col="steelblue", type="count",
   panel = function(x, ...){
     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
"NA"
     n <- length(x)
     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
"NA"
     panel.histogram(x, ...)


     breaks <- list(...)$breaks
     binwid <- breaks[2]-breaks[1]
     panel.mathdensity(dmath = sdnorm, col = "green",
           args =
## Changed line
#### list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))
list(mean=mean(x),sd=sd(x),N=length(x),binwid=breaks[2]-breaks[1]))

     panel.abline(v= mean(x), col = "red")
     panel.abline(h=5)
     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

     x1 <- unit(1, "npc") - unit(2, "mm")
     y1 <- unit(1, "npc") - unit(2, "mm")
     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})


I first tried setting type="count" in the call to panel.histogram(),
as in

  panel.histogram(x, type="count", ...)

but this got me an error message

  Error in panel.histogram(x, type = "count", ...) :
        formal argument "type" matched by multiple actual arguments

from which I (slowly) deduced that it had already got that parameter
from somewhere else, after which it seemed natural that it got it
from the preceding call to

  histogram(~ resp | group, col="steelblue",
    [etc]

so I tried setting it there and (with the factor N<-length(x) for
counts, instead of 100 for percentage) this worked!

(And I confess, when I posted my first suggestion, that I had
omitted to read the vertically aligned small print at the side
of the panel which said "Percent of Total"!).

Ah well, it's been an interesting little tour!

Thanks for the breakthrough, Duncan!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Mar-07                                       Time: 19:14:39
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.