Lattice Histogram Scaling

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Lattice Histogram Scaling

Roger Koenker-3
I am trying to do some comparisons of density estimators using lattice.
The code below attempts to plot the same histogram in each panel and
then overplots a kernel estimate with different bandwidths. Finding
packet.number() was a bit arduous, but seems to do what I want.  My
concern now is that close examination of the 4 histograms reveals that
they are different even though they use the same data, and use the
same binning.  Can someone explain this, or better yet suggest a fix?
Admittedly, the kernel estimates are rather silly, they are just standing
in for something else that I would like to think is less silly.

Many thanks,
Roger


require(REBayes)  # Source for the velo data
require(lattice)
x <- velo
x <- x[!(x == 0)]
bandwidths <- (1:4)*10
lp <- histogram(~x|bw, nint = 100, border = grey(.9), col = grey(.9),
                type = "density", panel = function(x, bw = bandwidths, ...){
            panel.histogram(x, ...)
            f <- density(x, bw[packet.number()])
            panel.lines(f$x, f$y, col = "blue", lwd = 1.5)
            })
print(lp)


url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Lattice Histogram Scaling

Richard M. Heiberger
Your example is not easily reproducible.
The REBayes requires Rmosek which requires a system command MOSEK.
Please try again with an example using data in base R.

Meanwhile, my guess is that you will need to do something like
explicitly specifying xlim and ylim so all panels have the same
limits.

On Mon, Aug 14, 2017 at 5:41 PM, Roger Koenker <[hidden email]> wrote:

> I am trying to do some comparisons of density estimators using lattice.
> The code below attempts to plot the same histogram in each panel and
> then overplots a kernel estimate with different bandwidths. Finding
> packet.number() was a bit arduous, but seems to do what I want.  My
> concern now is that close examination of the 4 histograms reveals that
> they are different even though they use the same data, and use the
> same binning.  Can someone explain this, or better yet suggest a fix?
> Admittedly, the kernel estimates are rather silly, they are just standing
> in for something else that I would like to think is less silly.
>
> Many thanks,
> Roger
>
>
> require(REBayes)  # Source for the velo data
> require(lattice)
> x <- velo
> x <- x[!(x == 0)]
> bandwidths <- (1:4)*10
> lp <- histogram(~x|bw, nint = 100, border = grey(.9), col = grey(.9),
>                 type = "density", panel = function(x, bw = bandwidths, ...){
>             panel.histogram(x, ...)
>             f <- density(x, bw[packet.number()])
>             panel.lines(f$x, f$y, col = "blue", lwd = 1.5)
>             })
> print(lp)
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Lattice Histogram Scaling

Roger Koenker-3
In reply to this post by Roger Koenker-3
My apologies,  the data can now be found at:

url <- "http://www.econ.uiuc.edu/~roger/research/ebayes/velo.d"
x <- scan(url,skip = 1)

If I could get each of the histograms to mimic what is produced by

hist(x, 100, freq = FALSE)

I’ve experimented with xlim, ylim, without success so far...

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801

> On Aug 14, 2017, at 8:36 PM, Richard M. Heiberger <[hidden email]> wrote:
>
> Your example is not easily reproducible.
> The REBayes requires Rmosek which requires a system command MOSEK.
> Please try again with an example using data in base R.
>
> Meanwhile, my guess is that you will need to do something like
> explicitly specifying xlim and ylim so all panels have the same
> limits.
>
> On Mon, Aug 14, 2017 at 5:41 PM, Roger Koenker <[hidden email]> wrote:
>> I am trying to do some comparisons of density estimators using lattice.
>> The code below attempts to plot the same histogram in each panel and
>> then overplots a kernel estimate with different bandwidths. Finding
>> packet.number() was a bit arduous, but seems to do what I want.  My
>> concern now is that close examination of the 4 histograms reveals that
>> they are different even though they use the same data, and use the
>> same binning.  Can someone explain this, or better yet suggest a fix?
>> Admittedly, the kernel estimates are rather silly, they are just standing
>> in for something else that I would like to think is less silly.
>>
>> Many thanks,
>> Roger
>>
>>
>> require(REBayes)  # Source for the velo data
>> require(lattice)
>> x <- velo
>> x <- x[!(x == 0)]
>> bandwidths <- (1:4)*10
>> lp <- histogram(~x|bw, nint = 100, border = grey(.9), col = grey(.9),
>>                type = "density", panel = function(x, bw = bandwidths, ...){
>>            panel.histogram(x, ...)
>>            f <- density(x, bw[packet.number()])
>>            panel.lines(f$x, f$y, col = "blue", lwd = 1.5)
>>            })
>> print(lp)
>>
>>
>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> email    [hidden email]            Department of Economics
>> vox:     217-333-4558                University of Illinois
>> fax:       217-244-6678                Urbana, IL 61801
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Lattice Histogram Scaling

Richard M. Heiberger
The original was conditioning the entire histogram, not just the kernel
estimate, on bw.  I am assuming that the bw in the condition should be
bandwidths.

What you want is easily done by using the latticeExtra package.
This works from a new R session:


url <- "http://www.econ.uiuc.edu/~roger/research/ebayes/velo.d"
x <- scan(url,skip = 1)
bandwidths <- (1:4)*10

library(latticeExtra)

H <- histogram( ~ x, nint = 100, border = grey(.9), col = grey(.9),
               type = "density")

DD <- lapply(bandwidths, FUN=density, x=x)

DDplot <- lapply(DD, FUN=function(bw) xyplot(y ~ x, data=bw, type="l",
col="blue", lwd=1.5))
names(DDplot) <- bandwidths

do.call(c, DDplot) ## warning message that I am not chasing down

DDplot.together <- do.call(c, c(DDplot, list(x.same = TRUE, y.same = TRUE)))
DDplot.together + H  ## wrong plot on top

DDplot.together.H <- c(H+DDplot[[1]], H+DDplot[[2]], H+DDplot[[3]],
H+DDplot[[4]], x.same = TRUE, y.same = TRUE)
dimnames(DDplot.together.H)[[1]] <- as.character(bandwidths)
DDplot.together.H <- update(DDplot.together.H, strip=TRUE)
DDplot.together.H ## correct plot on top



On Tue, Aug 15, 2017 at 10:13 AM, Roger Koenker <[hidden email]> wrote:

> My apologies,  the data can now be found at:
>
> url <- "http://www.econ.uiuc.edu/~roger/research/ebayes/velo.d"
> x <- scan(url,skip = 1)
>
> If I could get each of the histograms to mimic what is produced by
>
> hist(x, 100, freq = FALSE)
>
> I’ve experimented with xlim, ylim, without success so far...
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
>> On Aug 14, 2017, at 8:36 PM, Richard M. Heiberger <[hidden email]> wrote:
>>
>> Your example is not easily reproducible.
>> The REBayes requires Rmosek which requires a system command MOSEK.
>> Please try again with an example using data in base R.
>>
>> Meanwhile, my guess is that you will need to do something like
>> explicitly specifying xlim and ylim so all panels have the same
>> limits.
>>
>> On Mon, Aug 14, 2017 at 5:41 PM, Roger Koenker <[hidden email]> wrote:
>>> I am trying to do some comparisons of density estimators using lattice.
>>> The code below attempts to plot the same histogram in each panel and
>>> then overplots a kernel estimate with different bandwidths. Finding
>>> packet.number() was a bit arduous, but seems to do what I want.  My
>>> concern now is that close examination of the 4 histograms reveals that
>>> they are different even though they use the same data, and use the
>>> same binning.  Can someone explain this, or better yet suggest a fix?
>>> Admittedly, the kernel estimates are rather silly, they are just standing
>>> in for something else that I would like to think is less silly.
>>>
>>> Many thanks,
>>> Roger
>>>
>>>
>>> require(REBayes)  # Source for the velo data
>>> require(lattice)
>>> x <- velo
>>> x <- x[!(x == 0)]
>>> bandwidths <- (1:4)*10
>>> lp <- histogram(~x|bw, nint = 100, border = grey(.9), col = grey(.9),
>>>                type = "density", panel = function(x, bw = bandwidths, ...){
>>>            panel.histogram(x, ...)
>>>            f <- density(x, bw[packet.number()])
>>>            panel.lines(f$x, f$y, col = "blue", lwd = 1.5)
>>>            })
>>> print(lp)
>>>
>>>
>>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>>> email    [hidden email]            Department of Economics
>>> vox:     217-333-4558                University of Illinois
>>> fax:       217-244-6678                Urbana, IL 61801
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.