Help to do this exercise

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Help to do this exercise

Pham Huong
N1 Consider the database "LakeHuron" , containing the annual measurements
of the level (in feet) of Lake Huron 1875{1972, see
https://stat.ethz.ch/R-manual/Rdevel/library/datasets/html/LakeHuron.html.
The general aim is to estimate the probability density of the level of the
lake.
(i) Construct the histogram estimator with the number of bins selected
by the Sturges rule. On the same plot display the graph of the density of
the normal distribution with estimated mean and standard
deviation (normal fit).
(ii) Among the histograms with the number of bins from 5 to 30, find
the histogram estimator which is closest to the normal fit. Comment on the
bias-variance tradeoff in this case.
(iii) Construct the kernel estimators with various kernels (apply all
kernels available in the R language). The bandwidth can be chosen by
default. Construct the kernel estimators under various choices of
bandwidth (apply all rules for bandwidth selection, which are implemented
in the R language, the kernel can be chosen by default).
Among all constructed kernel estimators, find the kernel estimator
which is closest to the normal fit

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External Email] Help to do this exercise

Christopher W. Ryan
Homework questions are generally frowned upon on R-help List. It is best to
discuss those questions with your instructor.

--Chris Ryan
SUNY Upstate Medical University Clinical Campus at Binghamton

On Mon, Feb 10, 2020 at 9:39 AM hương phạm <[hidden email]> wrote:

> N1 Consider the database "LakeHuron" , containing the annual measurements
> of the level (in feet) of Lake Huron 1875{1972, see
> https://stat.ethz.ch/R-manual/Rdevel/library/datasets/html/LakeHuron.html.
> The general aim is to estimate the probability density of the level of the
> lake.
> (i) Construct the histogram estimator with the number of bins selected
> by the Sturges rule. On the same plot display the graph of the density of
> the normal distribution with estimated mean and standard
> deviation (normal fit).
> (ii) Among the histograms with the number of bins from 5 to 30, find
> the histogram estimator which is closest to the normal fit. Comment on the
> bias-variance tradeoff in this case.
> (iii) Construct the kernel estimators with various kernels (apply all
> kernels available in the R language). The bandwidth can be chosen by
> default. Construct the kernel estimators under various choices of
> bandwidth (apply all rules for bandwidth selection, which are implemented
> in the R language, the kernel can be chosen by default).
> Among all constructed kernel estimators, find the kernel estimator
> which is closest to the normal fit
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help to do this exercise

Rui Barradas
In reply to this post by Pham Huong
Hello,

R-help has a no homework policy.
Please find help somewhere else,

Rui Barradas


Às 10:50 de 09/02/20, hương phạm escreveu:

> N1 Consider the database "LakeHuron" , containing the annual measurements
> of the level (in feet) of Lake Huron 1875{1972, see
> https://stat.ethz.ch/R-manual/Rdevel/library/datasets/html/LakeHuron.html.
> The general aim is to estimate the probability density of the level of the
> lake.
> (i) Construct the histogram estimator with the number of bins selected
> by the Sturges rule. On the same plot display the graph of the density of
> the normal distribution with estimated mean and standard
> deviation (normal fit).
> (ii) Among the histograms with the number of bins from 5 to 30, find
> the histogram estimator which is closest to the normal fit. Comment on the
> bias-variance tradeoff in this case.
> (iii) Construct the kernel estimators with various kernels (apply all
> kernels available in the R language). The bandwidth can be chosen by
> default. Construct the kernel estimators under various choices of
> bandwidth (apply all rules for bandwidth selection, which are implemented
> in the R language, the kernel can be chosen by default).
> Among all constructed kernel estimators, find the kernel estimator
> which is closest to the normal fit
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help to do this exercise

Richard O'Keefe-2
In reply to this post by Pham Huong
Others have already commented on the "no homework" policy.
I'd like to make a different point.
When I was doing my MSc many years ago, a friend of mine was
really struggling with statistics.  He complained to me that when
studying the textbooks and looking at examples, he could never
figure out why the method chosen was the best method for the example.
When I looked into it, the answer was horribly simple.
The method was chosen first, and the example selected to illustrate.
More often than not, the method *wasn't* the best one for the example.

Now let's consider the Lake Huron dataset.
What is the single most obvious thing about it?
It's a TIME SERIES.
It's a time series where the events of a year make a modest *change*
to the level of the lake, we expect a high autocorrelation.
> plot(LakeHuron)
> plot(diff(LakeHuron))
> acf(LakeHuron)

Autocorrelations of series ‘LakeHuron’, by lag

     0      1      2      3      4      5      6      7      8      9     10
 1.000  0.832  0.610  0.458  0.371  0.326  0.285  0.265  0.264  0.258  0.183
    11     12     13     14     15     16     17     18     19
 0.095  0.044  0.029  0.041  0.045  0.035  0.005 -0.033 -0.053

Yup, high autocorrelation is what we get.
This suggests that a model like level(t+1) = level(t) + shock(t)
might be a good first attempt, so
> plot(diff(LakeHuron))

What this says to me is that estimating the probability density of the level
is not a sensible thing to do.  We DON'T have a collection of independent
and identically distributed measurements.  We have a time series.

So,
 + you CAN do what the homework says
 + so you CAN use this dataset to illustrate these methods
 - BUT these methods are NOT a sensible way to understand this dataset.

On Tue, 11 Feb 2020 at 03:40, hương phạm <[hidden email]> wrote:

>
> N1 Consider the database "LakeHuron" , containing the annual measurements
> of the level (in feet) of Lake Huron 1875{1972, see
> https://stat.ethz.ch/R-manual/Rdevel/library/datasets/html/LakeHuron.html.
> The general aim is to estimate the probability density of the level of the
> lake.
> (i) Construct the histogram estimator with the number of bins selected
> by the Sturges rule. On the same plot display the graph of the density of
> the normal distribution with estimated mean and standard
> deviation (normal fit).
> (ii) Among the histograms with the number of bins from 5 to 30, find
> the histogram estimator which is closest to the normal fit. Comment on the
> bias-variance tradeoff in this case.
> (iii) Construct the kernel estimators with various kernels (apply all
> kernels available in the R language). The bandwidth can be chosen by
> default. Construct the kernel estimators under various choices of
> bandwidth (apply all rules for bandwidth selection, which are implemented
> in the R language, the kernel can be chosen by default).
> Among all constructed kernel estimators, find the kernel estimator
> which is closest to the normal fit
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.