Quantcast

Ellipse that Contains 95% of the Observed Data

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Ellipse that Contains 95% of the Observed Data

Tom La Bone
I can take the results of a simulation with one random variable and generate an empirical interval that contains 95% of the observations, e.g.,

x <- rnorm(10000)
quantile(x,probs=c(0.025,0.975))

Is there an R function that can take the results from two random variables and generate an empirical ellipse that contains 95% of the observations, e.g.,  

x <- rnorm(10000)
y <- rnorm(10000)
?

I am specifically looking for an ellipse that does not assume normality.

Tom

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Ben Bolker
Tom La Bone <booboo <at> gforcecable.com> writes:

>
>
> I can take the results of a simulation with one random variable and generate
> an empirical interval that contains 95% of the observations, e.g.,
>
> x <- rnorm(10000)
> quantile(x,probs=c(0.025,0.975))
>
> Is there an R function that can take the results from two random variables
> and generate an empirical ellipse that contains 95% of the observations,
> e.g.,  
>
> x <- rnorm(10000)
> y <- rnorm(10000)
> ?
>
> I am specifically looking for an ellipse that does not assume normality.

  I'll be interested to hear what others come up with.
  I'm not sure the problem as you have stated it is well-posed, or
necessarily possible. Suppose there is a true unknown
bivariate probability distribution with a non-elliptical 95%
quantile region. Will you be able to draw an ellipse that
has the properties you want?

  Here's one possible alternative:

  library(coda)
  library(emdbook)
  plot(x,y)
  z = HPDregionplot(as.mcmc(cbind(x,y)),add=TRUE,col=2,lwd=2)

  is not an ellipse, but does contain (approximately) 95% of
the points.

  Convex hulls are another plausible approach.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Barry Rowlingson
On Mon, Mar 29, 2010 at 4:02 AM, Ben Bolker <[hidden email]> wrote:

>  I'll be interested to hear what others come up with.
>  I'm not sure the problem as you have stated it is well-posed, or
> necessarily possible. Suppose there is a true unknown
> bivariate probability distribution with a non-elliptical 95%
> quantile region. Will you be able to draw an ellipse that
> has the properties you want?

 I think the problem as posed doesn't produce a unique ellipse. You
could start with a circle of radius 0 centered on mean(x),mean(y) and
then increase the radius until it has 95% of the points in it. As long
as your points are in continuous space and with no coincident points
then you could do a simple bisection search on the radius.

 Similarly you could start with an ellipse of any eccentricity
centered at the same point with fixed angle and do the same. And the
ellipse doesn't even need to be centered at the mean point - it could
be waaay over to the left and eventually as it gets bigger it will
gobble up 95% of the points.

 Obviously with bivariate normally-distributed points we tend to show
the ellipse that is numerically derived from the mean and correlation
of the two normals, but that's not the only ellipse that takes 95% of
the points.

 So ummm I'm not sure what you should do. What is the question you are
trying to answer?

Barry

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Jim Lemon
On 03/29/2010 07:17 PM, Barry Rowlingson wrote:

> ...
>   I think the problem as posed doesn't produce a unique ellipse. You
> could start with a circle of radius 0 centered on mean(x),mean(y) and
> then increase the radius until it has 95% of the points in it. As long
> as your points are in continuous space and with no coincident points
> then you could do a simple bisection search on the radius.
>
>   Similarly you could start with an ellipse of any eccentricity
> centered at the same point with fixed angle and do the same. And the
> ellipse doesn't even need to be centered at the mean point - it could
> be waaay over to the left and eventually as it gets bigger it will
> gobble up 95% of the points.
>
>   Obviously with bivariate normally-distributed points we tend to show
> the ellipse that is numerically derived from the mean and correlation
> of the two normals, but that's not the only ellipse that takes 95% of
> the points.
>
>   So ummm I'm not sure what you should do. What is the question you are
> trying to answer?

So, why not begin with a problem that is uniquely soluble and achieve
the viewpoint of its solution?

1) If we assume that the distribution has a barycenter, then that can be
calculated.

2) Calculate the distances of all points from the barycenter, flagging
the 5% most distant.

3) Divide the area covered by the points into an arbitrary number of
equal sectors, say 10.

4) Within each sector, find the most distant "inner" point and the least
distant "outer" point, placing a dot in the middle of the sector at 5%
of their difference beyond the radius of the most distant "inner" point.

5) Join the dots.

6) Look at it.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

slre
In reply to this post by Tom La Bone

The bagplot at
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112
gives a nonparametric 2-d view analagous to a boxplot.

S Ellison

>
> I can take the results of a simulation with one random variable and
generate
> an empirical interval that contains 95% of the observations, e.g.,
>
> x <- rnorm(10000)
> quantile(x,probs=c(0.025,0.975))
>
> Is there an R function that can take the results from two random
variables
> and generate an empirical ellipse that contains 95% of the
observations,
> e.g.,  
>
> x <- rnorm(10000)
> y <- rnorm(10000)
> ?
>
> I am specifically looking for an ellipse that does not assume
normality.

  I'll be interested to hear what others come up with.
  I'm not sure the problem as you have stated it is well-posed, or
necessarily possible. Suppose there is a true unknown
bivariate probability distribution with a non-elliptical 95%
quantile region. Will you be able to draw an ellipse that
has the properties you want?

  Here's one possible alternative:

  library(coda)
  library(emdbook)
  plot(x,y)
  z = HPDregionplot(as.mcmc(cbind(x,y)),add=TRUE,col=2,lwd=2)

  is not an ellipse, but does contain (approximately) 95% of
the points.

  Convex hulls are another plausible approach.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Tom La Bone
In reply to this post by Barry Rowlingson
Concisely, here is what I am trying to do:

#I take a random sample of 300 measurements. After I have the measurements
#I post stratify them to 80 type A measurements and 220 type B measurements.
#These measurements tend to be lognormally distributed so I fit them to
#determine the geometric mean and geometric standard deviation of each stratum.
#The question is: are the geometric mean and geometric standard deviation of
#the type A measurements the same as the geometric mean and geometric
#standard deviation of the type B measurements?


library(MASS)
library(car)
setwd("C:/Documents and Settings/Tom/workspace/Work")
source("bagplot.r")
#http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112

#Here are the data. Unknown to me, they are indeed drawn from the same
#lognormal distribution
X <- cbind(1:300,rlnorm(300,log(30),log(3)))
X.A <- X[1:80,2]
X.B <- X[81:300,2]

#These are the data I see. Fit the type A and type B measurements
fit.XA <- fitdistr(X.A,"lognormal")
fit.XB <- fitdistr(X.B,"lognormal")

#Fit 2000 random samples of size 80 and 220 and calculate the difference in
#the GM and GSD for each
x <- numeric(2000)
y <- numeric(2000)
for (i in 1:2000) {
        k <- sample(X[,1],80,replace=FALSE)
        x.a <- X[k,2]
        x.b <- X[setdiff(1:300,k),2]
        fit.xa <- fitdistr(x.a,"lognormal")
        fit.xb <- fitdistr(x.b,"lognormal")
        x[i] <- coef(fit.xa)[1] - coef(fit.xb)[1]
        y[i] <- coef(fit.xa)[2] - coef(fit.xb)[2]
}

#Create the bagplot and superimpose the 95% joint normal confidence ellipse.
#Does the difference in GM and GSD actually observed for the type A and type B
#measurements look like the result of the random draw?
bagplot(x,y,show.whiskers=FALSE,approx.limit=1000)
data.ellipse(x,y,plot.points=FALSE,levels=c(0.95),col="black",center.cex=0)
box()
points(coef(fit.XA)[1]-coef(fit.XB)[1],coef(fit.XA)[2]-coef(fit.XB)[2],
                cex=1.5,col="black",pch=19)

       
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

David Freedman 2
In reply to this post by Tom La Bone
for a picture of the bagplot, try going to http://www.statmethods.net/graphs/boxplot.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Bert Gunter
In reply to this post by Tom La Bone
Easy. See below.

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Tom La Bone
Sent: Monday, March 29, 2010 6:56 AM
To: [hidden email]
Subject: Re: [R] Ellipse that Contains 95% of the Observed Data


Concisely, here is what I am trying to do:

#I take a random sample of 300 measurements. After I have the measurements
#I post stratify them to 80 type A measurements and 220 type B measurements.

#These measurements tend to be lognormally distributed so I fit them to
#determine the geometric mean and geometric standard deviation of each
stratum.
#The question is: are the geometric mean and geometric standard deviation of

#the type A measurements the same as the geometric mean and geometric
#standard deviation of the type B measurements?

-- No.

(So you probably need to

1. Ask a more statistically meaningful question.

2. Get a MUCH larger sample (you're talking about bivariate sd's here!!)

)

-- Bert

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Tom La Bone
I know what "get a bigger sample means". I have no clue what "ask a more statistically meaningful question" means. Can you elaborate a bit?

Tom
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Bert Gunter
Tom:

You asked whether two groups have the same underlying population 1st and 2nd
moments. The answer is: no they don't. Nothing is ever exactly the same as
anything else (indeed, I think this is the Paul Exclusion Principle ;-)  ).

So quoting Jim Holtman: "What is the question?" That certainly requires
someone who knows something about the scientific issues (not me!). But maybe
it's something like: "Well, if these two **populations** are more different
than a, b, c, ... in population characteristics A, B, and C,...  then that
is scientifically meaningful."  So then you can ask: "Well how can I
measure/statistically characterize  A,B, and C,...? -- How much uncertainty
will there be in this characterization (depends on study design and how one
characterizes "uncertainty" ) and how much can I tolerate and still reach
scientifically useful conclusions."  

And so forth... all of which might be squeezed into Bayesian, or classical,
parametric, nonparametric, or whatever holes happen to satisfy your
particular "religious" convictions. Or, perhaps even better, be informed by
some good plots (horrors -- no P-Values! ...  but those are **my** religious
convictions). But those are mere statistical details, about which all I can
safely say is: The question is not "Are they the same?"

Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics

P.S. Technical comment (because, alas, I **are** a statistician): You
probably want the ellipsoids you speak of to cover subsets of the
**populations** with some degree of certainty, not of the **data.**
 
Disclaimer: Bert Gunter's opinions only. Associate neither my company nor my
colleagues with my obstreperousness.



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Tom La Bone
Sent: Monday, March 29, 2010 9:56 AM
To: [hidden email]
Subject: Re: [R] Ellipse that Contains 95% of the Observed Data


I know what "get a bigger sample means". I have no clue what "ask a more
statistically meaningful question" means. Can you elaborate a bit?

Tom
--
View this message in context:
http://n4.nabble.com/Ellipse-that-Contains-95-of-the-Observed-Data-tp1694538
p1695357.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Ellipse that Contains 95% of the Observed Data

Bert Gunter
Typo: "**Paul_i** Exclusion Principle"

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...