I can take the results of a simulation with one random variable and generate an empirical interval that contains 95% of the observations, e.g.,
x < rnorm(10000) quantile(x,probs=c(0.025,0.975)) Is there an R function that can take the results from two random variables and generate an empirical ellipse that contains 95% of the observations, e.g., x < rnorm(10000) y < rnorm(10000) ? I am specifically looking for an ellipse that does not assume normality. Tom 
Tom La Bone <booboo <at> gforcecable.com> writes:
> > > I can take the results of a simulation with one random variable and generate > an empirical interval that contains 95% of the observations, e.g., > > x < rnorm(10000) > quantile(x,probs=c(0.025,0.975)) > > Is there an R function that can take the results from two random variables > and generate an empirical ellipse that contains 95% of the observations, > e.g., > > x < rnorm(10000) > y < rnorm(10000) > ? > > I am specifically looking for an ellipse that does not assume normality. I'll be interested to hear what others come up with. I'm not sure the problem as you have stated it is wellposed, or necessarily possible. Suppose there is a true unknown bivariate probability distribution with a nonelliptical 95% quantile region. Will you be able to draw an ellipse that has the properties you want? Here's one possible alternative: library(coda) library(emdbook) plot(x,y) z = HPDregionplot(as.mcmc(cbind(x,y)),add=TRUE,col=2,lwd=2) is not an ellipse, but does contain (approximately) 95% of the points. Convex hulls are another plausible approach. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
On Mon, Mar 29, 2010 at 4:02 AM, Ben Bolker <[hidden email]> wrote:
> I'll be interested to hear what others come up with. > I'm not sure the problem as you have stated it is wellposed, or > necessarily possible. Suppose there is a true unknown > bivariate probability distribution with a nonelliptical 95% > quantile region. Will you be able to draw an ellipse that > has the properties you want? I think the problem as posed doesn't produce a unique ellipse. You could start with a circle of radius 0 centered on mean(x),mean(y) and then increase the radius until it has 95% of the points in it. As long as your points are in continuous space and with no coincident points then you could do a simple bisection search on the radius. Similarly you could start with an ellipse of any eccentricity centered at the same point with fixed angle and do the same. And the ellipse doesn't even need to be centered at the mean point  it could be waaay over to the left and eventually as it gets bigger it will gobble up 95% of the points. Obviously with bivariate normallydistributed points we tend to show the ellipse that is numerically derived from the mean and correlation of the two normals, but that's not the only ellipse that takes 95% of the points. So ummm I'm not sure what you should do. What is the question you are trying to answer? Barry ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
On 03/29/2010 07:17 PM, Barry Rowlingson wrote:
> ... > I think the problem as posed doesn't produce a unique ellipse. You > could start with a circle of radius 0 centered on mean(x),mean(y) and > then increase the radius until it has 95% of the points in it. As long > as your points are in continuous space and with no coincident points > then you could do a simple bisection search on the radius. > > Similarly you could start with an ellipse of any eccentricity > centered at the same point with fixed angle and do the same. And the > ellipse doesn't even need to be centered at the mean point  it could > be waaay over to the left and eventually as it gets bigger it will > gobble up 95% of the points. > > Obviously with bivariate normallydistributed points we tend to show > the ellipse that is numerically derived from the mean and correlation > of the two normals, but that's not the only ellipse that takes 95% of > the points. > > So ummm I'm not sure what you should do. What is the question you are > trying to answer? So, why not begin with a problem that is uniquely soluble and achieve the viewpoint of its solution? 1) If we assume that the distribution has a barycenter, then that can be calculated. 2) Calculate the distances of all points from the barycenter, flagging the 5% most distant. 3) Divide the area covered by the points into an arbitrary number of equal sectors, say 10. 4) Within each sector, find the most distant "inner" point and the least distant "outer" point, placing a dot in the middle of the sector at 5% of their difference beyond the radius of the most distant "inner" point. 5) Join the dots. 6) Look at it. Jim ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
In reply to this post by Tom La Bone
The bagplot at http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112 gives a nonparametric 2d view analagous to a boxplot. S Ellison > > I can take the results of a simulation with one random variable and generate > an empirical interval that contains 95% of the observations, e.g., > > x < rnorm(10000) > quantile(x,probs=c(0.025,0.975)) > > Is there an R function that can take the results from two random variables > and generate an empirical ellipse that contains 95% of the observations, > e.g., > > x < rnorm(10000) > y < rnorm(10000) > ? > > I am specifically looking for an ellipse that does not assume normality. I'll be interested to hear what others come up with. I'm not sure the problem as you have stated it is wellposed, or necessarily possible. Suppose there is a true unknown bivariate probability distribution with a nonelliptical 95% quantile region. Will you be able to draw an ellipse that has the properties you want? Here's one possible alternative: library(coda) library(emdbook) plot(x,y) z = HPDregionplot(as.mcmc(cbind(x,y)),add=TRUE,col=2,lwd=2) is not an ellipse, but does contain (approximately) 95% of the points. Convex hulls are another plausible approach. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
In reply to this post by Barry Rowlingson
Concisely, here is what I am trying to do:
#I take a random sample of 300 measurements. After I have the measurements #I post stratify them to 80 type A measurements and 220 type B measurements. #These measurements tend to be lognormally distributed so I fit them to #determine the geometric mean and geometric standard deviation of each stratum. #The question is: are the geometric mean and geometric standard deviation of #the type A measurements the same as the geometric mean and geometric #standard deviation of the type B measurements? library(MASS) library(car) setwd("C:/Documents and Settings/Tom/workspace/Work") source("bagplot.r") #http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112 #Here are the data. Unknown to me, they are indeed drawn from the same #lognormal distribution X < cbind(1:300,rlnorm(300,log(30),log(3))) X.A < X[1:80,2] X.B < X[81:300,2] #These are the data I see. Fit the type A and type B measurements fit.XA < fitdistr(X.A,"lognormal") fit.XB < fitdistr(X.B,"lognormal") #Fit 2000 random samples of size 80 and 220 and calculate the difference in #the GM and GSD for each x < numeric(2000) y < numeric(2000) for (i in 1:2000) { k < sample(X[,1],80,replace=FALSE) x.a < X[k,2] x.b < X[setdiff(1:300,k),2] fit.xa < fitdistr(x.a,"lognormal") fit.xb < fitdistr(x.b,"lognormal") x[i] < coef(fit.xa)[1]  coef(fit.xb)[1] y[i] < coef(fit.xa)[2]  coef(fit.xb)[2] } #Create the bagplot and superimpose the 95% joint normal confidence ellipse. #Does the difference in GM and GSD actually observed for the type A and type B #measurements look like the result of the random draw? bagplot(x,y,show.whiskers=FALSE,approx.limit=1000) data.ellipse(x,y,plot.points=FALSE,levels=c(0.95),col="black",center.cex=0) box() points(coef(fit.XA)[1]coef(fit.XB)[1],coef(fit.XA)[2]coef(fit.XB)[2], cex=1.5,col="black",pch=19) 
In reply to this post by Tom La Bone
for a picture of the bagplot, try going to http://www.statmethods.net/graphs/boxplot.html

In reply to this post by Tom La Bone
Easy. See below.
Bert Gunter Genentech Nonclinical Biostatistics Original Message From: [hidden email] [mailto:[hidden email]] On Behalf Of Tom La Bone Sent: Monday, March 29, 2010 6:56 AM To: [hidden email] Subject: Re: [R] Ellipse that Contains 95% of the Observed Data Concisely, here is what I am trying to do: #I take a random sample of 300 measurements. After I have the measurements #I post stratify them to 80 type A measurements and 220 type B measurements. #These measurements tend to be lognormally distributed so I fit them to #determine the geometric mean and geometric standard deviation of each stratum. #The question is: are the geometric mean and geometric standard deviation of #the type A measurements the same as the geometric mean and geometric #standard deviation of the type B measurements?  No. (So you probably need to 1. Ask a more statistically meaningful question. 2. Get a MUCH larger sample (you're talking about bivariate sd's here!!) )  Bert ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
I know what "get a bigger sample means". I have no clue what "ask a more statistically meaningful question" means. Can you elaborate a bit?
Tom 
Tom:
You asked whether two groups have the same underlying population 1st and 2nd moments. The answer is: no they don't. Nothing is ever exactly the same as anything else (indeed, I think this is the Paul Exclusion Principle ;) ). So quoting Jim Holtman: "What is the question?" That certainly requires someone who knows something about the scientific issues (not me!). But maybe it's something like: "Well, if these two **populations** are more different than a, b, c, ... in population characteristics A, B, and C,... then that is scientifically meaningful." So then you can ask: "Well how can I measure/statistically characterize A,B, and C,...?  How much uncertainty will there be in this characterization (depends on study design and how one characterizes "uncertainty" ) and how much can I tolerate and still reach scientifically useful conclusions." And so forth... all of which might be squeezed into Bayesian, or classical, parametric, nonparametric, or whatever holes happen to satisfy your particular "religious" convictions. Or, perhaps even better, be informed by some good plots (horrors  no PValues! ... but those are **my** religious convictions). But those are mere statistical details, about which all I can safely say is: The question is not "Are they the same?" Cheers, Bert Gunter Genentech Nonclinical Biostatistics P.S. Technical comment (because, alas, I **are** a statistician): You probably want the ellipsoids you speak of to cover subsets of the **populations** with some degree of certainty, not of the **data.** Disclaimer: Bert Gunter's opinions only. Associate neither my company nor my colleagues with my obstreperousness. Original Message From: [hidden email] [mailto:[hidden email]] On Behalf Of Tom La Bone Sent: Monday, March 29, 2010 9:56 AM To: [hidden email] Subject: Re: [R] Ellipse that Contains 95% of the Observed Data I know what "get a bigger sample means". I have no clue what "ask a more statistically meaningful question" means. Can you elaborate a bit? Tom  View this message in context: http://n4.nabble.com/EllipsethatContains95oftheObservedDatatp1694538 p1695357.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
Typo: "**Paul_i** Exclusion Principle"
Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/rhelp PLEASE do read the posting guide http://www.Rproject.org/postingguide.html and provide commented, minimal, selfcontained, reproducible code. 
Free forum by Nabble  Edit this page 