# Ellipse that Contains 95% of the Observed Data

11 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Ellipse that Contains 95% of the Observed Data

 I can take the results of a simulation with one random variable and generate an empirical interval that contains 95% of the observations, e.g., x <- rnorm(10000) quantile(x,probs=c(0.025,0.975)) Is there an R function that can take the results from two random variables and generate an empirical ellipse that contains 95% of the observations, e.g.,   x <- rnorm(10000) y <- rnorm(10000) ? I am specifically looking for an ellipse that does not assume normality. Tom
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

 Tom La Bone gforcecable.com> writes: > > > I can take the results of a simulation with one random variable and generate > an empirical interval that contains 95% of the observations, e.g., > > x <- rnorm(10000) > quantile(x,probs=c(0.025,0.975)) > > Is there an R function that can take the results from two random variables > and generate an empirical ellipse that contains 95% of the observations, > e.g.,   > > x <- rnorm(10000) > y <- rnorm(10000) > ? > > I am specifically looking for an ellipse that does not assume normality.   I'll be interested to hear what others come up with.   I'm not sure the problem as you have stated it is well-posed, or necessarily possible. Suppose there is a true unknown bivariate probability distribution with a non-elliptical 95% quantile region. Will you be able to draw an ellipse that has the properties you want?   Here's one possible alternative:   library(coda)   library(emdbook)   plot(x,y)   z = HPDregionplot(as.mcmc(cbind(x,y)),add=TRUE,col=2,lwd=2)   is not an ellipse, but does contain (approximately) 95% of the points.   Convex hulls are another plausible approach. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

 In reply to this post by Tom La Bone The bagplot at http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112gives a nonparametric 2-d view analagous to a boxplot. S Ellison > > I can take the results of a simulation with one random variable and generate > an empirical interval that contains 95% of the observations, e.g., > > x <- rnorm(10000) > quantile(x,probs=c(0.025,0.975)) > > Is there an R function that can take the results from two random variables > and generate an empirical ellipse that contains 95% of the observations, > e.g.,   > > x <- rnorm(10000) > y <- rnorm(10000) > ? > > I am specifically looking for an ellipse that does not assume normality.   I'll be interested to hear what others come up with.   I'm not sure the problem as you have stated it is well-posed, or necessarily possible. Suppose there is a true unknown bivariate probability distribution with a non-elliptical 95% quantile region. Will you be able to draw an ellipse that has the properties you want?   Here's one possible alternative:   library(coda)   library(emdbook)   plot(x,y)   z = HPDregionplot(as.mcmc(cbind(x,y)),add=TRUE,col=2,lwd=2)   is not an ellipse, but does contain (approximately) 95% of the points.   Convex hulls are another plausible approach. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html  and provide commented, minimal, self-contained, reproducible code. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

 In reply to this post by Barry Rowlingson Concisely, here is what I am trying to do: #I take a random sample of 300 measurements. After I have the measurements #I post stratify them to 80 type A measurements and 220 type B measurements. #These measurements tend to be lognormally distributed so I fit them to #determine the geometric mean and geometric standard deviation of each stratum. #The question is: are the geometric mean and geometric standard deviation of #the type A measurements the same as the geometric mean and geometric #standard deviation of the type B measurements? library(MASS) library(car) setwd("C:/Documents and Settings/Tom/workspace/Work") source("bagplot.r") #http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112#Here are the data. Unknown to me, they are indeed drawn from the same #lognormal distribution X <- cbind(1:300,rlnorm(300,log(30),log(3))) X.A <- X[1:80,2] X.B <- X[81:300,2] #These are the data I see. Fit the type A and type B measurements fit.XA <- fitdistr(X.A,"lognormal") fit.XB <- fitdistr(X.B,"lognormal") #Fit 2000 random samples of size 80 and 220 and calculate the difference in #the GM and GSD for each x <- numeric(2000) y <- numeric(2000) for (i in 1:2000) {         k <- sample(X[,1],80,replace=FALSE)         x.a <- X[k,2]         x.b <- X[setdiff(1:300,k),2]         fit.xa <- fitdistr(x.a,"lognormal")         fit.xb <- fitdistr(x.b,"lognormal")         x[i] <- coef(fit.xa)[1] - coef(fit.xb)[1]         y[i] <- coef(fit.xa)[2] - coef(fit.xb)[2] } #Create the bagplot and superimpose the 95% joint normal confidence ellipse. #Does the difference in GM and GSD actually observed for the type A and type B #measurements look like the result of the random draw? bagplot(x,y,show.whiskers=FALSE,approx.limit=1000) data.ellipse(x,y,plot.points=FALSE,levels=c(0.95),col="black",center.cex=0) box() points(coef(fit.XA)[1]-coef(fit.XB)[1],coef(fit.XA)[2]-coef(fit.XB)[2],                 cex=1.5,col="black",pch=19)
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

 In reply to this post by Tom La Bone for a picture of the bagplot, try going to http://www.statmethods.net/graphs/boxplot.html
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

 In reply to this post by Tom La Bone Easy. See below. Bert Gunter Genentech Nonclinical Biostatistics    -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Tom La Bone Sent: Monday, March 29, 2010 6:56 AM To: [hidden email] Subject: Re: [R] Ellipse that Contains 95% of the Observed Data Concisely, here is what I am trying to do: #I take a random sample of 300 measurements. After I have the measurements #I post stratify them to 80 type A measurements and 220 type B measurements. #These measurements tend to be lognormally distributed so I fit them to #determine the geometric mean and geometric standard deviation of each stratum. #The question is: are the geometric mean and geometric standard deviation of #the type A measurements the same as the geometric mean and geometric #standard deviation of the type B measurements? -- No. (So you probably need to 1. Ask a more statistically meaningful question. 2. Get a MUCH larger sample (you're talking about bivariate sd's here!!) ) -- Bert ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Ellipse that Contains 95% of the Observed Data

 I know what "get a bigger sample means". I have no clue what "ask a more statistically meaningful question" means. Can you elaborate a bit? Tom
Open this post in threaded view
|
Report Content as Inappropriate