better example for multivariate data simulation question-please help if you can

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

better example for multivariate data simulation question-please help if you can

Andras Farkas
Dear All,
 
a few weeks ago I have posted a question on the R help listserv that some of you have responded to with a great solution, would like to thank you for that  again. I thought I would reach out to you with the issue I am trying to solve now. I have posted the question a few days ago, but probably it was not clear enough, so I thought i try it again. At times I have a multivariate example on my hand with known information of means, SDs and medians for the variables, and the covariance matrix of those variables. Occasionally, these parameters have a strong enough relationship between them that a covariance matrix can be established. Please see attached document as an example. Usually when I (a medicine people) simulate (and it is not to say that this is the best approach), we use a lognormal distribution to avoid from negative values being generated because physiologic variables almost are never negative (we also really do not know better,
 unfortunatelly). For the most part I use another software that is capable of reproducing reasonable means and medians and SD if I enter the covariance matrix, but that is not a free resource (so I can not share the solutions with others), nor does it have the Sweave option for standard reports like R does that can be distributed for free. Unfortunately in R I am having a hard time figuring the solution out. I have tried to use the multivariate normal distribution function mvrnorm from the MASS package, or the Mvnorm from mvtnorm package, but will get negative values simulated, which I can not afford, also, at times the simulated means, medians and SDs are quiet different from what I started with (which may be due to the assumption I make with regards to the distribution of the data). I was wondering if anyone would be willing to provide some thoughts on how you think one should try to attempt to simulate in R a multivariate distribution
 with covariance matrix (using the attached data as an example) that would result in reasonable means, medians and SD as compared to the original values? While to have a better idea about the actual distribution of the data would probably be invaluable to accurately reproduce the data (and to choose a probability distribution to simulate with), often times in the medical literature we only have information available similar to what I have attached, (and we make the assumption of it being log normally distributed as I have mentioned it above). I would greatly appreciate your help,
 
Sincerely,
 
Andras
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: better example for multivariate data simulation question-please help if you can

Michael Weylandt
[Lightly edited for legibility.]

On Fri, Oct 12, 2012 at 7:39 PM, Andras Farkas <[hidden email]> wrote:
> Dear All,
>
> [A] few weeks ago I have posted a question on the R help listserv that some of you have responded to with a great solution, would like to thank you for that  again. I thought I would reach out to you with the issue I am trying to solve now. I have posted the question a few days ago, but probably it was not clear enough, so I thought i try it again. [\n\n]
>
> At times I have a multivariate example on my hand with known information of means, SDs and medians for the variables, and the covariance matrix of those variables. Occasionally, these parameters have a strong enough relationship between them that a covariance matrix can be established. Please see attached document as an example. [\n\n]

> Usually when I (a medicine people) simulate (and it is not to say that this is the best approach), we use a lognormal distribution to avoid from negative values being generated because physiologic variables almost are never negative (we also really do not know better, unfortunatelly). For the most part I use another software that is capable of reproducing reasonable means and medians and SD if I enter the covariance matrix, but that is not a free resource (so I can not share the solutions with others), nor does it have the Sweave option for standard reports like R does that can be distributed for free. Unfortunately in R I am having a hard time figuring the solution out. I have tried to use the multivariate normal distribution function mvrnorm from the MASS package, or the Mvnorm from mvtnorm package, but will get negative values simulated, which I can not afford, also, at times the simulated means, medians and SDs are quiet different from what I started with (which may be!
  due to the assumption I make with regards to the distribution of the data). [\n\n]
>
> I was wondering if anyone would be willing to provide some thoughts on how you think one should try to attempt to simulate in R a multivariate distribution with covariance matrix (using the attached data as an example) that would result in reasonable means, medians and SD as compared to the original values? While to have a better idea about the actual distribution of the data would probably be invaluable to accurately reproduce the data (and to choose a probability distribution to simulate with), often times in the medical literature we only have information available similar to what I have attached, (and we make the assumption of it being log normally distributed as I have mentioned it above). I would greatly appreciate your help,
>
> Sincerely,
>
> Andras
> ______________________________________________

Hi Andras,

It seems that your attachment did not make it through the mail server:
you probably need to include it inline as plain text if it's a
reasonable size.

Anyways, I believe your problem is that mvrnorm() et al generate
multivariate _normals_, not multivariate lognormals. Perhaps have a
look at these functions:
http://rss.acs.unt.edu/Rdoc/library/compositions/html/rlnorm.html You
might also think about truncated normals.

Cheers,
Michael

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.