On Sun, Apr 01, 2012 at 06:00:43PM -0700, Burak Aydin wrote:

> Hello Greg,

> Sorry for the confusion.

> Lets say, I have a population. I have 6 variables. They are correlated to

> each other. I can get you pearson correlation, tetrachoric or polychoric

> correlation coefficients.

> 2 of them continuous, 2 binary, 2 categorical.

> Lets assume following conditions;

> Co1 and Co2 are normally distributed continuous random variables. Co1-- N

> (0,1), Co2--N(100,15)

> Ca1 and Ca2 are categorical variables. Ca1 probabilities

> =c(.02,.18,.28,.22,.30), Ca2 probs =c(.06,.18,.76)

> Bi1 and Bi2 are binaries, Marginal probabilities Bi1 p= 0.4, Bi2 p=0.5.

> And , again, I have the correlations.

>

> When I try to simulate this population I fail. If I keep the means and

> probabilities same I lost the correct correlations. When I keep

> correlations, I loose precision on means and frequencies/probabilities.

Hi.

One idea, which occured to me, is the following. Formulate a model of

the joint distribution with some parameters and a criterion function,

which measures how much the data generated from the model differ from

the required marginal distributions and the required correlations. Then

run an optimization of the parameters to minimize the difference.

If you have enough data, then the model can be a table of estimated

probabilities for all 5*3*2*2 = 60 combinations of the discrete

variables and for each of these combinations the parameters of the

conditional distribution on the 2 continuous variables, which can

be a bivariate normal distribution. However, you probably do not have

enough data for this.

Another approach starts from the distribution of the continuous

variables and the model for the discrete variables can be a logistic

model using the continuous variables as input.

Another type of a model, which may be suitable, is a Bayesian network.

For this, you need to choose only a subset of the most important dependencies,

so that the selected dependencies can be represented by a directed acyclic

graph.

Petr Savicky.

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.