Random Normal Variable Correlated to an Existing Binomial Variable

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Random Normal Variable Correlated to an Existing Binomial Variable

Shane Phillips
Hi, R-Helpers!

I have a dataframe that contains a binomial variable.  I need to add another random variable drawn from a normal distribution with a specific mean and standard deviation.  This variable also needs to be correlated with the existing binomial variable with a specific correlation (say .75).  Any ideas?

Thanks!

Shane
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Normal Variable Correlated to an Existing Binomial Variable

Petr Savicky-2
On Sun, Apr 24, 2011 at 07:00:26PM -0400, Shane Phillips wrote:
> Hi, R-Helpers!
>
> I have a dataframe that contains a binomial variable.  I need to add another random variable drawn from a normal distribution with a specific mean and standard deviation.  This variable also needs to be correlated with the existing binomial variable with a specific correlation (say .75).  Any ideas?

Hi.

If X, Y are dependent random variables and we want to generate y, so
that (x, y) is a pair from their joint distribution with known x,
then y should be generated from the conditional distribution P(Y|X=x).
If the probability P(X=x) is not too small, then this may be done by
rejection sampling: Generate pairs (X, Y) until the condition X=x is
satisfied and use the corresponding Y.

It remains to generate pairs (X, Y), where Y is a normal variable
and X a binomial one. The parameters of Y are known, the parameters
of X should be chosen somehow and the correlation of X and Y is
known. I suggest the following. Compute the distribution of X as a
vector of probabilities p_0, ..., p_n (see ?dbinom). Find a nondecreasing
function f() from reals to {0, .., n} such that f(Y) has distribution
p_0, ..., p_n. The function may be determined by a sequence of
cutpoints a_1, ..., a_n defining f(y) as follows

  y              f(y)  
  (-infty, a_1)  0
  [a_1, a_2)     1
  ...
  [a_n, infty)   n

For each i, the cutpoint a_i is the (p_0 + ... + p_{i-1})-quantile of Y
(see ?qnorm). See ?cut for computing f().

The pair (f(Y), Y) has the required marginal distributions and, in my
opinion, the maximal possible correlation. If this correlation is lower
than the requested one, then i think there is no solution.

If the correlation of (f(Y), Y) is at least the required one, then use
a mixture of the distribution (f(Y), Y) and (X, Y), where X has the
required marginal distribution of X, but is generated independently
from Y. The mixture parameter may be determined as a solution of an
equation with one variable.

Hope this helps.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Random Normal Variable Correlated to an Existing BinomialVariable

Enrico Schumann
In reply to this post by Shane Phillips


Hi,

do you know the parameters of the binomial variate? then maybe you could use
something like the code below. as Petr pointed out, it is generally not
guaranteed that you can create variates with any linear correlation (ie,
depending on the parameters of the binomial)

n <- 100    # how many variates

# your binomial variate (example)
size <- 10; prob <- 0.2
vecB <- rbinom(n, size = size, prob = prob)

rho <- 0.75  # desired cor
m   <- 0.5   # mean and sd of Gaussian
sig <- 2

rho <- 2*sin(rho*pi/6)  # a small correction
C <- matrix(rho, nrow = 2, ncol = 2)
diag(C) <- 1; C <- chol(C)

# (1) transform binomial to Gaussian
X1 <- qnorm(pbinom(vecB, size = size, prob = prob))
# (2) create another Gaussian
X2 <- rnorm(n)
X <- cbind(X1,X2)
# (3) induce correlation (does not change X1)
X <- X %*% C
# (4) make uniforms
U <- pnorm(X)
# (5) ... and put them into the inverses
vecB1 <- qbinom(U[,1],size,prob)
vecG <- qnorm(U[,2], mean = m, sd = sig)

# check
plot(vecB1,vecG)
cor(vecB1,vecG)
all.equal(vecB1,vecB)
sd(vecG)

(linear correlation is not affected by linear transformation, so you can
enforce exactly your desired mean and standard deviation for the Gaussian by
rescaling it in the end)


regards,
enrico

> -----Urspr√ľngliche Nachricht-----
> Von: [hidden email]
> [mailto:[hidden email]] Im Auftrag von Shane Phillips
> Gesendet: Montag, 25. April 2011 01:00
> An: [hidden email]
> Betreff: [R] Random Normal Variable Correlated to an Existing
> BinomialVariable
>
> Hi, R-Helpers!
>
> I have a dataframe that contains a binomial variable.  I need
> to add another random variable drawn from a normal
> distribution with a specific mean and standard deviation.  
> This variable also needs to be correlated with the existing
> binomial variable with a specific correlation (say .75).  Any ideas?
>
> Thanks!
>
> Shane
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.