simulation data with dichotomous varuables

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

simulation data with dichotomous varuables

thanoon younis
Dear R-users
i need your help to solve my problem in the code below, i  want to simulate
two different samples R1 and R2 and each sample has 10 variables and 1000
observations so i want to simulate a data with high correlation between
var. in R1 and also in R2 and no correlation between R1 and R2 also i have
a problem with correlation coefficient between tow dichotomous var. the R-
program supports just these types of correlation coefficients such as
pearson, spearman,kendall.

thanks alot in advance

Thanoon


ords <- seq(0,1)
p <- 10
N <- 1000
percent_change <- 0.9

R1 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
R2 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
# pearson is more appropriate for dichotomous data
cor(R1, R2, method = "pearson")


# subset variable to have a stronger correlation


v1 <- R1[,1, drop = FALSE]
v1 <- R2[,1, drop = FALSE]
# randomly choose which rows to retain
keep <- sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1))
 change <- as.numeric(rownames(v1)[-keep])

# randomly choose new values for changing
new.change <- sample(ords, ((1-percent_change)*N)+1, replace = T)

# replace values in copy of original column
v1.samp <- v1
 v1.samp[change,] <- new.change

# closer correlation
cor(v1, v1.samp, method = "pearson")

# set correlated column as one of your other columns
R1[,2] <- v1.samp
R2[,2] <- v1.samp
R1
R2

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: simulation data with dichotomous varuables

William Revelle
Dear Thanoon,
 You might look at the various item simulation functions in the psych package.

In particular, for your problem:

R1 <- sim.irt(10,1000,a=3,low = -2, high=2)
R2 <-  sim.irt(10,1000,a=3,low = -2, high=2)
R12 <- data.frame(R1$items,R2$items)
#this gives you 20 items, grouped with high correlations within the first 10, and the second 10, no correlation between the first and second sets.
rho <- tetrachoric(R12)$rho  #find the tetrachoric correlation between the items
lowerMat(rho)  #show the correlations
cor.plot(rho,numbers=TRUE)   #show a heat map of the correlations

Bill


On Aug 4, 2014, at 8:08 PM, thanoon younis <[hidden email]> wrote:

> Dear R-users
> i need your help to solve my problem in the code below, i  want to simulate
> two different samples R1 and R2 and each sample has 10 variables and 1000
> observations so i want to simulate a data with high correlation between
> var. in R1 and also in R2 and no correlation between R1 and R2 also i have
> a problem with correlation coefficient between tow dichotomous var. the R-
> program supports just these types of correlation coefficients such as
> pearson, spearman,kendall.
>
> thanks alot in advance
>
> Thanoon
>
>
> ords <- seq(0,1)
> p <- 10
> N <- 1000
> percent_change <- 0.9
>
> R1 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
> R2 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
> # pearson is more appropriate for dichotomous data
> cor(R1, R2, method = "pearson")
>
>
> # subset variable to have a stronger correlation
>
>
> v1 <- R1[,1, drop = FALSE]
> v1 <- R2[,1, drop = FALSE]
> # randomly choose which rows to retain
> keep <- sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1))
> change <- as.numeric(rownames(v1)[-keep])
>
> # randomly choose new values for changing
> new.change <- sample(ords, ((1-percent_change)*N)+1, replace = T)
>
> # replace values in copy of original column
> v1.samp <- v1
> v1.samp[change,] <- new.change
>
> # closer correlation
> cor(v1, v1.samp, method = "pearson")
>
> # set correlated column as one of your other columns
> R1[,2] <- v1.samp
> R2[,2] <- v1.samp
> R1
> R2
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

William Revelle           http://personality-project.org/revelle.html
Professor           http://personality-project.org
Department of Psychology   http://www.wcas.northwestern.edu/psych/
Northwestern University   http://www.northwestern.edu/
Use R for psychology             http://personality-project.org/r
It is 5 minutes to midnight   http://www.thebulletin.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.