# Trying to speed up an if/else statement in simulations

## Trying to speed up an if/else statement in simulations

 Dear R-help, I am trying to write a function to simulate datasets of size n which contain two time-to-event outcome variables with associated 'Event'/'Censored' indicator variables (flag1 and flag2 respectively). One of these indicator variables needs to be dependent on the other, so I am creating the first and trying to use this to create the second using an if/else statement. My data structure needs to follow this algorithm (for each row of the data): If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero otherwise I can set up this example quite simply using if else statements, but this is incredibly inefficient when running thousands of datasets: data<-as.data.frame(rbinom(10,1,0.5)) colnames(data)<-'flag1' for (i in 1:n) {   if (data\$flag1[i]==1) {data\$flag2[i]<-rbinom(1,1,0.95)} else {data\$flag2[i]<-rbinom(1,1,0.5)}    } I think to speed up the simulations I would be better changing to vectorisation and using something like: ifelse(data\$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) but the rbinom statements here generate one value and repeat this draw for every element of flag2 that matches the 'if' statement on flag1. Is there a way to assign flag2 to a new bernoulli draw for each subject in the data frame with flag1=1? I hope my question is clear, and thank you in advance for your help. Thanks, Natalie PhD student, Reading University P.S. I am using R 2.12.1 on Windows 7.
## Re: Trying to speed up an if/else statement in simulations

 inline below Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of nqf > Sent: Monday, June 18, 2012 10:30 AM > To: [hidden email] > Subject: [R] Trying to speed up an if/else statement in simulations > > Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { >   if (data\$flag1[i]==1) {data\$flag2[i]<-rbinom(1,1,0.95)} else > {data\$flag2[i]<-rbinom(1,1,0.5)} >  } Do you mean    n <- 10    data <- data.frame(flag1 = rbinom(n, 1, 0.5))    for(i in seq_len(n)) {       if (data\$flag1[i]==1) {data\$flag2[i]<-rbinom(1,1,0.95)} else {data\$flag2[i]<-rbinom(1,1,0.5)}   } ? > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data\$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1. Assuming that n is nrow(data), use    data\$flag2 <- ifelse(data\$flag1==1, rbinom(n, 1, 0.95), rbinom(n, 1, 0.5)) so you get n independent draws from each distribution instead of 1.  I prefer to factor out the common arguments as in    data\$flag2 <- rbinom(n, 1, ifelse(data\$flag1==1, 0.95, 0.5))
## Re: Trying to speed up an if/else statement in simulations

 In reply to this post by nqf On Jun 18, 2012, at 1:29 PM, nqf wrote: > Dear R-help, > > I am trying to write a function to simulate datasets of size n which   > contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these   > indicator > variables needs to be dependent on the other, so I am creating the   > first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of   > the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero   > otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but   > this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { >  if (data\$flag1[i]==1) {data\$flag2[i]<-rbinom(1,1,0.95)} else > {data\$flag2[i]<-rbinom(1,1,0.5)} > } > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data\$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this   > draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each   > subject in > the data frame with flag1=1? If the parameters for the the Bernoulli draws stay the same, as they   appear to do then all you need to do is read the help page for   `rbinom` and use the appropriate call to create vectors that are as   long as data\$flag. -- David Winsemius, MD West Hartford, CT
## Re: Trying to speed up an if/else statement in simulations

## Re: Trying to speed up an if/else statement in simulations

 In reply to this post by nqf I might try something like: data[data\$flag1 == 1, "flag2"] <- runif(sum(data\$flag1 == 1)) < 0.95 and similarly for the other case. Hope this helps, Michael On Mon, Jun 18, 2012 at 12:29 PM, nqf <[hidden email]> wrote: > Dear R-help, > > I am trying to write a function to simulate datasets of size n which contain > two time-to-event outcome variables with associated 'Event'/'Censored' > indicator variables (flag1 and flag2 respectively). One of these indicator > variables needs to be dependent on the other, so I am creating the first and > trying to use this to create the second using an if/else statement. > > My data structure needs to follow this algorithm (for each row of the data): > If flag1=1 then flag2 should be 1 with probability 0.95 and zero otherwise > Else if flag1=0 then flag2 should be 1 with probability 0.5 and zero > otherwise > > I can set up this example quite simply using if else statements, but this is > incredibly inefficient when running thousands of datasets: > data<-as.data.frame(rbinom(10,1,0.5)) > colnames(data)<-'flag1' > for (i in 1:n) { >  if (data\$flag1[i]==1) {data\$flag2[i]<-rbinom(1,1,0.95)} else > {data\$flag2[i]<-rbinom(1,1,0.5)} >  } > > > I think to speed up the simulations I would be better changing to > vectorisation and using something like: > ifelse(data\$flag1==1,rbinom(1,1,0.95),rbinom(1,1,0.5)) > but the rbinom statements here generate one value and repeat this draw for > every element of flag2 that matches the 'if' statement on flag1. > > Is there a way to assign flag2 to a new bernoulli draw for each subject in > the data frame with flag1=1? > > I hope my question is clear, and thank you in advance for your help. > > Thanks, > Natalie > PhD student, Reading University > > P.S. I am using R 2.12.1 on Windows 7.
