I was trying to calculate ratios with confidence interval using Monte Carlo
simulation but I could not figure it out.
Here is the example of my data (see below), I want to calculate ratios
(dat$v1/dat$v3 & dat$v2/dat$v3) and its confidence intervals using a 100
randomly selected data sets.
Could you please give me your suggestions how I can estimate ratios with
CI?
dat<structure(list(v1 = c(NA, TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE,
NA, NA, TRUE, TRUE, TRUE, TRUE, NA, NA, TRUE, TRUE), v2 = c(TRUE,
NA, NA, NA, NA, TRUE, NA, NA, TRUE, TRUE, NA, TRUE, TRUE, NA,
NA, TRUE, TRUE, NA), v3 = c(TRUE, TRUE, NA, TRUE, TRUE, NA, NA,
TRUE, TRUE, NA, NA, TRUE, TRUE, TRUE, NA, NA, TRUE, NA)), .Names = c("v1",
"v2", "v3"), class = "data.frame", row.names = c(NA, 18L))
ratio1<length(which(dat$v1 == "TRUE"))/length(which(dat$v3 == "TRUE"))
ratio2<length(which(dat$v2 == "TRUE"))/length(which(dat$v3 == "TRUE"))
> ratio1 < with(dat, sum(v1,na.rm = TRUE)/sum(v3,na.rm=TRUE))
> ratio1
[1] 1.2
It looks like you should spend some more time with an R tutorial or two.
This is basic stuff (if I understand what you wanted correctly).
Also, this is not how a "confidence interval" should be calculated, but
that is another off topic discussion for which stats.stackexchange.com is a
more appropriate venue.
Dear Bert,
Thank you very much for the response.
I did it manually but I could not put them in a loop so that I created the
table manually with selecting the rows randomly several times. Here what I
have done so far, please find it. I want to create the table 100 times and
calculate its mean and CI from those 100 values. If anyone can give me some
hint to make a loop, that would be great. I am very grateful with your help.
Thanks,
library(dplyr)
library(plyr)
dat<structure(list(v1 = c(NA, TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE,
NA, NA, TRUE, TRUE, TRUE, TRUE, NA, NA, TRUE, TRUE), v2 = c(TRUE,
NA, NA, NA, NA, TRUE, NA, NA, TRUE, TRUE, NA, TRUE, TRUE, NA,
NA, TRUE, TRUE, NA), v3 = c(TRUE, TRUE, NA, TRUE, TRUE, NA, NA,
TRUE, TRUE, NA, NA, TRUE, TRUE, TRUE, NA, NA, TRUE, NA)), .Names = c("v1",
"v2", "v3"), class = "data.frame", row.names = c(NA, 18L))
ratio1 < with(dat, sum(v1,na.rm = TRUE)/sum(v3,na.rm=TRUE))
ratio2 < with(dat, sum(v2,na.rm = TRUE)/sum(v3,na.rm=TRUE))
#
A1<sample_n(dat1, 16)# created a table with selecting a 16 sample size
(rows)
A1.ratio1<with(A1, sum(v1,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A1.ratio2 < with(A1, sum(v2,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A1.Table<data.frame(Ratio1=A1.ratio1, Ratio2=A1.ratio2)
#
A2<sample_n(dat1, 16)
A2.ratio1<with(A2, sum(v1,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A2.ratio2 < with(A2, sum(v2,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A2.Table<data.frame(Ratio1=A2.ratio1, Ratio2=A2.ratio2)
#
A3<sample_n(dat1, 16)
A3.ratio1<with(A3, sum(v1,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A3.ratio2 < with(A3, sum(v2,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A3.Table<data.frame(Ratio1=A3.ratio1, Ratio2=A3.ratio2)
#
##..............
# I was thinking to repeat this procedure 100 times and calculate the ratio
A100<sample_n(dat1, 16)
A100.ratio1<with(A100, sum(v1,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A100.ratio2 < with(A100, sum(v2,na.rm = TRUE)/sum(v3,na.rm=TRUE))
A100.Table<data.frame(Ratio1=A100.ratio1, Ratio2=A100.ratio2)
#
Tab<rbind(A1.Table, A2.Table, A3.Table, A100.Table)
#Compute the mean for each ratio
Ratio1<mean(Table1[,1])
Ratio2<mean(Table1[,2])
summary < ddply(subset(Tab), c(""),summarise,
N = length(Tab),
mean.R1 = mean(Ratio1, na.rm=T),
median.R1=median(Ratio1, na.rm=T),
sd.R1 = sd(Ratio1, na.rm=T),
se.R1 = sd / sqrt(N),
LCI.95.R1=mean.R11.95*se.R1,
UCI.95.R1=mean.R1+1.95*se.R1,
mean.R2 = mean(Ratio2, na.rm=T),
median.R2=median(Ratio2, na.rm=T),
sd.R2 = sd(Ratio2, na.rm=T),
se.R2 = sd / sqrt(N),
LCI.95.R2=mean.R21.95*se.R2,
UCI.95.R2=mean.R2+1.95*se.R2
)
summary
Do you really not know how to use a for loop? The tutorial recommendation seems apropos...
I second the vote on needing a tutorial. You need to learn about how R does things and get familiar with vectorization and the apply() family of functions. You defined dat but not dat1 in your code so I'll just use dat. First, to get the ratios:
(ratios < colSums(dat[3], na.rm=TRUE)/colSums(dat[3], na.rm=TRUE))
# v1 v2
# 1.2 0.8
Then create a function for the Monte Carlo simulation that generates a sample and computes the ratios. Finally, use the function with replicate() to generate the 100 samples:
nratios < function(x) {
sdat < x[sample.int(18,16), ]
colSums(sdat[3], na.rm=TRUE)/colSums(sdat[3], na.rm=TRUE)
}
mcrat < replicate(100, nratios(dat))
str(mcrat)
# num [1:2, 1:100] 1 0.8 1.222 0.778 1.111 ...
#  attr(*, "dimnames")=List of 2
# ..$ : chr [1:2] "v1" "v2"
# ..$ : NULL
100 values of ratio1 are stored as mcrat["v1", ] and 100 values of ratio2 are stored as mcrat["v2", ].
Now you can generate your summary statistics.

