How to generate SE for the proportion value using a randomization process in R?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

How to generate SE for the proportion value using a randomization process in R?

Marna Wagley
Hi All,
I was trying to estimate standard error (SE) for the proportion value using
some kind of randomization process (bootstrapping or jackknifing) in R, but
I could not figure it out.

Is there any way to generate SE for the proportion?

The example of the data and the code I am using is attached for your
reference. I would like to generate the value of proportion with a SE using
a 1000 times randomization.

dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
  mutate(Time1.but.not.in.Time2 = case_when(
            Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
Time2.but.not.in.Time1 = case_when(
            Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
 BothTimes = case_when(
            Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
 daT
 summary(daT)

cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
ProportionValue
standard error??

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Rui Barradas
Hello,

Something like this, using base package boot?


library(boot)

bootprop <- function(data, index){
   d <- data[index, ]
   sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)


Hope this helps,

Rui Barradas

Às 21:57 de 22/01/21, Marna Wagley escreveu:

> Hi All,
> I was trying to estimate standard error (SE) for the proportion value using
> some kind of randomization process (bootstrapping or jackknifing) in R, but
> I could not figure it out.
>
> Is there any way to generate SE for the proportion?
>
> The example of the data and the code I am using is attached for your
> reference. I would like to generate the value of proportion with a SE using
> a 1000 times randomization.
>
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
> "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
>    mutate(Time1.but.not.in.Time2 = case_when(
>              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> Time2.but.not.in.Time1 = case_when(
>              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
>   BothTimes = case_when(
>              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
>   daT
>   summary(daT)
>
> cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
> ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
> ProportionValue
> standard error??
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Marna Wagley
Dear Rui,
I was wondering whether we have to square root of SD to find SE, right?

bootprop <- function(data, index){
   d <- data[index, ]
   sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)

hist(b$t, freq = FALSE)




On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <[hidden email]> wrote:

> Hello,
>
> Something like this, using base package boot?
>
>
> library(boot)
>
> bootprop <- function(data, index){
>    d <- data[index, ]
>    sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0     # original
> sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> hist(b$t, freq = FALSE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 21:57 de 22/01/21, Marna Wagley escreveu:
> > Hi All,
> > I was trying to estimate standard error (SE) for the proportion value
> using
> > some kind of randomization process (bootstrapping or jackknifing) in R,
> but
> > I could not figure it out.
> >
> > Is there any way to generate SE for the proportion?
> >
> > The example of the data and the code I am using is attached for your
> > reference. I would like to generate the value of proportion with a SE
> using
> > a 1000 times randomization.
> >
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
> c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> >    mutate(Time1.but.not.in.Time2 = case_when(
> >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >   BothTimes = case_when(
> >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> >   daT
> >   summary(daT)
> >
> > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
> > ProportionValue
> > standard error??
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Rui Barradas
Hello,

Inline.

Às 07:47 de 23/01/21, Marna Wagley escreveu:
> Dear Rui,
> I was wondering whether we have to square root of SD to find SE, right?

No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name at a
command line.


sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
#    na.rm = na.rm))
#<bytecode: 0x55f3ce900848>
#<environment: namespace:stats>



>
> bootprop <- function(data, index){
>     d <- data[index, ]
>     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0     # original
> sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> sd(b$t)/sqrt(1000)
> pandit*(1-pandit)
>
> hist(b$t, freq = FALSE)


Try plotting the normal densities for both cases, the red line is
clearly wrong.


f <- function(x, xbar, s){
   dnorm(x, mean = xbar, sd = s)
}

hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
"red", add = TRUE)


Hope this helps,

Rui Barradas

>
>
>
>
> On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     Something like this, using base package boot?
>
>
>     library(boot)
>
>     bootprop <- function(data, index){
>         d <- data[index, ]
>         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>     }
>
>     R <- 1e3
>     set.seed(2020)
>     b <- boot(daT, bootprop, R)
>     b
>     b$t0     # original
>     sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>     hist(b$t, freq = FALSE)
>
>
>     Hope this helps,
>
>     Rui Barradas
>
>     Às 21:57 de 22/01/21, Marna Wagley escreveu:
>      > Hi All,
>      > I was trying to estimate standard error (SE) for the proportion
>     value using
>      > some kind of randomization process (bootstrapping or jackknifing)
>     in R, but
>      > I could not figure it out.
>      >
>      > Is there any way to generate SE for the proportion?
>      >
>      > The example of the data and the code I am using is attached for your
>      > reference. I would like to generate the value of proportion with
>     a SE using
>      > a 1000 times randomization.
>      >
>      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
>      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
>     = c("id1",
>      > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
>      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
>      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
>      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
>      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
>      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
>     "data.frame",
>      > row.names = c(NA,
>      > -19L))
>      > daT<-data.frame(dat %>%
>      >    mutate(Time1.but.not.in.Time2 = case_when(
>      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
>      > Time2.but.not.in.Time1 = case_when(
>      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
>      >   BothTimes = case_when(
>      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
>      >   daT
>      >   summary(daT)
>      >
>      > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>      > "BothTimes")
>      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>      > summary(daT)
>      > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
>      > ProportionValue
>      > standard error??
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > [hidden email] <mailto:[hidden email]> mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
>      >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Marna Wagley
Yes Rui, I can see we don't need to divide by square root of sample size.
The example is great to understand it.
Thank you.
Marna


On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <[hidden email]> wrote:

> Hello,
>
> Inline.
>
> Às 07:47 de 23/01/21, Marna Wagley escreveu:
> > Dear Rui,
> > I was wondering whether we have to square root of SD to find SE, right?
>
> No, we don't. var already divides by n, don't divide again.
> This is the code, that can be seen by running the function name at a
> command line.
>
>
> sd
> #function (x, na.rm = FALSE)
> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> #    na.rm = na.rm))
> #<bytecode: 0x55f3ce900848>
> #<environment: namespace:stats>
>
>
>
> >
> > bootprop <- function(data, index){
> >     d <- data[index, ]
> >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 1e3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0     # original
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > sd(b$t)/sqrt(1000)
> > pandit*(1-pandit)
> >
> > hist(b$t, freq = FALSE)
>
>
> Try plotting the normal densities for both cases, the red line is
> clearly wrong.
>
>
> f <- function(x, xbar, s){
>    dnorm(x, mean = xbar, sd = s)
> }
>
> hist(b$t, freq = FALSE)
> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
> add = TRUE)
> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
> "red", add = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> >
> >
> >
> >
> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <[hidden email]
> > <mailto:[hidden email]>> wrote:
> >
> >     Hello,
> >
> >     Something like this, using base package boot?
> >
> >
> >     library(boot)
> >
> >     bootprop <- function(data, index){
> >         d <- data[index, ]
> >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
> TRUE)
> >     }
> >
> >     R <- 1e3
> >     set.seed(2020)
> >     b <- boot(daT, bootprop, R)
> >     b
> >     b$t0     # original
> >     sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> >     hist(b$t, freq = FALSE)
> >
> >
> >     Hope this helps,
> >
> >     Rui Barradas
> >
> >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
> >      > Hi All,
> >      > I was trying to estimate standard error (SE) for the proportion
> >     value using
> >      > some kind of randomization process (bootstrapping or jackknifing)
> >     in R, but
> >      > I could not figure it out.
> >      >
> >      > Is there any way to generate SE for the proportion?
> >      >
> >      > The example of the data and the code I am using is attached for
> your
> >      > reference. I would like to generate the value of proportion with
> >     a SE using
> >      > a 1000 times randomization.
> >      >
> >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
> 16L,
> >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
> >     = c("id1",
> >      > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> >      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> >      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
> >     "data.frame",
> >      > row.names = c(NA,
> >      > -19L))
> >      > daT<-data.frame(dat %>%
> >      >    mutate(Time1.but.not.in.Time2 = case_when(
> >      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> >      > Time2.but.not.in.Time1 = case_when(
> >      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >      >   BothTimes = case_when(
> >      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> >      >   daT
> >      >   summary(daT)
> >      >
> >      > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >      > "BothTimes")
> >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> >      > summary(daT)
> >      > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
> na.rm=T)
> >      > ProportionValue
> >      > standard error??
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > ______________________________________________
> >      > [hidden email] <mailto:[hidden email]> mailing list
> >     -- To UNSUBSCRIBE and more, see
> >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >      > PLEASE do read the posting guide
> >     http://www.R-project.org/posting-guide.html
> >      > and provide commented, minimal, self-contained, reproducible code.
> >      >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Marna Wagley
Hi Rui,
I am sorry for asking you several questions.

In the given example, randomizations (reshuffle) were done 1000 times, and
its 1000 proportion values (results) are stored and it can be seen using
b$t; but I was wondering how the table was randomized (which rows have been
missed/or repeated in each randomizing procedure?).

Is there any way we can see the randomized table and its associated
results? Here in this example, I randomized (or bootstrapped) the table
into three times (R=3) so I would like to store these three tables and look
at them later to know which rows were repeated/missed. Is there any
possibility?
The example data and the code is given below.

Thank you for your help.

####
library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
  mutate(Time1.but.not.in.Time2 = case_when(
            Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
Time2.but.not.in.Time1 = case_when(
            Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
 BothTimes = case_when(
            Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)

bootprop <- function(data, index){
   d <- data[index, ]
   sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}

R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0     # original
b$t
sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)

str(b)
b$data
b$seed
b$sim
b$strata
################


On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <[hidden email]>
wrote:

> Yes Rui, I can see we don't need to divide by square root of sample size.
> The example is great to understand it.
> Thank you.
> Marna
>
>
> On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <[hidden email]>
> wrote:
>
>> Hello,
>>
>> Inline.
>>
>> Às 07:47 de 23/01/21, Marna Wagley escreveu:
>> > Dear Rui,
>> > I was wondering whether we have to square root of SD to find SE, right?
>>
>> No, we don't. var already divides by n, don't divide again.
>> This is the code, that can be seen by running the function name at a
>> command line.
>>
>>
>> sd
>> #function (x, na.rm = FALSE)
>> #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
>> #    na.rm = na.rm))
>> #<bytecode: 0x55f3ce900848>
>> #<environment: namespace:stats>
>>
>>
>>
>> >
>> > bootprop <- function(data, index){
>> >     d <- data[index, ]
>> >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>> > }
>> >
>> > R <- 1e3
>> > set.seed(2020)
>> > b <- boot(daT, bootprop, R)
>> > b
>> > b$t0     # original
>> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>> > sd(b$t)/sqrt(1000)
>> > pandit*(1-pandit)
>> >
>> > hist(b$t, freq = FALSE)
>>
>>
>> Try plotting the normal densities for both cases, the red line is
>> clearly wrong.
>>
>>
>> f <- function(x, xbar, s){
>>    dnorm(x, mean = xbar, sd = s)
>> }
>>
>> hist(b$t, freq = FALSE)
>> curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
>> add = TRUE)
>> curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
>> "red", add = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> >
>> >
>> >
>> >
>> > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <[hidden email]
>> > <mailto:[hidden email]>> wrote:
>> >
>> >     Hello,
>> >
>> >     Something like this, using base package boot?
>> >
>> >
>> >     library(boot)
>> >
>> >     bootprop <- function(data, index){
>> >         d <- data[index, ]
>> >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
>> TRUE)
>> >     }
>> >
>> >     R <- 1e3
>> >     set.seed(2020)
>> >     b <- boot(daT, bootprop, R)
>> >     b
>> >     b$t0     # original
>> >     sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>> >     hist(b$t, freq = FALSE)
>> >
>> >
>> >     Hope this helps,
>> >
>> >     Rui Barradas
>> >
>> >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
>> >      > Hi All,
>> >      > I was trying to estimate standard error (SE) for the proportion
>> >     value using
>> >      > some kind of randomization process (bootstrapping or jackknifing)
>> >     in R, but
>> >      > I could not figure it out.
>> >      >
>> >      > Is there any way to generate SE for the proportion?
>> >      >
>> >      > The example of the data and the code I am using is attached for
>> your
>> >      > reference. I would like to generate the value of proportion with
>> >     a SE using
>> >      > a 1000 times randomization.
>> >      >
>> >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
>> 16L,
>> >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
>> >     = c("id1",
>> >      > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
>> >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
>> >      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
>> >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 =
>> c(1L,
>> >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
>> >      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
>> >     "data.frame",
>> >      > row.names = c(NA,
>> >      > -19L))
>> >      > daT<-data.frame(dat %>%
>> >      >    mutate(Time1.but.not.in.Time2 = case_when(
>> >      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
>> >      > Time2.but.not.in.Time1 = case_when(
>> >      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
>> >      >   BothTimes = case_when(
>> >      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
>> >      >   daT
>> >      >   summary(daT)
>> >      >
>> >      > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>> >      > "BothTimes")
>> >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>> >      > summary(daT)
>> >      > ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
>> na.rm=T)
>> >      > ProportionValue
>> >      > standard error??
>> >      >
>> >      >       [[alternative HTML version deleted]]
>> >      >
>> >      > ______________________________________________
>> >      > [hidden email] <mailto:[hidden email]> mailing list
>> >     -- To UNSUBSCRIBE and more, see
>> >      > https://stat.ethz.ch/mailman/listinfo/r-help
>> >      > PLEASE do read the posting guide
>> >     http://www.R-project.org/posting-guide.html
>> >      > and provide commented, minimal, self-contained, reproducible
>> code.
>> >      >
>> >
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Rui Barradas
Hello,

I don't know why you would need to see the indices but rewrite the
function bootprop as

bootprop_ind <- function(data, index){
   d <- data[index, ]
   #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
   index
}


and call in the same way. It will now return a matrix of indices with R
= 1000 rows and 19 columns.

Hope this helps,

Rui Barradas


Às 19:29 de 28/01/21, Marna Wagley escreveu:

> Hi Rui,
> I am sorry for asking you several questions.
>
> In the given example, randomizations (reshuffle) were done 1000 times,
> and its 1000 proportion values (results) are stored and it can be seen
> using b$t; but I was wondering how the table was randomized (which rows
> have been missed/or repeated in each randomizing procedure?).
>
> Is there any way we can see the randomized table and its associated
> results? Here in this example, I randomized (or bootstrapped) the table
> into three times (R=3) so I would like to store these three tables and
> look at them later to know which rows were repeated/missed. Is there any
> possibility?
> The example data and the code is given below.
>
> Thank you for your help.
>
> ####
> library(boot)
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
> "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
>    mutate(Time1.but.not.in.Time2 = case_when(
>              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> Time2.but.not.in.Time1 = case_when(
>              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
>   BothTimes = case_when(
>              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
>
> bootprop <- function(data, index){
>     d <- data[index, ]
>     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> }
>
> R <- 3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0     # original
> b$t
> sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> hist(b$t, freq = FALSE)
>
> str(b)
> b$data
> b$seed
> b$sim
> b$strata
> ################
>
>
> On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Yes Rui, I can see we don't need to divide by square root of sample
>     size. The example is great to understand it.
>     Thank you.
>     Marna
>
>
>     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hello,
>
>         Inline.
>
>         Às 07:47 de 23/01/21, Marna Wagley escreveu:
>          > Dear Rui,
>          > I was wondering whether we have to square root of SD to find
>         SE, right?
>
>         No, we don't. var already divides by n, don't divide again.
>         This is the code, that can be seen by running the function name
>         at a
>         command line.
>
>
>         sd
>         #function (x, na.rm = FALSE)
>         #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
>         #    na.rm = na.rm))
>         #<bytecode: 0x55f3ce900848>
>         #<environment: namespace:stats>
>
>
>
>          >
>          > bootprop <- function(data, index){
>          >     d <- data[index, ]
>          >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
>         na.rm = TRUE)
>          > }
>          >
>          > R <- 1e3
>          > set.seed(2020)
>          > b <- boot(daT, bootprop, R)
>          > b
>          > b$t0     # original
>          > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>          > sd(b$t)/sqrt(1000)
>          > pandit*(1-pandit)
>          >
>          > hist(b$t, freq = FALSE)
>
>
>         Try plotting the normal densities for both cases, the red line is
>         clearly wrong.
>
>
>         f <- function(x, xbar, s){
>             dnorm(x, mean = xbar, sd = s)
>         }
>
>         hist(b$t, freq = FALSE)
>         curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
>         "blue",
>         add = TRUE)
>         curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
>         col =
>         "red", add = TRUE)
>
>
>         Hope this helps,
>
>         Rui Barradas
>
>          >
>          >
>          >
>          >
>          > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
>         <[hidden email] <mailto:[hidden email]>
>          > <mailto:[hidden email] <mailto:[hidden email]>>>
>         wrote:
>          >
>          >     Hello,
>          >
>          >     Something like this, using base package boot?
>          >
>          >
>          >     library(boot)
>          >
>          >     bootprop <- function(data, index){
>          >         d <- data[index, ]
>          >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
>         na.rm = TRUE)
>          >     }
>          >
>          >     R <- 1e3
>          >     set.seed(2020)
>          >     b <- boot(daT, bootprop, R)
>          >     b
>          >     b$t0     # original
>          >     sd(b$t)  # bootstrapped estimate of the SE of the sample
>         prop.
>          >     hist(b$t, freq = FALSE)
>          >
>          >
>          >     Hope this helps,
>          >
>          >     Rui Barradas
>          >
>          >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
>          >      > Hi All,
>          >      > I was trying to estimate standard error (SE) for the
>         proportion
>          >     value using
>          >      > some kind of randomization process (bootstrapping or
>         jackknifing)
>          >     in R, but
>          >      > I could not figure it out.
>          >      >
>          >      > Is there any way to generate SE for the proportion?
>          >      >
>          >      > The example of the data and the code I am using is
>         attached for your
>          >      > reference. I would like to generate the value of
>         proportion with
>          >     a SE using
>          >      > a 1000 times randomization.
>          >      >
>          >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
>         14L, 15L, 16L,
>          >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
>         11L), .Label
>          >     = c("id1",
>          >      > "id10", "id11", "id12", "id13", "id14", "id15",
>         "id16", "id17",
>          >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
>         "id7", "id8",
>          >      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
>         0L, 0L,
>          >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
>         Time2 = c(1L,
>          >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
>         0L, 1L, 0L,
>          >      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
>          >     "data.frame",
>          >      > row.names = c(NA,
>          >      > -19L))
>          >      > daT<-data.frame(dat %>%
>          >      >    mutate(Time1.but.not.in.Time2 = case_when(
>          >      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
>          >      > Time2.but.not.in.Time1 = case_when(
>          >      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
>          >      >   BothTimes = case_when(
>          >      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
>          >      >   daT
>          >      >   summary(daT)
>          >      >
>          >      > cols.num <-
>         c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>          >      > "BothTimes")
>          >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>          >      > summary(daT)
>          >      > ProportionValue<-sum(daT$BothTimes,
>         na.rm=T)/sum(daT$Time1, na.rm=T)
>          >      > ProportionValue
>          >      > standard error??
>          >      >
>          >      >       [[alternative HTML version deleted]]
>          >      >
>          >      > ______________________________________________
>          >      > [hidden email] <mailto:[hidden email]>
>         <mailto:[hidden email] <mailto:[hidden email]>>
>         mailing list
>          >     -- To UNSUBSCRIBE and more, see
>          >      > https://stat.ethz.ch/mailman/listinfo/r-help
>          >      > PLEASE do read the posting guide
>          > http://www.R-project.org/posting-guide.html
>          >      > and provide commented, minimal, self-contained,
>         reproducible code.
>          >      >
>          >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Marna Wagley
Thank you Rui,
This is great. How about the following?

SimilatedData<-boot.array(b, indices=T)

seems it is giving the rows ID which are used in the calculation, isn't it?




On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <[hidden email]> wrote:

> Hello,
>
> I don't know why you would need to see the indices but rewrite the
> function bootprop as
>
> bootprop_ind <- function(data, index){
>    d <- data[index, ]
>    #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>    index
> }
>
>
> and call in the same way. It will now return a matrix of indices with R
> = 1000 rows and 19 columns.
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 19:29 de 28/01/21, Marna Wagley escreveu:
> > Hi Rui,
> > I am sorry for asking you several questions.
> >
> > In the given example, randomizations (reshuffle) were done 1000 times,
> > and its 1000 proportion values (results) are stored and it can be seen
> > using b$t; but I was wondering how the table was randomized (which rows
> > have been missed/or repeated in each randomizing procedure?).
> >
> > Is there any way we can see the randomized table and its associated
> > results? Here in this example, I randomized (or bootstrapped) the table
> > into three times (R=3) so I would like to store these three tables and
> > look at them later to know which rows were repeated/missed. Is there any
> > possibility?
> > The example data and the code is given below.
> >
> > Thank you for your help.
> >
> > ####
> > library(boot)
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
> c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> >    mutate(Time1.but.not.in.Time2 = case_when(
> >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >   BothTimes = case_when(
> >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> >
> > bootprop <- function(data, index){
> >     d <- data[index, ]
> >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
> > }
> >
> > R <- 3
> > set.seed(2020)
> > b <- boot(daT, bootprop, R)
> > b
> > b$t0     # original
> > b$t
> > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> > hist(b$t, freq = FALSE)
> >
> > str(b)
> > b$data
> > b$seed
> > b$sim
> > b$strata
> > ################
> >
> >
> > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <[hidden email]
> > <mailto:[hidden email]>> wrote:
> >
> >     Yes Rui, I can see we don't need to divide by square root of sample
> >     size. The example is great to understand it.
> >     Thank you.
> >     Marna
> >
> >
> >     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <[hidden email]
> >     <mailto:[hidden email]>> wrote:
> >
> >         Hello,
> >
> >         Inline.
> >
> >         Às 07:47 de 23/01/21, Marna Wagley escreveu:
> >          > Dear Rui,
> >          > I was wondering whether we have to square root of SD to find
> >         SE, right?
> >
> >         No, we don't. var already divides by n, don't divide again.
> >         This is the code, that can be seen by running the function name
> >         at a
> >         command line.
> >
> >
> >         sd
> >         #function (x, na.rm = FALSE)
> >         #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
> >         #    na.rm = na.rm))
> >         #<bytecode: 0x55f3ce900848>
> >         #<environment: namespace:stats>
> >
> >
> >
> >          >
> >          > bootprop <- function(data, index){
> >          >     d <- data[index, ]
> >          >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
> >         na.rm = TRUE)
> >          > }
> >          >
> >          > R <- 1e3
> >          > set.seed(2020)
> >          > b <- boot(daT, bootprop, R)
> >          > b
> >          > b$t0     # original
> >          > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
> >          > sd(b$t)/sqrt(1000)
> >          > pandit*(1-pandit)
> >          >
> >          > hist(b$t, freq = FALSE)
> >
> >
> >         Try plotting the normal densities for both cases, the red line is
> >         clearly wrong.
> >
> >
> >         f <- function(x, xbar, s){
> >             dnorm(x, mean = xbar, sd = s)
> >         }
> >
> >         hist(b$t, freq = FALSE)
> >         curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
> >         "blue",
> >         add = TRUE)
> >         curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
> >         col =
> >         "red", add = TRUE)
> >
> >
> >         Hope this helps,
> >
> >         Rui Barradas
> >
> >          >
> >          >
> >          >
> >          >
> >          > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
> >         <[hidden email] <mailto:[hidden email]>
> >          > <mailto:[hidden email] <mailto:[hidden email]>>>
> >         wrote:
> >          >
> >          >     Hello,
> >          >
> >          >     Something like this, using base package boot?
> >          >
> >          >
> >          >     library(boot)
> >          >
> >          >     bootprop <- function(data, index){
> >          >         d <- data[index, ]
> >          >         sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
> >         na.rm = TRUE)
> >          >     }
> >          >
> >          >     R <- 1e3
> >          >     set.seed(2020)
> >          >     b <- boot(daT, bootprop, R)
> >          >     b
> >          >     b$t0     # original
> >          >     sd(b$t)  # bootstrapped estimate of the SE of the sample
> >         prop.
> >          >     hist(b$t, freq = FALSE)
> >          >
> >          >
> >          >     Hope this helps,
> >          >
> >          >     Rui Barradas
> >          >
> >          >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
> >          >      > Hi All,
> >          >      > I was trying to estimate standard error (SE) for the
> >         proportion
> >          >     value using
> >          >      > some kind of randomization process (bootstrapping or
> >         jackknifing)
> >          >     in R, but
> >          >      > I could not figure it out.
> >          >      >
> >          >      > Is there any way to generate SE for the proportion?
> >          >      >
> >          >      > The example of the data and the code I am using is
> >         attached for your
> >          >      > reference. I would like to generate the value of
> >         proportion with
> >          >     a SE using
> >          >      > a 1000 times randomization.
> >          >      >
> >          >      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
> >         14L, 15L, 16L,
> >          >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
> >         11L), .Label
> >          >     = c("id1",
> >          >      > "id10", "id11", "id12", "id13", "id14", "id15",
> >         "id16", "id17",
> >          >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
> >         "id7", "id8",
> >          >      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
> >         0L, 0L,
> >          >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
> >         Time2 = c(1L,
> >          >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
> >         0L, 1L, 0L,
> >          >      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"),
> class =
> >          >     "data.frame",
> >          >      > row.names = c(NA,
> >          >      > -19L))
> >          >      > daT<-data.frame(dat %>%
> >          >      >    mutate(Time1.but.not.in.Time2 = case_when(
> >          >      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
> >          >      > Time2.but.not.in.Time1 = case_when(
> >          >      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
> >          >      >   BothTimes = case_when(
> >          >      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
> >          >      >   daT
> >          >      >   summary(daT)
> >          >      >
> >          >      > cols.num <-
> >         c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> >          >      > "BothTimes")
> >          >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> >          >      > summary(daT)
> >          >      > ProportionValue<-sum(daT$BothTimes,
> >         na.rm=T)/sum(daT$Time1, na.rm=T)
> >          >      > ProportionValue
> >          >      > standard error??
> >          >      >
> >          >      >       [[alternative HTML version deleted]]
> >          >      >
> >          >      > ______________________________________________
> >          >      > [hidden email] <mailto:[hidden email]>
> >         <mailto:[hidden email] <mailto:[hidden email]>>
> >         mailing list
> >          >     -- To UNSUBSCRIBE and more, see
> >          >      > https://stat.ethz.ch/mailman/listinfo/r-help
> >          >      > PLEASE do read the posting guide
> >          > http://www.R-project.org/posting-guide.html
> >          >      > and provide commented, minimal, self-contained,
> >         reproducible code.
> >          >      >
> >          >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to generate SE for the proportion value using a randomization process in R?

Rui Barradas
Hello,

Yes, sorry for my previous post, I had forgotten about boot.array.
That's a much better solution for your problem.

Rui Barradas

Às 20:29 de 28/01/21, Marna Wagley escreveu:

> Thank you Rui,
> This is great. How about the following?
>
> SimilatedData<-boot.array(b, indices=T)
>
> seems it is giving the rows ID which are used in the calculation, isn't it?
>
>
>
>
> On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hello,
>
>     I don't know why you would need to see the indices but rewrite the
>     function bootprop as
>
>     bootprop_ind <- function(data, index){
>         d <- data[index, ]
>         #sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
>         index
>     }
>
>
>     and call in the same way. It will now return a matrix of indices with R
>     = 1000 rows and 19 columns.
>
>     Hope this helps,
>
>     Rui Barradas
>
>
>     Às 19:29 de 28/01/21, Marna Wagley escreveu:
>      > Hi Rui,
>      > I am sorry for asking you several questions.
>      >
>      > In the given example, randomizations (reshuffle) were done 1000
>     times,
>      > and its 1000 proportion values (results) are stored and it can be
>     seen
>      > using b$t; but I was wondering how the table was randomized
>     (which rows
>      > have been missed/or repeated in each randomizing procedure?).
>      >
>      > Is there any way we can see the randomized table and its associated
>      > results? Here in this example, I randomized (or bootstrapped) the
>     table
>      > into three times (R=3) so I would like to store these three
>     tables and
>      > look at them later to know which rows were repeated/missed. Is
>     there any
>      > possibility?
>      > The example data and the code is given below.
>      >
>      > Thank you for your help.
>      >
>      > ####
>      > library(boot)
>      > dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
>      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
>     = c("id1",
>      > "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
>      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
>      > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
>      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
>      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
>      > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
>     "data.frame",
>      > row.names = c(NA,
>      > -19L))
>      > daT<-data.frame(dat %>%
>      >    mutate(Time1.but.not.in.Time2 = case_when(
>      >              Time1 %in% "1" & Time2 %in% "0"  ~ "1"),
>      > Time2.but.not.in.Time1 = case_when(
>      >              Time1 %in% "0" & Time2 %in% "1"  ~ "1"),
>      >   BothTimes = case_when(
>      >              Time1 %in% "1" & Time2 %in% "1"  ~ "1")))
>      > cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>      > "BothTimes")
>      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>      > summary(daT)
>      >
>      > bootprop <- function(data, index){
>      >     d <- data[index, ]
>      >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
>     TRUE)
>      > }
>      >
>      > R <- 3
>      > set.seed(2020)
>      > b <- boot(daT, bootprop, R)
>      > b
>      > b$t0     # original
>      > b$t
>      > sd(b$t)  # bootstrapped estimate of the SE of the sample prop.
>      > hist(b$t, freq = FALSE)
>      >
>      > str(b)
>      > b$data
>      > b$seed
>      > b$sim
>      > b$strata
>      > ################
>      >
>      >
>      > On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley
>     <[hidden email] <mailto:[hidden email]>
>      > <mailto:[hidden email] <mailto:[hidden email]>>>
>     wrote:
>      >
>      >     Yes Rui, I can see we don't need to divide by square root of
>     sample
>      >     size. The example is great to understand it.
>      >     Thank you.
>      >     Marna
>      >
>      >
>      >     On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas
>     <[hidden email] <mailto:[hidden email]>
>      >     <mailto:[hidden email] <mailto:[hidden email]>>>
>     wrote:
>      >
>      >         Hello,
>      >
>      >         Inline.
>      >
>      >         Às 07:47 de 23/01/21, Marna Wagley escreveu:
>      >          > Dear Rui,
>      >          > I was wondering whether we have to square root of SD
>     to find
>      >         SE, right?
>      >
>      >         No, we don't. var already divides by n, don't divide again.
>      >         This is the code, that can be seen by running the
>     function name
>      >         at a
>      >         command line.
>      >
>      >
>      >         sd
>      >         #function (x, na.rm = FALSE)
>      >         #sqrt(var(if (is.vector(x) || is.factor(x)) x else
>     as.double(x),
>      >         #    na.rm = na.rm))
>      >         #<bytecode: 0x55f3ce900848>
>      >         #<environment: namespace:stats>
>      >
>      >
>      >
>      >          >
>      >          > bootprop <- function(data, index){
>      >          >     d <- data[index, ]
>      >          >     sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
>      >         na.rm = TRUE)
>      >          > }
>      >          >
>      >          > R <- 1e3
>      >          > set.seed(2020)
>      >          > b <- boot(daT, bootprop, R)
>      >          > b
>      >          > b$t0     # original
>      >          > sd(b$t)  # bootstrapped estimate of the SE of the
>     sample prop.
>      >          > sd(b$t)/sqrt(1000)
>      >          > pandit*(1-pandit)
>      >          >
>      >          > hist(b$t, freq = FALSE)
>      >
>      >
>      >         Try plotting the normal densities for both cases, the red
>     line is
>      >         clearly wrong.
>      >
>      >
>      >         f <- function(x, xbar, s){
>      >             dnorm(x, mean = xbar, sd = s)
>      >         }
>      >
>      >         hist(b$t, freq = FALSE)
>      >         curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
>      >         "blue",
>      >         add = TRUE)
>      >         curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0,
>     to = 1,
>      >         col =
>      >         "red", add = TRUE)
>      >
>      >
>      >         Hope this helps,
>      >
>      >         Rui Barradas
>      >
>      >          >
>      >          >
>      >          >
>      >          >
>      >          > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
>      >         <[hidden email] <mailto:[hidden email]>
>     <mailto:[hidden email] <mailto:[hidden email]>>
>      >          > <mailto:[hidden email]
>     <mailto:[hidden email]> <mailto:[hidden email]
>     <mailto:[hidden email]>>>>
>      >         wrote:
>      >          >
>      >          >     Hello,
>      >          >
>      >          >     Something like this, using base package boot?
>      >          >
>      >          >
>      >          >     library(boot)
>      >          >
>      >          >     bootprop <- function(data, index){
>      >          >         d <- data[index, ]
>      >          >         sum(d[["BothTimes"]], na.rm =
>     TRUE)/sum(d[["Time1"]],
>      >         na.rm = TRUE)
>      >          >     }
>      >          >
>      >          >     R <- 1e3
>      >          >     set.seed(2020)
>      >          >     b <- boot(daT, bootprop, R)
>      >          >     b
>      >          >     b$t0     # original
>      >          >     sd(b$t)  # bootstrapped estimate of the SE of the
>     sample
>      >         prop.
>      >          >     hist(b$t, freq = FALSE)
>      >          >
>      >          >
>      >          >     Hope this helps,
>      >          >
>      >          >     Rui Barradas
>      >          >
>      >          >     Às 21:57 de 22/01/21, Marna Wagley escreveu:
>      >          >      > Hi All,
>      >          >      > I was trying to estimate standard error (SE)
>     for the
>      >         proportion
>      >          >     value using
>      >          >      > some kind of randomization process
>     (bootstrapping or
>      >         jackknifing)
>      >          >     in R, but
>      >          >      > I could not figure it out.
>      >          >      >
>      >          >      > Is there any way to generate SE for the proportion?
>      >          >      >
>      >          >      > The example of the data and the code I am using is
>      >         attached for your
>      >          >      > reference. I would like to generate the value of
>      >         proportion with
>      >          >     a SE using
>      >          >      > a 1000 times randomization.
>      >          >      >
>      >          >      > dat<-structure(list(Sample = structure(c(1L,
>     12L, 13L,
>      >         14L, 15L, 16L,
>      >          >      > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
>      >         11L), .Label
>      >          >     = c("id1",
>      >          >      > "id10", "id11", "id12", "id13", "id14", "id15",
>      >         "id16", "id17",
>      >          >      > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
>      >         "id7", "id8",
>      >          >      > "id9"), class = "factor"), Time1 = c(0L, 1L,
>     1L, 1L,
>      >         0L, 0L,
>      >          >      > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
>     0L),
>      >         Time2 = c(1L,
>      >          >      > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
>      >         0L, 1L, 0L,
>      >          >      > 1L, 1L)), .Names = c("Sample", "Time1",
>     "Time2"), class =
>      >          >     "data.frame",
>      >          >      > row.names = c(NA,
>      >          >      > -19L))
>      >          >      > daT<-data.frame(dat %>%
>      >          >      >    mutate(Time1.but.not.in.Time2 = case_when(
>      >          >      >              Time1 %in% "1" & Time2 %in% "0"  ~
>     "1"),
>      >          >      > Time2.but.not.in.Time1 = case_when(
>      >          >      >              Time1 %in% "0" & Time2 %in% "1"  ~
>     "1"),
>      >          >      >   BothTimes = case_when(
>      >          >      >              Time1 %in% "1" & Time2 %in% "1"  ~
>     "1")))
>      >          >      >   daT
>      >          >      >   summary(daT)
>      >          >      >
>      >          >      > cols.num <-
>      >         c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>      >          >      > "BothTimes")
>      >          >      > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>      >          >      > summary(daT)
>      >          >      > ProportionValue<-sum(daT$BothTimes,
>      >         na.rm=T)/sum(daT$Time1, na.rm=T)
>      >          >      > ProportionValue
>      >          >      > standard error??
>      >          >      >
>      >          >      >       [[alternative HTML version deleted]]
>      >          >      >
>      >          >      > ______________________________________________
>      >          >      > [hidden email]
>     <mailto:[hidden email]> <mailto:[hidden email]
>     <mailto:[hidden email]>>
>      >         <mailto:[hidden email]
>     <mailto:[hidden email]> <mailto:[hidden email]
>     <mailto:[hidden email]>>>
>      >         mailing list
>      >          >     -- To UNSUBSCRIBE and more, see
>      >          >      > https://stat.ethz.ch/mailman/listinfo/r-help
>      >          >      > PLEASE do read the posting guide
>      >          > http://www.R-project.org/posting-guide.html
>      >          >      > and provide commented, minimal, self-contained,
>      >         reproducible code.
>      >          >      >
>      >          >
>      >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.