Error Running Bootstrap Function within Wrapper Function

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Error Running Bootstrap Function within Wrapper Function

Kevin Egan
Hello,

I am currently trying to solve a problem with the boot package and writing
a function within a function in R. I have developed several functions to
perform the lasso but continue to receive an error when bootstrapping these
functions within a wrapper function. When I perform these methods using
tsboot outside the wrapper function, I do not get an error. However, when
placed within my function I continue to get the error "Error in t[r, ] <-
res[[r]] : number of items to replace is not a multiple of length."

I've attached an example of my functions as well as a data file of the data
I am using. I'm sorry the file is so large, but I do not get a problem with
a smaller number of observations.


library(boot)
library(glmnet)
library(np)
foo1 <- function(data,index){ #index is the bootstrap sample index
  x <- data[index, -1] %>%
    as.matrix() %>%
    unname()
  y <- data[index, 1] %>%
    scale(center = TRUE, scale = FALSE) %>%
    as.matrix() %>%
    unname()
  ols <- lm(y ~ x)
  # The intercept estimate should be dropped.
  ols.coef <- as.numeric(coef(ols))[-1]
  ols.coef[is.na(ols.coef)] <- 0
  ## The intercept estimate should be dropped.
  lasso <- cv.glmnet(x, y, alpha = 1,
                     penalty.factor = 1 / abs(ols.coef))
  # Select nonzero coefficients from bic.out
  coef <- as.vector(coef(lasso,
                         s = lasso$lambda.min))[-1]
  return(coef)
}
foo2 <- function(data, index){ #index is the bootstrap sample index
  x <- data[index, -1] %>%
    as.matrix() %>%
    unname()
  y <- data[index, 1] %>%
    scale(center = TRUE, scale = FALSE) %>%
    as.matrix() %>%
    unname()
  # ic.glmnet provides coefficients with lowest BIC
  ols <- lm(y ~ x)
  # The intercept estimate should be dropped.
  ols.coef <- as.numeric(coef(ols))[-1]
  ols.coef[is.na(ols.coef)] <- 0
  lasso <- cv.glmnet(x, y, alpha = 1,
                     penalty.factor = 1 / abs(ols.coef))
  # Select nonzero coefficients from bic.out
  coef <- as.vector(coef(lasso,
                         s = lasso$lambda.min))[-1]
  coef_nonzero <- coef != 0
  if(sum(coef_nonzero) > 0) {
    ls_obj <- lm(y ~ x[, coef_nonzero, drop = FALSE])
    ls_coef <- as.vector(coef(ls_obj))[-1]
    coef[coef_nonzero] <- ls_coef
  }
  return(coef)
}
foo3 <- function(data, num_samples) {
  bstar <- b.star(data[, 1], round = TRUE)
  # Select Block Length of circular block result
  blocklength <- bstar[, 2]
  init_boot_ts <- tsboot(tseries = data,
                         statistic = foo1,
                         R = num_samples, l = blocklength,
                         sim = "fixed")
  final_boot_ts <- tsboot(tseries = data,
                          statistic = foo2,
                          R = num_samples,
                          l = blocklength, sim = "fixed")
  # point estimates
  final_boot_t0 <- final_boot_ts$t0
  return(list(point_estimates = final_boot_t0))
}
num_samples <- 50
test.foo3 <- foo3(data, num_samples = num_samples)

Which *sometimes *works, however, sometimes I get the error: “Error in t[r,
] <- res[[r]] : number of items to replace is not a multiple of length".
I've also gotten the error "Error in x[, coef_nonzero, drop = FALSE]
:(subscript) logical subscript too long" at times, when running foo2 within
foo3. Which seems to be unclear as there should never be an index which is
greater than the number of columns in x.
As I stated, this error particularly occurs when I increase the number of
bootstrap samples I try to run, and only when I run foo3. When I run foo1
and foo2 separately, I don’t get an error at all. I’m wondering if there is
something I need to add to my function “foo3” to ensure that an error like
this doesn’t occur since I am calling several other functions within this
function, ie., tsboot, foo1, and foo2.

Lastly, although I have not provided an example here, I get an error for
this code when running with lapply for several dataframes similar to the
attached.

Again, I apologise for attaching such a large data frame but the function
seems to work with fewer observations. Perhaps it is a data issue?

Thanks
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Error Running Bootstrap Function within Wrapper Function

Stephen Ellison
Kevin,

I didn't have the data set (you might want to post a link to a downloadable file instead), but on a quick look at the code your foo1 function looks as if it is not guaranteed to return an array of the same length each time. It's testing for nonzero coefs in a fitted model and then dropping exact zero coefs. That need not (and often will not) return the same coefficients every time in a simulation.  foo2 does the same general kind of thing.

Could that be part of the problem?

S Ellison


> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Kevin Egan
> Sent: 05 January 2021 12:26
> To: r-help <[hidden email]>
> Subject: [R] Error Running Bootstrap Function within Wrapper Function
>
> ===============
>  EXTERNAL EMAIL
> ===============
>
> Hello,
>
> I am currently trying to solve a problem with the boot package and writing a
> function within a function in R. I have developed several functions to
> perform the lasso but continue to receive an error when bootstrapping these
> functions within a wrapper function. When I perform these methods using
> tsboot outside the wrapper function, I do not get an error. However, when
> placed within my function I continue to get the error "Error in t[r, ] <- res[[r]]
> : number of items to replace is not a multiple of length."
>
> I've attached an example of my functions as well as a data file of the data I
> am using. I'm sorry the file is so large, but I do not get a problem with a
> smaller number of observations.
>
>
> library(boot)
> library(glmnet)
> library(np)
> foo1 <- function(data,index){ #index is the bootstrap sample index
>   x <- data[index, -1] %>%
>     as.matrix() %>%
>     unname()
>   y <- data[index, 1] %>%
>     scale(center = TRUE, scale = FALSE) %>%
>     as.matrix() %>%
>     unname()
>   ols <- lm(y ~ x)
>   # The intercept estimate should be dropped.
>   ols.coef <- as.numeric(coef(ols))[-1]
>   ols.coef[is.na(ols.coef)] <- 0
>   ## The intercept estimate should be dropped.
>   lasso <- cv.glmnet(x, y, alpha = 1,
>                      penalty.factor = 1 / abs(ols.coef))
>   # Select nonzero coefficients from bic.out
>   coef <- as.vector(coef(lasso,
>                          s = lasso$lambda.min))[-1]
>   return(coef)
> }
> foo2 <- function(data, index){ #index is the bootstrap sample index
>   x <- data[index, -1] %>%
>     as.matrix() %>%
>     unname()
>   y <- data[index, 1] %>%
>     scale(center = TRUE, scale = FALSE) %>%
>     as.matrix() %>%
>     unname()
>   # ic.glmnet provides coefficients with lowest BIC
>   ols <- lm(y ~ x)
>   # The intercept estimate should be dropped.
>   ols.coef <- as.numeric(coef(ols))[-1]
>   ols.coef[is.na(ols.coef)] <- 0
>   lasso <- cv.glmnet(x, y, alpha = 1,
>                      penalty.factor = 1 / abs(ols.coef))
>   # Select nonzero coefficients from bic.out
>   coef <- as.vector(coef(lasso,
>                          s = lasso$lambda.min))[-1]
>   coef_nonzero <- coef != 0
>   if(sum(coef_nonzero) > 0) {
>     ls_obj <- lm(y ~ x[, coef_nonzero, drop = FALSE])
>     ls_coef <- as.vector(coef(ls_obj))[-1]
>     coef[coef_nonzero] <- ls_coef
>   }
>   return(coef)
> }
> foo3 <- function(data, num_samples) {
>   bstar <- b.star(data[, 1], round = TRUE)
>   # Select Block Length of circular block result
>   blocklength <- bstar[, 2]
>   init_boot_ts <- tsboot(tseries = data,
>                          statistic = foo1,
>                          R = num_samples, l = blocklength,
>                          sim = "fixed")
>   final_boot_ts <- tsboot(tseries = data,
>                           statistic = foo2,
>                           R = num_samples,
>                           l = blocklength, sim = "fixed")
>   # point estimates
>   final_boot_t0 <- final_boot_ts$t0
>   return(list(point_estimates = final_boot_t0)) } num_samples <- 50
> test.foo3 <- foo3(data, num_samples = num_samples)
>
> Which *sometimes *works, however, sometimes I get the error: “Error in t[r,
> ] <- res[[r]] : number of items to replace is not a multiple of length".
> I've also gotten the error "Error in x[, coef_nonzero, drop = FALSE]
> :(subscript) logical subscript too long" at times, when running foo2 within
> foo3. Which seems to be unclear as there should never be an index which is
> greater than the number of columns in x.
> As I stated, this error particularly occurs when I increase the number of
> bootstrap samples I try to run, and only when I run foo3. When I run foo1
> and foo2 separately, I don’t get an error at all. I’m wondering if there is
> something I need to add to my function “foo3” to ensure that an error like
> this doesn’t occur since I am calling several other functions within this
> function, ie., tsboot, foo1, and foo2.
>
> Lastly, although I have not provided an example here, I get an error for this
> code when running with lapply for several dataframes similar to the
> attached.
>
> Again, I apologise for attaching such a large data frame but the function
> seems to work with fewer observations. Perhaps it is a data issue?
>
> Thanks
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ==========================================================
> ====================================
> WARNING - EXTERNAL: This email originated from outside of LGC. Do not click
> any links or open any attachments unless you trust the sender and know that
> the content is safe
> ==========================================================
> ====================================


*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If
you have received this message in error, please notify the sender
immediately via +44(0)20 8943 7000 or notify [hidden email]
and delete this message and any copies from your computer and network.
LGC Limited. Registered in England 2991879.
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Error Running Bootstrap Function within Wrapper Function

Kevin Egan
Hi Stephen,

Thanks for the advice, from my understanding, the function foo1 should
provide a vector of coefficients which are the same length as the columns
in the data frame provided. This is based on the results from glmnet. With
foo2, this should be the same case, I've provided an if statement that
performs OLS if the sum of coef_nonzero is greater than 0. Meaning there
should be at least one independent variable from the original data set to
perform OLS since it's coefficient from glmnet was nonzero.

I previously have not had any issues with this method for other tests I
have run, it seems to only occur as this data increases in observations.
Would it be possible for you to look at the attached file? I provide a new
foo1 function that generates the data I am using, the same 3 functions I
sent previously, and the function I am using for glmnet. I'm still not sure
why, but as the number of bootstrap samples increases, I get an error with
my methods, particularly with foo3.

At this point, I think the problem could be with the wrapper function or
with the data itself.

I appreciate any help I can get at this point. I've been trying to debug
this function for almost a month.

Thanks,

Kevin


https://www.dropbox.com/s/1sdidrnmzrqmukt/r-help%20example.R?dl=0

On Wed, Jan 6, 2021 at 12:48 PM Stephen Ellison <[hidden email]>
wrote:

> Kevin,
>
> I didn't have the data set (you might want to post a link to a
> downloadable file instead), but on a quick look at the code your foo1
> function looks as if it is not guaranteed to return an array of the same
> length each time. It's testing for nonzero coefs in a fitted model and then
> dropping exact zero coefs. That need not (and often will not) return the
> same coefficients every time in a simulation.  foo2 does the same general
> kind of thing.
>
> Could that be part of the problem?
>
> S Ellison
>
>
> > -----Original Message-----
> > From: R-help <[hidden email]> On Behalf Of Kevin Egan
> > Sent: 05 January 2021 12:26
> > To: r-help <[hidden email]>
> > Subject: [R] Error Running Bootstrap Function within Wrapper Function
> >
> > ===============
> >  EXTERNAL EMAIL
> > ===============
> >
> > Hello,
> >
> > I am currently trying to solve a problem with the boot package and
> writing a
> > function within a function in R. I have developed several functions to
> > perform the lasso but continue to receive an error when bootstrapping
> these
> > functions within a wrapper function. When I perform these methods using
> > tsboot outside the wrapper function, I do not get an error. However, when
> > placed within my function I continue to get the error "Error in t[r, ]
> <- res[[r]]
> > : number of items to replace is not a multiple of length."
> >
> > I've attached an example of my functions as well as a data file of the
> data I
> > am using. I'm sorry the file is so large, but I do not get a problem
> with a
> > smaller number of observations.
> >
> >
> > library(boot)
> > library(glmnet)
> > library(np)
> > foo1 <- function(data,index){ #index is the bootstrap sample index
> >   x <- data[index, -1] %>%
> >     as.matrix() %>%
> >     unname()
> >   y <- data[index, 1] %>%
> >     scale(center = TRUE, scale = FALSE) %>%
> >     as.matrix() %>%
> >     unname()
> >   ols <- lm(y ~ x)
> >   # The intercept estimate should be dropped.
> >   ols.coef <- as.numeric(coef(ols))[-1]
> >   ols.coef[is.na(ols.coef)] <- 0
> >   ## The intercept estimate should be dropped.
> >   lasso <- cv.glmnet(x, y, alpha = 1,
> >                      penalty.factor = 1 / abs(ols.coef))
> >   # Select nonzero coefficients from bic.out
> >   coef <- as.vector(coef(lasso,
> >                          s = lasso$lambda.min))[-1]
> >   return(coef)
> > }
> > foo2 <- function(data, index){ #index is the bootstrap sample index
> >   x <- data[index, -1] %>%
> >     as.matrix() %>%
> >     unname()
> >   y <- data[index, 1] %>%
> >     scale(center = TRUE, scale = FALSE) %>%
> >     as.matrix() %>%
> >     unname()
> >   # ic.glmnet provides coefficients with lowest BIC
> >   ols <- lm(y ~ x)
> >   # The intercept estimate should be dropped.
> >   ols.coef <- as.numeric(coef(ols))[-1]
> >   ols.coef[is.na(ols.coef)] <- 0
> >   lasso <- cv.glmnet(x, y, alpha = 1,
> >                      penalty.factor = 1 / abs(ols.coef))
> >   # Select nonzero coefficients from bic.out
> >   coef <- as.vector(coef(lasso,
> >                          s = lasso$lambda.min))[-1]
> >   coef_nonzero <- coef != 0
> >   if(sum(coef_nonzero) > 0) {
> >     ls_obj <- lm(y ~ x[, coef_nonzero, drop = FALSE])
> >     ls_coef <- as.vector(coef(ls_obj))[-1]
> >     coef[coef_nonzero] <- ls_coef
> >   }
> >   return(coef)
> > }
> > foo3 <- function(data, num_samples) {
> >   bstar <- b.star(data[, 1], round = TRUE)
> >   # Select Block Length of circular block result
> >   blocklength <- bstar[, 2]
> >   init_boot_ts <- tsboot(tseries = data,
> >                          statistic = foo1,
> >                          R = num_samples, l = blocklength,
> >                          sim = "fixed")
> >   final_boot_ts <- tsboot(tseries = data,
> >                           statistic = foo2,
> >                           R = num_samples,
> >                           l = blocklength, sim = "fixed")
> >   # point estimates
> >   final_boot_t0 <- final_boot_ts$t0
> >   return(list(point_estimates = final_boot_t0)) } num_samples <- 50
> > test.foo3 <- foo3(data, num_samples = num_samples)
> >
> > Which *sometimes *works, however, sometimes I get the error: “Error in
> t[r,
> > ] <- res[[r]] : number of items to replace is not a multiple of length".
> > I've also gotten the error "Error in x[, coef_nonzero, drop = FALSE]
> > :(subscript) logical subscript too long" at times, when running foo2
> within
> > foo3. Which seems to be unclear as there should never be an index which
> is
> > greater than the number of columns in x.
> > As I stated, this error particularly occurs when I increase the number of
> > bootstrap samples I try to run, and only when I run foo3. When I run foo1
> > and foo2 separately, I don’t get an error at all. I’m wondering if there
> is
> > something I need to add to my function “foo3” to ensure that an error
> like
> > this doesn’t occur since I am calling several other functions within this
> > function, ie., tsboot, foo1, and foo2.
> >
> > Lastly, although I have not provided an example here, I get an error for
> this
> > code when running with lapply for several dataframes similar to the
> > attached.
> >
> > Again, I apologise for attaching such a large data frame but the function
> > seems to work with fewer observations. Perhaps it is a data issue?
> >
> > Thanks
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > ==========================================================
> > ====================================
> > WARNING - EXTERNAL: This email originated from outside of LGC. Do not
> click
> > any links or open any attachments unless you trust the sender and know
> that
> > the content is safe
> > ==========================================================
> > ====================================
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:12}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.