# Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?

7 messages
Open this post in threaded view
|

## Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?

 Hello, I'm trying to understand how to use the pbo package by looking at a vignette. I'm curious about a part of the vignette that creates simulated returns data. The package author transforms his simulated returns in a way that I'm unfamiliar with, and that I haven't been able to find an explanation for after searching around. I'm curious if I need to replicate the transformation with real returns. For context, here is the vignette (cleaned up a bit to make it reproducible): (Full vignette: https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html) library(pbo) #First, we assemble the trials into an NxT matrix where each column #represents a trial and each trial has the same length T. This example #is random data so the backtest should be overfit.` set.seed(765) n <- 100 t <- 2400 m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,                        dimnames=list(1:t,1:n)), check.names=FALSE) sr_base <- 0 mu_base <- sr_base/(252.0) sigma_base <- 1.00/(252.0)**0.5 for ( i in 1:n ) {   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center} #We can use any performance evaluation function that can work with the #reassembled sub-matrices during the cross validation iterations. #Following the original paper we can use the Sharpe ratio as sharpe <- function(x,rf=0.03/252) {   sr <- apply(x,2,function(col) {     er = col - rf     return(mean(er)/sd(er))   })   return(sr)} #Now that we have the trials matrix we can pass it to the pbo function  #for analysis. my_pbo <- pbo(m,s=8,f=sharpe,threshold=0) summary(my_pbo) Here's the portion i'm curious about: sr_base <- 0 mu_base <- sr_base/(252.0) sigma_base <- 1.00/(252.0)**0.5 for ( i in 1:n ) {   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center} Why is the data transformed within the for loop, and does this kind of re-scaling and re-centering need to be done with real returns? Or is this just something the author is doing to make his simulated returns look more like the real thing? Googling around turned up some articles regarding scaling volatility to the square root of time, but the scaling in the code here doesn't look quite like what I've seen. Re-scalings I've seen involve multiplying some short term (i.e. daily) measure of volatility by the root of time, but this isn't quite that. Also, the documentation for the package doesn't include this chunk of re-scaling and re-centering code. Documentation: https://cran.r-project.org/web/packages/pbo/pbo.pdf So:    -    Why is the data transformed in this way/what is result of this    transformation?    -    Is it only necessary for this simulated data, or do I need to    similarly transform real returns? I read in the posting guide that stats questions are acceptable given certain conditions, I hope this counts. Thanks for reading, -Joe Virus-free. www.avg.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?

 Wrong list. Post on r-sig-finance instead. Cheers, Bert On Nov 20, 2017 11:25 PM, "Joe O" <[hidden email]> wrote: Hello, I'm trying to understand how to use the pbo package by looking at a vignette. I'm curious about a part of the vignette that creates simulated returns data. The package author transforms his simulated returns in a way that I'm unfamiliar with, and that I haven't been able to find an explanation for after searching around. I'm curious if I need to replicate the transformation with real returns. For context, here is the vignette (cleaned up a bit to make it reproducible): (Full vignette: https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html) library(pbo) #First, we assemble the trials into an NxT matrix where each column #represents a trial and each trial has the same length T. This example #is random data so the backtest should be overfit.` set.seed(765) n <- 100 t <- 2400 m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,                        dimnames=list(1:t,1:n)), check.names=FALSE) sr_base <- 0 mu_base <- sr_base/(252.0) sigma_base <- 1.00/(252.0)**0.5 for ( i in 1:n ) {   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center} #We can use any performance evaluation function that can work with the #reassembled sub-matrices during the cross validation iterations. #Following the original paper we can use the Sharpe ratio as sharpe <- function(x,rf=0.03/252) {   sr <- apply(x,2,function(col) {     er = col - rf     return(mean(er)/sd(er))   })   return(sr)} #Now that we have the trials matrix we can pass it to the pbo function  #for analysis. my_pbo <- pbo(m,s=8,f=sharpe,threshold=0) summary(my_pbo) Here's the portion i'm curious about: sr_base <- 0 mu_base <- sr_base/(252.0) sigma_base <- 1.00/(252.0)**0.5 for ( i in 1:n ) {   m[,i] = m[,i] * sigma_base / sd(m[,i]) # re-scale   m[,i] = m[,i] + mu_base - mean(m[,i]) # re-center} Why is the data transformed within the for loop, and does this kind of re-scaling and re-centering need to be done with real returns? Or is this just something the author is doing to make his simulated returns look more like the real thing? Googling around turned up some articles regarding scaling volatility to the square root of time, but the scaling in the code here doesn't look quite like what I've seen. Re-scalings I've seen involve multiplying some short term (i.e. daily) measure of volatility by the root of time, but this isn't quite that. Also, the documentation for the package doesn't include this chunk of re-scaling and re-centering code. Documentation: https://cran.r-project.org/web/packages/pbo/pbo.pdf So:    -    Why is the data transformed in this way/what is result of this    transformation?    -    Is it only necessary for this simulated data, or do I need to    similarly transform real returns? I read in the posting guide that stats questions are acceptable given certain conditions, I hope this counts. Thanks for reading, -Joe Virus-free. www.avg.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?

Open this post in threaded view
|

## Re: Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?

Open this post in threaded view
|