# Vectorizing a for-loop for cross-validation in R

4 messages
Open this post in threaded view
|

## Vectorizing a for-loop for cross-validation in R

 I'm trying to speed up a script that otherwise takes days to handle larger data sets. So, is there a way to completely vectorize or paralellize the following script:                 *# k-fold cross validation* df <- trees # a data frame 'trees' from R. df <- df[sample(nrow(df)), ] # randomly shuffles the data. k <- 10 # Number of folds. Note k=nrow(df) in the leave-one-out cross validation. folds <- cut(seq(from=1, to=nrow(df)), breaks=k, labels=FALSE) # creates unique numbers for k equally size folds. df\$ID <- folds # adds fold IDs. df[paste("pred", 1:3, sep="")] <- NA # adds multiple columns "pred1" "pred2" "pred3" to speed up the following loop. library(mgcv) for(i in 1:k) {   # looping for different models:   m1 <- gam(Volume ~ s(Height), data=df, subset=(ID != i))   m2 <- gam(Volume ~ s(Girth), data=df, subset=(ID != i))   m3 <- gam(Volume ~ s(Girth) + s(Height), data=df, subset=(ID != i))   # looping for predictions:   df[df\$ID==i, "pred1"] <- predict(m1, df[df\$ID==i, ], type="response")   df[df\$ID==i, "pred2"] <- predict(m2, df[df\$ID==i, ], type="response")   df[df\$ID==i, "pred3"] <- predict(m3, df[df\$ID==i, ], type="response") } # calculating residuals: df\$res1 <- with(df, Volume - pred1) df\$res2 <- with(df, Volume - pred2) df\$res3 <- with(df, Volume - pred3) Model <- paste("m", 1:3, sep="") # creates a vector of model names. # creating a vector of mean-square errors (MSE): MSE <- with(df, c(   sum(res1^2) / nrow(df),   sum(res2^2) / nrow(df),   sum(res3^2) / nrow(df) )) model.mse <- data.frame(Model, MSE) # creates a data frame of model names and mean-square errors. model.mse <- model.mse[order(model.mse\$MSE), ] # rearranges the previous data frame in order of increasing mean-square errors. I'd appreciate any help. This code takes several days if run on >=30,000 different GAM models and 3 predictors. Could you please help with re-writing the script into sapply() or foreach()/doParallel format? Thanks Lexo         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|