I want to use the grplasso package on a data set where I want to fit a linear
model. My interest is in identifying significant beta coefficients. The documentation is a bit cryptic so I'd appreciate some help. I know this is a strategy for large numbers of variables but consider a simple case for pedagogical puposes. Say I have two 3 category predictors (2 dummies each), a binary predictor and a continuous predictor with a continuous outcome: y x1 x2 x3 x4 x5 x6 rows of data here ...... ...... Naturally, I want to select x1 and x2 as a group and x3 and x4 as another group. The documentation has a couple of examples but it's not clear how they translate to the current problem. How do I specify my groups and run the lasso regression? Looks like this is the grouping part: index<-c(NA,) but I'm not sure how to specify the df for the variables past the NA for the intercept. Once that's defined the penalty can be specified: lambda <- lambdamax(x, y = y, index = index, penscale = sqrt, model = LogReg()) * 0.5^(0:5) In my case I'd use LinReg for the model. Then the model: fit <- grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(), penscale = sqrt, control = grpl.control(update.hess = "lambda", trace = 0)) again using LinReg for the model. This can be plotted against lambda, but when I do lasso regression in other software I end up with a plot of the coefficients against the tuning parameter with a cutpoint or a table and graph that tells me what to include in the model based on some selected criterion. It's not clear from the example if there's a cross-validation or some other procedure to determine what variables to include. Plot(fit) produces a graph of coefficients against lambda but nothig to indicate what to include. What is used in the package, if anything, to make that determination? ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
So does anyone use this package?
----- Original Message ----- From: Scott Raynaud <[hidden email]> To: "[hidden email]" <[hidden email]> Cc: Sent: Tuesday, January 10, 2012 1:40 PM Subject: grplasso I want to use the grplasso package on a data set where I want to fit a linear model. My interest is in identifying significant beta coefficients. The documentation is a bit cryptic so I'd appreciate some help. I know this is a strategy for large numbers of variables but consider a simple case for pedagogical puposes. Say I have two 3 category predictors (2 dummies each), a binary predictor and a continuous predictor with a continuous outcome: y x1 x2 x3 x4 x5 x6 rows of data here ...... ...... Naturally, I want to select x1 and x2 as a group and x3 and x4 as another group. The documentation has a couple of examples but it's not clear how they translate to the current problem. How do I specify my groups and run the lasso regression? Looks like this is the grouping part: index<-c(NA,) but I'm not sure how to specify the df for the variables past the NA for the intercept. Once that's defined the penalty can be specified: lambda <- lambdamax(x, y = y, index = index, penscale = sqrt, model = LogReg()) * 0.5^(0:5) In my case I'd use LinReg for the model. Then the model: fit <- grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(), penscale = sqrt, control = grpl.control(update.hess = "lambda", trace = 0)) again using LinReg for the model. This can be plotted against lambda, but when I do lasso regression in other software I end up with a plot of the coefficients against the tuning parameter with a cutpoint or a table and graph that tells me what to include in the model based on some selected criterion. It's not clear from the example if there's a cross-validation or some other procedure to determine what variables to include. Plot(fit) produces a graph of coefficients against lambda but nothig to indicate what to include. What is used in the package, if anything, to make that determination? ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |