|
In the package lasso2, there is a Prostate Data. To find coefficients in the prostate cancer example we could impose L1 constraint on the parameters.
code is: data(Prostate) p.mean <- apply(Prostate, 5,mean) pros <- sweep(Prostate, 5, p.mean, "-") p.std <- apply(pros, 5, var) pros <- sweep(pros, 5, sqrt(p.std),"/") pros[, "lpsa"] <- Prostate[, "lpsa"] l1ce(lpsa ~ . , pros, bound = 0.44) I can't figure out what dose 0.44 come from. On the paper it said it was from generalized cross-validation and it is the optimal choice. paper name: Regression Shrinkage and Selection via the Lasso author: Robert Tibshirani |
|
Hi,
your code has errors: apply function only has 1 or 2 as margin. bound is used as turning parameter for summation of absolute coefficients. lasso runs on a grid of the turning parameter for varying strength of shrinkage. so each turning value may yield different sets of coefficients and values. cross validation is used to estimate the value of the turning parameter which gives the smallest errors (mse or deviance) on testing data. Weidong Gu On Tue, Mar 27, 2012 at 10:35 AM, yx78 <[hidden email]> wrote: > In the package lasso2, there is a Prostate Data. To find coefficients in the > prostate cancer example we could impose L1 constraint on the parameters. > > code is: > data(Prostate) > p.mean <- apply(Prostate, 5,mean) > pros <- sweep(Prostate, 5, p.mean, "-") > p.std <- apply(pros, 5, var) > pros <- sweep(pros, 5, sqrt(p.std),"/") > pros[, "lpsa"] <- Prostate[, "lpsa"] > l1ce(lpsa ~ . , pros, bound = 0.44) > > I can't figure out what dose 0.44 come from. On the paper it said it was > from generalized cross-validation and it is the optimal choice. > > paper name: Regression Shrinkage and Selection via the Lasso > > author: Robert Tibshirani > > > > -- > View this message in context: http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Inline:
On Tue, Mar 27, 2012 at 10:00 AM, Weidong Gu <[hidden email]> wrote: > Hi, > > your code has errors: apply function only has 1 or 2 as margin. FALSE. Please re-read the Help files. It works as expected with arbitrary higher dim arrays. -- Bert > > bound is used as turning parameter for summation of absolute > coefficients. lasso runs on a grid of the turning parameter for > varying strength of shrinkage. so each turning value may yield > different sets of coefficients and values. cross validation is used to > estimate the value of the turning parameter which gives the smallest > errors (mse or deviance) on testing data. > > Weidong Gu > > > > On Tue, Mar 27, 2012 at 10:35 AM, yx78 <[hidden email]> wrote: >> In the package lasso2, there is a Prostate Data. To find coefficients in the >> prostate cancer example we could impose L1 constraint on the parameters. >> >> code is: >> data(Prostate) >> p.mean <- apply(Prostate, 5,mean) >> pros <- sweep(Prostate, 5, p.mean, "-") >> p.std <- apply(pros, 5, var) >> pros <- sweep(pros, 5, sqrt(p.std),"/") >> pros[, "lpsa"] <- Prostate[, "lpsa"] >> l1ce(lpsa ~ . , pros, bound = 0.44) >> >> I can't figure out what dose 0.44 come from. On the paper it said it was >> from generalized cross-validation and it is the optimal choice. >> >> paper name: Regression Shrinkage and Selection via the Lasso >> >> author: Robert Tibshirani >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by yx78
Hi,
On Tue, Mar 27, 2012 at 10:35 AM, yx78 <[hidden email]> wrote: > In the package lasso2, there is a Prostate Data. To find coefficients in the > prostate cancer example we could impose L1 constraint on the parameters. > > code is: > data(Prostate) > p.mean <- apply(Prostate, 5,mean) > pros <- sweep(Prostate, 5, p.mean, "-") > p.std <- apply(pros, 5, var) > pros <- sweep(pros, 5, sqrt(p.std),"/") > pros[, "lpsa"] <- Prostate[, "lpsa"] > l1ce(lpsa ~ . , pros, bound = 0.44) > > I can't figure out what dose 0.44 come from. On the paper it said it was > from generalized cross-validation and it is the optimal choice. Yes, this is exactly how the "optimal" value for bound would be found. Using the lasso2 package, you'll likely have to do a grid search over possible values for `bound` in a cross validation setting and you pick the one that fits the model best on the held out data over all your CV folds. If I were you, I'd use the glmnet package since it can calculate the entire regularization path w/o having to do a grid search over the bound (or lamda), making cross validation easier. If you're confused about how you might use cross validation to find the optimal value of the parameter(s) of the model you are building, then it's time to pull yourself away from the keyboaRd and start doing some reading, or (as Bert will likely tell you) consult your local statistician. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
