Hello fellow R's,
I do apologize if this is a basic question. I'm doing some GAMs using the mgcv package, and I am wondering what is the most appropriate way to determine how much of the variability in the dependent variable is explained by each term in the model. The information provided by summary.gam() relates to the significance of each term (F, p-value) and to the "wiggliness" of the fitted smooth (edf), but (as far as I understand) there is no information on the proportion of variance explained. One alternative may be to fit alternative models without each term, and calculate the reduction in deviance. For example: m1=gam(y~s(x1) + s(x2)) # Full model m2=gam(y~s(x2)) m3=gam(y~s(x1)) ddev1=deviance(m1)-deviance(m2) ddev2=deviance(m1)-deviance(m3) Here, ddev1 would measure the relative proportion of the variability in y explained by x1, and ddev2 would do the same for x2. Does this sound like an appropriate approach? Julian Julian Burgos FAR lab University of Washington ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
I think that your approach is reasonable, except that you should use the same
smoothing parameters throughout. i.e the reduced models should use the same smoothing parameters as the full model. Otherwise you get in trouble if x1 and x2 are correlated, since the smoothing parameters will then tend to change alot when terms are dropped as one smooth tries to `do the work' of the other. Here's an example, (which is modifiable to illustrate the problem with not fixing the sp's) ## simulate some data set.seed(0) n<-400 x1 <- runif(n, 0, 1) ## to see problem with not fixing smoothing parameters ## remove the `##' from the next line, and the `sp' ## arguments from the `gam' calls generating b1 and b2. x2 <- runif(n, 0, 1) ## *.1 + x1 f1 <- function(x) exp(2 * x) f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 f <- f1(x1) + f2(x2) e <- rnorm(n, 0, 2) y <- f + e ## fit full and reduced models... b <- gam(y~s(x1)+s(x2)) b1 <- gam(y~s(x1),sp=b$sp[1]) b2 <- gam(y~s(x2),sp=b$sp[2]) b0 <- gam(y~1) ## calculate proportions deviance explained... (deviance(b1)-deviance(b))/deviance(b0) ## prop explained by s(x2) (deviance(b2)-deviance(b))/deviance(b0) ## prop explained by s(x1) On Monday 08 October 2007 20:19, Julian M Burgos wrote: > Hello fellow R's, > > I do apologize if this is a basic question. I'm doing some GAMs using the > mgcv package, and I am wondering what is the most appropriate way to > determine how much of the variability in the dependent variable is > explained by each term in the model. The information provided by > summary.gam() relates to the significance of each term (F, p-value) and to > the "wiggliness" of the fitted smooth (edf), but (as far as I understand) > there is no information on the proportion of variance explained. > > One alternative may be to fit alternative models without each term, and > calculate the reduction in deviance. For example: > > m1=gam(y~s(x1) + s(x2)) # Full model > m2=gam(y~s(x2)) > m3=gam(y~s(x1)) > > ddev1=deviance(m1)-deviance(m2) > ddev2=deviance(m1)-deviance(m3) > > Here, ddev1 would measure the relative proportion of the variability in y > explained by x1, and ddev2 would do the same for x2. Does this sound like > an appropriate approach? > > Julian > > Julian Burgos > FAR lab > University of Washington > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code. -- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Thanks again for your answer, prof. Wood.
And my apologies for the list for my repeated message from yesterday. Still trying to figure out what happened with my email software. Julian Simon Wood wrote: > I think that your approach is reasonable, except that you should use the same > smoothing parameters throughout. i.e the reduced models should use the same > smoothing parameters as the full model. Otherwise you get in trouble if x1 > and x2 are correlated, since the smoothing parameters will then tend to > change alot when terms are dropped as one smooth tries to `do the work' of > the other. Here's an example, (which is modifiable to illustrate the problem > with not fixing the sp's) > > ## simulate some data > set.seed(0) > n<-400 > x1 <- runif(n, 0, 1) > ## to see problem with not fixing smoothing parameters > ## remove the `##' from the next line, and the `sp' > ## arguments from the `gam' calls generating b1 and b2. > x2 <- runif(n, 0, 1) ## *.1 + x1 > f1 <- function(x) exp(2 * x) > f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 > f <- f1(x1) + f2(x2) > e <- rnorm(n, 0, 2) > y <- f + e > ## fit full and reduced models... > b <- gam(y~s(x1)+s(x2)) > b1 <- gam(y~s(x1),sp=b$sp[1]) > b2 <- gam(y~s(x2),sp=b$sp[2]) > b0 <- gam(y~1) > ## calculate proportions deviance explained... > (deviance(b1)-deviance(b))/deviance(b0) ## prop explained by s(x2) > (deviance(b2)-deviance(b))/deviance(b0) ## prop explained by s(x1) > > > > > > On Monday 08 October 2007 20:19, Julian M Burgos wrote: >> Hello fellow R's, >> >> I do apologize if this is a basic question. I'm doing some GAMs using the >> mgcv package, and I am wondering what is the most appropriate way to >> determine how much of the variability in the dependent variable is >> explained by each term in the model. The information provided by >> summary.gam() relates to the significance of each term (F, p-value) and to >> the "wiggliness" of the fitted smooth (edf), but (as far as I understand) >> there is no information on the proportion of variance explained. >> >> One alternative may be to fit alternative models without each term, and >> calculate the reduction in deviance. For example: >> >> m1=gam(y~s(x1) + s(x2)) # Full model >> m2=gam(y~s(x2)) >> m3=gam(y~s(x1)) >> >> ddev1=deviance(m1)-deviance(m2) >> ddev2=deviance(m1)-deviance(m3) >> >> Here, ddev1 would measure the relative proportion of the variability in y >> explained by x1, and ddev2 would do the same for x2. Does this sound like >> an appropriate approach? >> >> Julian >> >> Julian Burgos >> FAR lab >> University of Washington >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, minimal, >> self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Simon Wood-4
Dear Prof. Wood,
Just another quick question. I am doing model selection following Wood and Augustin (2002). One of the criteria for retaining a term is to see if removing it causes an increase in the GCV score. When doing this, do I also need to fix the smooth parameters? Thanks, Julian Burgos Fisheries Acoustics Research Lab School of Aquatic and Fishery Science University of Washington 1122 NE Boat Street Seattle, WA 98105 Simon Wood wrote: > I think that your approach is reasonable, except that you should use the same > smoothing parameters throughout. i.e the reduced models should use the same > smoothing parameters as the full model. Otherwise you get in trouble if x1 > and x2 are correlated, since the smoothing parameters will then tend to > change alot when terms are dropped as one smooth tries to `do the work' of > the other. Here's an example, (which is modifiable to illustrate the problem > with not fixing the sp's) > > ## simulate some data > set.seed(0) > n<-400 > x1 <- runif(n, 0, 1) > ## to see problem with not fixing smoothing parameters > ## remove the `##' from the next line, and the `sp' > ## arguments from the `gam' calls generating b1 and b2. > x2 <- runif(n, 0, 1) ## *.1 + x1 > f1 <- function(x) exp(2 * x) > f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 > f <- f1(x1) + f2(x2) > e <- rnorm(n, 0, 2) > y <- f + e > ## fit full and reduced models... > b <- gam(y~s(x1)+s(x2)) > b1 <- gam(y~s(x1),sp=b$sp[1]) > b2 <- gam(y~s(x2),sp=b$sp[2]) > b0 <- gam(y~1) > ## calculate proportions deviance explained... > (deviance(b1)-deviance(b))/deviance(b0) ## prop explained by s(x2) > (deviance(b2)-deviance(b))/deviance(b0) ## prop explained by s(x1) > > > > > > On Monday 08 October 2007 20:19, Julian M Burgos wrote: > >> Hello fellow R's, >> >> I do apologize if this is a basic question. I'm doing some GAMs using the >> mgcv package, and I am wondering what is the most appropriate way to >> determine how much of the variability in the dependent variable is >> explained by each term in the model. The information provided by >> summary.gam() relates to the significance of each term (F, p-value) and to >> the "wiggliness" of the fitted smooth (edf), but (as far as I understand) >> there is no information on the proportion of variance explained. >> >> One alternative may be to fit alternative models without each term, and >> calculate the reduction in deviance. For example: >> >> m1=gam(y~s(x1) + s(x2)) # Full model >> m2=gam(y~s(x2)) >> m3=gam(y~s(x1)) >> >> ddev1=deviance(m1)-deviance(m2) >> ddev2=deviance(m1)-deviance(m3) >> >> Here, ddev1 would measure the relative proportion of the variability in y >> explained by x1, and ddev2 would do the same for x2. Does this sound like >> an appropriate approach? >> >> Julian >> >> Julian Burgos >> FAR lab >> University of Washington >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, minimal, >> self-contained, reproducible code. >> > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Powered by Nabble | Edit this page |