# choose.k function in mgcv packages

## choose.k function in mgcv packages

 This post was updated on . Hi,everyone, I am studying the generalized additive model and employ the package 'mgcv' developed by professor Wood. However,I can not understand the example listed in choose.in function. For example, library(mgcv) set.seed(1) dat <- gamSim(1,n=400,scale=2) ## fit a GAM with quite low `k' b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat) plot(b,pages=1,residuals=TRUE) ## hint of a problem in s(x2) ## the following suggests a problem with s(x2) gam.check(b) ## Another approach (see below for more obvious method).... ## check for residual pattern, removeable by increasing `k' ## typically `k', below, chould be substantially larger than ## the original, `k' but certainly less than n/2. ## Note use of cheap "cs" shrinkage smoothers, and gamma=1.4 ## to reduce chance of overfitting... rsd <- residuals(b) gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too lowgam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine why the model is not good for x2? > gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low Family: gaussian Link function: identity Formula: rsd ~ s(x2, k = 40, bs = "cs") Estimated degrees of freedom: 9.0093  total = 10.00926 GCV score: 4.494652 For the results,we can see that the EDF is much less than K-1,so according to "If the effective degrees of freedom for a model term are estimated to be much less than k-1 then this is unlikely to be very worthwhile",I think the results are reasonable. Why? Thanks in advance wanhai
## Re: check.k function in mgcv packages

 The point is that you are checking the basis dimension used in the first model, b, where the basis dimension for s(x2) was set to 6. All the other model fits are about checking that first one. On checking the residuals from model b you detect pattern with respect to x2, with an estimated degrees of freedom of 9, which is bigger than the maximum possible employed in model b. So model b is probably using too small a basis dimension for s(x2). best, Simon On 06/21/2012 02:07 AM, ywh123 wrote: > Hi,everyone, > I am studying the generalized additive model and employ the package 'mgcv' > developed by professor Wood. > However,I can not understand the example listed in check.in function. > For example, > > > library(mgcv) > set.seed(1) > dat<- gamSim(1,n=400,scale=2) > > ## fit a GAM with quite low `k' > b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat) > plot(b,pages=1,residuals=TRUE) ## hint of a problem in s(x2) > > ## the following suggests a problem with s(x2) > gam.check(b) > > ## Another approach (see below for more obvious method).... > ## check for residual pattern, removeable by increasing `k' > ## typically `k', below, chould be substantially larger than > ## the original, `k' but certainly less than n/2. > ## Note use of cheap "cs" shrinkage smoothers, and gamma=1.4 > ## to reduce chance of overfitting... > rsd<- residuals(b) > gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine > gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine > /gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low/ > gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine > > why the model is not good for x2? > >> gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low > Family: gaussian > Link function: identity > > Formula: > rsd ~ s(x2, k = 40, bs = "cs") > > Estimated degrees of freedom: > 9.0093  total = 10.00926 > > GCV score: 4.494652 > > For the results,we can see that the EDF is much less than K-1,so according > to > "If the effective degrees of freedom for a model term are estimated to be > much less than k-1 then this is unlikely to be very worthwhile",I think the > results are reasonable. > > Why? > > Thanks in advance > wanhai