Quantcast

choose.k function in mgcv packages

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

choose.k function in mgcv packages

ywh123
This post was updated on .
Hi,everyone,
I am studying the generalized additive model and employ the package 'mgcv' developed by professor Wood.
However,I can not understand the example listed in choose.in function.
For example,


library(mgcv)
set.seed(1)
dat <- gamSim(1,n=400,scale=2)

## fit a GAM with quite low `k'
b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat)
plot(b,pages=1,residuals=TRUE) ## hint of a problem in s(x2)

## the following suggests a problem with s(x2)
gam.check(b)

## Another approach (see below for more obvious method)....
## check for residual pattern, removeable by increasing `k'
## typically `k', below, chould be substantially larger than
## the original, `k' but certainly less than n/2.
## Note use of cheap "cs" shrinkage smoothers, and gamma=1.4
## to reduce chance of overfitting...
rsd <- residuals(b)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine

why the model is not good for x2?

> gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low

Family: gaussian
Link function: identity

Formula:
rsd ~ s(x2, k = 40, bs = "cs")

Estimated degrees of freedom:
9.0093  total = 10.00926

GCV score: 4.494652

For the results,we can see that the EDF is much less than K-1,so according to
"If the effective degrees of freedom for a model term are estimated to be much less than k-1 then this is unlikely to be very worthwhile",I think the results are reasonable.

Why?

Thanks in advance
wanhai
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: check.k function in mgcv packages

Simon Wood-4
The point is that you are checking the basis dimension used in the first
model, b, where the basis dimension for s(x2) was set to 6. All the
other model fits are about checking that first one. On checking the
residuals from model b you detect pattern with respect to x2, with an
estimated degrees of freedom of 9, which is bigger than the maximum
possible employed in model b. So model b is probably using too small a
basis dimension for s(x2).

best,
Simon

On 06/21/2012 02:07 AM, ywh123 wrote:

> Hi,everyone,
> I am studying the generalized additive model and employ the package 'mgcv'
> developed by professor Wood.
> However,I can not understand the example listed in check.in function.
> For example,
>
>
> library(mgcv)
> set.seed(1)
> dat<- gamSim(1,n=400,scale=2)
>
> ## fit a GAM with quite low `k'
> b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat)
> plot(b,pages=1,residuals=TRUE) ## hint of a problem in s(x2)
>
> ## the following suggests a problem with s(x2)
> gam.check(b)
>
> ## Another approach (see below for more obvious method)....
> ## check for residual pattern, removeable by increasing `k'
> ## typically `k', below, chould be substantially larger than
> ## the original, `k' but certainly less than n/2.
> ## Note use of cheap "cs" shrinkage smoothers, and gamma=1.4
> ## to reduce chance of overfitting...
> rsd<- residuals(b)
> gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
> gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
> /gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low/
> gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine
>
> why the model is not good for x2?
>
>> gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low
> Family: gaussian
> Link function: identity
>
> Formula:
> rsd ~ s(x2, k = 40, bs = "cs")
>
> Estimated degrees of freedom:
> 9.0093  total = 10.00926
>
> GCV score: 4.494652
>
> For the results,we can see that the EDF is much less than K-1,so according
> to
> "If the effective degrees of freedom for a model term are estimated to be
> much less than k-1 then this is unlikely to be very worthwhile",I think the
> results are reasonable.
>
> Why?
>
> Thanks in advance
> wanhai
>
> --
> View this message in context: http://r.789695.n4.nabble.com/check-k-function-in-mgcv-packages-tp4634050.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: check.k function in mgcv packages

ywh123
This post was updated on .
Hi,thanks very much,
I have some another questions about GAM models.

First,Is there some restrictions on the sample size? For example,I am studying
the GDP and foreign direct investment on 29 provinces in China(N=29).Whether or
not N is too samll? If so,could I use pooled data(N=29,T=5)?


Second,Could I use the "mgcv" packages to implement panel data model through adding
specific fixed effect and time fixed effect as the generzlied mixed model?(eg. adding factor(s)).

For example,
The paper written by  Roberto Basile named "Regional economic growth in Europe;A semiparametric
spatial dependence approach",published in Papers in Regional Science.In this paper,the author employ
a semiparemetric spatial durbin model to analyse the growth behavior of 155 European regions in the
period 1988-2000.I am not sure how to  arrange the data?

Thanks very much in advance.


Best Regards,
Wanhai
Loading...