random effects in library mgcv

5 messages
Open this post in threaded view
|
Report Content as Inappropriate

random effects in library mgcv

 Hi, I am working with gam models in the mgcv library. My response variable (Y) is binary (0/1), and my dataset contains repeated measures over 110 individuals (same number of 0/1 within a given individual: e.g. 345-zero and 345-one for individual A, 226-zero and 226-one for individual B, etc.). The variable Factor is separating the individuals in three groups according to mass (group 0,1,2), Factor1 is a binary variable coding for individuals of group1, Factor2 is a binary variable for individuals of group 2 I use gam models of this sort with random effects coded using a s( ..., bs="re") term: gm<-gam(Y~Factor+te(x1,x2,by=Factor) )+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML") gm1<-gam(Y~Factor+te(x1,x2)+ te(x1,x2,by=Factor1)+ te(x1,x2,by=Factor2)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML") 1)First question: is it OK to use gam() to model a binary variable with random effects coded as a "bs="re" term"?? I have read that the gamm4() function gives better performance than gamm() to deal with binary variables when random effects are coded as: random=~(1|Individual) but does that mean that binary variables should not be used as response variable in gam() with random effects coded as bs="re"??? 2)Second question: For some models, I obtain a p-value=NA and Chi-square=0 for the s(Individual) term, and for some other models a p-value=1 and high Chi-square. The difference between one model that can estimate a p-value and one that cannot is very slight: for example if I use a variable x3 instead of x2 in a model, it can change from p-value=NA to p-value=1. Does anyone know what can be happening? 3)Third question: Not linked to random effects but rather to what the two models gm and gm1 are actually testing. From my understanding, the first model creates a 2d-smooth for each level of my factor variable and test whether those smooth are significantly different from a straight line. The second model, also creates 3 smooth: one for the reference level of my Factor variable (group0), one showing the difference between the reference smooth and the smooth for group1, one showing the difference between the reference smooth and the smooth for group 2. The summary(gm1) gives p-values associated with each of those three smooths and which determine:  if the reference smooth is different from 0, if the smooth for group1 is different from the reference smooth and  if the smooth for group2 is different from the reference smooth. Do I understand well what the models are testing? The number of "edf" estimated for te(x1,x2):Factor2 in the gm1 model is 3,013 while it is 19,57 in the gm model. Does that mean that the difference between the reference smooth: te(x1,x2) and the smooth for group 2: te(x1,x2, by=Factor2) is "small" so it can be modeled with only 3 degrees of freedom? Still, the associated p-value is highly significant? When comparing AIC between the gm and gm1 models, I find sometimes that gm1 has a lower AIC than gm.  How can that be interpreted?? Thanks a lot if anyone can help... Geraldine         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

Re: random effects in library mgcv

 On 25/04/12 14:02, Mabille, Geraldine wrote: > Hi, I am working with gam models in the mgcv library. My response > variable (Y) is binary (0/1), and my dataset contains repeated > measures over 110 individuals (same number of 0/1 within a given > individual: e.g. 345-zero and 345-one for individual A, 226-zero and > 226-one for individual B, etc.). The variable Factor is separating > the individuals in three groups according to mass (group 0,1,2), > Factor1 is a binary variable coding for individuals of group1, > Factor2 is a binary variable for individuals of group 2 I use gam > models of this sort with random effects coded using a s( ..., > bs="re") term: gm<-gam(Y~Factor+te(x1,x2,by=Factor) > )+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML") > > > gm1<-gam(Y~Factor+te(x1,x2)+ te(x1,x2,by=Factor1)+ te(x1,x2,by=Factor2)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML") > > 1)First question: is it OK to use gam() to model a binary variable > with random effects coded as a "bs="re" term"?? I have read that the > gamm4() function gives better performance than gamm() to deal with > binary variables when random effects are coded as: > random=~(1|Individual) but does that mean that binary variables > should not be used as response variable in gam() with random effects > coded as bs="re"??? -- It's fine to use 'gam'. The problem with gamm, is that it uses PQL, which can perform poorly in the binary case (mostly related to estimating the scale parameter rather than treating it as fixed). Both gam and gamm4 use a Laplace approximation that behaves much better that PQL in the binary case. > 2)Second question: For some models, I obtain a p-value=NA and > Chi-square=0 for the s(Individual) term, and for some other models a > p-value=1 and high Chi-square. The difference between one model that > can estimate a p-value and one that cannot is very slight: for > example if I use a variable x3 instead of x2 in a model, it can > change from p-value=NA to p-value=1. Does anyone know what can be > happening? > -- basically both cases are providing the same information --- that the corresponding term is not need in the model. In the first case the term is estimated to be so close to zero that it makes no sense to compute a p-value, while in the second case the term has been estimated to be non-zero, but not *significantly* different from zero at any level. > 3)Third question: Not linked to random effects but rather to what > the two models gm and gm1 are actually testing. From my > understanding, the first model creates a 2d-smooth for each level of > my factor variable and test whether those smooth are significantly > different from a straight line. -- from zero, if I understand the model right. > The second model, also creates 3 > smooth: one for the reference level of my Factor variable (group0), > one showing the difference between the reference smooth and the > smooth for group1, one showing the difference between the reference > smooth and the smooth for group 2. The summary(gm1) gives p-values > associated with each of those three smooths and which determine:  if > the reference smooth is different from 0, if the smooth for group1 > is different from the reference smooth and  if the smooth for group2 > is different from the reference smooth. Do I understand well what > the models are testing? -- sounds right to me. >The number of "edf" estimated for > te(x1,x2):Factor2 in the gm1 model is 3,013 while it is 19,57 in the > gm model. Does that mean that the difference between the reference > smooth: te(x1,x2) and the smooth for group 2: te(x1,x2, by=Factor2) > is "small" so it can be modeled with only 3 degrees of freedom? -- "smooth", rather than "small". The difference could be very large, but vary smoothly with x1, x2. > Still, the associated p-value is highly significant? ... since the terms is significantly different from zero. > When comparing > AIC between the gm and gm1 models, I find sometimes that gm1 has a > lower AIC than gm.  How can that be interpreted?? -- this is quite common if the differences between the factor levels can be represented rather smoothly, but the baseline smooth is quite complicated. In that case gm1 is much more parsimonious than gm for (approximately) the same fit, since gm requires a complicated smooth for each level (duplicating effort, in effect). So gm1 has a lower AIC because its doing the same job with fewer coefficients. best, Simon -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603               http://people.bath.ac.uk/sw283______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

Re: random effects in library mgcv

 Thanks Simon for your help again on those points. Just a last thing, regarding the third question: If the gm1 model has a lower AIC than the gm model, does that mean that we should "select" for it?? As you said, I think both these models are doing the same job (i.e. testing that the three smooth are different from zero), so the part of deviance explain by both should be similar, and the difference in AIC we obtained is only due to the fact that creating a smooth of the difference to the reference model needs less degrees of freedom than creating a totally "new" smooth??? Am I interpreting those things right??? Cheers, Geraldine -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Wood Sent: 26. april 2012 19:28 To: [hidden email] Subject: Re: [R] random effects in library mgcv On 25/04/12 14:02, Mabille, Geraldine wrote: > Hi, I am working with gam models in the mgcv library. My response > variable (Y) is binary (0/1), and my dataset contains repeated > measures over 110 individuals (same number of 0/1 within a given > individual: e.g. 345-zero and 345-one for individual A, 226-zero and > 226-one for individual B, etc.). The variable Factor is separating the > individuals in three groups according to mass (group 0,1,2), > Factor1 is a binary variable coding for individuals of group1, > Factor2 is a binary variable for individuals of group 2 I use gam > models of this sort with random effects coded using a s( ..., > bs="re") term: gm<-gam(Y~Factor+te(x1,x2,by=Factor) > )+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="R > EML") > > > gm1<-gam(Y~Factor+te(x1,x2)+ te(x1,x2,by=Factor1)+ te(x1,x2,by=Factor2)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML") > > 1)First question: is it OK to use gam() to model a binary variable > with random effects coded as a "bs="re" term"?? I have read that the > gamm4() function gives better performance than gamm() to deal with > binary variables when random effects are coded as: > random=~(1|Individual) but does that mean that binary variables should > not be used as response variable in gam() with random effects coded as > bs="re"??? -- It's fine to use 'gam'. The problem with gamm, is that it uses PQL, which can perform poorly in the binary case (mostly related to estimating the scale parameter rather than treating it as fixed). Both gam and gamm4 use a Laplace approximation that behaves much better that PQL in the binary case. > 2)Second question: For some models, I obtain a p-value=NA and > Chi-square=0 for the s(Individual) term, and for some other models a > p-value=1 and high Chi-square. The difference between one model that > can estimate a p-value and one that cannot is very slight: for example > if I use a variable x3 instead of x2 in a model, it can change from > p-value=NA to p-value=1. Does anyone know what can be happening? > -- basically both cases are providing the same information --- that the corresponding term is not need in the model. In the first case the term is estimated to be so close to zero that it makes no sense to compute a p-value, while in the second case the term has been estimated to be non-zero, but not *significantly* different from zero at any level. > 3)Third question: Not linked to random effects but rather to what the > two models gm and gm1 are actually testing. From my understanding, the > first model creates a 2d-smooth for each level of my factor variable > and test whether those smooth are significantly different from a > straight line. -- from zero, if I understand the model right. > The second model, also creates 3 > smooth: one for the reference level of my Factor variable (group0), > one showing the difference between the reference smooth and the smooth > for group1, one showing the difference between the reference smooth > and the smooth for group 2. The summary(gm1) gives p-values associated > with each of those three smooths and which determine:  if the > reference smooth is different from 0, if the smooth for group1 is > different from the reference smooth and  if the smooth for group2 is > different from the reference smooth. Do I understand well what the > models are testing? -- sounds right to me. >The number of "edf" estimated for > te(x1,x2):Factor2 in the gm1 model is 3,013 while it is 19,57 in the   >gm model. Does that mean that the difference between the reference > smooth: te(x1,x2) and the smooth for group 2: te(x1,x2, by=Factor2)   >is "small" so it can be modeled with only 3 degrees of freedom? -- "smooth", rather than "small". The difference could be very large, but vary smoothly with x1, x2. > Still, the associated p-value is highly significant? ... since the terms is significantly different from zero. > When comparing > AIC between the gm and gm1 models, I find sometimes that gm1 has a > lower AIC than gm.  How can that be interpreted?? -- this is quite common if the differences between the factor levels can be represented rather smoothly, but the baseline smooth is quite complicated. In that case gm1 is much more parsimonious than gm for (approximately) the same fit, since gm requires a complicated smooth for each level (duplicating effort, in effect). So gm1 has a lower AIC because its doing the same job with fewer coefficients. best, Simon -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603               http://people.bath.ac.uk/sw283______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate