lasso based selection for mixed model

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

lasso based selection for mixed model

Dieter Menne
Dear useRs (called Frank Harrell, most likely),

after having preached for years to my medical colleagues to be cautious
with stepwise selection procedures, they chanted back asking for an
alternative when using mixed models.
There is a half dozen laXXX packages around for all types of linear models,
but as far I see there is none for mixed models such as lme. Even
boot.stepAIC (which I considered valid, but Frank objects) is not for
lme.

I found one SPSS macro around that comes close.

Any ideas?

Dieter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lasso based selection for mixed model

dave fournier

I have been working on lasso type model selection
 using AD Model Builders random effects module.
I came across a few questions about this topic on the list
so I thought this would be of interest.
The advantage of AD Model Builder is that it lets you formulate
general nonlinear random effects models which is I believe
necessary for the lasso type approach.  I think the approach would work
for any kind of RE model such as say a negative binomial mixed model,
but lets keep it simple for now.


Let the fixed effect parameters be a_i
The usual lasso penalty then is

        p* sum_i abs(a_i)

with tunable penalty weight
Since AD Model Builders random effects module can deal with arbitrary
nonlinear random effects models. I tried simply adding this to the -LL.
However what happens is that in order to get the penalty large enough to
get the lasso effect the estimation procedure simply  makes all the a_i
too small and uses the RE's instead to fit the data.  I figured what was
needed was a penalty term which is more encouraging for setting most of
the terms to 0 and making a few of them large.  Using a square root
would have this property so I tried

             p* sum_i sqrt*(abs(a_i))


Of course the thing is not differentiable at 0 so I smoothed it off a
bit there, but that is the general idea.
To test it I generated data of the form

       Y_ij = a(1) *X_1i  + u_i  +e_ij

for 1<=i<=500  1<=j<=5

and a(1)=2.0  and std_dev u_i =0.6  std dev e_ij = 1.5

and the IV X_1i  is  std normal.

For the other IV's  I generated 9 more  IV's  X_ki  for 2<=k<=10

with   X_ki =  X_1i + .1*V_ki

where the V_li's are std normal


The correlation matrix for them is

 1 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995
 0.995 1 0.99 0.99 0.989 0.988 0.99 0.989 0.99 0.99
 0.995 0.99 1 0.99 0.989 0.99 0.99 0.99 0.989 0.99
 0.995 0.99 0.99 1 0.99 0.989 0.99 0.99 0.99 0.99
 0.995 0.989 0.989 0.99 1 0.989 0.99 0.989 0.99 0.99
 0.995 0.988 0.99 0.989 0.989 1 0.99 0.99 0.989 0.99
 0.995 0.99 0.99 0.99 0.99 0.99 1 0.991 0.99 0.991
 0.995 0.989 0.99 0.99 0.989 0.99 0.991 1 0.988 0.99
 0.995 0.99 0.989 0.99 0.99 0.989 0.99 0.988 1 0.99
 0.995 0.99 0.99 0.99 0.99 0.99 0.991 0.99 0.99 1

so these are highly correlated explanatory variables.
Then I fit the model with no lasso penalty.

index   name   value      std dev
    1   a      3.1565e+00 1.2302e+00
    2   a      6.7390e-02 3.8737e-01
    3   a     -1.9154e-01 4.0204e-01
    4   a     -2.7920e-01 3.9847e-01
    5   a      3.0924e-02 3.9842e-01
    6   a      7.6508e-02 3.8698e-01
    7   a     -5.3902e-01 4.1590e-01
    8   a     -2.4941e-01 4.0141e-01
    9   a      2.5324e-01 3.8875e-01
   10   a     -2.9435e-01 4.1345e-01
   11   sig_u  8.2845e-01 3.0015e-02

so a(1) is rather badly estimated and its estimated std dev is large.

One way to try and deal with this is ridge regression. This is like
adding the following quadratic penalty.

      p* sum_i square(a_i)

Of course we know that this will have exactly the opposite effect to
what we want that is it tends to make all the estimated parameters non zero.
With a value of p=1.0 I got the following estimates.

 index   name   value      std dev
     1   a      8.7729e-01 5.9032e-01
     2   a      2.9461e-01 3.2352e-01
     3   a      8.3218e-02 3.3546e-01
     4   a      8.3569e-03 3.3438e-01
     5   a      2.7345e-01 3.2984e-01
     6   a      2.5172e-01 3.2822e-01
     7   a     -1.9834e-01 3.4721e-01
     8   a      2.7677e-02 3.3404e-01
     9   a      4.0135e-01 3.2855e-01
    10   a      5.7042e-03 3.4325e-01
    11   sig_u  8.3188e-01 3.0163e-02

So the std devs are smaller but the results are really bad.

Now the fun part. With a penalty weight p=10 and the
sqrt(abs()) penalty I got

index   name   value      std dev
    1   a      2.0191e+00 4.0619e-02
    2   a      6.7300e-06 1.2917e-03
    3   a      3.7718e-06 1.2915e-03
    4   a      3.0421e-06 1.2914e-03
    5   a      6.0954e-06 1.2917e-03
    6   a      5.9105e-06 1.2916e-03
    7   a      4.2511e-07 1.2914e-03
    8   a      2.4084e-06 1.2914e-03
    9   a      8.3686e-06 1.2919e-03
   10   a      2.7657e-06 1.2914e-03
   11   sig_u  8.3226e-01 3.0119e-02


So this looks like a promising approach.
Of course the real test for this is does it
work for real data?  If anyone wants to cooperate
on this it could be fun.

   Dave



--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lasso based selection for mixed model

schell
In reply to this post by Dieter Menne
Dear Dieter

There has been some effort examining Lasso-type estimators for mixed models. It is more involved than with other type of models.

We have studied Gaussian and generalized linear mixed models including a Lasso-type penalty for the fixed effects and we are now able to provide two packages: lmmlasso and glmmlasso. Both are available from R-Forge (the former even from CRAN).

Andreas Groll  provides the glmmLasso package, which is available from CRAN.

Juerg Schelldorfer