I have been working on lasso type model selection

using AD Model Builders random effects module.

I came across a few questions about this topic on the list

so I thought this would be of interest.

The advantage of AD Model Builder is that it lets you formulate

general nonlinear random effects models which is I believe

necessary for the lasso type approach. I think the approach would work

for any kind of RE model such as say a negative binomial mixed model,

but lets keep it simple for now.

Let the fixed effect parameters be a_i

The usual lasso penalty then is

p* sum_i abs(a_i)

with tunable penalty weight

Since AD Model Builders random effects module can deal with arbitrary

nonlinear random effects models. I tried simply adding this to the -LL.

However what happens is that in order to get the penalty large enough to

get the lasso effect the estimation procedure simply makes all the a_i

too small and uses the RE's instead to fit the data. I figured what was

needed was a penalty term which is more encouraging for setting most of

the terms to 0 and making a few of them large. Using a square root

would have this property so I tried

p* sum_i sqrt*(abs(a_i))

Of course the thing is not differentiable at 0 so I smoothed it off a

bit there, but that is the general idea.

To test it I generated data of the form

Y_ij = a(1) *X_1i + u_i +e_ij

for 1<=i<=500 1<=j<=5

and a(1)=2.0 and std_dev u_i =0.6 std dev e_ij = 1.5

and the IV X_1i is std normal.

For the other IV's I generated 9 more IV's X_ki for 2<=k<=10

with X_ki = X_1i + .1*V_ki

where the V_li's are std normal

The correlation matrix for them is

1 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995

0.995 1 0.99 0.99 0.989 0.988 0.99 0.989 0.99 0.99

0.995 0.99 1 0.99 0.989 0.99 0.99 0.99 0.989 0.99

0.995 0.99 0.99 1 0.99 0.989 0.99 0.99 0.99 0.99

0.995 0.989 0.989 0.99 1 0.989 0.99 0.989 0.99 0.99

0.995 0.988 0.99 0.989 0.989 1 0.99 0.99 0.989 0.99

0.995 0.99 0.99 0.99 0.99 0.99 1 0.991 0.99 0.991

0.995 0.989 0.99 0.99 0.989 0.99 0.991 1 0.988 0.99

0.995 0.99 0.989 0.99 0.99 0.989 0.99 0.988 1 0.99

0.995 0.99 0.99 0.99 0.99 0.99 0.991 0.99 0.99 1

so these are highly correlated explanatory variables.

Then I fit the model with no lasso penalty.

index name value std dev

1 a 3.1565e+00 1.2302e+00

2 a 6.7390e-02 3.8737e-01

3 a -1.9154e-01 4.0204e-01

4 a -2.7920e-01 3.9847e-01

5 a 3.0924e-02 3.9842e-01

6 a 7.6508e-02 3.8698e-01

7 a -5.3902e-01 4.1590e-01

8 a -2.4941e-01 4.0141e-01

9 a 2.5324e-01 3.8875e-01

10 a -2.9435e-01 4.1345e-01

11 sig_u 8.2845e-01 3.0015e-02

so a(1) is rather badly estimated and its estimated std dev is large.

One way to try and deal with this is ridge regression. This is like

adding the following quadratic penalty.

p* sum_i square(a_i)

Of course we know that this will have exactly the opposite effect to

what we want that is it tends to make all the estimated parameters non zero.

With a value of p=1.0 I got the following estimates.

index name value std dev

1 a 8.7729e-01 5.9032e-01

2 a 2.9461e-01 3.2352e-01

3 a 8.3218e-02 3.3546e-01

4 a 8.3569e-03 3.3438e-01

5 a 2.7345e-01 3.2984e-01

6 a 2.5172e-01 3.2822e-01

7 a -1.9834e-01 3.4721e-01

8 a 2.7677e-02 3.3404e-01

9 a 4.0135e-01 3.2855e-01

10 a 5.7042e-03 3.4325e-01

11 sig_u 8.3188e-01 3.0163e-02

So the std devs are smaller but the results are really bad.

Now the fun part. With a penalty weight p=10 and the

sqrt(abs()) penalty I got

index name value std dev

1 a 2.0191e+00 4.0619e-02

2 a 6.7300e-06 1.2917e-03

3 a 3.7718e-06 1.2915e-03

4 a 3.0421e-06 1.2914e-03

5 a 6.0954e-06 1.2917e-03

6 a 5.9105e-06 1.2916e-03

7 a 4.2511e-07 1.2914e-03

8 a 2.4084e-06 1.2914e-03

9 a 8.3686e-06 1.2919e-03

10 a 2.7657e-06 1.2914e-03

11 sig_u 8.3226e-01 3.0119e-02

So this looks like a promising approach.

Of course the real test for this is does it

work for real data? If anyone wants to cooperate

on this it could be fun.

Dave

--

David A. Fournier

P.O. Box 2040,

Sidney, B.C. V8l 3S3

Canada

Phone/FAX 250-655-3364

http://otter-rsch.com______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.