esthetics --- extending the lm command to fixed effects?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

esthetics --- extending the lm command to fixed effects?

ivo welch-2
dear R wizards:

not important.  more a curiosity or esthetics question.

is there a way to extend the standard lm command, so that it takes a new
argument that handles fixed effects?   right now, I have (provided to me
from an expert---I would have never figured this one out):

   diffid <- function(h,id) {
       id <- as.factor(id)[, drop=TRUE]
       apply(as.matrix(h), 2, function(x) x - tapply(x,id,mean)[id]
   }

which is used as

     r= lm( diffid(y, firmid) ~ diffid(x, firmid ) )

it works, but it would be much nicer if I could just write

    r= lm( y ~ x + z, fixed.effects=firmid )

does this already exists as a package?  or has someone figured out how to
program this?

as I wrote---this is a curiosity question, not a substance question.

regards,

/iaw
----
Ivo Welch ([hidden email], [hidden email])

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: esthetics --- extending the lm command to fixed effects?

Thomas Lumley
On Thu, 20 May 2010, ivo welch wrote:

> dear R wizards:
>
> not important.  more a curiosity or esthetics question.
>
> is there a way to extend the standard lm command, so that it takes a new
> argument that handles fixed effects?   right now, I have (provided to me
> from an expert---I would have never figured this one out):
>
>   diffid <- function(h,id) {
>       id <- as.factor(id)[, drop=TRUE]
>       apply(as.matrix(h), 2, function(x) x - tapply(x,id,mean)[id]
>   }

Simpler would be

    diffid<-function(h,id){ h-ave(h,id)}

> which is used as
>
>     r= lm( diffid(y, firmid) ~ diffid(x, firmid ) )
>
> it works, but it would be much nicer if I could just write
>
>    r= lm( y ~ x + z, fixed.effects=firmid )
>
> does this already exists as a package?  or has someone figured out how to
> program this?

I would just have used lm(y~x+z+factor(firmid)).  Admittedly, you get a whole bunch of uninteresting coefficients in the output, but it's not that hard to subset them out.

There are two implementation of this in Bill Venables' course notes on advanced programming. I think they are also in 'S Programming', but I can't find my copy right now.  These were motivated by computational problems: the full design matrix for the linear model was too large for memory at the time (last century).


As a final note, I would strongly discourage
    r= lm( y ~ x + z, fixed.effects=firmid )
as a specification, and would argue for
    r= lm( y ~ x + z, fixed.effects=~firmid )

I think the ability to have some subset of the arguments in a modelling call silently treated as formulas was a bad decision, although it must have looked user-friendly at the time.

          -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: esthetics --- extending the lm command to fixed effects?

ivo welch
hi thomas---

thanks for the answer.  the problem with the "+factor(fmid)" is not just
that it provides uninteresting coefficients and that it eats more memory,
but that it is also MUCH slower when there are (hundred of) thousands of
fixed effects.

Does Bill Venables describe how to do extend the lm() function?  I googled
"course notes on advanced programming Venables", but did not find it.  Do
you have a better link?   (hopefully, this is a short explanation---I know
the algorithm.  I want to learn how to coax it into an lm statement.)

regards,

/iaw
----
Ivo Welch ([hidden email], [hidden email])


I would just have used lm(y~x+z+factor(firmid)).  Admittedly, you get a

> whole bunch of uninteresting coefficients in the output, but it's not that
> hard to subset them out.
>
> There are two implementation of this in Bill Venables' course notes on
> advanced programming. I think they are also in 'S Programming', but I can't
> find my copy right now.  These were motivated by computational problems: the
> full design matrix for the linear model was too large for memory at the time
> (last century).
>
>
> As a final note, I would strongly discourage
>
>   r= lm( y ~ x + z, fixed.effects=firmid )
> as a specification, and would argue for
>
>   r= lm( y ~ x + z, fixed.effects=~firmid )
>
> I think the ability to have some subset of the arguments in a modelling
> call silently treated as formulas was a bad decision, although it must have
> looked user-friendly at the time.
>
>         -thomas
>
> Thomas Lumley                   Assoc. Professor, Biostatistics
> [hidden email]        University of Washington, Seattle
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: esthetics --- extending the lm command to fixed effects?

Achim Zeileis-4
On Thu, 20 May 2010, ivo welch wrote:

> hi thomas---
>
> thanks for the answer.  the problem with the "+factor(fmid)" is not just
> that it provides uninteresting coefficients and that it eats more memory,
> but that it is also MUCH slower when there are (hundred of) thousands of
> fixed effects.

There is also the plm() function in the "plm" package which provides fixed
effects models (among many other models for panel data).

However, if I recall correctly, it internally employs the full regressor
matrix, not the demeaned one. More details are explained in the vignette,
though:
   vignette("plm", package = "plm")

hth,
Z

> Does Bill Venables describe how to do extend the lm() function?  I googled
> "course notes on advanced programming Venables", but did not find it.  Do
> you have a better link?   (hopefully, this is a short explanation---I know
> the algorithm.  I want to learn how to coax it into an lm statement.)
>
> regards,
>
> /iaw
> ----
> Ivo Welch ([hidden email], [hidden email])
>
>
> I would just have used lm(y~x+z+factor(firmid)).  Admittedly, you get a
>> whole bunch of uninteresting coefficients in the output, but it's not that
>> hard to subset them out.
>>
>> There are two implementation of this in Bill Venables' course notes on
>> advanced programming. I think they are also in 'S Programming', but I can't
>> find my copy right now.  These were motivated by computational problems: the
>> full design matrix for the linear model was too large for memory at the time
>> (last century).
>>
>>
>> As a final note, I would strongly discourage
>>
>>   r= lm( y ~ x + z, fixed.effects=firmid )
>> as a specification, and would argue for
>>
>>   r= lm( y ~ x + z, fixed.effects=~firmid )
>>
>> I think the ability to have some subset of the arguments in a modelling
>> call silently treated as formulas was a bad decision, although it must have
>> looked user-friendly at the time.
>>
>>         -thomas
>>
>> Thomas Lumley                   Assoc. Professor, Biostatistics
>> [hidden email]        University of Washington, Seattle
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.