Discourage the weights= option of lm with summarized data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Discourage the weights= option of lm with summarized data

Arie ten Cate
In the Details section of lm (linear models) in the Reference manual,
it is suggested to use the weights= option for summarized data. This
must be discouraged rather than encouraged. The motivation for this is
as follows.

With summarized data the standard errors get smaller with increasing
numbers of observations. However, the standard errors in lm do not get
smaller when for instance all weights are multiplied with the same
constant larger than one, since the inverse weights are merely
proportional to the error variances.

Here is an example of the estimated standard errors being too large
with the weights= option. The p value and the number of degrees of
freedom are also wrong. The parameter estimates are correct.

  n <- 10
  x <- c(1,2,3,4)
  y <- c(1,2,5,4)
  w <- c(1,1,1,n)
  xb <- c(x,rep(x[4],n-1))  # restore the original data
  yb <- c(y,rep(y[4],n-1))
  print(summary(lm(yb ~ xb)))
  print(summary(lm(y ~ x, weights=w)))

Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a
FREQ statement (for summarized data).

    Arie

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Discourage the weights= option of lm with summarized data

Viechtbauer Wolfgang (SP)
Using 'weights' is not meant to indicate that the same observation is repeated 'n' times. It is meant to indicate different variances (or to be precise, that the variance of the last observation in 'x' is sigma^2 / n, while the first three observations have variance sigma^2).

Best,
Wolfgang

-----Original Message-----
From: R-devel [mailto:[hidden email]] On Behalf Of Arie ten Cate
Sent: Saturday, 07 October, 2017 9:36
To: [hidden email]
Subject: [Rd] Discourage the weights= option of lm with summarized data

In the Details section of lm (linear models) in the Reference manual,
it is suggested to use the weights= option for summarized data. This
must be discouraged rather than encouraged. The motivation for this is
as follows.

With summarized data the standard errors get smaller with increasing
numbers of observations. However, the standard errors in lm do not get
smaller when for instance all weights are multiplied with the same
constant larger than one, since the inverse weights are merely
proportional to the error variances.

Here is an example of the estimated standard errors being too large
with the weights= option. The p value and the number of degrees of
freedom are also wrong. The parameter estimates are correct.

  n <- 10
  x <- c(1,2,3,4)
  y <- c(1,2,5,4)
  w <- c(1,1,1,n)
  xb <- c(x,rep(x[4],n-1))  # restore the original data
  yb <- c(y,rep(y[4],n-1))
  print(summary(lm(yb ~ xb)))
  print(summary(lm(y ~ x, weights=w)))

Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a
FREQ statement (for summarized data).

    Arie

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel