lm equivalent of Welch-corrected t-test?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

lm equivalent of Welch-corrected t-test?

PaulJohnson32gmail
Long ago, when R's t.test had var.equal=TRUE by default, I wrote some
class notes showing that the result was equivalent to a one predictor
regression model.  Because t.test does not default to var.equal=TRUE
these days, I'm curious to know if there is a way to specify weights
in an lm to obtain the same result as the Welch-adjusted values
reported by t.test at the current time.  Is there a WLS equivalent
adjustment with lm?

Here's example code to show that lm is same as t.test with var.equal.
The signs come out differently, but otherwise the effect estimate,
standard error, t value are same:


set.seed(234234)
dat <- data.frame(x = gl(2, 50, labels = c("F", "M")))
dat$err <- rnorm(100, 0, 1)
dat$y <- ifelse(dat$x == "F", 40 + dat$err, 44 + dat$err)
m1 <- lm(y ~ x, dat)
summary(m1)
m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
m1.t
## diff matches regression coefficient
(m1.t.effect <- diff(m1.t$estimate))
## standard error matches regression se
m1.t.stderr <- m1.t.effect/m1.t$statistic

If you run that, you see lm output:

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  39.9456     0.1180  338.65   <2e-16 ***
xM            3.9080     0.1668   23.43   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8341 on 98 degrees of freedom
Multiple R-squared:  0.8485,    Adjusted R-squared:  0.8469
F-statistic: 548.8 on 1 and 98 DF,  p-value: < 2.2e-16

and t.test:

> m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
> m1.t

    Two Sample t-test

data:  y by x
t = -23.427, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.239038 -3.576968
sample estimates:
mean in group F mean in group M
       39.94558        43.85358

> (m1.t.effect <- diff(m1.t$estimate))
mean in group M
       3.908003
> m1.t.effect/m1.t$statistic
mean in group M
     -0.1668129




--
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: lm equivalent of Welch-corrected t-test?

Peter Dalgaard-2


> On 13 Nov 2018, at 16:19 , Paul Johnson <[hidden email]> wrote:
>
> Long ago, when R's t.test had var.equal=TRUE by default, I wrote some
> class notes showing that the result was equivalent to a one predictor
> regression model.  Because t.test does not default to var.equal=TRUE
> these days, I'm curious to know if there is a way to specify weights
> in an lm to obtain the same result as the Welch-adjusted values
> reported by t.test at the current time.  Is there a WLS equivalent
> adjustment with lm?
>

The short answer is no. The long answer is to look into heteroscedasticity adjustments or lmer combined with pbkrtest.

Well, you can do the weights in lm() and get the right t statistic, but not the Satterthwaite oddball-df thing (the sleep data are actually paired, but ignore that here)

variances <- tapply(sleep$extra, sleep$group, var)
sleep$wt <- 1/variances[sleep$group]
t.test(extra~group, sleep)
summary(lm(extra~factor(group), weight=wt, sleep))


The pbkrtest approach goes like this:

library(lme4)
library(pbkrtest)
sleep$gdummy <- sleep$group-1 # arrange that this is 1 in the group with larger variance
fit1 <-  lmer(extra ~ group + (gdummy+0 | ID), sleep)
fit0 <-  lmer(extra ~ 1 + (gdummy+0 | ID), sleep)
KRmodcomp(fit0, fit1)

(This somewhat abuses the existing sleep$ID -- all you really need is something that has a level for each record where gdummy==1)

This ends up with

         stat     ndf     ddf F.scaling p.value  
Ftest  3.4626  1.0000 17.7765         1 0.07939 .

to be compared with the Welch test

t = -1.8608, df = 17.776, p-value = 0.07939


-pd


> Here's example code to show that lm is same as t.test with var.equal.
> The signs come out differently, but otherwise the effect estimate,
> standard error, t value are same:
>
>
> set.seed(234234)
> dat <- data.frame(x = gl(2, 50, labels = c("F", "M")))
> dat$err <- rnorm(100, 0, 1)
> dat$y <- ifelse(dat$x == "F", 40 + dat$err, 44 + dat$err)
> m1 <- lm(y ~ x, dat)
> summary(m1)
> m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
> m1.t
> ## diff matches regression coefficient
> (m1.t.effect <- diff(m1.t$estimate))
> ## standard error matches regression se
> m1.t.stderr <- m1.t.effect/m1.t$statistic
>
> If you run that, you see lm output:
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  39.9456     0.1180  338.65   <2e-16 ***
> xM            3.9080     0.1668   23.43   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.8341 on 98 degrees of freedom
> Multiple R-squared:  0.8485,    Adjusted R-squared:  0.8469
> F-statistic: 548.8 on 1 and 98 DF,  p-value: < 2.2e-16
>
> and t.test:
>
>> m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
>> m1.t
>
>    Two Sample t-test
>
> data:  y by x
> t = -23.427, df = 98, p-value < 2.2e-16
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -4.239038 -3.576968
> sample estimates:
> mean in group F mean in group M
>       39.94558        43.85358
>
>> (m1.t.effect <- diff(m1.t$estimate))
> mean in group M
>       3.908003
>> m1.t.effect/m1.t$statistic
> mean in group M
>     -0.1668129
>
>
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.