t-statistic for independent samples

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

t-statistic for independent samples

David Arnold
Hi,

Typical things you read when new to stats are cautions about using a t-statistic when comparing independent samples. You are steered toward a pooled test or welch's approximation of the degrees of freedom in order to make the distribution a t-distribution. However, most texts give no information why you have to do this.

So I thought I try a little experiment which is outlined here.

Distrubtion of differences of independent samples

As you can see in the above link, I see no evidence why you need a pooled or Welch's in these images.

Anyone care to comment? Or should I put this on Stack Exchange?

D.
Reply | Threaded
Open this post in threaded view
|

Re: t-statistic for independent samples

Kevin E. Thorpe

On 04/17/2013 06:24 PM, David Arnold wrote:

> Hi,
>
> Typical things you read when new to stats are cautions about using a
> t-statistic when comparing independent samples. You are steered toward a
> pooled test or welch's approximation of the degrees of freedom in order to
> make the distribution a t-distribution. However, most texts give no
> information why you have to do this.
>
> So I thought I try a little experiment which is outlined here.
>
> Distrubtion of differences of independent samples
> <http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html>
>
> As you can see in the above link, I see no evidence why you need a pooled or
> Welch's in these images.
>
> Anyone care to comment? Or should I put this on Stack Exchange?
>
> D.

Admittedly, I just skimmed the page, but one thing stands out.  Your
standard deviations are really quite close to each other.  Try your
simulations again with variance ratios exceeding 2 and see what happens.


--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: t-statistic for independent samples

Jay Kerns-2
In reply to this post by David Arnold
Dear David,

On Wed, Apr 17, 2013 at 6:24 PM, David Arnold <[hidden email]> wrote:
> Hi,

[snip]

>
> D.

Before posting to StackExchange, check out the Wikipedia entry for
"Behrens-Fisher problem".

Cheers,
Jay


--
G. Jay Kerns, Ph.D.
Youngstown State University
http://people.ysu.edu/~gkerns/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: t-statistic for independent samples

David Arnold
In reply to this post by Kevin E. Thorpe
OK,although the variance ratio was already 2.25 to 1,  tried sigma1=10, sigma2=25, which makes the ratios of the variances 6.25 to 1.

Still no change. See:  http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html

D.
Reply | Threaded
Open this post in threaded view
|

Re: t-statistic for independent samples

Thomas Lumley-2
I just looked more carefully at your code.

You are computing the unequal-variance (Welch) version of the t-test, so
that's why there isn't a problem.  Compare it with the equal-variance
t-test, using the pooled variance estimate, which does have a problem, as
below

    -thomas

tstat4 <- function() {

    n1 = 7

    mu1 = 100

    sigma1 = 25

    n2 = 14

    mu2 = 100

    sigma2 = 10

    x1 = rnorm(n1, mu1, sigma1)

    x1bar = mean(x1)

    s1 = sd(x1)

    x2 = rnorm(n2, mu2, sigma2)

    x2bar = mean(x2)

    s2 = sd(x2)

    t = ((x1bar - x2bar) - (mu1 - mu2))/sqrt(s1^2/n1 + s2^2/n2)

    t2= ((x1bar - x2bar) - (mu1 - mu2))/sqrt(((n1-1)*s1^2 + (n2-1)*s2^2)/(n1
+n2-2)*(1/n1+1/n2))

    return(c(t,t2))

}


tstats4 = replicate(10000, tstat4())


hist(tstats4[1,], breaks = "scott", prob = TRUE, xlim = c(-4, 4), ylim = c(0
, 0.4))

x = seq(-4, 4, length = 200)

y = dt(x, df = 48)

lines(x, y, type = "l", col = "red")


hist(tstats4[2,], breaks = "scott", prob = TRUE, xlim = c(-4, 4), ylim = c(0
, 0.4))

x = seq(-4, 4, length = 200)

y = dt(x, df = 48)

lines(x, y, type = "l", col = "red")


On Thu, Apr 18, 2013 at 12:28 PM, David Arnold <[hidden email]>wrote:

> OK,although the variance ratio was already 2.25 to 1,  tried sigma1=10,
> sigma2=25, which makes the ratios of the variances 6.25 to 1.
>
> Still no change. See:
>
> http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html
>
> D.
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/t-statistic-for-independent-samples-tp4664553p4664556.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Thomas Lumley
Professor of Biostatistics
University of Auckland

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: t-statistic for independent samples

Peter Dalgaard-2

On Apr 18, 2013, at 05:35 , Thomas Lumley wrote:

> I just looked more carefully at your code.
>
> You are computing the unequal-variance (Welch) version of the t-test, so
> that's why there isn't a problem.  Compare it with the equal-variance
> t-test, using the pooled variance estimate, which does have a problem, as
> below
>
>    -thomas

In principle, there should be a problem because the DF are being computed from the equal-variance formula. However, with the values on the web page, the adjusted DF from the Welch test come out as 18.54 intead of 19, which is not likely to be discernible.

It should be more apparent if you put the larger variance in the smaller group. (I get DF=6.98) for that case.

(Thomas' example also switches the variances, it seems.)

--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.