Discrepency between confidence intervals from t.test and computed manually -- why?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Discrepency between confidence intervals from t.test and computed manually -- why?

David Rossiter
I am sure there is something simple here I am missing, so please bear
with  me.

It concerns the computation of the confidence interval for a population
mean.

The data are 125 measurements of Cs137 radation, a sample data set from
Davis "Statistics and Data Analysis in Geology" 3rd ed. (CROATRAD.TXT)
------------------
method 1: using textbook definitions: mean \pm se_mean * t-value

mu <- mean(Cs137); n <- length(Cs137)
se.mean <- sqrt(var(Cs137)/n)
# two-tail alphas
alpha <- c(1, 5, 10, 20)/100
# t-values for each tail
t.vals <- qt(1-(alpha/2), n-1)
# name them for the respective alpha
names(t.vals) <- alpha
# low and high ends of the confidence interval
round(ci.low <- mu - se.mean * t.vals, 2)
round(ci.hi <- mu + se.mean * t.vals, 2)

Output:
0.01 0.05  0.1  0.2
5.66 5.81 5.90 5.99

0.01 0.05  0.1  0.2
6.69 6.54 6.46 6.36

-----------------

So for the 95% confidence level I seem to get a CI of 5.81 .. 6.54

------------------
method 2: using t.test.  I am not really testing for any specific mean,
I just want the confidence interval of the mean, which t.test seems to
give to me:

Input:
t.test(Cs137)

Output:

        One Sample t-test

data:  Cs137
t = 11.5122, df = 124, p-value < 2.2e-16              <-- not relevant
alternative hypothesis: true mean is not equal to 0   <-- not relevant
95 percent confidence interval:
 5.115488 7.239712
sample estimates:
mean of x
   6.1776
------------------------------

So with t.test I seem to get a CI of 5.12 .. 7.24 which is considerably
wider than the directly computed interval 5.81 .. 6.54.  Perhaps I am
mis-understanding the CI which t.test is reporting?

Any help would be appreciated.

Thank you.

D G Rossiter
Senior University Lecturer
Department of Earth Systems Analysis (DESA)
International Institute for Geo-Information Science and Earth
Observation (ITC)
Hengelosestraat 99
PO Box 6, 7500 AA Enschede, The Netherlands
mailto:[hidden email],  Internet: http://www.itc.nl/personal/rossiter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Discrepency between confidence intervals from t.test andcomputed manually -- why?

Dimitris Rizopoulos
for me your code works correctly with simulated data, e.g.,

Cs137 <- rexp(100, 1/6)

mu <- mean(Cs137)
n <- length(Cs137)
se.mean <- sqrt(var(Cs137)/n)
alpha <- c(1, 5, 10, 20)/100
t.vals <- qt(1 -(alpha/2), n-1)
names(t.vals) <- alpha
ci.low <- mu - se.mean * t.vals
ci.hi <- mu + se.mean * t.vals
######################
rbind(ci.low, ci.hi)

t.test(Cs137)

maybe you overwrite somewhere the value of the vector Cs137.

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message -----
From: "David Rossiter" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, January 04, 2006 9:58 AM
Subject: [R] Discrepency between confidence intervals from t.test
andcomputed manually -- why?


>I am sure there is something simple here I am missing, so please bear
> with  me.
>
> It concerns the computation of the confidence interval for a
> population
> mean.
>
> The data are 125 measurements of Cs137 radation, a sample data set
> from
> Davis "Statistics and Data Analysis in Geology" 3rd ed.
> (CROATRAD.TXT)
> ------------------
> method 1: using textbook definitions: mean \pm se_mean * t-value
>
> mu <- mean(Cs137); n <- length(Cs137)
> se.mean <- sqrt(var(Cs137)/n)
> # two-tail alphas
> alpha <- c(1, 5, 10, 20)/100
> # t-values for each tail
> t.vals <- qt(1-(alpha/2), n-1)
> # name them for the respective alpha
> names(t.vals) <- alpha
> # low and high ends of the confidence interval
> round(ci.low <- mu - se.mean * t.vals, 2)
> round(ci.hi <- mu + se.mean * t.vals, 2)
>
> Output:
> 0.01 0.05  0.1  0.2
> 5.66 5.81 5.90 5.99
>
> 0.01 0.05  0.1  0.2
> 6.69 6.54 6.46 6.36
>
> -----------------
>
> So for the 95% confidence level I seem to get a CI of 5.81 .. 6.54
>
> ------------------
> method 2: using t.test.  I am not really testing for any specific
> mean,
> I just want the confidence interval of the mean, which t.test seems
> to
> give to me:
>
> Input:
> t.test(Cs137)
>
> Output:
>
>        One Sample t-test
>
> data:  Cs137
> t = 11.5122, df = 124, p-value < 2.2e-16              <-- not
> relevant
> alternative hypothesis: true mean is not equal to 0   <-- not
> relevant
> 95 percent confidence interval:
> 5.115488 7.239712
> sample estimates:
> mean of x
>   6.1776
> ------------------------------
>
> So with t.test I seem to get a CI of 5.12 .. 7.24 which is
> considerably
> wider than the directly computed interval 5.81 .. 6.54.  Perhaps I
> am
> mis-understanding the CI which t.test is reporting?
>
> Any help would be appreciated.
>
> Thank you.
>
> D G Rossiter
> Senior University Lecturer
> Department of Earth Systems Analysis (DESA)
> International Institute for Geo-Information Science and Earth
> Observation (ITC)
> Hengelosestraat 99
> PO Box 6, 7500 AA Enschede, The Netherlands
> mailto:[hidden email],  Internet:
> http://www.itc.nl/personal/rossiter
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Discrepency between confidence intervals from t.test and computed manually -- why?

David Rossiter
In reply to this post by David Rossiter
Problem solved thanks to Dimitris Rizopoulos. A classic beginner's
mistake -- though I've been using R for four years -- I had a local
variable named Cs137 and then one in an active data frame in the R
Commander (I am testing Rcmdr for possible classroom use); in Rcmdr the
active frame is explicitly named, e.g. t.test(hrrad$Cs137); at the
command prompt I used t.test(Cs137), with frame hrrad attached, but of
course the local variable took precendence. Dumb of me and sorry to
bother you all.

David Rossiter


> -----Original Message-----
> From: David Rossiter
> Sent: Wednesday, January 04, 2006 9:59
> To: '[hidden email]'
> Subject: Discrepency between confidence intervals from t.test
> and computed manually -- why?
>
> I am sure there is something simple here I am missing, so
> please bear with  me.
>
> It concerns the computation of the confidence interval for a
> population mean.
>
> The data are 125 measurements of Cs137 radation, a sample
> data set from Davis "Statistics and Data Analysis in Geology"
> 3rd ed. (CROATRAD.TXT)
> ------------------
> method 1: using textbook definitions: mean \pm se_mean * t-value
>
> mu <- mean(Cs137); n <- length(Cs137)
> se.mean <- sqrt(var(Cs137)/n)
> # two-tail alphas
> alpha <- c(1, 5, 10, 20)/100
> # t-values for each tail
> t.vals <- qt(1-(alpha/2), n-1)
> # name them for the respective alpha
> names(t.vals) <- alpha
> # low and high ends of the confidence interval round(ci.low
> <- mu - se.mean * t.vals, 2) round(ci.hi <- mu + se.mean * t.vals, 2)
>
> Output:
> 0.01 0.05  0.1  0.2
> 5.66 5.81 5.90 5.99
>
> 0.01 0.05  0.1  0.2
> 6.69 6.54 6.46 6.36
>
> -----------------
>
> So for the 95% confidence level I seem to get a CI of 5.81 .. 6.54
>
> ------------------
> method 2: using t.test.  I am not really testing for any
> specific mean, I just want the confidence interval of the
> mean, which t.test seems to give to me:
>
> Input:
> t.test(Cs137)
>
> Output:
>
>         One Sample t-test
>
> data:  Cs137
> t = 11.5122, df = 124, p-value < 2.2e-16              <-- not relevant
> alternative hypothesis: true mean is not equal to 0   <-- not relevant
> 95 percent confidence interval:
>  5.115488 7.239712
> sample estimates:
> mean of x
>    6.1776
> ------------------------------
>
> So with t.test I seem to get a CI of 5.12 .. 7.24 which is
> considerably wider than the directly computed interval 5.81
> .. 6.54.  Perhaps I am  mis-understanding the CI which t.test
> is reporting?
>
> Any help would be appreciated.
>
> Thank you.
>
> D G Rossiter
> Senior University Lecturer
> Department of Earth Systems Analysis (DESA) International
> Institute for Geo-Information Science and Earth Observation
> (ITC) Hengelosestraat 99 PO Box 6, 7500 AA Enschede, The
> Netherlands mailto:[hidden email],  Internet:
> http://www.itc.nl/personal/rossiter
>  
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Discrepency between confidence intervals from t.test and computed manually -- why?

Prof Brian Ripley
In reply to this post by David Rossiter
Some user error it appears.  I googled, got the data file from

http://www3.interscience.wiley.com:8100/legacy/college/davis/0471172758/datafiles/data_index.html

and did

> temp <-  read.table("c:/TEMP/CROATRAD.TXT", header=T)
> Cs137 <- temp$X137Cs
> t.test(Cs137)$conf.int
[1] 5.115488 7.239712
attr(,"conf.level")
[1] 0.95

which agrees with your report

> mu <- mean(Cs137); n <- length(Cs137)
> se.mean <- sqrt(var(Cs137)/n)
> # two-tail alphas
> alpha <- c(1, 5, 10, 20)/100
> # t-values for each tail
> t.vals <- qt(1-(alpha/2), n-1)
> # name them for the respective alpha
> names(t.vals) <- alpha
> # low and high ends of the confidence interval
> round(ci.low <- mu - se.mean * t.vals, 2)
0.01 0.05  0.1  0.2
4.77 5.12 5.29 5.49
> round(ci.hi <- mu + se.mean * t.vals, 2)
0.01 0.05  0.1  0.2
7.58 7.24 7.07 6.87
> c(ci.low[2], ci.hi[2])
     0.05     0.05
5.115488 7.239712

which agrees with t.test and not what you reported you got.


On Wed, 4 Jan 2006, David Rossiter wrote:

> I am sure there is something simple here I am missing, so please bear
> with  me.
>
> It concerns the computation of the confidence interval for a population
> mean.
>
> The data are 125 measurements of Cs137 radation, a sample data set from
> Davis "Statistics and Data Analysis in Geology" 3rd ed. (CROATRAD.TXT)
> ------------------
> method 1: using textbook definitions: mean \pm se_mean * t-value
>
> mu <- mean(Cs137); n <- length(Cs137)
> se.mean <- sqrt(var(Cs137)/n)
> # two-tail alphas
> alpha <- c(1, 5, 10, 20)/100
> # t-values for each tail
> t.vals <- qt(1-(alpha/2), n-1)
> # name them for the respective alpha
> names(t.vals) <- alpha
> # low and high ends of the confidence interval
> round(ci.low <- mu - se.mean * t.vals, 2)
> round(ci.hi <- mu + se.mean * t.vals, 2)
>
> Output:
> 0.01 0.05  0.1  0.2
> 5.66 5.81 5.90 5.99
>
> 0.01 0.05  0.1  0.2
> 6.69 6.54 6.46 6.36
>
> -----------------
>
> So for the 95% confidence level I seem to get a CI of 5.81 .. 6.54
>
> ------------------
> method 2: using t.test.  I am not really testing for any specific mean,
> I just want the confidence interval of the mean, which t.test seems to
> give to me:
>
> Input:
> t.test(Cs137)
>
> Output:
>
>        One Sample t-test
>
> data:  Cs137
> t = 11.5122, df = 124, p-value < 2.2e-16              <-- not relevant
> alternative hypothesis: true mean is not equal to 0   <-- not relevant
> 95 percent confidence interval:
> 5.115488 7.239712
> sample estimates:
> mean of x
>   6.1776
> ------------------------------
>
> So with t.test I seem to get a CI of 5.12 .. 7.24 which is considerably
> wider than the directly computed interval 5.81 .. 6.54.  Perhaps I am
> mis-understanding the CI which t.test is reporting?
>
> Any help would be appreciated.
>
> Thank you.
>
> D G Rossiter
> Senior University Lecturer
> Department of Earth Systems Analysis (DESA)
> International Institute for Geo-Information Science and Earth
> Observation (ITC)
> Hengelosestraat 99
> PO Box 6, 7500 AA Enschede, The Netherlands
> mailto:[hidden email],  Internet: http://www.itc.nl/personal/rossiter

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Discrepency between confidence intervals from t.test and computed manually -- why?

Chuck Cleland
In reply to this post by David Rossiter
Your two methods agree for me:

 > Cs137 <-
read.table("http://geomechanics.geol.pdx.edu/Courses/G423/Texts/Davis3/CROATRAD.TXT",
skip=1)[,5]

 > mu <- mean(Cs137); n <- length(Cs137)
 > se.mean <- sqrt(var(Cs137)/n)
 > # two-tail alphas
 > alpha <- c(1, 5, 10, 20)/100
 > # t-values for each tail
 > t.vals <- qt(1-(alpha/2), n-1)
 > # name them for the respective alpha
 > names(t.vals) <- alpha
 > # low and high ends of the confidence interval
 > round(ci.low <- mu - se.mean * t.vals, 2)
0.01 0.05  0.1  0.2
4.77 5.12 5.29 5.49
 > round(ci.hi <- mu + se.mean * t.vals, 2)
0.01 0.05  0.1  0.2
7.58 7.24 7.07 6.87

 > t.test(Cs137)

         One Sample t-test

data:  Cs137
t = 11.5122, df = 124, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  5.115488 7.239712
sample estimates:
mean of x
    6.1776

David Rossiter wrote:

> I am sure there is something simple here I am missing, so please bear
> with  me.
>
> It concerns the computation of the confidence interval for a population
> mean.
>
> The data are 125 measurements of Cs137 radation, a sample data set from
> Davis "Statistics and Data Analysis in Geology" 3rd ed. (CROATRAD.TXT)
> ------------------
> method 1: using textbook definitions: mean \pm se_mean * t-value
>
> mu <- mean(Cs137); n <- length(Cs137)
> se.mean <- sqrt(var(Cs137)/n)
> # two-tail alphas
> alpha <- c(1, 5, 10, 20)/100
> # t-values for each tail
> t.vals <- qt(1-(alpha/2), n-1)
> # name them for the respective alpha
> names(t.vals) <- alpha
> # low and high ends of the confidence interval
> round(ci.low <- mu - se.mean * t.vals, 2)
> round(ci.hi <- mu + se.mean * t.vals, 2)
>
> Output:
> 0.01 0.05  0.1  0.2
> 5.66 5.81 5.90 5.99
>
> 0.01 0.05  0.1  0.2
> 6.69 6.54 6.46 6.36
>
> -----------------
>
> So for the 95% confidence level I seem to get a CI of 5.81 .. 6.54
>
> ------------------
> method 2: using t.test.  I am not really testing for any specific mean,
> I just want the confidence interval of the mean, which t.test seems to
> give to me:
>
> Input:
> t.test(Cs137)
>
> Output:
>
>         One Sample t-test
>
> data:  Cs137
> t = 11.5122, df = 124, p-value < 2.2e-16              <-- not relevant
> alternative hypothesis: true mean is not equal to 0   <-- not relevant
> 95 percent confidence interval:
>  5.115488 7.239712
> sample estimates:
> mean of x
>    6.1776
> ------------------------------
>
> So with t.test I seem to get a CI of 5.12 .. 7.24 which is considerably
> wider than the directly computed interval 5.81 .. 6.54.  Perhaps I am
> mis-understanding the CI which t.test is reporting?
>
> Any help would be appreciated.
>
> Thank you.
>
> D G Rossiter
> Senior University Lecturer
> Department of Earth Systems Analysis (DESA)
> International Institute for Geo-Information Science and Earth
> Observation (ITC)
> Hengelosestraat 99
> PO Box 6, 7500 AA Enschede, The Netherlands
> mailto:[hidden email],  Internet: http://www.itc.nl/personal/rossiter
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html