On 20-Sep-10 13:54:56, A wrote:

> Dear all,

> I'm performing a t-test on two normal distributions with identical

> mean & standard deviation, and repeating this tests a very large

> number of times to describe an representative p value distribution

> in a null case. As a part of this, the program bins these values

> in 10 evenly distributed bins between 0 and 1 and reports the number

> of observations in each bin. What I have noticed is that even after

> 500,000 replications the number in my lowest bin is consistently ~5%

> smaller than the number in all the other bins, which are similar

> within about 1% of each other. Is there any reason, perhaps to do

> with random number generation in R or the nature of the normal

> distribution simulated by the rnorm function that could explain

> this depletion?

>

> Here are two key parts of my code to show what functions I'm

> working with:

>

>#Calculating the p values

> while(i<numtests){

> Group1<-rnorm(6,-0.0065,0.0837)

> Group2<-rnorm(6,-0.0065,0.0837)

> PV<-t.test(Group1,Group2)$p.value

> pscoresvector<-c(PV,pscoresvector)

> i<-i+1

> }

>

>#Binning the results

> freqtbl1<-binning(pscoresvector,breaks=bins)

>

> Thanks in advance for any insights,

> Andrew

The issue lies in the t-test, not in the random number generation.

Look at '?t.test':

t.test(x, y = NULL,

alternative = c("two.sided", "less", "greater"),

mu = 0, paired = FALSE, var.equal = FALSE,

conf.level = 0.95, ...)

and note "var.equal = FALSE" as the default, Then:

var.equal: a logical variable indicating whether to treat the two

variances as being equal. If ?TRUE? then the pooled variance

is used to estimate the variance otherwise the Welch (or

Satterthwaite) approximation to the degrees of freedom is

used.

So the default t-test is not the t-test you were perhaps expecting!

I used a variant of your code (to be absolutely sure of what I was

doing):

numtests<-100000 ; PVals <- numeric(numtests)

for(i in (1:numtests)){

Group1<-rnorm(6,-0.0065,0.0837)

Group2<-rnorm(6,-0.0065,0.0837)

PVals[i] <- t.test(Group1,Group2)$p.value

}

hist(PVals,breaks=0.1*(0:10))

and observed similar behaviour to what you report. But, with

"var.equal=TRUE":

numtests<-100000 ; PVals <- numeric(numtests)

for(i in (1:numtests)){

Group1<-rnorm(6,-0.0065,0.0837)

Group2<-rnorm(6,-0.0065,0.0837)

PVals[i] <- t.test(Group1,Group2,var.equal=TRUE)$p.value

}

hist(PVals,breaks=0.1*(0:10))

all the bin-values were similar.

So what heppans here is that the Welch/Satterthwaite approxumation

does not produce uniformly distributed P-values when the Null

Hyptohesis is true (at any rate for sample sizes s as small as the

6 you are using).

Hoping this helps,

Ted.

--------------------------------------------------------------------

E-Mail: (Ted Harding) <

[hidden email]>

Fax-to-email: +44 (0)870 094 0861

Date: 20-Sep-10 Time: 16:08:41

------------------------------ XFMail ------------------------------

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.