Quantcast

null distribution of binom.test p values

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

null distribution of binom.test p values

Chris Wallace
Dear R-help,

I must be missing something very obvious, but I am confused as to why
the null distribution for p values generated by binom.test() appears to
be non-uniform.  The histogram generated below has a trend towards
values closer to 1 than 0.  I expected it to be flat.

hist(sapply(1:1000, function(i,n=100)
binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value))

This trend is more pronounced for small n, and the distribution appears
uniform for larger n, say n=1000.  I had expected the distribution to be
discrete for small n, but not skewed.  Can anyone explain why?

Many thanks,

Chris.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: null distribution of binom.test p values

Greg Snow-2
I believe that what you are seeing is due to the discrete nature of the binomial test.  When I run your code below I see the bar between 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to that of the lowest bar.  The bar between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are also similar to the bars near 0.



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
801.408.8111


> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Chris Wallace
> Sent: Thursday, January 26, 2012 5:44 AM
> To: [hidden email]
> Subject: [R] null distribution of binom.test p values
>
> Dear R-help,
>
> I must be missing something very obvious, but I am confused as to why
> the null distribution for p values generated by binom.test() appears to
> be non-uniform.  The histogram generated below has a trend towards
> values closer to 1 than 0.  I expected it to be flat.
>
> hist(sapply(1:1000, function(i,n=100)
> binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value))
>
> This trend is more pronounced for small n, and the distribution appears
> uniform for larger n, say n=1000.  I had expected the distribution to
> be
> discrete for small n, but not skewed.  Can anyone explain why?
>
> Many thanks,
>
> Chris.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: null distribution of binom.test p values

Chris Wallace
Greg, thanks for the reply.

Unfortunately, I remain unconvinced!

I ran a longer simulation, 100,000 reps.  The size of the test is
consistently too small (see below) and the histogram shows increasing
bars even within the parts of the histogram with even bar spacing.  See
https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png

y<-sapply(1:100000, function(i,n=100)
  binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
mean(y<0.01)
# [1] 0.00584
mean(y<0.05)
# [1] 0.03431
mean(y<0.1)
# [1] 0.08646

Can that really be due to the discreteness of the distribution?

C.

On 26/01/12 16:08, Greg Snow wrote:
> I believe that what you are seeing is due to the discrete nature of the binomial test.  When I run your code below I see the bar between 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to that of the lowest bar.  The bar between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are also similar to the bars near 0.
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: null distribution of binom.test p values

Greg Snow-2
Yes that is due to the discreteness of the distribution, consider the following:

> binom.test(39,100,.5)

        Exact binomial test

data:  39 and 100
number of successes = 39, number of trials = 100, p-value = 0.0352
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2940104 0.4926855
sample estimates:
probability of success
                  0.39

> binom.test(40,100,.5)

        Exact binomial test

data:  40 and 100
number of successes = 40, number of trials = 100, p-value = 0.05689
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.3032948 0.5027908
sample estimates:
probability of success
                   0.4

(you can do the same for 60 and 61)

So notice that the probability of getting 39 or more extreme is 0.0352, but anything less extreme will result in not rejecting the null hypothesis (because the probability of getting a 40 or a 60 (dbinom(40,100,.5)) is about 1% each, so we see a 2% jump there).  So the size/probability of a type I error will generally not be equal to alpha unless n is huge or alpha is chosen to correspond to a jump in the distribution rather than using common round values.

I have seen suggestions that instead of the standard test we use a test that rejects the null for values 39 and more extreme, don't reject the null for 41 and less extreme, and if you see a 40 or 60 then you generate a uniform random number and reject if it is below a certain value (that value chosen to give an overall probability of type I error of 0.05).  This will correctly size the test, but becomes less reproducible (and makes clients nervous when they present their data and you pull out a coin, flip it, and tell them if they have significant results based on your coin flip (or more realistically a die roll)).  I think it is better in this case if you know your final sample size is going to be 100 to explicitly state that alpha will be 0.352 (but then you need to justify why you are not using the common 0.05 to reviewers).

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
801.408.8111


> -----Original Message-----
> From: Chris Wallace [mailto:[hidden email]]
> Sent: Thursday, January 26, 2012 9:36 AM
> To: Greg Snow
> Cc: [hidden email]
> Subject: Re: [R] null distribution of binom.test p values
>
> Greg, thanks for the reply.
>
> Unfortunately, I remain unconvinced!
>
> I ran a longer simulation, 100,000 reps.  The size of the test is
> consistently too small (see below) and the histogram shows increasing
> bars even within the parts of the histogram with even bar spacing.  See
> https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png
>
> y<-sapply(1:100000, function(i,n=100)
>   binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
> mean(y<0.01)
> # [1] 0.00584
> mean(y<0.05)
> # [1] 0.03431
> mean(y<0.1)
> # [1] 0.08646
>
> Can that really be due to the discreteness of the distribution?
>
> C.
>
> On 26/01/12 16:08, Greg Snow wrote:
> > I believe that what you are seeing is due to the discrete nature of
> the binomial test.  When I run your code below I see the bar between
> 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but
> the bar between 0.8 and 0.9 is not there (height 0), if you average the
> top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to
> that of the lowest bar.  The bar between 0.5 and 0.6 is also 0, if you
> average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are
> also similar to the bars near 0.
> >
> >
> >

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: null distribution of binom.test p values

Thomas Lumley-2
In reply to this post by Chris Wallace
On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace
<[hidden email]> wrote:

> Greg, thanks for the reply.
>
> Unfortunately, I remain unconvinced!
>
> I ran a longer simulation, 100,000 reps.  The size of the test is
> consistently too small (see below) and the histogram shows increasing bars
> even within the parts of the histogram with even bar spacing.  See
> https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png
>
> y<-sapply(1:100000, function(i,n=100)
>  binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
> mean(y<0.01)
> # [1] 0.00584
> mean(y<0.05)
> # [1] 0.03431
> mean(y<0.1)
> # [1] 0.08646
>
> Can that really be due to the discreteness of the distribution?

Yes.  All so-called exact tests tend to be conservative due to
discreteness, and there's quite a lot of discreteness in the tails

The problem is far worse for Fisher's exact test, and worse still for
Fisher's other exact test (of Hardy-Weinberg equilibrium --
http://www.genetics.org/content/180/3/1609.full).

You don't need to rely on finite-sample simulations here: you can
evaluate the level exactly.  Using binom.test() you find that the
rejection regions are y<=39 and y>=61, so the level at nominal 0.05
is:
> pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE)
[1] 0.0352002
agreeing very well with your 0.03431

At nominal 0.01 the exact level is
> pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE)
[1] 0.006637121
and at 0.1 it is
> pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE)
[1] 0.08862608

Your result at nominal 0.01 is a bit low, but I think that's bad luck.
 When I ran your code I got 0.00659 for the estimated level at nominal
0.01, which matches the exact calculations very well


Theoreticians sweep this under the carpet by inventing randomized
tests, where you interpolate a random p-value between the upper and
lower values from a discrete distribution.  It's a very elegant idea
that I'm glad to say I haven't seen used in practice.

     -thomas

--
Thomas Lumley
Professor of Biostatistics
University of Auckland

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: null distribution of binom.test p values

Chris Wallace
Greg, Thomas,

thank you dot the detailed and lucid replies.  I understand now.  I am
doing multiple tests and wanted to present an overview of results using
pp plots, which were looking very underdispersed.  Now I understand why,
I think I can generate a "correct" expected distribution which should at
least reassure the biologists when they view them.

Many thanks, Chris.

On 26/01/12 20:03, Thomas Lumley wrote:

> On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace
> <[hidden email]>  wrote:
>> Greg, thanks for the reply.
>>
>> Unfortunately, I remain unconvinced!
>>
>> I ran a longer simulation, 100,000 reps.  The size of the test is
>> consistently too small (see below) and the histogram shows increasing bars
>> even within the parts of the histogram with even bar spacing.  See
>> https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png
>>
>> y<-sapply(1:100000, function(i,n=100)
>>   binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
>> mean(y<0.01)
>> # [1] 0.00584
>> mean(y<0.05)
>> # [1] 0.03431
>> mean(y<0.1)
>> # [1] 0.08646
>>
>> Can that really be due to the discreteness of the distribution?
>
> Yes.  All so-called exact tests tend to be conservative due to
> discreteness, and there's quite a lot of discreteness in the tails
>
> The problem is far worse for Fisher's exact test, and worse still for
> Fisher's other exact test (of Hardy-Weinberg equilibrium --
> http://www.genetics.org/content/180/3/1609.full).
>
> You don't need to rely on finite-sample simulations here: you can
> evaluate the level exactly.  Using binom.test() you find that the
> rejection regions are y<=39 and y>=61, so the level at nominal 0.05
> is:
>> pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE)
> [1] 0.0352002
> agreeing very well with your 0.03431
>
> At nominal 0.01 the exact level is
>> pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE)
> [1] 0.006637121
> and at 0.1 it is
>> pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE)
> [1] 0.08862608
>
> Your result at nominal 0.01 is a bit low, but I think that's bad luck.
>   When I ran your code I got 0.00659 for the estimated level at nominal
> 0.01, which matches the exact calculations very well
>
>
> Theoreticians sweep this under the carpet by inventing randomized
> tests, where you interpolate a random p-value between the upper and
> lower values from a discrete distribution.  It's a very elegant idea
> that I'm glad to say I haven't seen used in practice.
>
>       -thomas
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...