# Basis of fisher.test

 Classic List Threaded
5 messages
Reply | Threaded
Open this post in threaded view
|

## Basis of fisher.test

 I want to ascertain the basis of the table ranking, i.e. the meaning of "extreme", in Fisher's Exact Test as implemented in 'fisher.test', when applied to RxC tables which are larger than 2x2. One can summarise a strategy for the test as 1) For each table compatible with the margins    of the observed table, compute the probability    of this table conditional on the marginal totals. 2) Rank the possible tables in order of a measure    of discrepancy between the table and the null    hypothesis of "no association". 3) Locate the observed table, and compute the sum    of the probabilties, computed in (1), for this    table and more "extreme" tables in the sense of    the ranking in (2). The question is: what "measure of discrepancy" is used in 'fisher.test' corresponding to stage (2)? (There are in principle several possibilities, e.g. value of a Pearson chi-squared, large values being discrepant; the probability calculated in (2), small values being discrepant; ... ) "?fisher.test" says only:      In the one-sided 2 by 2 cases, p-values are obtained      directly using the hypergeometric distribution.      Otherwise, computations are based on a C version of      the FORTRAN subroutine FEXACT which implements the      network developed by Mehta and Patel (1986) and      improved by Clarkson, Fan & Joe (1993). The FORTRAN      code can be obtained from      . I have had a look at this FORTRAN code, and cannot ascertain it from the code itself. However, there is a Comment to the effect: c     PRE    - Table p-value.  (Output) c              PRE is the probability of a more extreme table, where c              'extreme' is in a probabilistic sense. which suggests that the tables are ranked in order of their probabilities as computed in (2). Can anyone confirm definitively what goes on? With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 12-Jan-06                                       Time: 20:19:02 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

## Re: Basis of fisher.test

 (Ted Harding) <[hidden email]> writes: > I want to ascertain the basis of the table ranking, > i.e. the meaning of "extreme", in Fisher's Exact Test > as implemented in 'fisher.test', when applied to RxC > tables which are larger than 2x2. > > One can summarise a strategy for the test as > > 1) For each table compatible with the margins >    of the observed table, compute the probability >    of this table conditional on the marginal totals. > > 2) Rank the possible tables in order of a measure >    of discrepancy between the table and the null >    hypothesis of "no association". > > 3) Locate the observed table, and compute the sum >    of the probabilties, computed in (1), for this >    table and more "extreme" tables in the sense of >    the ranking in (2). > > The question is: what "measure of discrepancy" is > used in 'fisher.test' corresponding to stage (2)? > > (There are in principle several possibilities, e.g. > value of a Pearson chi-squared, large values being > discrepant; the probability calculated in (2), > small values being discrepant; ... ) > > "?fisher.test" says only: > >      In the one-sided 2 by 2 cases, p-values are obtained >      directly using the hypergeometric distribution. >      Otherwise, computations are based on a C version of >      the FORTRAN subroutine FEXACT which implements the >      network developed by Mehta and Patel (1986) and >      improved by Clarkson, Fan & Joe (1993). The FORTRAN >      code can be obtained from >      . > > I have had a look at this FORTRAN code, and cannot ascertain > it from the code itself. However, there is a Comment to the > effect: > > c     PRE    - Table p-value.  (Output) > c              PRE is the probability of a more extreme table, where > c              'extreme' is in a probabilistic sense. > > which suggests that the tables are ranked in order of their > probabilities as computed in (2). > > Can anyone confirm definitively what goes on? To my knowledge, it is the "table probability", according to the hypergeometric distribution, i.e. the probability of the table given the marginals, which can be translated to sampling a+b balls without replacement from a box with a+c white and b+d black balls. Playing around with dhyper should be instructive. (You're right that the "two-sided" p values are obtained by summing all smaller or equal table probabilities. This is the traditional way, but there are alternatives, e.g. tail balancing.) --    O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918 ~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

## Re: Basis of fisher.test

 In reply to this post by Ted.Harding On Thu, 12 Jan 2006 [hidden email] wrote: > I want to ascertain the basis of the table ranking, > i.e. the meaning of "extreme", in Fisher's Exact Test > as implemented in 'fisher.test', when applied to RxC > tables which are larger than 2x2. > > One can summarise a strategy for the test as > > 1) For each table compatible with the margins >   of the observed table, compute the probability >   of this table conditional on the marginal totals. > > 2) Rank the possible tables in order of a measure >   of discrepancy between the table and the null >   hypothesis of "no association". > > 3) Locate the observed table, and compute the sum >   of the probabilties, computed in (1), for this >   table and more "extreme" tables in the sense of >   the ranking in (2). > > The question is: what "measure of discrepancy" is > used in 'fisher.test' corresponding to stage (2)? > > (There are in principle several possibilities, e.g. > value of a Pearson chi-squared, large values being > discrepant; the probability calculated in (2), > small values being discrepant; ... ) > > "?fisher.test" says only: [That following is not a quote from a current version of R.] >     In the one-sided 2 by 2 cases, p-values are obtained >     directly using the hypergeometric distribution. >     Otherwise, computations are based on a C version of >     the FORTRAN subroutine FEXACT which implements the >     network developed by Mehta and Patel (1986) and >     improved by Clarkson, Fan & Joe (1993). The FORTRAN >     code can be obtained from >     . No, it *also* says       Two-sided tests are based on the probabilities of the tables, and       take as 'more extreme' all tables with probabilities less than or       equal to that of the observed table, the p-value being the sum of       such probabilities. which answers the question (there are only two-sided tests for such tables). Now, what does the posting guide say about stating the R version and updating before posting? -- Brian D. Ripley,                  [hidden email] Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/University of Oxford,             Tel:  +44 1865 272861 (self) 1 South Parks Road,                     +44 1865 272866 (PA) Oxford OX1 3TG, UK                Fax:  +44 1865 272595 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

## Re: Basis of fisher.test

 On 13-Jan-06 Prof Brian Ripley wrote: > On Thu, 12 Jan 2006 [hidden email] wrote: >>[...] >> "?fisher.test" says only: > > [That following is not a quote from a current version of R.] > >>     In the one-sided 2 by 2 cases, p-values are obtained >>     directly using the hypergeometric distribution. >>     Otherwise, computations are based on a C version of >>     the FORTRAN subroutine FEXACT which implements the >>     network developed by Mehta and Patel (1986) and >>     improved by Clarkson, Fan & Joe (1993). The FORTRAN >>     code can be obtained from >>     . > > No, it *also* says > >       Two-sided tests are based on the probabilities of the tables, and >       take as 'more extreme' all tables with probabilities less than or >       equal to that of the observed table, the p-value being the sum of >       such probabilities. > > which answers the question (there are only two-sided tests for such > tables). Thanks for the above information, which is indeed the definitive straightforward answer to my question! (Not sure that I quite agree with the "two-sided" terminology, though, since the ranking is unidirectional based on decreasing probability, and the P-value is that of the least-probability tail -- i.e. analagous to the "large (-2*loglik)" tail of a likelihood-ratio test -- which I've always visualised as a 1-tailed test (depite the fact that the "other tail" can on occasion be indicative of a fit "too good to be true"). > Now, what does the posting guide say about stating the R version and > updating before posting? Well, I plead that in practice there is necessarily a grey area here! My quotation was from "?fisher.test" in R-2.1.0beta of 2004/04/08, the most recent version installed on any of my machines. Admittedly a bit behind the times, but not grossly; and that help page has not changed in this respect since the earliest version I have installed, which is R-1.2.3 of 2001/04/26. Contents of help pages can change overnight as R evolves. While it is better to be up-to-date than behind the times (even slightly), there is a compromise to be struck between upgrading to the latest R every time one has a question which might be answered thereby, or going on-line to read the latest PDF documentation from CRAN, on the one hand, and on the other asking a straightforward question to the list. Thanks again, and best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jan-06                                       Time: 08:55:11 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

## Re: Basis of fisher.test

 On Fri, 13 Jan 2006 [hidden email] wrote: > On 13-Jan-06 Prof Brian Ripley wrote: >> On Thu, 12 Jan 2006 [hidden email] wrote: >>> [...] >>> "?fisher.test" says only: >> >> [That following is not a quote from a current version of R.] >> >>>     In the one-sided 2 by 2 cases, p-values are obtained >>>     directly using the hypergeometric distribution. >>>     Otherwise, computations are based on a C version of >>>     the FORTRAN subroutine FEXACT which implements the >>>     network developed by Mehta and Patel (1986) and >>>     improved by Clarkson, Fan & Joe (1993). The FORTRAN >>>     code can be obtained from >>>     . >> >> No, it *also* says >> >>       Two-sided tests are based on the probabilities of the tables, and >>       take as 'more extreme' all tables with probabilities less than or >>       equal to that of the observed table, the p-value being the sum of >>       such probabilities. >> >> which answers the question (there are only two-sided tests for such >> tables). > > Thanks for the above information, which is indeed the definitive > straightforward answer to my question! > > (Not sure that I quite agree with the "two-sided" terminology, though, > since the ranking is unidirectional based on decreasing probability, > and the P-value is that of the least-probability tail -- i.e. analagous > to the "large (-2*loglik)" tail of a likelihood-ratio test -- which > I've always visualised as a 1-tailed test (depite the fact that > the "other tail" can on occasion be indicative of a fit "too good to > be true"). As statistics is usually taught, significance tests are always one-tailed. The two-sided t-test is one-tailed, the test statistic being |T|. In any case, the `two-sided' is part of the arguments given to the function, so this para is just using the already-established terminology. >> Now, what does the posting guide say about stating the R version and >> updating before posting? > > Well, I plead that in practice there is necessarily a grey area > here! My quotation was from "?fisher.test" in R-2.1.0beta of > 2004/04/08, the most recent version installed on any of my machines. > Admittedly a bit behind the times, but not grossly; and that help > page has not changed in this respect since the earliest version I > have installed, which is R-1.2.3 of 2001/04/26. > > Contents of help pages can change overnight as R evolves. > While it is better to be up-to-date than behind the times (even > slightly), there is a compromise to be struck between upgrading > to the latest R every time one has a question which might be > answered thereby, or going on-line to read the latest PDF > documentation from CRAN, on the one hand, and on the other asking > a straightforward question to the list. Well, if you had given the R version number the problem would have been much more obvious. > Thanks again, and best wishes, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <[hidden email]> > Fax-to-email: +44 (0)870 094 0861 > Date: 13-Jan-06                                       Time: 08:55:11 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html> -- Brian D. Ripley,                  [hidden email] Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/University of Oxford,             Tel:  +44 1865 272861 (self) 1 South Parks Road,                     +44 1865 272866 (PA) Oxford OX1 3TG, UK                Fax:  +44 1865 272595 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html