Hi everybody,
somehow i dont get the shapiro wilk test for normality. i just can´t find what the H0 is . i tried : shapiro.test(rnorm(5000)) Shapiro-Wilk normality test data: rnorm(5000) W = 0.9997, p-value = 0.6205 If normality is the H0, the test says it´s probably not normal, doesn ´t it ? 5000 is the biggest n allowed by the test... are there any other test ? ( i know qqnorm already ;) thanks in advance matthias ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
At 11:30 AM 7/12/2008, Bunny, lautloscrew.com wrote:
>Hi everybody, > >somehow i dont get the shapiro wilk test for normality. i just can´t >find what the H0 is . > >i tried : > > shapiro.test(rnorm(5000)) > > Shapiro-Wilk normality test > >data: rnorm(5000) >W = 0.9997, p-value = 0.6205 > > >If normality is the H0, the test says it´s probably not normal, doesn ´t it ? > >5000 is the biggest n allowed by the test... > >are there any other test ? ( i know qqnorm already ;) > >thanks in advance > >matthias Yes, H0 is "normality". The P-value, as for other statistical tests, measures the probability that this sample could have arisen from the population under H0. 0.62 is a probability very compatible with H0. The typical rejection criterion would be a P-value < 0.05, which is not the case here. The limitation to n = 5000 is not serious, as even a few hundred data should take you to the asymptotic region. Use sample() to select the data at random from within your data set to avoid bias in using the test. E.g., shapiro.test(sample(mydata, 1000, replace=TRUE)) ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [hidden email] Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Bunny, lautloscrew.com
Hmm thanks,
But on the other hand it just says i cant reject normality, which doesnt really mean it is normal. Wouldn´t be nice to test for non- normality ? if i´d reject that a high level i could be pretty sure it ´s normal... ?? thanks in advance matthias Am 12.07.2008 um 18:10 schrieb Mark Leeds: > Hi: If normality is the HO, then the test below says don't reject > ( large p > value ). Check out any multivariate text for what the null of the > shapiro > test is. I don't know for sure but, from below, it sure looks like > HO is > normality. Or google for it. > > > > -----Original Message----- > From: [hidden email] [mailto:[hidden email] > ] On > Behalf Of Bunny, lautloscrew.com > Sent: Saturday, July 12, 2008 11:30 AM > To: [hidden email] > Subject: [R] shapiro wilk normality test > > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just can´t > find what the H0 is . > > i tried : > > shapiro.test(rnorm(5000)) > > Shapiro-Wilk normality test > > data: rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says it´s probably not normal, doesn > ´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
There might be a test that uses "not normal" as the HO but I don't know of
it. There's been a lot of discussion on this list in the past on the pitfalls associated with tests of normality in general so maybe you can find them in the archives. I think you should figure out why you are testing for normality and then decide on the test you want to use because ( qqplot could be enough ) , many of the procedures done in statistics can be robust to departures from normality anyway. Others, much more fluent than I in this area, hopefully can give more specific advice. -----Original Message----- From: Bunny, lautloscrew.com [mailto:[hidden email]] Sent: Saturday, July 12, 2008 12:20 PM To: Mark Leeds Cc: [hidden email] Subject: Re: [R] shapiro wilk normality test Hmm thanks, But on the other hand it just says i cant reject normality, which doesnt really mean it is normal. Wouldn´t be nice to test for non- normality ? if i´d reject that a high level i could be pretty sure it ´s normal... ?? thanks in advance matthias Am 12.07.2008 um 18:10 schrieb Mark Leeds: > Hi: If normality is the HO, then the test below says don't reject > ( large p > value ). Check out any multivariate text for what the null of the > shapiro > test is. I don't know for sure but, from below, it sure looks like > HO is > normality. Or google for it. > > > > -----Original Message----- > From: [hidden email] [mailto:[hidden email] > ] On > Behalf Of Bunny, lautloscrew.com > Sent: Saturday, July 12, 2008 11:30 AM > To: [hidden email] > Subject: [R] shapiro wilk normality test > > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just can´t > find what the H0 is . > > i tried : > > shapiro.test(rnorm(5000)) > > Shapiro-Wilk normality test > > data: rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says it´s probably not normal, doesn > ´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Robert A LaBudde
At 12:48 PM 7/12/2008, Bunny, lautloscrew.com wrote:
>first of all thanks yall. it´s always good to get it from people that >know for sure. > >my bad, i meant to say it´s compatible with normality. i just wanted >to know if it wouldnt be better to test for non-normality in order to >know for "sure". >and if so, how can i do it? Doing a significance test may seem complicated, but it's an almost trivial concept. You assume some "null hypothesis" that specifies a unique distribution that you can use to calculate probabilities from. Then use this distribution to calculate the probability of finding what you found in your data, or more extreme. This is the P-value of the test. It is the probability of finding what you found, given that the null hypothesis is true. You give up ("reject") the null hypothesis if this P-value is too unbelievably small. The conventional measure for ordinary, repeatable experiments is 0.05. Sometimes a smaller value like 0.01 is more reasonable. Doing what has been suggested, i.e., using a null hypothesis of "nonnormality", is unworkable. There are uncountably infinite ways to specify a "nonnormal" distribution. Is it discrete or continuous? Is it skewed or symmetric? Does it go from zero to infinity, from 0 to 1, from -infinity to infinity, or anything else? Does it have one mode or many? Is it continuous or differentiable? Etc. In order to do a statistical test, you must be able to calculate the P-value. That usually means your null hypothesis must specify a single, unique probability distribution. So "nonnormal" in testing means "reject normal as the distribution". "Nonnormal" is not defined other than it's not the normal distribution. If you wish to test how the distribution is nonnormal, within some family of nonnormal distributions, you will have to specify such a null hypothesis and test for deviation from it. E.g., testing for coefficient of skewness = 0. ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: [hidden email] Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Bunny, lautloscrew.com
You may consider the nortest package.
http://cran.r-project.org/web/packages/nortest/index.html Regards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com <[hidden email]> wrote: > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just can´t find > what the H0 is . > > i tried : > > shapiro.test(rnorm(5000)) > > Shapiro-Wilk normality test > > data: rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says it´s probably not normal, doesn´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- CH Chan Research Assistant - KWH http://www.macgrass.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Bunny, lautloscrew.com
Hi!
Well, if you look at the output: shapiro.test(rnorm(5000)) > >Â Â Â Â Shapiro-Wilk normality test > > data:Â rnorm(5000) > W = 0.9997, p-value = 0.6205 You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis. H0: normal dataÂ Â Â vs H1: not normal So shapiro.wilk test is saying that your data are normal and it's correct! Bye Marta ----- Messaggio originale ----- Da: C.H. <[hidden email]> A: "Bunny, lautloscrew.com" <[hidden email]> Cc: [hidden email] Inviato: Domenica 13 luglio 2008, 7:27:43 Oggetto: Re: [R] shapiro wilk normality test You may consider the nortest package. http://cran.r-project.org/web/packages/nortest/index.html Regards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com <[hidden email]> wrote: > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just canÂ´t find > what the H0 is . > > i tried : > >Â shapiro.test(rnorm(5000)) > >Â Â Â Â Shapiro-Wilk normality test > > data:Â rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says itÂ´s probably not normal, doesnÂ´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- CH Chan Research Assistant - KWH http://www.macgrass.com ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Vuoi incontrare Rihanna? [[elided Yahoo spam]] [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Marta Colombo wrote:
> Hi! > Well, if you look at the output: > shapiro.test(rnorm(5000)) >> Â Â Â Â Shapiro-Wilk normality test >> >> data:Â rnorm(5000) >> W = 0.9997, p-value = 0.6205 > > You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis. > H0: normal dataÂ Â Â vs H1: not normal > So shapiro.wilk test is saying that your data are normal and it's correct! > Bye > Marta A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence. Your first sentence is correct, but not the second. Why test for normality? What downstream method depends on it? If normality is in doubt why not use a method that doesn't require it? Frank Harrell > > > ----- Messaggio originale ----- > Da: C.H. <[hidden email]> > A: "Bunny, lautloscrew.com" <[hidden email]> > Cc: [hidden email] > Inviato: Domenica 13 luglio 2008, 7:27:43 > Oggetto: Re: [R] shapiro wilk normality test > > You may consider the nortest package. > > http://cran.r-project.org/web/packages/nortest/index.html > > Regards, > > CH > > On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com > <[hidden email]> wrote: >> Hi everybody, >> >> somehow i dont get the shapiro wilk test for normality. i just canÂ´t find >> what the H0 is . >> >> i tried : >> >> Â shapiro.test(rnorm(5000)) >> >> Â Â Â Â Shapiro-Wilk normality test >> >> data:Â rnorm(5000) >> W = 0.9997, p-value = 0.6205 >> >> >> If normality is the H0, the test says itÂ´s probably not normal, doesnÂ´t it ? >> >> 5000 is the biggest n allowed by the test... >> >> are there any other test ? ( i know qqnorm already ;) >> >> thanks in advance >> >> matthias >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > > ------------------------------------------------------------------------ > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
> [...] > A large P-value means nothing more than needing more data. No > conclusion is possible. Please read the classic paper Absence of > Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a "classic paper" with that title? If so, I'd be pleased to have a reference to it! Thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 15:55:35 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
http://www.bmj.com/cgi/content/full/311/7003/485
Charles Annis, P.E. [hidden email] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Ted Harding Sent: Sunday, July 13, 2008 10:56 AM To: Frank E Harrell Jr Cc: [hidden email] Subject: Re: [R] shapiro wilk normality test On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: > [...] > A large P-value means nothing more than needing more data. No > conclusion is possible. Please read the classic paper Absence of > Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a "classic paper" with that title? If so, I'd be pleased to have a reference to it! Thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 15:55:35 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Ted.Harding-2
G'day all,
On Sun, 13 Jul 2008 15:55:38 +0100 (BST) (Ted Harding) <[hidden email]> wrote: > On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: > > [...] > > A large P-value means nothing more than needing more data. No > > conclusion is possible. I would have thought that "we need more data" would qualify as a conclusion. :) > > Please read the classic paper Absence of Evidence is not Evidence > > for Absence. > > Is that ironic, Frank, or is there really a "classic paper" with > that title? If so, I'd be pleased to have a reference to it! Of course, I do not know for sure which paper Frank has in mind, but google and google schoar readily come up with papers/editorials that have a nearly identical title: http://www.bmj.com/cgi/content/full/311/7003/485 http://bmj.bmjjournals.com/cgi/content/full/328/7438/476 (see also http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831) http://www.ncbi.nlm.nih.gov/pubmed/6829975 My money is on Frank having the first of these publications in mind. Cheers, Berwin =========================== Full address ============================= Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [hidden email] Singapore 117546 http://www.stat.nus.edu.sg/~statba ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Many thanks to Berwin, and also to Charles Annis, for the
references. The're good! Ted. On 13-Jul-08 15:22:03, Berwin A Turlach wrote: > G'day all, > > On Sun, 13 Jul 2008 15:55:38 +0100 (BST) > (Ted Harding) <[hidden email]> wrote: > >> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >> > [...] >> > A large P-value means nothing more than needing more data. No >> > conclusion is possible. > > I would have thought that "we need more data" would qualify as a > conclusion. :) > >> > Please read the classic paper Absence of Evidence is not Evidence >> > for Absence. >> >> Is that ironic, Frank, or is there really a "classic paper" with >> that title? If so, I'd be pleased to have a reference to it! > > Of course, I do not know for sure which paper Frank has in mind, but > google and google schoar readily come up with papers/editorials that > have a nearly identical title: > > http://www.bmj.com/cgi/content/full/311/7003/485 > http://bmj.bmjjournals.com/cgi/content/full/328/7438/476 > (see also > http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831) > http://www.ncbi.nlm.nih.gov/pubmed/6829975 > > My money is on Frank having the first of these publications in mind. > > Cheers, > > Berwin > > =========================== Full address ============================= > Berwin A Turlach Tel.: +65 6516 4416 (secr) > Dept of Statistics and Applied Probability +65 6516 6650 (self) > Faculty of Science FAX : +65 6872 3919 > National University of Singapore > 6 Science Drive 2, Blk S16, Level 7 e-mail: [hidden email] > Singapore 117546 http://www.stat.nus.edu.sg/~statba > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 18:01:51 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Ted.Harding-2
(Ted Harding) wrote:
> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >> [...] >> A large P-value means nothing more than needing more data. No >> conclusion is possible. Please read the classic paper Absence of >> Evidence is not Evidence for Absence. > > Is that ironic, Frank, or is there really a "classic paper" with > that title? If so, I'd be pleased to have a reference to it! > > Thanks, > Ted. It's real. Full text is available to all: http://www.bmj.com/cgi/content/full/311/7003/485 It's one of the dozens of gems in the short statistics notes series in the British Medical Journal. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at 08:07:37PM CEST]:
> (Ted Harding) wrote: >> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >>> [...] >>> A large P-value means nothing more than needing more data. No >>> conclusion is possible. Please read the classic paper Absence of >>> Evidence is not Evidence for Absence. >> [...] > > It's real. Full text is available to all: > http://www.bmj.com/cgi/content/full/311/7003/485 The quotation is attributed to the late Carl Sagan who seemed to have used it as a strawman argument , see http://oyhus.no/AbsenceOfEvidence.html. -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[hidden email] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi") ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On 13-Jul-08 19:53:47, Johannes Huesing wrote:
> Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at > 08:07:37PM CEST]: >> (Ted Harding) wrote: >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >>>> [...] >>>> A large P-value means nothing more than needing more data. No >>>> conclusion is possible. Please read the classic paper Absence of >>>> Evidence is not Evidence for Absence. >>> > [...] >> >> It's real. Full text is available to all: >> http://www.bmj.com/cgi/content/full/311/7003/485 > > The quotation is attributed to the late Carl Sagan who > seemed to have used it as a strawman argument , see > http://oyhus.no/AbsenceOfEvidence.html. This citation of Sagan, and the link therein to Sagan quotes: http://en.wikiquote.org/wiki/Carl_Sagan are interesting, as far as they go. However, I disagree with the proof ("by conditional probability") that absence of evidence is evidence of absence. Definition 1 is disputable. But, whether one agrees with it or not, Definition 2 does not correspond to my interpretation of "absence of evidence". If A is evidence for B (in terms of P(B|A) etc.), this means that if we *know* that A is the case, or that not-A is the case, then we can say something about P(B). But "absence of evidence", in my interpretation (which I believe is right for the statistical context of "non-significant P-values"), means that we do not know about A: we do not have enough information. That proof needs to be discussed in terms of the available evidence for A! The proof is, basically, given in terms of a 2-valued logic where every term is either TRUE or FALSE. In the real world we have at least a third possible value: UNKNOWN (or, as R would put it, NA). Even if you accept (Definition 1) that "A is evidence for B" == P(B|A) > P(B|not-A) what can you possibly say about P(B|NA) (other than that it is NA itself)? Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08 Time: 21:59:16 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Ted Harding <[hidden email]> [Sun, Jul 13, 2008 at 10:59:21PM CEST]:
> On 13-Jul-08 19:53:47, Johannes Huesing wrote: > > Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at > > 08:07:37PM CEST]: > >> (Ted Harding) wrote: > >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: > >>>> [...] > >>>> A large P-value means nothing more than needing more data. No > >>>> conclusion is possible. [...] > But "absence > of evidence", in my interpretation (which I believe is right for > the statistical context of "non-significant P-values"), means that > we do not know about A: we do not have enough information. > What would the p-value have to be like in your opinion to make the null hypothesis look more likely after the experiment than before? > The proof is, basically, given in terms of a 2-valued logic where > every term is either TRUE or FALSE. In the real world we have at > least a third possible value: UNKNOWN (or, as R would put it, NA). How would the probabilities that A is NA be affected by the outcome of an experiment like this? If this probability is affected, how does this leave the probability that A is T or F unaffected? Or do you assign the NA status to the data collected? A high p-value does not always equate that you might as well have collected nothing but missing values. Of course I buy into the notion that a point estimate with a measure of accuracy is much better suited to describe your data; but a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis. -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:[hidden email] from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi") ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
See at end.
On 13-Jul-08 21:42:19, Johannes Huesing wrote: > Ted Harding <[hidden email]> [Sun, Jul 13, 2008 at > 10:59:21PM CEST]: >> On 13-Jul-08 19:53:47, Johannes Huesing wrote: >> > Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at >> > 08:07:37PM CEST]: >> >> (Ted Harding) wrote: >> >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >> >>>> [...] >> >>>> A large P-value means nothing more than needing more data. No >> >>>> conclusion is possible. > [...] > >> But "absence >> of evidence", in my interpretation (which I believe is right for >> the statistical context of "non-significant P-values"), means that >> we do not know about A: we do not have enough information. >> > > What would the p-value have to be like in your opinion to make the > null hypothesis look more likely after the experiment than before? > >> The proof is, basically, given in terms of a 2-valued logic where >> every term is either TRUE or FALSE. In the real world we have at >> least a third possible value: UNKNOWN (or, as R would put it, NA). > > How would the probabilities that A is NA be affected by the outcome > of an experiment like this? If this probability is affected, how > does this leave the probability that A is T or F unaffected? > > Or do you assign the NA status to the data collected? > > A high p-value does not always equate that you might as well have > collected nothing but missing values. > > Of course I buy into the notion that a point estimate with a measure > of accuracy is much better suited to describe your data; but a > high p-value as a result of a test procedure that can be claimed to > be adequately powered may defensibly be taken as a hint that we > can for now stick with the null hypothesis. > -- > Johannes HÃ¼sing I shall perhaps try later to respond in more detail to specific points above. But, for the moment, let me say that I think your statement "a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis" is the main key. The power function of a test (which of course depends on the design of the investigation and on its size, i.e. number of data gathered) is basically much the same (in my mind) as the amount of evidence. A high P-value with a very powerful test serves to exclude all alternatives to the Null Hypothesis except those which lie very close to the Null Hypothesis. In that sense, we do in fact have a lot of evidence against all hypotheses except those which are very similar to the Null. So we are not in an "absence of evidence" situation, and we do have "evidence of absence". The basic logic of a Hypothesis Test (in its standard sense) is the generalisation, to a logic where certainty is at best probabilistic, of the classical-logic argument: Given (as a matter of fact): If A, then B Observed: B is FALSE Conclusion: A is FALSE Probabilistically: Given: If A (H0), then B has high probability Observed: B is FALSE Conclusion: An event (not-B) has occurred which has very small probability if A is TRUE. Hence we (as George Barnard used to put it) apply "The Principle of Disbelief in Tall Stories" and disbelieve A to the extent that we disbelieve not-B as a possible outcome from A (H0). In applications, the event B will be specified in terms of a set of possible values of a Test Statistic T, devised so as to represent an interesting measure of discrepancy between the data and the hypothesis H0 (e.g. the t-statistic for testing whether two samples are drawn from populations with equal means -- if that is the case, then E(T) = 0, and the set of values {abs(T) > T0} will be a "discrepant set". By choosing T0 to be such that Prob(abs(T) > T0) = p0, a small value which we choose to suit ourselves, we are defining the threshold at which we are prepared to deem that "the claim that Abs(T) > T0 is compatible with H0" is too unlikely to be plausible. The cleanest example in real life can be drawn from the basic principle in criminal law for concluding that an accused person is guilty, namely "The accused is deemed innocent until proved guilty beyond reasonable doubt". What constitutes "reasonable doubt" can become a very interesting question, but there are some crimes for which it has a definite statistical interpretation, typically exceeding some authorised limit (of speed in a vehicle, of alcohol content in the blood while driving a vehicle, of a factory plant exceeding permitted levels of polluting emissions [which in the UK, under the Environmental Protection Act, is a criminal offence]. In the days when blood alcohol was determined by laboratory analysis of a blood sample, it was possible to determine that the "margin of error" corresponded to a P-value less than or equal to 0.001 (i.e. if the lab analysis yielded a result in exceess of the legal limit + 2*SE, then the inevitable result was a conviction unless it could be independently proved in defence that the statutory procedures were carried out in a flawed manner). So, in that case, "beyond reasonable doubt" meant "The P-value of the data was less that 1/1000". But, if the lab analysis gave 80mg/100ml (the legal limit in the UK), then at best you can conclude that the result equally favoured any two hypotheses equidistant on either side of the legal limit. But while this constitutes (in the sense explained) absence of evidence for guilt (i.e. alc > 80), it certainly does not exclude it (someone at 81, and therefore truly guilty, could be quite likely to give a result of 80). So the "80" result is not evidence of innocence -- it is merely lack of evidence of guilt. It gets worse with the environmental pollution situation. For the blood alcohol and the lab analysis of a blood sample, the lab procedure is only legally valid if it consistently achieves an SE of determination of 2% or less (taken as 2mg/100ml for results below 100). Thus the power function has Power(alc) = 0.001 at alc=80, Power(alc) = 0.5 at alc=86, Power(alc) = 0.999 at alc=92. Thus the innocent (alc <= 80) have a good protection against false conviction; the marginally guilty (alc < 86, say) are likely to get away with it; the seriously guilty (alc > 92) are almost certain to be convicted. However, the kinds of measurement which can be made of, say, atmospheric pollution are subject to SEs which are more like 20% and are often higher (50% or more). To achieve the requisite "beyond reasonable doubt" (since it is a criminal offence) on the same criterion (3*SE above) means that the procedure is only effective when the emission is say twice the permitted level (or even more). Here we have lack of evidence in a very real sense (the procedure is weak). It would be quite possible for a polluter emit well above the permitted level, yet the sampling give a result well below the permitted level. Hence, such absence of evidence is certainly not evidence of absence. And, if I understand correctly, this is pretty much what Frank Harrell meant when he wrote "A large P-value means nothing more than needing more data. No conclusion is possible. Please read the classic paper Absence of Evidence is not Evidence for Absence." [Or "better data", one might add]. But it does need to be qualified (as I try to do above) by consideration of whereabouts on the "effect" scale the procedure becomes capable of doing its job, which in turn brings in issues about the importance (in real life) of the sort of departure from H0 that it is important to detect. The blood-alcohol test does a reasonably good job (one is prepared to accept a relatively narrow "grey area" where any conclusion is unclear). The pollution test does not. Mustn't go on too long! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 14-Jul-08 Time: 00:16:50 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Bunny, lautloscrew.com
For those people who feel the need for a p-value to test normality on large sample sizes, I propose the following test/function:
SnowsPenultimateNormalityTest <- function(x){ # the following function works for current implementations of R # to my knowledge, eventually it may need to be expanded is.rational <- function(x){ rep( TRUE, length(x) ) } tmp.p <- if( any(is.rational(x))) { 0 } else { # current implementation will not get here # this part is reserved for the ultimate test 1 } out <- list( p.value = tmp.p, alternative = strwrap(paste('The data does not come from a', 'strict normal distribution (but may represent a distribution', 'that is close enough)'), prefix="\n\t"), method = "Snow's Penultimate Normality Test", data.name = deparse(substitute(x)) ) class(out) <- 'htest' out } Now that the need for a p-value is satisfied, we can get onto the more useful questions mentioned in this thread and other places. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [hidden email] (801) 408-8111 > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Bunny, > lautloscrew.com > Sent: Saturday, July 12, 2008 10:20 AM > To: Mark Leeds > Cc: [hidden email] > Subject: Re: [R] shapiro wilk normality test > > Hmm thanks, > But on the other hand it just says i cant reject normality, > which doesnt really mean it is normal. Wouldn´t be nice to > test for non- normality ? if i´d reject that a high level i > could be pretty sure it ´s normal... ?? > > thanks in advance > > matthias > Am 12.07.2008 um 18:10 schrieb Mark Leeds: > > > Hi: If normality is the HO, then the test below says don't reject ( > > large p value ). Check out any multivariate text for what > the null of > > the shapiro test is. I don't know for sure but, from below, it sure > > looks like HO is normality. Or google for it. > > > > > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email] > > ] On > > Behalf Of Bunny, lautloscrew.com > > Sent: Saturday, July 12, 2008 11:30 AM > > To: [hidden email] > > Subject: [R] shapiro wilk normality test > > > > Hi everybody, > > > > somehow i dont get the shapiro wilk test for normality. i > just can´t > > find what the H0 is . > > > > i tried : > > > > shapiro.test(rnorm(5000)) > > > > Shapiro-Wilk normality test > > > > data: rnorm(5000) > > W = 0.9997, p-value = 0.6205 > > > > > > If normality is the H0, the test says it´s probably not > normal, doesn > > ´t it ? > > > > 5000 is the biggest n allowed by the test... > > > > are there any other test ? ( i know qqnorm already ;) > > > > thanks in advance > > > > matthias > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
This one should (I am tempted to write "must") make its way to fortune
()... Thankyouthankyouthankyou ... Emmanuel Charpentier On Mon, 14 Jul 2008 14:58:13 -0600, Greg Snow wrote : > For those people who feel the need for a p-value to test normality on > large sample sizes, I propose the following test/function: [ Snip ... ] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |