# shapiro wilk normality test

19 messages
Open this post in threaded view
|

## shapiro wilk normality test

 Hi everybody, somehow i dont get the shapiro wilk test for normality. i just can´t   find what the H0 is . i tried :   shapiro.test(rnorm(5000))         Shapiro-Wilk normality test data:  rnorm(5000) W = 0.9997, p-value = 0.6205 If normality is the H0, the test says it´s probably not normal, doesn ´t it ? 5000 is the biggest n allowed by the test... are there any other test ? ( i know qqnorm already ;) thanks in advance matthias ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 At 11:30 AM 7/12/2008, Bunny, lautloscrew.com wrote: >Hi everybody, > >somehow i dont get the shapiro wilk test for normality. i just can´t >find what the H0 is . > >i tried : > >  shapiro.test(rnorm(5000)) > >         Shapiro-Wilk normality test > >data:  rnorm(5000) >W = 0.9997, p-value = 0.6205 > > >If normality is the H0, the test says it´s probably not normal, doesn ´t it ? > >5000 is the biggest n allowed by the test... > >are there any other test ? ( i know qqnorm already ;) > >thanks in advance > >matthias Yes, H0 is "normality". The P-value, as for other statistical tests, measures the probability that this sample could have arisen from the population under H0. 0.62 is a probability very compatible with H0. The typical rejection criterion would be a P-value < 0.05, which is not the case here. The limitation to n = 5000 is not serious, as even a few hundred data should take you to the asymptotic region. Use sample() to select the data at random from within your data set to avoid bias in using the test. E.g., shapiro.test(sample(mydata, 1000, replace=TRUE)) ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email] Least Cost Formulations, Ltd.            URL: http://lcfltd.com/824 Timberlake Drive                     Tel: 757-467-0954 Virginia Beach, VA 23464-3239            Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 In reply to this post by Bunny, lautloscrew.com Hmm thanks, But on the other hand it just says i cant reject normality, which   doesnt really mean it is normal. Wouldn´t be nice to test for non- normality ? if i´d reject that a high level i could be pretty sure it ´s normal... ?? thanks in advance matthias Am 12.07.2008 um 18:10 schrieb Mark Leeds: > Hi: If normality is the HO, then the test below says don't reject   > ( large p > value ).  Check out any multivariate text for what the null of the   > shapiro > test is. I don't know for sure but, from below, it sure looks like   > HO is > normality. Or google for it. > > > > -----Original Message----- > From: [hidden email] [mailto:[hidden email] > ] On > Behalf Of Bunny, lautloscrew.com > Sent: Saturday, July 12, 2008 11:30 AM > To: [hidden email] > Subject: [R] shapiro wilk normality test > > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just can´t > find what the H0 is . > > i tried : > >  shapiro.test(rnorm(5000)) > > Shapiro-Wilk normality test > > data:  rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says it´s probably not normal, doesn > ´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 There might be a test that uses "not normal" as the HO but I don't know of it. There's been a lot of discussion on this list in the past on the pitfalls associated with tests of normality in general so maybe you can find them in the archives. I think you should figure out why you are testing for normality and then decide on the test you want to use because ( qqplot could be enough ) , many of the procedures done in statistics can be robust to departures from normality anyway. Others, much more fluent than I in this area, hopefully can give more specific advice. -----Original Message----- From: Bunny, lautloscrew.com [mailto:[hidden email]] Sent: Saturday, July 12, 2008 12:20 PM To: Mark Leeds Cc: [hidden email] Subject: Re: [R] shapiro wilk normality test Hmm thanks, But on the other hand it just says i cant reject normality, which   doesnt really mean it is normal. Wouldn´t be nice to test for non- normality ? if i´d reject that a high level i could be pretty sure it ´s normal... ?? thanks in advance matthias Am 12.07.2008 um 18:10 schrieb Mark Leeds: > Hi: If normality is the HO, then the test below says don't reject   > ( large p > value ).  Check out any multivariate text for what the null of the   > shapiro > test is. I don't know for sure but, from below, it sure looks like   > HO is > normality. Or google for it. > > > > -----Original Message----- > From: [hidden email] [mailto:[hidden email] > ] On > Behalf Of Bunny, lautloscrew.com > Sent: Saturday, July 12, 2008 11:30 AM > To: [hidden email] > Subject: [R] shapiro wilk normality test > > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just can´t > find what the H0 is . > > i tried : > >  shapiro.test(rnorm(5000)) > > Shapiro-Wilk normality test > > data:  rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says it´s probably not normal, doesn > ´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 In reply to this post by Robert A LaBudde At 12:48 PM 7/12/2008, Bunny, lautloscrew.com wrote: >first of all thanks yall. it´s always good to get it from people that >know for sure. > >my bad, i meant to say it´s compatible with normality. i just wanted >to know if it wouldnt be better to test for non-normality in order to >know for "sure". >and if so, how can i do it? Doing a significance test may seem complicated, but it's an almost trivial concept. You assume some "null hypothesis" that specifies a unique distribution that you can use to calculate probabilities from. Then use this distribution to calculate the probability of finding what you found in your data, or more extreme. This is the P-value of the test. It is the probability of finding what you found, given that the null hypothesis is true. You give up ("reject") the null hypothesis if this P-value is too unbelievably small. The conventional measure for ordinary, repeatable experiments is 0.05. Sometimes a smaller value like 0.01 is more reasonable. Doing what has been suggested, i.e., using a null hypothesis of "nonnormality", is unworkable. There are uncountably infinite ways to specify a "nonnormal" distribution. Is it discrete or continuous? Is it skewed or symmetric? Does it go from zero to infinity, from 0 to 1, from -infinity to infinity, or anything else? Does it have one mode or many? Is it continuous or differentiable? Etc. In order to do a statistical test, you must be able to calculate the P-value. That usually means your null hypothesis must specify a single, unique probability distribution. So "nonnormal" in testing means "reject normal as the distribution". "Nonnormal" is not defined other than it's not the normal distribution. If you wish to test how the distribution is nonnormal, within some family of nonnormal distributions, you will have to specify such a null hypothesis and test for deviation from it. E.g., testing for coefficient of skewness = 0. ================================================================ Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email] Least Cost Formulations, Ltd.            URL: http://lcfltd.com/824 Timberlake Drive                     Tel: 757-467-0954 Virginia Beach, VA 23464-3239            Fax: 757-467-2947 "Vere scire est per causas scire" ================================================================ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 In reply to this post by Bunny, lautloscrew.com You may consider the nortest package. http://cran.r-project.org/web/packages/nortest/index.htmlRegards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com <[hidden email]> wrote: > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just can´t find > what the H0 is . > > i tried : > >  shapiro.test(rnorm(5000)) > >        Shapiro-Wilk normality test > > data:  rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says it´s probably not normal, doesn´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > -- CH Chan Research Assistant - KWH http://www.macgrass.com______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 In reply to this post by Bunny, lautloscrew.com Hi! Well, if you look at the output: shapiro.test(rnorm(5000)) > >Â  Â  Â  Â  Shapiro-Wilk normality test > > data:Â  rnorm(5000) > W = 0.9997, p-value = 0.6205 You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis. H0: normal dataÂ Â Â  vs H1: not normal So shapiro.wilk test is saying that your data are normal and it's correct! Bye Marta ----- Messaggio originale ----- Da: C.H. <[hidden email]> A: "Bunny, lautloscrew.com" <[hidden email]> Cc: [hidden email] Inviato: Domenica 13 luglio 2008, 7:27:43 Oggetto: Re: [R] shapiro wilk normality test You may consider the nortest package. http://cran.r-project.org/web/packages/nortest/index.htmlRegards, CH On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com <[hidden email]> wrote: > Hi everybody, > > somehow i dont get the shapiro wilk test for normality. i just canÂ´t find > what the H0 is . > > i tried : > >Â  shapiro.test(rnorm(5000)) > >Â  Â  Â  Â  Shapiro-Wilk normality test > > data:Â  rnorm(5000) > W = 0.9997, p-value = 0.6205 > > > If normality is the H0, the test says itÂ´s probably not normal, doesnÂ´t it ? > > 5000 is the biggest n allowed by the test... > > are there any other test ? ( i know qqnorm already ;) > > thanks in advance > > matthias > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > -- CH Chan Research Assistant - KWH http://www.macgrass.com______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.       Vuoi incontrare Rihanna? [[elided Yahoo spam]]         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

Open this post in threaded view
|

## Re: shapiro wilk normality test

 On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: > [...] > A large P-value means nothing more than needing more data.  No > conclusion is possible.  Please read the classic paper Absence of > Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a "classic paper" with that title? If so, I'd be pleased to have a reference to it! Thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08                                       Time: 15:55:35 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 http://www.bmj.com/cgi/content/full/311/7003/485Charles Annis, P.E. [hidden email] phone: 561-352-9699 eFax:  614-455-3265 http://www.StatisticalEngineering.com  -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Ted Harding Sent: Sunday, July 13, 2008 10:56 AM To: Frank E Harrell Jr Cc: [hidden email] Subject: Re: [R] shapiro wilk normality test On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: > [...] > A large P-value means nothing more than needing more data.  No > conclusion is possible.  Please read the classic paper Absence of > Evidence is not Evidence for Absence. Is that ironic, Frank, or is there really a "classic paper" with that title? If so, I'd be pleased to have a reference to it! Thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08                                       Time: 15:55:35 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

Open this post in threaded view
|

## Re: shapiro wilk normality test

Open this post in threaded view
|

## Re: shapiro wilk normality test

 In reply to this post by Ted.Harding-2 (Ted Harding) wrote: > On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >> [...] >> A large P-value means nothing more than needing more data.  No >> conclusion is possible.  Please read the classic paper Absence of >> Evidence is not Evidence for Absence. > > Is that ironic, Frank, or is there really a "classic paper" with > that title? If so, I'd be pleased to have a reference to it! > > Thanks, > Ted. It's real.  Full text is available to all: http://www.bmj.com/cgi/content/full/311/7003/485It's one of the dozens of gems in the short statistics notes series in the British Medical Journal. Frank -- Frank E Harrell Jr   Professor and Chair           School of Medicine                       Department of Biostatistics   Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. Frank Harrell Department of Biostatistics, Vanderbilt University
Open this post in threaded view
|

## Re: shapiro wilk normality test

 Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at 08:07:37PM CEST]: > (Ted Harding) wrote: >> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >>> [...] >>> A large P-value means nothing more than needing more data.  No   >>> conclusion is possible.  Please read the classic paper Absence of   >>> Evidence is not Evidence for Absence. >> [...] > > It's real.  Full text is available to all:   > http://www.bmj.com/cgi/content/full/311/7003/485The quotation is attributed to the late Carl Sagan who seemed to have used it as a strawman argument , see http://oyhus.no/AbsenceOfEvidence.html. -- Johannes Hüsing               There is something fascinating about science.                               One gets such wholesale returns of conjecture mailto:[hidden email]  from such a trifling investment of fact.                 http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi") ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 On 13-Jul-08 19:53:47, Johannes Huesing wrote: > Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at > 08:07:37PM CEST]: >> (Ted Harding) wrote: >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: >>>> [...] >>>> A large P-value means nothing more than needing more data.  No   >>>> conclusion is possible.  Please read the classic paper Absence of   >>>> Evidence is not Evidence for Absence. >>> > [...] >> >> It's real.  Full text is available to all:   >> http://www.bmj.com/cgi/content/full/311/7003/485> > The quotation is attributed to the late Carl Sagan who > seemed to have used it as a strawman argument , see > http://oyhus.no/AbsenceOfEvidence.html. This citation of Sagan, and the link therein to Sagan quotes:   http://en.wikiquote.org/wiki/Carl_Saganare interesting, as far as they go. However, I disagree with the proof ("by conditional probability") that absence of evidence is evidence of absence. Definition 1 is disputable. But, whether one agrees with it or not, Definition 2 does not correspond to my interpretation of "absence of evidence". If A is evidence for B (in terms of P(B|A) etc.), this means that if we *know* that A is the case, or that not-A is the case, then we can say something about P(B). But "absence of evidence", in my interpretation (which I believe is right for the statistical context of "non-significant P-values"), means that we do not know about A: we do not have enough information. That proof needs to be discussed in terms of the available evidence for A! The proof is, basically, given in terms of a 2-valued logic where every term is either TRUE or FALSE. In the real world we have at least a third possible value: UNKNOWN (or, as R would put it, NA). Even if you accept (Definition 1) that   "A is evidence for B" == P(B|A) > P(B|not-A) what can you possibly say about P(B|NA) (other than that it is NA itself)? Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 13-Jul-08                                       Time: 21:59:16 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: shapiro wilk normality test

 Ted Harding <[hidden email]> [Sun, Jul 13, 2008 at 10:59:21PM CEST]: > On 13-Jul-08 19:53:47, Johannes Huesing wrote: > > Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at > > 08:07:37PM CEST]: > >> (Ted Harding) wrote: > >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote: > >>>> [...] > >>>> A large P-value means nothing more than needing more data.  No   > >>>> conclusion is possible.   [...] > But "absence > of evidence", in my interpretation (which I believe is right for > the statistical context of "non-significant P-values"), means that > we do not know about A: we do not have enough information. > What would the p-value have to be like in your opinion to make the null hypothesis look more likely after the experiment than before? > The proof is, basically, given in terms of a 2-valued logic where > every term is either TRUE or FALSE. In the real world we have at > least a third possible value: UNKNOWN (or, as R would put it, NA). How would the probabilities that A is NA be affected by the outcome of an experiment like this? If this probability is affected, how does this leave the probability that A is T or F unaffected? Or do you assign the NA status to the data collected? A high p-value does not always equate that you might as well have collected nothing but missing values. Of course I buy into the notion that a point estimate with a measure of accuracy is much better suited to describe your data; but a high p-value as a result of a test procedure that can be claimed to be adequately powered may defensibly be taken as a hint that we can for now stick with the null hypothesis. -- Johannes Hüsing               There is something fascinating about science.                               One gets such wholesale returns of conjecture mailto:[hidden email]  from such a trifling investment of fact.                 http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi") ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|