Quantcast

shapiro wilk normality test

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

shapiro wilk normality test

Bunny, lautloscrew.com
Hi everybody,

somehow i dont get the shapiro wilk test for normality. i just can´t  
find what the H0 is .

i tried :

  shapiro.test(rnorm(5000))

        Shapiro-Wilk normality test

data:  rnorm(5000)
W = 0.9997, p-value = 0.6205


If normality is the H0, the test says it´s probably not normal, doesn
´t it ?

5000 is the biggest n allowed by the test...

are there any other test ? ( i know qqnorm already ;)

thanks in advance

matthias
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Robert A LaBudde
At 11:30 AM 7/12/2008, Bunny, lautloscrew.com wrote:

>Hi everybody,
>
>somehow i dont get the shapiro wilk test for normality. i just can´t
>find what the H0 is .
>
>i tried :
>
>  shapiro.test(rnorm(5000))
>
>         Shapiro-Wilk normality test
>
>data:  rnorm(5000)
>W = 0.9997, p-value = 0.6205
>
>
>If normality is the H0, the test says it´s probably not normal, doesn ´t it ?
>
>5000 is the biggest n allowed by the test...
>
>are there any other test ? ( i know qqnorm already ;)
>
>thanks in advance
>
>matthias

Yes, H0 is "normality". The P-value, as for other
statistical tests, measures the probability that
this sample could have arisen from the population under H0.

0.62 is a probability very compatible with H0.
The typical rejection criterion would be a
P-value < 0.05, which is not the case here.

The limitation to n = 5000 is not serious, as
even a few hundred data should take you to the
asymptotic region. Use sample() to select the
data at random from within your data set to avoid bias in using the test. E.g.,

shapiro.test(sample(mydata, 1000, replace=TRUE))






================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Bunny, lautloscrew.com
In reply to this post by Bunny, lautloscrew.com
Hmm thanks,
But on the other hand it just says i cant reject normality, which  
doesnt really mean it is normal. Wouldn´t be nice to test for non-
normality ? if i´d reject that a high level i could be pretty sure it
´s normal... ??

thanks in advance

matthias
Am 12.07.2008 um 18:10 schrieb Mark Leeds:

> Hi: If normality is the HO, then the test below says don't reject  
> ( large p
> value ).  Check out any multivariate text for what the null of the  
> shapiro
> test is. I don't know for sure but, from below, it sure looks like  
> HO is
> normality. Or google for it.
>
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]
> ] On
> Behalf Of Bunny, lautloscrew.com
> Sent: Saturday, July 12, 2008 11:30 AM
> To: [hidden email]
> Subject: [R] shapiro wilk normality test
>
> Hi everybody,
>
> somehow i dont get the shapiro wilk test for normality. i just can´t
> find what the H0 is .
>
> i tried :
>
>  shapiro.test(rnorm(5000))
>
> Shapiro-Wilk normality test
>
> data:  rnorm(5000)
> W = 0.9997, p-value = 0.6205
>
>
> If normality is the H0, the test says it´s probably not normal, doesn
> ´t it ?
>
> 5000 is the biggest n allowed by the test...
>
> are there any other test ? ( i know qqnorm already ;)
>
> thanks in advance
>
> matthias
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

markleeds
There might be a test that uses "not normal" as the HO but I don't know of
it. There's been a lot of discussion on this list in the past on the
pitfalls associated with tests of normality in general so maybe you
can find them in the archives.

I think you should figure out why you are testing for normality and then
decide on the test you want to use because ( qqplot could be enough ) ,
many of the procedures done in statistics can be robust to departures from
normality anyway. Others, much more fluent than
I in this area, hopefully can give more specific advice.



-----Original Message-----
From: Bunny, lautloscrew.com [mailto:[hidden email]]
Sent: Saturday, July 12, 2008 12:20 PM
To: Mark Leeds
Cc: [hidden email]
Subject: Re: [R] shapiro wilk normality test

Hmm thanks,
But on the other hand it just says i cant reject normality, which  
doesnt really mean it is normal. Wouldn´t be nice to test for non-
normality ? if i´d reject that a high level i could be pretty sure it
´s normal... ??

thanks in advance

matthias
Am 12.07.2008 um 18:10 schrieb Mark Leeds:

> Hi: If normality is the HO, then the test below says don't reject  
> ( large p
> value ).  Check out any multivariate text for what the null of the  
> shapiro
> test is. I don't know for sure but, from below, it sure looks like  
> HO is
> normality. Or google for it.
>
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]
> ] On
> Behalf Of Bunny, lautloscrew.com
> Sent: Saturday, July 12, 2008 11:30 AM
> To: [hidden email]
> Subject: [R] shapiro wilk normality test
>
> Hi everybody,
>
> somehow i dont get the shapiro wilk test for normality. i just can´t
> find what the H0 is .
>
> i tried :
>
>  shapiro.test(rnorm(5000))
>
> Shapiro-Wilk normality test
>
> data:  rnorm(5000)
> W = 0.9997, p-value = 0.6205
>
>
> If normality is the H0, the test says it´s probably not normal, doesn
> ´t it ?
>
> 5000 is the biggest n allowed by the test...
>
> are there any other test ? ( i know qqnorm already ;)
>
> thanks in advance
>
> matthias
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Robert A LaBudde
In reply to this post by Robert A LaBudde
At 12:48 PM 7/12/2008, Bunny, lautloscrew.com wrote:
>first of all thanks yall. it´s always good to get it from people that
>know for sure.
>
>my bad, i meant to say it´s compatible with normality. i just wanted
>to know if it wouldnt be better to test for non-normality in order to
>know for "sure".
>and if so, how can i do it?

Doing a significance test may seem complicated,
but it's an almost trivial concept.

You assume some "null hypothesis" that specifies
a unique distribution that you can use to
calculate probabilities from. Then use this
distribution to calculate the probability of
finding what you found in your data, or more
extreme. This is the P-value of the test. It is
the probability of finding what you found, given
that the null hypothesis is true. You give up
("reject") the null hypothesis if this P-value is
too unbelievably small. The conventional measure
for ordinary, repeatable experiments is 0.05.
Sometimes a smaller value like 0.01 is more reasonable.

Doing what has been suggested, i.e., using a null
hypothesis of "nonnormality", is unworkable.
There are uncountably infinite ways to specify a
"nonnormal" distribution. Is it discrete or
continuous? Is it skewed or symmetric? Does it go
from zero to infinity, from 0 to 1, from
-infinity to infinity, or anything else? Does it
have one mode or many? Is it continuous or differentiable? Etc.

In order to do a statistical test, you must be
able to calculate the P-value. That usually means
your null hypothesis must specify a single, unique probability distribution.

So "nonnormal" in testing means "reject normal as
the distribution". "Nonnormal" is not defined
other than it's not the normal distribution.

If you wish to test how the distribution is
nonnormal, within some family of nonnormal
distributions, you will have to specify such a
null hypothesis and test for deviation from it.

E.g., testing for coefficient of skewness = 0.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

C.H.
In reply to this post by Bunny, lautloscrew.com
You may consider the nortest package.

http://cran.r-project.org/web/packages/nortest/index.html

Regards,

CH

On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com
<[hidden email]> wrote:

> Hi everybody,
>
> somehow i dont get the shapiro wilk test for normality. i just can´t find
> what the H0 is .
>
> i tried :
>
>  shapiro.test(rnorm(5000))
>
>        Shapiro-Wilk normality test
>
> data:  rnorm(5000)
> W = 0.9997, p-value = 0.6205
>
>
> If normality is the H0, the test says it´s probably not normal, doesn´t it ?
>
> 5000 is the biggest n allowed by the test...
>
> are there any other test ? ( i know qqnorm already ;)
>
> thanks in advance
>
> matthias
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
CH Chan
Research Assistant - KWH
http://www.macgrass.com
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Marta Colombo-2
In reply to this post by Bunny, lautloscrew.com
Hi!
Well, if you look at the output:
shapiro.test(rnorm(5000))
>
>        Shapiro-Wilk normality test
>
> data:  rnorm(5000)
> W = 0.9997, p-value = 0.6205

You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis.
H0: normal data    vs H1: not normal
So shapiro.wilk test is saying that your data are normal and it's correct!
Bye
Marta


----- Messaggio originale -----
Da: C.H. <[hidden email]>
A: "Bunny, lautloscrew.com" <[hidden email]>
Cc: [hidden email]
Inviato: Domenica 13 luglio 2008, 7:27:43
Oggetto: Re: [R] shapiro wilk normality test

You may consider the nortest package.

http://cran.r-project.org/web/packages/nortest/index.html

Regards,

CH

On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com
<[hidden email]> wrote:

> Hi everybody,
>
> somehow i dont get the shapiro wilk test for normality. i just can´t find
> what the H0 is .
>
> i tried :
>
>  shapiro.test(rnorm(5000))
>
>        Shapiro-Wilk normality test
>
> data:  rnorm(5000)
> W = 0.9997, p-value = 0.6205
>
>
> If normality is the H0, the test says it´s probably not normal, doesn´t it ?
>
> 5000 is the biggest n allowed by the test...
>
> are there any other test ? ( i know qqnorm already ;)
>
> thanks in advance
>
> matthias
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
CH Chan
Research Assistant - KWH
http://www.macgrass.com
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



      Vuoi incontrare Rihanna?
[[elided Yahoo spam]]

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Frank Harrell
Marta Colombo wrote:

> Hi!
> Well, if you look at the output:
> shapiro.test(rnorm(5000))
>> Â  Â  Â  Â  Shapiro-Wilk normality test
>>
>> data:Â  rnorm(5000)
>> W = 0.9997, p-value = 0.6205
>
> You can see that the p-value is 0.6205 so you can't refuse the normality hypotesis.
> H0: normal data    vs H1: not normal
> So shapiro.wilk test is saying that your data are normal and it's correct!
> Bye
> Marta

A large P-value means nothing more than needing more data.  No
conclusion is possible.  Please read the classic paper Absence of
Evidence is not Evidence for Absence.

Your first sentence is correct, but not the second.

Why test for normality?  What downstream method depends on it?  If
normality is in doubt why not use a method that doesn't require it?

Frank Harrell

>
>
> ----- Messaggio originale -----
> Da: C.H. <[hidden email]>
> A: "Bunny, lautloscrew.com" <[hidden email]>
> Cc: [hidden email]
> Inviato: Domenica 13 luglio 2008, 7:27:43
> Oggetto: Re: [R] shapiro wilk normality test
>
> You may consider the nortest package.
>
> http://cran.r-project.org/web/packages/nortest/index.html
>
> Regards,
>
> CH
>
> On Sat, Jul 12, 2008 at 11:30 PM, Bunny, lautloscrew.com
> <[hidden email]> wrote:
>> Hi everybody,
>>
>> somehow i dont get the shapiro wilk test for normality. i just can´t find
>> what the H0 is .
>>
>> i tried :
>>
>> Â  shapiro.test(rnorm(5000))
>>
>> Â  Â  Â  Â  Shapiro-Wilk normality test
>>
>> data:Â  rnorm(5000)
>> W = 0.9997, p-value = 0.6205
>>
>>
>> If normality is the H0, the test says it´s probably not normal, doesn´t it ?
>>
>> 5000 is the biggest n allowed by the test...
>>
>> are there any other test ? ( i know qqnorm already ;)
>>
>> thanks in advance
>>
>> matthias
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Ted.Harding-2
On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
> [...]
> A large P-value means nothing more than needing more data.  No
> conclusion is possible.  Please read the classic paper Absence of
> Evidence is not Evidence for Absence.

Is that ironic, Frank, or is there really a "classic paper" with
that title? If so, I'd be pleased to have a reference to it!

Thanks,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jul-08                                       Time: 15:55:35
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Charles Annis, P.E.
http://www.bmj.com/cgi/content/full/311/7003/485

Charles Annis, P.E.

[hidden email]
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Ted Harding
Sent: Sunday, July 13, 2008 10:56 AM
To: Frank E Harrell Jr
Cc: [hidden email]
Subject: Re: [R] shapiro wilk normality test

On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
> [...]
> A large P-value means nothing more than needing more data.  No
> conclusion is possible.  Please read the classic paper Absence of
> Evidence is not Evidence for Absence.

Is that ironic, Frank, or is there really a "classic paper" with
that title? If so, I'd be pleased to have a reference to it!

Thanks,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jul-08                                       Time: 15:55:35
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Berwin A Turlach
In reply to this post by Ted.Harding-2
G'day all,

On Sun, 13 Jul 2008 15:55:38 +0100 (BST)
(Ted Harding) <[hidden email]> wrote:

> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
> > [...]
> > A large P-value means nothing more than needing more data.  No
> > conclusion is possible.  

I would have thought that "we need more data" would qualify as a
conclusion. :)

> > Please read the classic paper Absence of Evidence is not Evidence
> > for Absence.
>
> Is that ironic, Frank, or is there really a "classic paper" with
> that title? If so, I'd be pleased to have a reference to it!

Of course, I do not know for sure which paper Frank has in mind, but
google and google schoar readily come up with papers/editorials that
have a nearly identical title:

http://www.bmj.com/cgi/content/full/311/7003/485
http://bmj.bmjjournals.com/cgi/content/full/328/7438/476
(see also
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831)
http://www.ncbi.nlm.nih.gov/pubmed/6829975

My money is on Frank having the first of these publications in mind.

Cheers,

        Berwin

=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919      
National University of Singapore    
6 Science Drive 2, Blk S16, Level 7          e-mail: [hidden email]
Singapore 117546                    http://www.stat.nus.edu.sg/~statba

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Ted.Harding-2
Many thanks to Berwin, and also to Charles Annis, for the
references. The're good!
Ted.

On 13-Jul-08 15:22:03, Berwin A Turlach wrote:

> G'day all,
>
> On Sun, 13 Jul 2008 15:55:38 +0100 (BST)
> (Ted Harding) <[hidden email]> wrote:
>
>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
>> > [...]
>> > A large P-value means nothing more than needing more data.  No
>> > conclusion is possible.  
>
> I would have thought that "we need more data" would qualify as a
> conclusion. :)
>
>> > Please read the classic paper Absence of Evidence is not Evidence
>> > for Absence.
>>
>> Is that ironic, Frank, or is there really a "classic paper" with
>> that title? If so, I'd be pleased to have a reference to it!
>
> Of course, I do not know for sure which paper Frank has in mind, but
> google and google schoar readily come up with papers/editorials that
> have a nearly identical title:
>
> http://www.bmj.com/cgi/content/full/311/7003/485
> http://bmj.bmjjournals.com/cgi/content/full/328/7438/476
> (see also
> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=351831)
> http://www.ncbi.nlm.nih.gov/pubmed/6829975
>
> My money is on Frank having the first of these publications in mind.
>
> Cheers,
>
>       Berwin
>
> =========================== Full address =============================
> Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
> Dept of Statistics and Applied Probability        +65 6516 6650 (self)
> Faculty of Science                          FAX : +65 6872 3919      
> National University of Singapore    
> 6 Science Drive 2, Blk S16, Level 7          e-mail: [hidden email]
> Singapore 117546                    http://www.stat.nus.edu.sg/~statba
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jul-08                                       Time: 18:01:51
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Frank Harrell
In reply to this post by Ted.Harding-2
(Ted Harding) wrote:

> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
>> [...]
>> A large P-value means nothing more than needing more data.  No
>> conclusion is possible.  Please read the classic paper Absence of
>> Evidence is not Evidence for Absence.
>
> Is that ironic, Frank, or is there really a "classic paper" with
> that title? If so, I'd be pleased to have a reference to it!
>
> Thanks,
> Ted.

It's real.  Full text is available to all:
http://www.bmj.com/cgi/content/full/311/7003/485

It's one of the dozens of gems in the short statistics notes series in
the British Medical Journal.

Frank

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Johannes Huesing
Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at 08:07:37PM CEST]:
> (Ted Harding) wrote:
>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
>>> [...]
>>> A large P-value means nothing more than needing more data.  No  
>>> conclusion is possible.  Please read the classic paper Absence of  
>>> Evidence is not Evidence for Absence.
>>
[...]
>
> It's real.  Full text is available to all:  
> http://www.bmj.com/cgi/content/full/311/7003/485

The quotation is attributed to the late Carl Sagan who
seemed to have used it as a strawman argument , see
http://oyhus.no/AbsenceOfEvidence.html.

--
Johannes Hüsing               There is something fascinating about science.
                              One gets such wholesale returns of conjecture
mailto:[hidden email]  from such a trifling investment of fact.                
http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi")

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Ted.Harding-2
On 13-Jul-08 19:53:47, Johannes Huesing wrote:

> Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at
> 08:07:37PM CEST]:
>> (Ted Harding) wrote:
>>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
>>>> [...]
>>>> A large P-value means nothing more than needing more data.  No  
>>>> conclusion is possible.  Please read the classic paper Absence of  
>>>> Evidence is not Evidence for Absence.
>>>
> [...]
>>
>> It's real.  Full text is available to all:  
>> http://www.bmj.com/cgi/content/full/311/7003/485
>
> The quotation is attributed to the late Carl Sagan who
> seemed to have used it as a strawman argument , see
> http://oyhus.no/AbsenceOfEvidence.html.

This citation of Sagan, and the link therein to Sagan quotes:

  http://en.wikiquote.org/wiki/Carl_Sagan

are interesting, as far as they go. However, I disagree with the
proof ("by conditional probability") that absence of evidence is
evidence of absence.

Definition 1 is disputable. But, whether one agrees with it or not,
Definition 2 does not correspond to my interpretation of "absence
of evidence". If A is evidence for B (in terms of P(B|A) etc.),
this means that if we *know* that A is the case, or that not-A
is the case, then we can say something about P(B). But "absence
of evidence", in my interpretation (which I believe is right for
the statistical context of "non-significant P-values"), means that
we do not know about A: we do not have enough information.

That proof needs to be discussed in terms of the available evidence
for A!

The proof is, basically, given in terms of a 2-valued logic where
every term is either TRUE or FALSE. In the real world we have at
least a third possible value: UNKNOWN (or, as R would put it, NA).

Even if you accept (Definition 1) that

  "A is evidence for B" == P(B|A) > P(B|not-A)

what can you possibly say about P(B|NA) (other than that it is NA
itself)?

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jul-08                                       Time: 21:59:16
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Johannes Huesing
Ted Harding <[hidden email]> [Sun, Jul 13, 2008 at 10:59:21PM CEST]:
> On 13-Jul-08 19:53:47, Johannes Huesing wrote:
> > Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at
> > 08:07:37PM CEST]:
> >> (Ted Harding) wrote:
> >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
> >>>> [...]
> >>>> A large P-value means nothing more than needing more data.  No  
> >>>> conclusion is possible.  
[...]

> But "absence
> of evidence", in my interpretation (which I believe is right for
> the statistical context of "non-significant P-values"), means that
> we do not know about A: we do not have enough information.
>

What would the p-value have to be like in your opinion to make the
null hypothesis look more likely after the experiment than before?

> The proof is, basically, given in terms of a 2-valued logic where
> every term is either TRUE or FALSE. In the real world we have at
> least a third possible value: UNKNOWN (or, as R would put it, NA).

How would the probabilities that A is NA be affected by the outcome
of an experiment like this? If this probability is affected, how
does this leave the probability that A is T or F unaffected?

Or do you assign the NA status to the data collected?

A high p-value does not always equate that you might as well have
collected nothing but missing values.

Of course I buy into the notion that a point estimate with a measure
of accuracy is much better suited to describe your data; but a
high p-value as a result of a test procedure that can be claimed to
be adequately powered may defensibly be taken as a hint that we
can for now stick with the null hypothesis.
--
Johannes Hüsing               There is something fascinating about science.
                              One gets such wholesale returns of conjecture
mailto:[hidden email]  from such a trifling investment of fact.                
http://derwisch.wikidot.com         (Mark Twain, "Life on the Mississippi")

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Ted.Harding-2
See at end.

On 13-Jul-08 21:42:19, Johannes Huesing wrote:

> Ted Harding <[hidden email]> [Sun, Jul 13, 2008 at
> 10:59:21PM CEST]:
>> On 13-Jul-08 19:53:47, Johannes Huesing wrote:
>> > Frank E Harrell Jr <[hidden email]> [Sun, Jul 13, 2008 at
>> > 08:07:37PM CEST]:
>> >> (Ted Harding) wrote:
>> >>> On 13-Jul-08 13:29:13, Frank E Harrell Jr wrote:
>> >>>> [...]
>> >>>> A large P-value means nothing more than needing more data.  No  
>> >>>> conclusion is possible.  
> [...]
>
>> But "absence
>> of evidence", in my interpretation (which I believe is right for
>> the statistical context of "non-significant P-values"), means that
>> we do not know about A: we do not have enough information.
>>
>
> What would the p-value have to be like in your opinion to make the
> null hypothesis look more likely after the experiment than before?
>
>> The proof is, basically, given in terms of a 2-valued logic where
>> every term is either TRUE or FALSE. In the real world we have at
>> least a third possible value: UNKNOWN (or, as R would put it, NA).
>
> How would the probabilities that A is NA be affected by the outcome
> of an experiment like this? If this probability is affected, how
> does this leave the probability that A is T or F unaffected?
>
> Or do you assign the NA status to the data collected?
>
> A high p-value does not always equate that you might as well have
> collected nothing but missing values.
>
> Of course I buy into the notion that a point estimate with a measure
> of accuracy is much better suited to describe your data; but a
> high p-value as a result of a test procedure that can be claimed to
> be adequately powered may defensibly be taken as a hint that we
> can for now stick with the null hypothesis.
> --
> Johannes Hüsing

I shall perhaps try later to respond in more detail to specific
points above. But, for the moment, let me say that I think your
statement "a high p-value as a result of a test procedure that
can be claimed to be adequately powered may defensibly be taken
as a hint that we can for now stick with the null hypothesis"
is the main key.

The power function of a test (which of course depends on the
design of the investigation and on its size, i.e. number of
data gathered) is basically much the same (in my mind) as the
amount of evidence.

A high P-value with a very powerful test serves to exclude
all alternatives to the Null Hypothesis except those which
lie very close to the Null Hypothesis.

In that sense, we do in fact have a lot of evidence against
all hypotheses except those which are very similar to the Null.
So we are not in an "absence of evidence" situation, and we
do have "evidence of absence".

The basic logic of a Hypothesis Test (in its standard sense)
is the generalisation, to a logic where certainty is at best
probabilistic, of the classical-logic argument:

Given (as a matter of fact): If A, then B
Observed: B is FALSE
Conclusion: A is FALSE

Probabilistically:
Given: If A (H0), then B has high probability
Observed: B is FALSE
Conclusion: An event (not-B) has occurred which has very
small probability if A is TRUE. Hence we (as George Barnard
used to put it) apply "The Principle of Disbelief in Tall Stories"
and disbelieve A to the extent that we disbelieve not-B as
a possible outcome from A (H0).

In applications, the event B will be specified in terms of
a set of possible values of a Test Statistic T, devised so
as to represent an interesting measure of discrepancy between
the data and the hypothesis H0 (e.g. the t-statistic for
testing whether two samples are drawn from populations with
equal means -- if that is the case, then E(T) = 0, and the
set of values {abs(T) > T0} will be a "discrepant set".

By choosing T0 to be such that Prob(abs(T) > T0) = p0, a small
value which we choose to suit ourselves, we are defining the
threshold at which we are prepared to deem that "the claim
that Abs(T) > T0 is compatible with H0" is too unlikely to
be plausible.

The cleanest example in real life can be drawn from the basic
principle in criminal law for concluding that an accused person
is guilty, namely "The accused is deemed innocent until proved
guilty beyond reasonable doubt".

What constitutes "reasonable doubt" can become a very interesting
question, but there are some crimes for which it has a definite
statistical interpretation, typically exceeding some authorised
limit (of speed in a vehicle, of alcohol content in the blood
while driving a vehicle, of a factory plant exceeding permitted
levels of polluting emissions [which in the UK, under the
Environmental Protection Act, is a criminal offence].

In the days when blood alcohol was determined by laboratory
analysis of a blood sample, it was possible to determine that
the "margin of error" corresponded to a P-value less than or
equal to 0.001 (i.e. if the lab analysis yielded a result in
exceess of the legal limit + 2*SE, then the inevitable result
was a conviction unless it could be independently proved in
defence that the statutory procedures were carried out in a
flawed manner).

So, in that case, "beyond reasonable doubt" meant "The P-value
of the data was less that 1/1000".

But, if the lab analysis gave 80mg/100ml (the legal limit in
the UK), then at best you can conclude that the result equally
favoured any two hypotheses equidistant on either side of the
legal limit. But while this constitutes (in the sense explained)
absence of evidence for guilt (i.e. alc > 80), it certainly
does not exclude it (someone at 81, and therefore truly guilty,
could be quite likely to give a result of 80). So the "80" result
is not evidence of innocence -- it is merely lack of evidence of
guilt.

It gets worse with the environmental pollution situation. For
the blood alcohol and the lab analysis of a blood sample, the
lab procedure is only legally valid if it consistently achieves
an SE of determination of 2% or less (taken as 2mg/100ml for
results below 100).

Thus the power function has Power(alc) = 0.001 at alc=80,
Power(alc) = 0.5  at alc=86, Power(alc) = 0.999 at alc=92.
Thus the innocent (alc <= 80) have a good protection against
false conviction; the marginally guilty (alc < 86, say)
are likely to get away with it; the seriously guilty (alc > 92)
are almost certain to be convicted.

However, the kinds of measurement which can be made of, say,
atmospheric pollution are subject to SEs which are more like 20%
and are often higher (50% or more). To achieve the requisite
"beyond reasonable doubt" (since it is a criminal offence) on
the same criterion (3*SE above) means that the procedure is only
effective when the emission is say twice the permitted level
(or even more). Here we have lack of evidence in a very real
sense (the procedure is weak). It would be quite possible for
a polluter  emit well above the permitted level, yet the sampling
give a result well below the permitted level. Hence, such absence
of evidence is certainly not evidence of absence.

And, if I understand correctly, this is pretty much what Frank
Harrell meant when he wrote "A large P-value means nothing more
than needing more data. No conclusion is possible.  Please read
the classic paper Absence of  Evidence is not Evidence for Absence."
[Or "better data", one might add]. But it does need to be qualified
(as I try to do above) by consideration of whereabouts on the
"effect" scale the procedure becomes capable of doing its job,
which in turn brings in issues about the importance (in real life)
of the sort of departure from H0 that it is important to detect.
The blood-alcohol test does a reasonably good job (one is prepared
to accept a relatively narrow "grey area" where any conclusion
is unclear). The pollution test does not.

Mustn't go on too long!

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jul-08                                       Time: 00:16:50
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Greg Snow-2
In reply to this post by Bunny, lautloscrew.com
For those people who feel the need for a p-value to test normality on large sample sizes, I propose the following test/function:

SnowsPenultimateNormalityTest <- function(x){

        # the following function works for current implementations of R
        # to my knowledge, eventually it may need to be expanded
        is.rational <- function(x){
                rep( TRUE, length(x) )
        }

        tmp.p <- if( any(is.rational(x))) {
                0
        } else {
                # current implementation will not get here
                # this part is reserved for the ultimate test
                1
        }

        out <- list(
                p.value = tmp.p,
                alternative = strwrap(paste('The data does not come from a',
        'strict normal distribution (but may represent a distribution',
        'that is close enough)'), prefix="\n\t"),
                method = "Snow's Penultimate Normality Test",
                data.name = deparse(substitute(x))
        )

        class(out) <- 'htest'
        out
}


Now that the need for a p-value is satisfied, we can get onto the more useful questions mentioned in this thread and other places.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
(801) 408-8111



> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Bunny,
> lautloscrew.com
> Sent: Saturday, July 12, 2008 10:20 AM
> To: Mark Leeds
> Cc: [hidden email]
> Subject: Re: [R] shapiro wilk normality test
>
> Hmm thanks,
> But on the other hand it just says i cant reject normality,
> which doesnt really mean it is normal. Wouldn´t be nice to
> test for non- normality ? if i´d reject that a high level i
> could be pretty sure it ´s normal... ??
>
> thanks in advance
>
> matthias
> Am 12.07.2008 um 18:10 schrieb Mark Leeds:
>
> > Hi: If normality is the HO, then the test below says don't reject (
> > large p value ).  Check out any multivariate text for what
> the null of
> > the shapiro test is. I don't know for sure but, from below, it sure
> > looks like HO is normality. Or google for it.
> >
> >
> >
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]
> > ] On
> > Behalf Of Bunny, lautloscrew.com
> > Sent: Saturday, July 12, 2008 11:30 AM
> > To: [hidden email]
> > Subject: [R] shapiro wilk normality test
> >
> > Hi everybody,
> >
> > somehow i dont get the shapiro wilk test for normality. i
> just can´t
> > find what the H0 is .
> >
> > i tried :
> >
> >  shapiro.test(rnorm(5000))
> >
> >       Shapiro-Wilk normality test
> >
> > data:  rnorm(5000)
> > W = 0.9997, p-value = 0.6205
> >
> >
> > If normality is the H0, the test says it´s probably not
> normal, doesn
> > ´t it ?
> >
> > 5000 is the biggest n allowed by the test...
> >
> > are there any other test ? ( i know qqnorm already ;)
> >
> > thanks in advance
> >
> > matthias
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: shapiro wilk normality test

Emmanuel Charpentier
This one should (I am tempted to write "must") make its way to fortune
()...

Thankyouthankyouthankyou ...

                                        Emmanuel Charpentier

On Mon, 14 Jul 2008 14:58:13 -0600, Greg Snow wrote :

> For those people who feel the need for a p-value to test normality on
> large sample sizes, I propose the following test/function:

[ Snip ... ]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...