Quantcast

normality tests

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

normality tests

gatemaze
Hi all,

apologies for seeking advice on a general stats question. I ve run
normality tests using 8 different methods:
- Lilliefors
- Shapiro-Wilk
- Robust Jarque Bera
- Jarque Bera
- Anderson-Darling
- Pearson chi-square
- Cramer-von Mises
- Shapiro-Francia

All show that the null hypothesis that the data come from a normal
distro cannot be rejected. Great. However, I don't think it looks nice
to report the values of 8 different tests on a report. One note is
that my sample size is really tiny (less than 20 independent cases).
Without wanting to start a flame war, are there any advices of which
one/ones would be more appropriate and should be reported (along with
a Q-Q plot). Thank you.

Regards,

--
yianni

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests

Frank Harrell
[hidden email] wrote:

> Hi all,
>
> apologies for seeking advice on a general stats question. I ve run
> normality tests using 8 different methods:
> - Lilliefors
> - Shapiro-Wilk
> - Robust Jarque Bera
> - Jarque Bera
> - Anderson-Darling
> - Pearson chi-square
> - Cramer-von Mises
> - Shapiro-Francia
>
> All show that the null hypothesis that the data come from a normal
> distro cannot be rejected. Great. However, I don't think it looks nice
> to report the values of 8 different tests on a report. One note is
> that my sample size is really tiny (less than 20 independent cases).
> Without wanting to start a flame war, are there any advices of which
> one/ones would be more appropriate and should be reported (along with
> a Q-Q plot). Thank you.
>
> Regards,
>

Wow - I have so many concerns with that approach that it's hard to know
where to begin.  But first of all, why care about normality?  Why not
use distribution-free methods?

You should examine the power of the tests for n=20.  You'll probably
find it's not good enough to reach a reliable conclusion.

Frank


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests

gatemaze
On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:

> [hidden email] wrote:
> > Hi all,
> >
> > apologies for seeking advice on a general stats question. I ve run
> > normality tests using 8 different methods:
> > - Lilliefors
> > - Shapiro-Wilk
> > - Robust Jarque Bera
> > - Jarque Bera
> > - Anderson-Darling
> > - Pearson chi-square
> > - Cramer-von Mises
> > - Shapiro-Francia
> >
> > All show that the null hypothesis that the data come from a normal
> > distro cannot be rejected. Great. However, I don't think it looks nice
> > to report the values of 8 different tests on a report. One note is
> > that my sample size is really tiny (less than 20 independent cases).
> > Without wanting to start a flame war, are there any advices of which
> > one/ones would be more appropriate and should be reported (along with
> > a Q-Q plot). Thank you.
> >
> > Regards,
> >
>
> Wow - I have so many concerns with that approach that it's hard to know
> where to begin.  But first of all, why care about normality?  Why not
> use distribution-free methods?
>
> You should examine the power of the tests for n=20.  You'll probably
> find it's not good enough to reach a reliable conclusion.

And wouldn't it be even worse if I used non-parametric tests?

>
> Frank
>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>


--
yianni

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Liaw, Andy
From: [hidden email]

>
> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > [hidden email] wrote:
> > > Hi all,
> > >
> > > apologies for seeking advice on a general stats question. I ve run
> > > normality tests using 8 different methods:
> > > - Lilliefors
> > > - Shapiro-Wilk
> > > - Robust Jarque Bera
> > > - Jarque Bera
> > > - Anderson-Darling
> > > - Pearson chi-square
> > > - Cramer-von Mises
> > > - Shapiro-Francia
> > >
> > > All show that the null hypothesis that the data come from a normal
> > > distro cannot be rejected. Great. However, I don't think
> it looks nice
> > > to report the values of 8 different tests on a report. One note is
> > > that my sample size is really tiny (less than 20
> independent cases).
> > > Without wanting to start a flame war, are there any
> advices of which
> > > one/ones would be more appropriate and should be reported
> (along with
> > > a Q-Q plot). Thank you.
> > >
> > > Regards,
> > >
> >
> > Wow - I have so many concerns with that approach that it's
> hard to know
> > where to begin.  But first of all, why care about
> normality?  Why not
> > use distribution-free methods?
> >
> > You should examine the power of the tests for n=20.  You'll probably
> > find it's not good enough to reach a reliable conclusion.
>
> And wouldn't it be even worse if I used non-parametric tests?

I believe what Frank meant was that it's probably better to use a
distribution-free procedure to do the real test of interest (if there is
one) instead of testing for normality, and then use a test that assumes
normality.

I guess the question is, what exactly do you want to do with the outcome
of the normality tests?  If those are going to be used as basis for
deciding which test(s) to do next, then I concur with Frank's
reservation.

Generally speaking, I do not find goodness-of-fit for distributions very
useful, mostly for the reason that failure to reject the null is no
evidence in favor of the null.  It's difficult for me to imagine why
"there's insufficient evidence to show that the data did not come from a
normal distribution" would be interesting.

Andy

 

> >
> > Frank
> >
> >
> > --
> > Frank E Harrell Jr   Professor and Chair           School
> of Medicine
> >                       Department of Biostatistics  
> Vanderbilt University
> >
>
>
> --
> yianni
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Lucke, Joseph F
 Most standard tests, such as t-tests and ANOVA, are fairly resistant to
non-normalilty for significance testing. It's the sample means that have
to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
for normality prior to choosing a test statistic is generally not a good
idea.

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Liaw, Andy
Sent: Friday, May 25, 2007 12:04 PM
To: [hidden email]; Frank E Harrell Jr
Cc: r-help
Subject: Re: [R] normality tests [Broadcast]

From: [hidden email]
>
> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > [hidden email] wrote:
> > > Hi all,
> > >
> > > apologies for seeking advice on a general stats question. I ve run

> > > normality tests using 8 different methods:
> > > - Lilliefors
> > > - Shapiro-Wilk
> > > - Robust Jarque Bera
> > > - Jarque Bera
> > > - Anderson-Darling
> > > - Pearson chi-square
> > > - Cramer-von Mises
> > > - Shapiro-Francia
> > >
> > > All show that the null hypothesis that the data come from a normal

> > > distro cannot be rejected. Great. However, I don't think
> it looks nice
> > > to report the values of 8 different tests on a report. One note is

> > > that my sample size is really tiny (less than 20
> independent cases).
> > > Without wanting to start a flame war, are there any
> advices of which
> > > one/ones would be more appropriate and should be reported
> (along with
> > > a Q-Q plot). Thank you.
> > >
> > > Regards,
> > >
> >
> > Wow - I have so many concerns with that approach that it's
> hard to know
> > where to begin.  But first of all, why care about
> normality?  Why not
> > use distribution-free methods?
> >
> > You should examine the power of the tests for n=20.  You'll probably

> > find it's not good enough to reach a reliable conclusion.
>
> And wouldn't it be even worse if I used non-parametric tests?

I believe what Frank meant was that it's probably better to use a
distribution-free procedure to do the real test of interest (if there is
one) instead of testing for normality, and then use a test that assumes
normality.

I guess the question is, what exactly do you want to do with the outcome
of the normality tests?  If those are going to be used as basis for
deciding which test(s) to do next, then I concur with Frank's
reservation.

Generally speaking, I do not find goodness-of-fit for distributions very
useful, mostly for the reason that failure to reject the null is no
evidence in favor of the null.  It's difficult for me to imagine why
"there's insufficient evidence to show that the data did not come from a
normal distribution" would be interesting.

Andy

 

> >
> > Frank
> >
> >
> > --
> > Frank E Harrell Jr   Professor and Chair           School
> of Medicine
> >                       Department of Biostatistics  
> Vanderbilt University
> >
>
>
> --
> yianni
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>


------------------------------------------------------------------------
------
Notice:  This e-mail message, together with any
attachments,...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

gatemaze
Thank you all for your replies.... they have been more useful... well
in my case I have chosen to do some parametric tests (more precisely
correlation and linear regressions among some variables)... so it
would be nice if I had an extra bit of support on my decisions... If I
understood well from all your replies... I shouldn't pay soooo much
attntion on the normality tests, so it wouldn't matter which one/ones
I use to report... but rather focus on issues such as the power of the
test...

Thanks again.

On 25/05/07, Lucke, Joseph F <[hidden email]> wrote:

>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> non-normalilty for significance testing. It's the sample means that have
> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> for normality prior to choosing a test statistic is generally not a good
> idea.
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Liaw, Andy
> Sent: Friday, May 25, 2007 12:04 PM
> To: [hidden email]; Frank E Harrell Jr
> Cc: r-help
> Subject: Re: [R] normality tests [Broadcast]
>
> From: [hidden email]
> >
> > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > > [hidden email] wrote:
> > > > Hi all,
> > > >
> > > > apologies for seeking advice on a general stats question. I ve run
>
> > > > normality tests using 8 different methods:
> > > > - Lilliefors
> > > > - Shapiro-Wilk
> > > > - Robust Jarque Bera
> > > > - Jarque Bera
> > > > - Anderson-Darling
> > > > - Pearson chi-square
> > > > - Cramer-von Mises
> > > > - Shapiro-Francia
> > > >
> > > > All show that the null hypothesis that the data come from a normal
>
> > > > distro cannot be rejected. Great. However, I don't think
> > it looks nice
> > > > to report the values of 8 different tests on a report. One note is
>
> > > > that my sample size is really tiny (less than 20
> > independent cases).
> > > > Without wanting to start a flame war, are there any
> > advices of which
> > > > one/ones would be more appropriate and should be reported
> > (along with
> > > > a Q-Q plot). Thank you.
> > > >
> > > > Regards,
> > > >
> > >
> > > Wow - I have so many concerns with that approach that it's
> > hard to know
> > > where to begin.  But first of all, why care about
> > normality?  Why not
> > > use distribution-free methods?
> > >
> > > You should examine the power of the tests for n=20.  You'll probably
>
> > > find it's not good enough to reach a reliable conclusion.
> >
> > And wouldn't it be even worse if I used non-parametric tests?
>
> I believe what Frank meant was that it's probably better to use a
> distribution-free procedure to do the real test of interest (if there is
> one) instead of testing for normality, and then use a test that assumes
> normality.
>
> I guess the question is, what exactly do you want to do with the outcome
> of the normality tests?  If those are going to be used as basis for
> deciding which test(s) to do next, then I concur with Frank's
> reservation.
>
> Generally speaking, I do not find goodness-of-fit for distributions very
> useful, mostly for the reason that failure to reject the null is no
> evidence in favor of the null.  It's difficult for me to imagine why
> "there's insufficient evidence to show that the data did not come from a
> normal distribution" would be interesting.
>
> Andy
>
>
> > >
> > > Frank
> > >
> > >
> > > --
> > > Frank E Harrell Jr   Professor and Chair           School
> > of Medicine
> > >                       Department of Biostatistics
> > Vanderbilt University
> > >
> >
> >
> > --
> > yianni
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> ------------------------------------------------------------------------
> ------
> Notice:  This e-mail message, together with any
> attachments,...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
yianni

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Frank Harrell
In reply to this post by Lucke, Joseph F
Lucke, Joseph F wrote:
>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> non-normalilty for significance testing. It's the sample means that have
> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> for normality prior to choosing a test statistic is generally not a good
> idea.

I beg to differ Joseph.  I have had many datasets in which the CLT was
of no use whatsoever, i.e., where bootstrap confidence limits were
asymmetric because the data were so skewed, and where symmetric
normality-based confidence intervals had bad coverage in both tails
(though correct on the average).  I see this the opposite way:
nonparametric tests works fine if normality holds.

Note that the CLT helps with type I error but not so much with type II
error.

Frank

>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Liaw, Andy
> Sent: Friday, May 25, 2007 12:04 PM
> To: [hidden email]; Frank E Harrell Jr
> Cc: r-help
> Subject: Re: [R] normality tests [Broadcast]
>
> From: [hidden email]
>> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
>>> [hidden email] wrote:
>>>> Hi all,
>>>>
>>>> apologies for seeking advice on a general stats question. I ve run
>
>>>> normality tests using 8 different methods:
>>>> - Lilliefors
>>>> - Shapiro-Wilk
>>>> - Robust Jarque Bera
>>>> - Jarque Bera
>>>> - Anderson-Darling
>>>> - Pearson chi-square
>>>> - Cramer-von Mises
>>>> - Shapiro-Francia
>>>>
>>>> All show that the null hypothesis that the data come from a normal
>
>>>> distro cannot be rejected. Great. However, I don't think
>> it looks nice
>>>> to report the values of 8 different tests on a report. One note is
>
>>>> that my sample size is really tiny (less than 20
>> independent cases).
>>>> Without wanting to start a flame war, are there any
>> advices of which
>>>> one/ones would be more appropriate and should be reported
>> (along with
>>>> a Q-Q plot). Thank you.
>>>>
>>>> Regards,
>>>>
>>> Wow - I have so many concerns with that approach that it's
>> hard to know
>>> where to begin.  But first of all, why care about
>> normality?  Why not
>>> use distribution-free methods?
>>>
>>> You should examine the power of the tests for n=20.  You'll probably
>
>>> find it's not good enough to reach a reliable conclusion.
>> And wouldn't it be even worse if I used non-parametric tests?
>
> I believe what Frank meant was that it's probably better to use a
> distribution-free procedure to do the real test of interest (if there is
> one) instead of testing for normality, and then use a test that assumes
> normality.
>
> I guess the question is, what exactly do you want to do with the outcome
> of the normality tests?  If those are going to be used as basis for
> deciding which test(s) to do next, then I concur with Frank's
> reservation.
>
> Generally speaking, I do not find goodness-of-fit for distributions very
> useful, mostly for the reason that failure to reject the null is no
> evidence in favor of the null.  It's difficult for me to imagine why
> "there's insufficient evidence to show that the data did not come from a
> normal distribution" would be interesting.
>
> Andy
>
>  
>>> Frank
>>>
>>>
>>> --
>>> Frank E Harrell Jr   Professor and Chair           School
>> of Medicine
>>>                       Department of Biostatistics  
>> Vanderbilt University
>>
>> --
>> yianni
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>
> ------------------------------------------------------------------------
> ------
> Notice:  This e-mail message, together with any
> attachments,...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Frank Harrell
In reply to this post by gatemaze
[hidden email] wrote:
> Thank you all for your replies.... they have been more useful... well
> in my case I have chosen to do some parametric tests (more precisely
> correlation and linear regressions among some variables)... so it
> would be nice if I had an extra bit of support on my decisions... If I
> understood well from all your replies... I shouldn't pay soooo much
> attntion on the normality tests, so it wouldn't matter which one/ones
> I use to report... but rather focus on issues such as the power of the
> test...

If doing regression I assume your normality tests were on residuals
rather than raw data.

Frank

>
> Thanks again.
>
> On 25/05/07, Lucke, Joseph F <[hidden email]> wrote:
>>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
>> non-normalilty for significance testing. It's the sample means that have
>> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
>> for normality prior to choosing a test statistic is generally not a good
>> idea.
>>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Liaw, Andy
>> Sent: Friday, May 25, 2007 12:04 PM
>> To: [hidden email]; Frank E Harrell Jr
>> Cc: r-help
>> Subject: Re: [R] normality tests [Broadcast]
>>
>> From: [hidden email]
>> >
>> > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
>> > > [hidden email] wrote:
>> > > > Hi all,
>> > > >
>> > > > apologies for seeking advice on a general stats question. I ve run
>>
>> > > > normality tests using 8 different methods:
>> > > > - Lilliefors
>> > > > - Shapiro-Wilk
>> > > > - Robust Jarque Bera
>> > > > - Jarque Bera
>> > > > - Anderson-Darling
>> > > > - Pearson chi-square
>> > > > - Cramer-von Mises
>> > > > - Shapiro-Francia
>> > > >
>> > > > All show that the null hypothesis that the data come from a normal
>>
>> > > > distro cannot be rejected. Great. However, I don't think
>> > it looks nice
>> > > > to report the values of 8 different tests on a report. One note is
>>
>> > > > that my sample size is really tiny (less than 20
>> > independent cases).
>> > > > Without wanting to start a flame war, are there any
>> > advices of which
>> > > > one/ones would be more appropriate and should be reported
>> > (along with
>> > > > a Q-Q plot). Thank you.
>> > > >
>> > > > Regards,
>> > > >
>> > >
>> > > Wow - I have so many concerns with that approach that it's
>> > hard to know
>> > > where to begin.  But first of all, why care about
>> > normality?  Why not
>> > > use distribution-free methods?
>> > >
>> > > You should examine the power of the tests for n=20.  You'll probably
>>
>> > > find it's not good enough to reach a reliable conclusion.
>> >
>> > And wouldn't it be even worse if I used non-parametric tests?
>>
>> I believe what Frank meant was that it's probably better to use a
>> distribution-free procedure to do the real test of interest (if there is
>> one) instead of testing for normality, and then use a test that assumes
>> normality.
>>
>> I guess the question is, what exactly do you want to do with the outcome
>> of the normality tests?  If those are going to be used as basis for
>> deciding which test(s) to do next, then I concur with Frank's
>> reservation.
>>
>> Generally speaking, I do not find goodness-of-fit for distributions very
>> useful, mostly for the reason that failure to reject the null is no
>> evidence in favor of the null.  It's difficult for me to imagine why
>> "there's insufficient evidence to show that the data did not come from a
>> normal distribution" would be interesting.
>>
>> Andy
>>
>>
>> > >
>> > > Frank
>> > >
>> > >
>> > > --
>> > > Frank E Harrell Jr   Professor and Chair           School
>> > of Medicine
>> > >                       Department of Biostatistics
>> > Vanderbilt University
>> > >
>> >
>> >
>> > --
>> > yianni
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------
>> ------
>> Notice:  This e-mail message, together with any
>> attachments,...{{dropped}}
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

wssecn
In reply to this post by gatemaze
 The normality of the residuals is important in the inference procedures for the classical linear regression model, and normality is very important in correlation analysis (second moment)...

Washington S. Silva

> Thank you all for your replies.... they have been more useful... well
> in my case I have chosen to do some parametric tests (more precisely
> correlation and linear regressions among some variables)... so it
> would be nice if I had an extra bit of support on my decisions... If I
> understood well from all your replies... I shouldn't pay soooo much
> attntion on the normality tests, so it wouldn't matter which one/ones
> I use to report... but rather focus on issues such as the power of the
> test...
>
> Thanks again.
>
> On 25/05/07, Lucke, Joseph F <[hidden email]> wrote:
> >  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> > non-normalilty for significance testing. It's the sample means that have
> > to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> > for normality prior to choosing a test statistic is generally not a good
> > idea.
> >
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of Liaw, Andy
> > Sent: Friday, May 25, 2007 12:04 PM
> > To: [hidden email]; Frank E Harrell Jr
> > Cc: r-help
> > Subject: Re: [R] normality tests [Broadcast]
> >
> > From: [hidden email]
> > >
> > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > > > [hidden email] wrote:
> > > > > Hi all,
> > > > >
> > > > > apologies for seeking advice on a general stats question. I ve run
> >
> > > > > normality tests using 8 different methods:
> > > > > - Lilliefors
> > > > > - Shapiro-Wilk
> > > > > - Robust Jarque Bera
> > > > > - Jarque Bera
> > > > > - Anderson-Darling
> > > > > - Pearson chi-square
> > > > > - Cramer-von Mises
> > > > > - Shapiro-Francia
> > > > >
> > > > > All show that the null hypothesis that the data come from a normal
> >
> > > > > distro cannot be rejected. Great. However, I don't think
> > > it looks nice
> > > > > to report the values of 8 different tests on a report. One note is
> >
> > > > > that my sample size is really tiny (less than 20
> > > independent cases).
> > > > > Without wanting to start a flame war, are there any
> > > advices of which
> > > > > one/ones would be more appropriate and should be reported
> > > (along with
> > > > > a Q-Q plot). Thank you.
> > > > >
> > > > > Regards,
> > > > >
> > > >
> > > > Wow - I have so many concerns with that approach that it's
> > > hard to know
> > > > where to begin.  But first of all, why care about
> > > normality?  Why not
> > > > use distribution-free methods?
> > > >
> > > > You should examine the power of the tests for n=20.  You'll probably
> >
> > > > find it's not good enough to reach a reliable conclusion.
> > >
> > > And wouldn't it be even worse if I used non-parametric tests?
> >
> > I believe what Frank meant was that it's probably better to use a
> > distribution-free procedure to do the real test of interest (if there is
> > one) instead of testing for normality, and then use a test that assumes
> > normality.
> >
> > I guess the question is, what exactly do you want to do with the outcome
> > of the normality tests?  If those are going to be used as basis for
> > deciding which test(s) to do next, then I concur with Frank's
> > reservation.
> >
> > Generally speaking, I do not find goodness-of-fit for distributions very
> > useful, mostly for the reason that failure to reject the null is no
> > evidence in favor of the null.  It's difficult for me to imagine why
> > "there's insufficient evidence to show that the data did not come from a
> > normal distribution" would be interesting.
> >
> > Andy
> >
> >
> > > >
> > > > Frank
> > > >
> > > >
> > > > --
> > > > Frank E Harrell Jr   Professor and Chair           School
> > > of Medicine
> > > >                       Department of Biostatistics
> > > Vanderbilt University
> > > >
> > >
> > >
> > > --
> > > yianni
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> >
> >
> > ------------------------------------------------------------------------
> > ------
> > Notice:  This e-mail message, together with any
> > attachments,...{{dropped}}
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> yianni
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Cody Hamilton
In reply to this post by gatemaze

You can also try validating your regression model via the bootstrap (the
validate() function in the Design library is very helpful).  To my mind
that would be much more reassuring than normality tests performed on twenty
residuals.

By the way, be careful with the correlation test - it's only good at
detecting linear relationships between two variables (i.e. not helpful for
detecting non-linear relationships).

Regards,
   -Cody

Cody Hamilton, PhD
Edwards Lifesciences


                                                                           
             [hidden email]                                            
             m                                                            
             Sent by:                                                   To
             r-help-bounces@st         "Lucke, Joseph F"                  
             at.math.ethz.ch           <[hidden email]>        
                                                                        cc
                                       r-help <[hidden email]>  
             05/25/2007 11:23                                      Subject
             AM                        Re: [R] normality tests [Broadcast]
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Thank you all for your replies.... they have been more useful... well
in my case I have chosen to do some parametric tests (more precisely
correlation and linear regressions among some variables)... so it
would be nice if I had an extra bit of support on my decisions... If I
understood well from all your replies... I shouldn't pay soooo much
attntion on the normality tests, so it wouldn't matter which one/ones
I use to report... but rather focus on issues such as the power of the
test...

Thanks again.

On 25/05/07, Lucke, Joseph F <[hidden email]> wrote:

>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> non-normalilty for significance testing. It's the sample means that have
> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> for normality prior to choosing a test statistic is generally not a good
> idea.
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Liaw, Andy
> Sent: Friday, May 25, 2007 12:04 PM
> To: [hidden email]; Frank E Harrell Jr
> Cc: r-help
> Subject: Re: [R] normality tests [Broadcast]
>
> From: [hidden email]
> >
> > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > > [hidden email] wrote:
> > > > Hi all,
> > > >
> > > > apologies for seeking advice on a general stats question. I ve run
>
> > > > normality tests using 8 different methods:
> > > > - Lilliefors
> > > > - Shapiro-Wilk
> > > > - Robust Jarque Bera
> > > > - Jarque Bera
> > > > - Anderson-Darling
> > > > - Pearson chi-square
> > > > - Cramer-von Mises
> > > > - Shapiro-Francia
> > > >
> > > > All show that the null hypothesis that the data come from a normal
>
> > > > distro cannot be rejected. Great. However, I don't think
> > it looks nice
> > > > to report the values of 8 different tests on a report. One note is
>
> > > > that my sample size is really tiny (less than 20
> > independent cases).
> > > > Without wanting to start a flame war, are there any
> > advices of which
> > > > one/ones would be more appropriate and should be reported
> > (along with
> > > > a Q-Q plot). Thank you.
> > > >
> > > > Regards,
> > > >
> > >
> > > Wow - I have so many concerns with that approach that it's
> > hard to know
> > > where to begin.  But first of all, why care about
> > normality?  Why not
> > > use distribution-free methods?
> > >
> > > You should examine the power of the tests for n=20.  You'll probably
>
> > > find it's not good enough to reach a reliable conclusion.
> >
> > And wouldn't it be even worse if I used non-parametric tests?
>
> I believe what Frank meant was that it's probably better to use a
> distribution-free procedure to do the real test of interest (if there is
> one) instead of testing for normality, and then use a test that assumes
> normality.
>
> I guess the question is, what exactly do you want to do with the outcome
> of the normality tests?  If those are going to be used as basis for
> deciding which test(s) to do next, then I concur with Frank's
> reservation.
>
> Generally speaking, I do not find goodness-of-fit for distributions very
> useful, mostly for the reason that failure to reject the null is no
> evidence in favor of the null.  It's difficult for me to imagine why
> "there's insufficient evidence to show that the data did not come from a
> normal distribution" would be interesting.
>
> Andy
>
>
> > >
> > > Frank
> > >
> > >
> > > --
> > > Frank E Harrell Jr   Professor and Chair           School
> > of Medicine
> > >                       Department of Biostatistics
> > Vanderbilt University
> > >
> >
> >
> > --
> > yianni
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> ------------------------------------------------------------------------
> ------
> Notice:  This e-mail message, together with any
> attachments,...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
yianni

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Cody Hamilton
In reply to this post by Frank Harrell

Following up on Frank's thought, why is it that parametric tests are so
much more popular than their non-parametric counterparts?  As
non-parametric tests require fewer assumptions, why aren't they the
default?  The relative efficiency of the Wilcoxon test as compared to the
t-test is 0.955, and yet I still see t-tests in the medical literature all
the time.  Granted, the Wilcoxon still requires the assumption of symmetry
(I'm curious as to why the Wilcoxon is often used when asymmetry is
suspected, since the Wilcoxon assumes symmetry), but that's less stringent
than requiring normally distributed data.  In a similar vein, one usually
sees the mean and standard deviation reported as summary statistics for a
continuous variable - these are not very informative unless you assume the
variable is normally distributed.  However, clinicians often insist that I
included these figures in reports.

Cody Hamilton, PhD
Edwards Lifesciences



                                                                           
             Frank E Harrell                                              
             Jr                                                            
             <f.harrell@vander                                          To
             bilt.edu>                 "Lucke, Joseph F"                  
             Sent by:                  <[hidden email]>        
             r-help-bounces@st                                          cc
             at.math.ethz.ch           r-help <[hidden email]>  
                                                                   Subject
                                       Re: [R] normality tests            
             05/25/2007 02:42          [Broadcast]                        
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Lucke, Joseph F wrote:
>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> non-normalilty for significance testing. It's the sample means that have
> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> for normality prior to choosing a test statistic is generally not a good
> idea.

I beg to differ Joseph.  I have had many datasets in which the CLT was
of no use whatsoever, i.e., where bootstrap confidence limits were
asymmetric because the data were so skewed, and where symmetric
normality-based confidence intervals had bad coverage in both tails
(though correct on the average).  I see this the opposite way:
nonparametric tests works fine if normality holds.

Note that the CLT helps with type I error but not so much with type II
error.

Frank

>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Liaw, Andy
> Sent: Friday, May 25, 2007 12:04 PM
> To: [hidden email]; Frank E Harrell Jr
> Cc: r-help
> Subject: Re: [R] normality tests [Broadcast]
>
> From: [hidden email]
>> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
>>> [hidden email] wrote:
>>>> Hi all,
>>>>
>>>> apologies for seeking advice on a general stats question. I ve run
>
>>>> normality tests using 8 different methods:
>>>> - Lilliefors
>>>> - Shapiro-Wilk
>>>> - Robust Jarque Bera
>>>> - Jarque Bera
>>>> - Anderson-Darling
>>>> - Pearson chi-square
>>>> - Cramer-von Mises
>>>> - Shapiro-Francia
>>>>
>>>> All show that the null hypothesis that the data come from a normal
>
>>>> distro cannot be rejected. Great. However, I don't think
>> it looks nice
>>>> to report the values of 8 different tests on a report. One note is
>
>>>> that my sample size is really tiny (less than 20
>> independent cases).
>>>> Without wanting to start a flame war, are there any
>> advices of which
>>>> one/ones would be more appropriate and should be reported
>> (along with
>>>> a Q-Q plot). Thank you.
>>>>
>>>> Regards,
>>>>
>>> Wow - I have so many concerns with that approach that it's
>> hard to know
>>> where to begin.  But first of all, why care about
>> normality?  Why not
>>> use distribution-free methods?
>>>
>>> You should examine the power of the tests for n=20.  You'll probably
>
>>> find it's not good enough to reach a reliable conclusion.
>> And wouldn't it be even worse if I used non-parametric tests?
>
> I believe what Frank meant was that it's probably better to use a
> distribution-free procedure to do the real test of interest (if there is
> one) instead of testing for normality, and then use a test that assumes
> normality.
>
> I guess the question is, what exactly do you want to do with the outcome
> of the normality tests?  If those are going to be used as basis for
> deciding which test(s) to do next, then I concur with Frank's
> reservation.
>
> Generally speaking, I do not find goodness-of-fit for distributions very
> useful, mostly for the reason that failure to reject the null is no
> evidence in favor of the null.  It's difficult for me to imagine why
> "there's insufficient evidence to show that the data did not come from a
> normal distribution" would be interesting.
>
> Andy
>
>
>>> Frank
>>>
>>>
>>> --
>>> Frank E Harrell Jr   Professor and Chair           School
>> of Medicine
>>>                       Department of Biostatistics
>> Vanderbilt University
>>
>> --
>> yianni
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>
> ------------------------------------------------------------------------
> ------
> Notice:  This e-mail message, together with any
> attachments,...{{dropped}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Frank Harrell
[hidden email] wrote:

> Following up on Frank's thought, why is it that parametric tests are so
> much more popular than their non-parametric counterparts?  As
> non-parametric tests require fewer assumptions, why aren't they the
> default?  The relative efficiency of the Wilcoxon test as compared to the
> t-test is 0.955, and yet I still see t-tests in the medical literature all
> the time.  Granted, the Wilcoxon still requires the assumption of symmetry
> (I'm curious as to why the Wilcoxon is often used when asymmetry is
> suspected, since the Wilcoxon assumes symmetry), but that's less stringent
> than requiring normally distributed data.  In a similar vein, one usually
> sees the mean and standard deviation reported as summary statistics for a
> continuous variable - these are not very informative unless you assume the
> variable is normally distributed.  However, clinicians often insist that I
> included these figures in reports.
>
> Cody Hamilton, PhD
> Edwards Lifesciences

Well said Cody, just want to add that Wilcoxon does not assume symmetry
if you are interested in testing for stochastic ordering and not just
for a mean.

Frank

>
>
>
>                                                                            
>              Frank E Harrell                                              
>              Jr                                                            
>              <f.harrell@vander                                          To
>              bilt.edu>                 "Lucke, Joseph F"                  
>              Sent by:                  <[hidden email]>        
>              r-help-bounces@st                                          cc
>              at.math.ethz.ch           r-help <[hidden email]>  
>                                                                    Subject
>                                        Re: [R] normality tests            
>              05/25/2007 02:42          [Broadcast]                        
>              PM                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>
>
>
>
> Lucke, Joseph F wrote:
>>  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
>> non-normalilty for significance testing. It's the sample means that have
>> to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
>> for normality prior to choosing a test statistic is generally not a good
>> idea.
>
> I beg to differ Joseph.  I have had many datasets in which the CLT was
> of no use whatsoever, i.e., where bootstrap confidence limits were
> asymmetric because the data were so skewed, and where symmetric
> normality-based confidence intervals had bad coverage in both tails
> (though correct on the average).  I see this the opposite way:
> nonparametric tests works fine if normality holds.
>
> Note that the CLT helps with type I error but not so much with type II
> error.
>
> Frank
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Liaw, Andy
>> Sent: Friday, May 25, 2007 12:04 PM
>> To: [hidden email]; Frank E Harrell Jr
>> Cc: r-help
>> Subject: Re: [R] normality tests [Broadcast]
>>
>> From: [hidden email]
>>> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
>>>> [hidden email] wrote:
>>>>> Hi all,
>>>>>
>>>>> apologies for seeking advice on a general stats question. I ve run
>>>>> normality tests using 8 different methods:
>>>>> - Lilliefors
>>>>> - Shapiro-Wilk
>>>>> - Robust Jarque Bera
>>>>> - Jarque Bera
>>>>> - Anderson-Darling
>>>>> - Pearson chi-square
>>>>> - Cramer-von Mises
>>>>> - Shapiro-Francia
>>>>>
>>>>> All show that the null hypothesis that the data come from a normal
>>>>> distro cannot be rejected. Great. However, I don't think
>>> it looks nice
>>>>> to report the values of 8 different tests on a report. One note is
>>>>> that my sample size is really tiny (less than 20
>>> independent cases).
>>>>> Without wanting to start a flame war, are there any
>>> advices of which
>>>>> one/ones would be more appropriate and should be reported
>>> (along with
>>>>> a Q-Q plot). Thank you.
>>>>>
>>>>> Regards,
>>>>>
>>>> Wow - I have so many concerns with that approach that it's
>>> hard to know
>>>> where to begin.  But first of all, why care about
>>> normality?  Why not
>>>> use distribution-free methods?
>>>>
>>>> You should examine the power of the tests for n=20.  You'll probably
>>>> find it's not good enough to reach a reliable conclusion.
>>> And wouldn't it be even worse if I used non-parametric tests?
>> I believe what Frank meant was that it's probably better to use a
>> distribution-free procedure to do the real test of interest (if there is
>> one) instead of testing for normality, and then use a test that assumes
>> normality.
>>
>> I guess the question is, what exactly do you want to do with the outcome
>> of the normality tests?  If those are going to be used as basis for
>> deciding which test(s) to do next, then I concur with Frank's
>> reservation.
>>
>> Generally speaking, I do not find goodness-of-fit for distributions very
>> useful, mostly for the reason that failure to reject the null is no
>> evidence in favor of the null.  It's difficult for me to imagine why
>> "there's insufficient evidence to show that the data did not come from a
>> normal distribution" would be interesting.
>>
>> Andy
>>
>>
>>>> Frank
>>>>
>>>>
>>>> --
>>>> Frank E Harrell Jr   Professor and Chair           School
>>> of Medicine
>>>>                       Department of Biostatistics
>>> Vanderbilt University
>>>
>>> --
>>> yianni
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>> ------------------------------------------------------------------------
>> ------
>> Notice:  This e-mail message, together with any
>> attachments,...{{dropped}}
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests

John C Frain
In reply to this post by gatemaze
For small samples I would think that the Shapiro Wilk test is probably
the most powerful.  Chapter 7 of Thon (2002), "Testing for Normality",
Marcel Dekker,  contains a good summary of research in this area.  If
you have a specific alternative in view you might find s better test.

Regards

John

On 25/05/07, [hidden email] <[hidden email]> wrote:

> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > [hidden email] wrote:
> > > Hi all,
> > >
> > > apologies for seeking advice on a general stats question. I ve run
> > > normality tests using 8 different methods:
> > > - Lilliefors
> > > - Shapiro-Wilk
> > > - Robust Jarque Bera
> > > - Jarque Bera
> > > - Anderson-Darling
> > > - Pearson chi-square
> > > - Cramer-von Mises
> > > - Shapiro-Francia
> > >
> > > All show that the null hypothesis that the data come from a normal
> > > distro cannot be rejected. Great. However, I don't think it looks nice
> > > to report the values of 8 different tests on a report. One note is
> > > that my sample size is really tiny (less than 20 independent cases).
> > > Without wanting to start a flame war, are there any advices of which
> > > one/ones would be more appropriate and should be reported (along with
> > > a Q-Q plot). Thank you.
> > >
> > > Regards,
> > >
> >
> > Wow - I have so many concerns with that approach that it's hard to know
> > where to begin.  But first of all, why care about normality?  Why not
> > use distribution-free methods?
> >
> > You should examine the power of the tests for n=20.  You'll probably
> > find it's not good enough to reach a reliable conclusion.
>
> And wouldn't it be even worse if I used non-parametric tests?
>
> >
> > Frank
> >
> >
> > --
> > Frank E Harrell Jr   Professor and Chair           School of Medicine
> >                       Department of Biostatistics   Vanderbilt University
> >
>
>
> --
> yianni
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
John C Frain
Trinity College Dublin
Dublin 2
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:[hidden email]
mailto:[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Martin Maechler
In reply to this post by Lucke, Joseph F
>>>>> "LuckeJF" == Lucke, Joseph F <[hidden email]>
>>>>>     on Fri, 25 May 2007 12:29:49 -0500 writes:

    LuckeJF>  Most standard tests, such as t-tests and ANOVA,
    LuckeJF> are fairly resistant to non-normalilty for
    LuckeJF> significance testing. It's the sample means that
    LuckeJF> have to be normal, not the data.  The CLT kicks in
    LuckeJF> fairly quickly.

Even though such statements appear in too many (text)books,
that's just plain wrong practically:

Even though *level* of the t-test is resistant to non-normality,
the power is not at all!!  And that makes the t-test NON-robust!
It's an easy exercise to see that  lim T-statistic ---> 1  when
one observation goes to infinity, i.e., the t-test will never
reject when you have one extreme outlier; simple "proof" with R:

> t.test(11:20)

        One Sample t-test

data:  c(11:20)
t = 16.1892, df = 9, p-value = 5.805e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 13.33415 17.66585
sample estimates:
mean of x
     15.5

##   ---> unknown mean highly significantly different from 0
##   But

> t.test(c(11:20, 1000))

        One Sample t-test

data:  c(11:20, 1000)
t = 1.1731, df = 10, p-value = 0.2679
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -94.42776 304.42776
sample estimates:
mean of x
      105




    LuckeJF> Testing for normality prior to choosing a test
    LuckeJF> statistic is generally not a good idea.

Definitely. Or even: It's a very bad idea ...

Martin Maechler, ETH Zurich


    LuckeJF> -----Original Message----- From:
    LuckeJF> [hidden email]
    LuckeJF> [mailto:[hidden email]] On Behalf
    LuckeJF> Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM
    LuckeJF> To: [hidden email]; Frank E Harrell Jr Cc:
    LuckeJF> r-help Subject: Re: [R] normality tests [Broadcast]

    LuckeJF> From: [hidden email]
    >>  On 25/05/07, Frank E Harrell Jr
    >> <[hidden email]> wrote: > [hidden email]
    >> wrote: > > Hi all,
    >> > >
    >> > > apologies for seeking advice on a general stats
    >> question. I ve run

    >> > > normality tests using 8 different methods: > > -
    >> Lilliefors > > - Shapiro-Wilk > > - Robust Jarque Bera >
    >> > - Jarque Bera > > - Anderson-Darling > > - Pearson
    >> chi-square > > - Cramer-von Mises > > - Shapiro-Francia
    >> > >
    >> > > All show that the null hypothesis that the data come
    >> from a normal

    >> > > distro cannot be rejected. Great. However, I don't
    >> think it looks nice > > to report the values of 8
    >> different tests on a report. One note is

    >> > > that my sample size is really tiny (less than 20
    >> independent cases).  > > Without wanting to start a flame
    >> war, are there any advices of which > > one/ones would be
    >> more appropriate and should be reported (along with > > a
    >> Q-Q plot). Thank you.
    >> > >
    >> > > Regards,
    >> > >
    >> >
    >> > Wow - I have so many concerns with that approach that
    >> it's hard to know > where to begin.  But first of all,
    >> why care about normality?  Why not > use
    >> distribution-free methods?
    >> >
    >> > You should examine the power of the tests for n=20.
    >> You'll probably

    >> > find it's not good enough to reach a reliable
    >> conclusion.
    >>
    >> And wouldn't it be even worse if I used non-parametric
    >> tests?

    LuckeJF> I believe what Frank meant was that it's probably
    LuckeJF> better to use a distribution-free procedure to do
    LuckeJF> the real test of interest (if there is one) instead
    LuckeJF> of testing for normality, and then use a test that
    LuckeJF> assumes normality.

    LuckeJF> I guess the question is, what exactly do you want
    LuckeJF> to do with the outcome of the normality tests?  If
    LuckeJF> those are going to be used as basis for deciding
    LuckeJF> which test(s) to do next, then I concur with
    LuckeJF> Frank's reservation.

    LuckeJF> Generally speaking, I do not find goodness-of-fit
    LuckeJF> for distributions very useful, mostly for the
    LuckeJF> reason that failure to reject the null is no
    LuckeJF> evidence in favor of the null.  It's difficult for
    LuckeJF> me to imagine why "there's insufficient evidence to
    LuckeJF> show that the data did not come from a normal
    LuckeJF> distribution" would be interesting.

    LuckeJF> Andy

 
    >> > > Frank
    >> >
    >> >
    >> > --
    >> > Frank E Harrell Jr Professor and Chair School of
    >> Medicine > Department of Biostatistics Vanderbilt
    >> University
    >> >
    >>
    >>
    >> --
    >> yianni
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    >> read the posting guide
    >> http://www.R-project.org/posting-guide.html and provide
    >> commented, minimal, self-contained, reproducible code.
    >>
    >>
    >>


    LuckeJF> ------------------------------------------------------------------------
    LuckeJF> ------ Notice: This e-mail message, together with
    LuckeJF> any attachments,...{{dropped}}

    LuckeJF> ______________________________________________
    LuckeJF> [hidden email] mailing list
    LuckeJF> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
    LuckeJF> do read the posting guide
    LuckeJF> http://www.R-project.org/posting-guide.html and
    LuckeJF> provide commented, minimal, self-contained,
    LuckeJF> reproducible code.

    LuckeJF> ______________________________________________
    LuckeJF> [hidden email] mailing list
    LuckeJF> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE
    LuckeJF> do read the posting guide
    LuckeJF> http://www.R-project.org/posting-guide.html and
    LuckeJF> provide commented, minimal, self-contained,
    LuckeJF> reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Thomas Lumley
On Mon, 28 May 2007, Martin Maechler wrote:

>>>>>> "LuckeJF" == Lucke, Joseph F <[hidden email]>
>>>>>>     on Fri, 25 May 2007 12:29:49 -0500 writes:
>
>    LuckeJF>  Most standard tests, such as t-tests and ANOVA,
>    LuckeJF> are fairly resistant to non-normalilty for
>    LuckeJF> significance testing. It's the sample means that
>    LuckeJF> have to be normal, not the data.  The CLT kicks in
>    LuckeJF> fairly quickly.
>
> Even though such statements appear in too many (text)books,
> that's just plain wrong practically:
>
> Even though *level* of the t-test is resistant to non-normality,
> the power is not at all!!  And that makes the t-test NON-robust!

While it is true that this makes the t-test non-robust, it doesn't mean
that the statement is just plain wrong practically.

The issue really is more complicated than a lot of people claim (not you
specifically, Martin, but upthread and previous threads).

Starting with the demonstrable mathematical facts:
  - lots of rank tests are robust in the sense of Huber
  - rank tests are optimal for specific location-shift testing problems.
  - lots of rank tests have excellent power for location shift alternatives
     over a wide range of underlying distributions.
  - rank tests fail to be transitive when stochastic ordering is not
    assumed (they are not consistent with any ordering on all distributions)
  - rank tests do not lead to confidence intervals unless a location shift
    or similar one-dimensional family is assumed
  - No rank test is uniformly more powerful than any parametric test or
    vice versa (if we rule out pathological cases)
  - there is no rank test that is consistent precisely against a difference
    in means
  - the t-test (and essentially all tests) can be made distribution-free in
    large samples (for small values of 'large', usually)
  - being distribution-free does not guarantee robustness of power (for the
    t-test or for any other test)


Now, if we assume stochastic ordering is the Wilcoxon rank-sum test more
or less powerful than the t-test?  Everyone knows that this depends on the
null hypothesis distribution.  Fewer people seem to know that it also
depends on the alternative, especially in large samples.

Suppose the alternative of interest is not that the values are uniformly
larger by 1 unit, but that 5% of them are about 20 units larger.  The
Wilcoxon test -- precisely because it gives less weight to outliers --
will have lower power.  For example (ObR)

one.sim<-function(n, pct, delta){
  x<-rnorm(n)
  y<-rnorm(n)+delta*rbinom(n,1,pct)
  list(x=x,y=y)
  }

mean(replicate(100, {d<-one.sim(100,.05,20); t.test(d$x,d$y)$p.value})<0.05)
mean(replicate(100, {d<-one.sim(100,.05,20); wilcox.test(d$x,d$y)$p.value})<0.05)

mean(replicate(100, {d<-one.sim(100,.5,1); t.test(d$x,d$y)$p.value})<0.05)
mean(replicate(100, {d<-one.sim(100,.5,1); wilcox.test(d$x,d$y)$p.value})<0.05)


Since both relatively uniform shifts and large shifts of small fractions
are genuinely important alternatives in real problems it is true in
practice as well as in theory that neither the Wilcoxon nor the t-test is
uniformly superior.

This is without even considering violations of stochastic ordering --
which are not just esoteric pathologies, since it is quite plausible for a
treatment to benefit some people and harm others. For example, I've seen
one paper in which a Wilcoxon test on medical cost data was statistically
significant in the *opposite direction* to the difference in means.

This has been a long rant, but I keep encountering statisticians who think
anyone who ever recommends a t-test just needs to have the number 0.955
quoted to them.

<snip>
>
>    LuckeJF> Testing for normality prior to choosing a test
>    LuckeJF> statistic is generally not a good idea.
>
> Definitely. Or even: It's a very bad idea ...
>

I think that's something we can all agree on.

  -thomas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RODBC

Bill Szkotnicki
Hello,

I have installed R2.5.0 from sources ( x86_64 )
and added the package RODBC
and now I am trying to connect to a mysql database
In windows R after installing the 3.51 driver
and creating the dsn by specifying server, user, and password
it is easy to connect with
channel <- odbcConnect("dsn")

Does anyone know what needs to be done to make this work from linux?

Thanks, Bill

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RODBC

Prof Brian Ripley
yOn Mon, 28 May 2007, Bill Szkotnicki wrote:

> Hello,
>
> I have installed R2.5.0 from sources ( x86_64 )
> and added the package RODBC
> and now I am trying to connect to a mysql database
> In windows R after installing the 3.51 driver
> and creating the dsn by specifying server, user, and password
> it is easy to connect with
> channel <- odbcConnect("dsn")
>
> Does anyone know what needs to be done to make this work from linux?

Did you not read the RODBC README file?  It is described in some detail
with reference to tutorials.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RODBC

Bill Szkotnicki
I have now read the README file which I should have done before. :-[  
Sorry.
To summarize:
- Install the odbc connector driver (3.51)
- Set up the dsn in the file   .odbc.ini
- It works beautifully and RODBC is super!


Prof Brian Ripley wrote:

> yOn Mon, 28 May 2007, Bill Szkotnicki wrote:
>
>> Hello,
>>
>> I have installed R2.5.0 from sources ( x86_64 )
>> and added the package RODBC
>> and now I am trying to connect to a mysql database
>> In windows R after installing the 3.51 driver
>> and creating the dsn by specifying server, user, and password
>> it is easy to connect with
>> channel <- odbcConnect("dsn")
>>
>> Does anyone know what needs to be done to make this work from linux?
>
> Did you not read the RODBC README file?  It is described in some detail
> with reference to tutorials.
>

--
Bill Szkotnicki
Department of Animal and Poultry Science
University of Guelph
[hidden email]
(519)824-4120 Ext 52253

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: normality tests [Broadcast]

Bert Gunter
In reply to this post by wssecn
False. Box proved ~ca 1952 that standard inferences in the linear regression
model are robust to nonnormality, at least for (nearly) balanced designs.
The **crucial** assumption is independence, which I suspect partially
motivated his time series work on arima modeling. More recently, work on
hierarchical models (e.g. repeated measures/mixed effect models) has also
dealt with lack of independence.


Bert Gunter
Genentech Nonclinical Statistics


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of wssecn
Sent: Friday, May 25, 2007 2:59 PM
To: r-help
Subject: Re: [R] normality tests [Broadcast]

 The normality of the residuals is important in the inference procedures for
the classical linear regression model, and normality is very important in
correlation analysis (second moment)...

Washington S. Silva

> Thank you all for your replies.... they have been more useful... well
> in my case I have chosen to do some parametric tests (more precisely
> correlation and linear regressions among some variables)... so it
> would be nice if I had an extra bit of support on my decisions... If I
> understood well from all your replies... I shouldn't pay soooo much
> attntion on the normality tests, so it wouldn't matter which one/ones
> I use to report... but rather focus on issues such as the power of the
> test...
>
> Thanks again.
>
> On 25/05/07, Lucke, Joseph F <[hidden email]> wrote:
> >  Most standard tests, such as t-tests and ANOVA, are fairly resistant to
> > non-normalilty for significance testing. It's the sample means that have
> > to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
> > for normality prior to choosing a test statistic is generally not a good
> > idea.
> >
> > -----Original Message-----
> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of Liaw, Andy
> > Sent: Friday, May 25, 2007 12:04 PM
> > To: [hidden email]; Frank E Harrell Jr
> > Cc: r-help
> > Subject: Re: [R] normality tests [Broadcast]
> >
> > From: [hidden email]
> > >
> > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> > > > [hidden email] wrote:
> > > > > Hi all,
> > > > >
> > > > > apologies for seeking advice on a general stats question. I ve run
> >
> > > > > normality tests using 8 different methods:
> > > > > - Lilliefors
> > > > > - Shapiro-Wilk
> > > > > - Robust Jarque Bera
> > > > > - Jarque Bera
> > > > > - Anderson-Darling
> > > > > - Pearson chi-square
> > > > > - Cramer-von Mises
> > > > > - Shapiro-Francia
> > > > >
> > > > > All show that the null hypothesis that the data come from a normal
> >
> > > > > distro cannot be rejected. Great. However, I don't think
> > > it looks nice
> > > > > to report the values of 8 different tests on a report. One note is
> >
> > > > > that my sample size is really tiny (less than 20
> > > independent cases).
> > > > > Without wanting to start a flame war, are there any
> > > advices of which
> > > > > one/ones would be more appropriate and should be reported
> > > (along with
> > > > > a Q-Q plot). Thank you.
> > > > >
> > > > > Regards,
> > > > >
> > > >
> > > > Wow - I have so many concerns with that approach that it's
> > > hard to know
> > > > where to begin.  But first of all, why care about
> > > normality?  Why not
> > > > use distribution-free methods?
> > > >
> > > > You should examine the power of the tests for n=20.  You'll probably
> >
> > > > find it's not good enough to reach a reliable conclusion.
> > >
> > > And wouldn't it be even worse if I used non-parametric tests?
> >
> > I believe what Frank meant was that it's probably better to use a
> > distribution-free procedure to do the real test of interest (if there is
> > one) instead of testing for normality, and then use a test that assumes
> > normality.
> >
> > I guess the question is, what exactly do you want to do with the outcome
> > of the normality tests?  If those are going to be used as basis for
> > deciding which test(s) to do next, then I concur with Frank's
> > reservation.
> >
> > Generally speaking, I do not find goodness-of-fit for distributions very
> > useful, mostly for the reason that failure to reject the null is no
> > evidence in favor of the null.  It's difficult for me to imagine why
> > "there's insufficient evidence to show that the data did not come from a
> > normal distribution" would be interesting.
> >
> > Andy
> >
> >
> > > >
> > > > Frank
> > > >
> > > >
> > > > --
> > > > Frank E Harrell Jr   Professor and Chair           School
> > > of Medicine
> > > >                       Department of Biostatistics
> > > Vanderbilt University
> > > >
> > >
> > >
> > > --
> > > yianni
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> >
> >
> > ------------------------------------------------------------------------
> > ------
> > Notice:  This e-mail message, together with any
> > attachments,...{{dropped}}
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> yianni
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...