Hi all,
apologies for seeking advice on a general stats question. I ve run normality tests using 8 different methods: - Lilliefors - Shapiro-Wilk - Robust Jarque Bera - Jarque Bera - Anderson-Darling - Pearson chi-square - Cramer-von Mises - Shapiro-Francia All show that the null hypothesis that the data come from a normal distro cannot be rejected. Great. However, I don't think it looks nice to report the values of 8 different tests on a report. One note is that my sample size is really tiny (less than 20 independent cases). Without wanting to start a flame war, are there any advices of which one/ones would be more appropriate and should be reported (along with a Q-Q plot). Thank you. Regards, -- yianni ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
[hidden email] wrote:
> Hi all, > > apologies for seeking advice on a general stats question. I ve run > normality tests using 8 different methods: > - Lilliefors > - Shapiro-Wilk > - Robust Jarque Bera > - Jarque Bera > - Anderson-Darling > - Pearson chi-square > - Cramer-von Mises > - Shapiro-Francia > > All show that the null hypothesis that the data come from a normal > distro cannot be rejected. Great. However, I don't think it looks nice > to report the values of 8 different tests on a report. One note is > that my sample size is really tiny (less than 20 independent cases). > Without wanting to start a flame war, are there any advices of which > one/ones would be more appropriate and should be reported (along with > a Q-Q plot). Thank you. > > Regards, > Wow - I have so many concerns with that approach that it's hard to know where to begin. But first of all, why care about normality? Why not use distribution-free methods? You should examine the power of the tests for n=20. You'll probably find it's not good enough to reach a reliable conclusion. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote:
> [hidden email] wrote: > > Hi all, > > > > apologies for seeking advice on a general stats question. I ve run > > normality tests using 8 different methods: > > - Lilliefors > > - Shapiro-Wilk > > - Robust Jarque Bera > > - Jarque Bera > > - Anderson-Darling > > - Pearson chi-square > > - Cramer-von Mises > > - Shapiro-Francia > > > > All show that the null hypothesis that the data come from a normal > > distro cannot be rejected. Great. However, I don't think it looks nice > > to report the values of 8 different tests on a report. One note is > > that my sample size is really tiny (less than 20 independent cases). > > Without wanting to start a flame war, are there any advices of which > > one/ones would be more appropriate and should be reported (along with > > a Q-Q plot). Thank you. > > > > Regards, > > > > Wow - I have so many concerns with that approach that it's hard to know > where to begin. But first of all, why care about normality? Why not > use distribution-free methods? > > You should examine the power of the tests for n=20. You'll probably > find it's not good enough to reach a reliable conclusion. And wouldn't it be even worse if I used non-parametric tests? > > Frank > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > -- yianni ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
From: [hidden email]
> > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > [hidden email] wrote: > > > Hi all, > > > > > > apologies for seeking advice on a general stats question. I ve run > > > normality tests using 8 different methods: > > > - Lilliefors > > > - Shapiro-Wilk > > > - Robust Jarque Bera > > > - Jarque Bera > > > - Anderson-Darling > > > - Pearson chi-square > > > - Cramer-von Mises > > > - Shapiro-Francia > > > > > > All show that the null hypothesis that the data come from a normal > > > distro cannot be rejected. Great. However, I don't think > it looks nice > > > to report the values of 8 different tests on a report. One note is > > > that my sample size is really tiny (less than 20 > independent cases). > > > Without wanting to start a flame war, are there any > advices of which > > > one/ones would be more appropriate and should be reported > (along with > > > a Q-Q plot). Thank you. > > > > > > Regards, > > > > > > > Wow - I have so many concerns with that approach that it's > hard to know > > where to begin. But first of all, why care about > normality? Why not > > use distribution-free methods? > > > > You should examine the power of the tests for n=20. You'll probably > > find it's not good enough to reach a reliable conclusion. > > And wouldn't it be even worse if I used non-parametric tests? I believe what Frank meant was that it's probably better to use a distribution-free procedure to do the real test of interest (if there is one) instead of testing for normality, and then use a test that assumes normality. I guess the question is, what exactly do you want to do with the outcome of the normality tests? If those are going to be used as basis for deciding which test(s) to do next, then I concur with Frank's reservation. Generally speaking, I do not find goodness-of-fit for distributions very useful, mostly for the reason that failure to reject the null is no evidence in favor of the null. It's difficult for me to imagine why "there's insufficient evidence to show that the data did not come from a normal distribution" would be interesting. Andy > > > > Frank > > > > > > -- > > Frank E Harrell Jr Professor and Chair School > of Medicine > > Department of Biostatistics > Vanderbilt University > > > > > -- > yianni > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Most standard tests, such as t-tests and ANOVA, are fairly resistant to
non-normalilty for significance testing. It's the sample means that have to be normal, not the data. The CLT kicks in fairly quickly. Testing for normality prior to choosing a test statistic is generally not a good idea. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM To: [hidden email]; Frank E Harrell Jr Cc: r-help Subject: Re: [R] normality tests [Broadcast] From: [hidden email] > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > [hidden email] wrote: > > > Hi all, > > > > > > apologies for seeking advice on a general stats question. I ve run > > > normality tests using 8 different methods: > > > - Lilliefors > > > - Shapiro-Wilk > > > - Robust Jarque Bera > > > - Jarque Bera > > > - Anderson-Darling > > > - Pearson chi-square > > > - Cramer-von Mises > > > - Shapiro-Francia > > > > > > All show that the null hypothesis that the data come from a normal > > > distro cannot be rejected. Great. However, I don't think > it looks nice > > > to report the values of 8 different tests on a report. One note is > > > that my sample size is really tiny (less than 20 > independent cases). > > > Without wanting to start a flame war, are there any > advices of which > > > one/ones would be more appropriate and should be reported > (along with > > > a Q-Q plot). Thank you. > > > > > > Regards, > > > > > > > Wow - I have so many concerns with that approach that it's > hard to know > > where to begin. But first of all, why care about > normality? Why not > > use distribution-free methods? > > > > You should examine the power of the tests for n=20. You'll probably > > find it's not good enough to reach a reliable conclusion. > > And wouldn't it be even worse if I used non-parametric tests? I believe what Frank meant was that it's probably better to use a distribution-free procedure to do the real test of interest (if there is one) instead of testing for normality, and then use a test that assumes normality. I guess the question is, what exactly do you want to do with the outcome of the normality tests? If those are going to be used as basis for deciding which test(s) to do next, then I concur with Frank's reservation. Generally speaking, I do not find goodness-of-fit for distributions very useful, mostly for the reason that failure to reject the null is no evidence in favor of the null. It's difficult for me to imagine why "there's insufficient evidence to show that the data did not come from a normal distribution" would be interesting. Andy > > > > Frank > > > > > > -- > > Frank E Harrell Jr Professor and Chair School > of Medicine > > Department of Biostatistics > Vanderbilt University > > > > > -- > yianni > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------------------------------------------------ ------ Notice: This e-mail message, together with any attachments,...{{dropped}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Thank you all for your replies.... they have been more useful... well
in my case I have chosen to do some parametric tests (more precisely correlation and linear regressions among some variables)... so it would be nice if I had an extra bit of support on my decisions... If I understood well from all your replies... I shouldn't pay soooo much attntion on the normality tests, so it wouldn't matter which one/ones I use to report... but rather focus on issues such as the power of the test... Thanks again. On 25/05/07, Lucke, Joseph F <[hidden email]> wrote: > Most standard tests, such as t-tests and ANOVA, are fairly resistant to > non-normalilty for significance testing. It's the sample means that have > to be normal, not the data. The CLT kicks in fairly quickly. Testing > for normality prior to choosing a test statistic is generally not a good > idea. > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Liaw, Andy > Sent: Friday, May 25, 2007 12:04 PM > To: [hidden email]; Frank E Harrell Jr > Cc: r-help > Subject: Re: [R] normality tests [Broadcast] > > From: [hidden email] > > > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > > [hidden email] wrote: > > > > Hi all, > > > > > > > > apologies for seeking advice on a general stats question. I ve run > > > > > normality tests using 8 different methods: > > > > - Lilliefors > > > > - Shapiro-Wilk > > > > - Robust Jarque Bera > > > > - Jarque Bera > > > > - Anderson-Darling > > > > - Pearson chi-square > > > > - Cramer-von Mises > > > > - Shapiro-Francia > > > > > > > > All show that the null hypothesis that the data come from a normal > > > > > distro cannot be rejected. Great. However, I don't think > > it looks nice > > > > to report the values of 8 different tests on a report. One note is > > > > > that my sample size is really tiny (less than 20 > > independent cases). > > > > Without wanting to start a flame war, are there any > > advices of which > > > > one/ones would be more appropriate and should be reported > > (along with > > > > a Q-Q plot). Thank you. > > > > > > > > Regards, > > > > > > > > > > Wow - I have so many concerns with that approach that it's > > hard to know > > > where to begin. But first of all, why care about > > normality? Why not > > > use distribution-free methods? > > > > > > You should examine the power of the tests for n=20. You'll probably > > > > find it's not good enough to reach a reliable conclusion. > > > > And wouldn't it be even worse if I used non-parametric tests? > > I believe what Frank meant was that it's probably better to use a > distribution-free procedure to do the real test of interest (if there is > one) instead of testing for normality, and then use a test that assumes > normality. > > I guess the question is, what exactly do you want to do with the outcome > of the normality tests? If those are going to be used as basis for > deciding which test(s) to do next, then I concur with Frank's > reservation. > > Generally speaking, I do not find goodness-of-fit for distributions very > useful, mostly for the reason that failure to reject the null is no > evidence in favor of the null. It's difficult for me to imagine why > "there's insufficient evidence to show that the data did not come from a > normal distribution" would be interesting. > > Andy > > > > > > > > Frank > > > > > > > > > -- > > > Frank E Harrell Jr Professor and Chair School > > of Medicine > > > Department of Biostatistics > > Vanderbilt University > > > > > > > > > -- > > yianni > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > ------------------------------------------------------------------------ > ------ > Notice: This e-mail message, together with any > attachments,...{{dropped}} > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- yianni ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Lucke, Joseph F
Lucke, Joseph F wrote:
> Most standard tests, such as t-tests and ANOVA, are fairly resistant to > non-normalilty for significance testing. It's the sample means that have > to be normal, not the data. The CLT kicks in fairly quickly. Testing > for normality prior to choosing a test statistic is generally not a good > idea. I beg to differ Joseph. I have had many datasets in which the CLT was of no use whatsoever, i.e., where bootstrap confidence limits were asymmetric because the data were so skewed, and where symmetric normality-based confidence intervals had bad coverage in both tails (though correct on the average). I see this the opposite way: nonparametric tests works fine if normality holds. Note that the CLT helps with type I error but not so much with type II error. Frank > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Liaw, Andy > Sent: Friday, May 25, 2007 12:04 PM > To: [hidden email]; Frank E Harrell Jr > Cc: r-help > Subject: Re: [R] normality tests [Broadcast] > > From: [hidden email] >> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: >>> [hidden email] wrote: >>>> Hi all, >>>> >>>> apologies for seeking advice on a general stats question. I ve run > >>>> normality tests using 8 different methods: >>>> - Lilliefors >>>> - Shapiro-Wilk >>>> - Robust Jarque Bera >>>> - Jarque Bera >>>> - Anderson-Darling >>>> - Pearson chi-square >>>> - Cramer-von Mises >>>> - Shapiro-Francia >>>> >>>> All show that the null hypothesis that the data come from a normal > >>>> distro cannot be rejected. Great. However, I don't think >> it looks nice >>>> to report the values of 8 different tests on a report. One note is > >>>> that my sample size is really tiny (less than 20 >> independent cases). >>>> Without wanting to start a flame war, are there any >> advices of which >>>> one/ones would be more appropriate and should be reported >> (along with >>>> a Q-Q plot). Thank you. >>>> >>>> Regards, >>>> >>> Wow - I have so many concerns with that approach that it's >> hard to know >>> where to begin. But first of all, why care about >> normality? Why not >>> use distribution-free methods? >>> >>> You should examine the power of the tests for n=20. You'll probably > >>> find it's not good enough to reach a reliable conclusion. >> And wouldn't it be even worse if I used non-parametric tests? > > I believe what Frank meant was that it's probably better to use a > distribution-free procedure to do the real test of interest (if there is > one) instead of testing for normality, and then use a test that assumes > normality. > > I guess the question is, what exactly do you want to do with the outcome > of the normality tests? If those are going to be used as basis for > deciding which test(s) to do next, then I concur with Frank's > reservation. > > Generally speaking, I do not find goodness-of-fit for distributions very > useful, mostly for the reason that failure to reject the null is no > evidence in favor of the null. It's difficult for me to imagine why > "there's insufficient evidence to show that the data did not come from a > normal distribution" would be interesting. > > Andy > > >>> Frank >>> >>> >>> -- >>> Frank E Harrell Jr Professor and Chair School >> of Medicine >>> Department of Biostatistics >> Vanderbilt University >> >> -- >> yianni >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > > ------------------------------------------------------------------------ > ------ > Notice: This e-mail message, together with any > attachments,...{{dropped}} > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
In reply to this post by gatemaze
[hidden email] wrote:
> Thank you all for your replies.... they have been more useful... well > in my case I have chosen to do some parametric tests (more precisely > correlation and linear regressions among some variables)... so it > would be nice if I had an extra bit of support on my decisions... If I > understood well from all your replies... I shouldn't pay soooo much > attntion on the normality tests, so it wouldn't matter which one/ones > I use to report... but rather focus on issues such as the power of the > test... If doing regression I assume your normality tests were on residuals rather than raw data. Frank > > Thanks again. > > On 25/05/07, Lucke, Joseph F <[hidden email]> wrote: >> Most standard tests, such as t-tests and ANOVA, are fairly resistant to >> non-normalilty for significance testing. It's the sample means that have >> to be normal, not the data. The CLT kicks in fairly quickly. Testing >> for normality prior to choosing a test statistic is generally not a good >> idea. >> >> -----Original Message----- >> From: [hidden email] >> [mailto:[hidden email]] On Behalf Of Liaw, Andy >> Sent: Friday, May 25, 2007 12:04 PM >> To: [hidden email]; Frank E Harrell Jr >> Cc: r-help >> Subject: Re: [R] normality tests [Broadcast] >> >> From: [hidden email] >> > >> > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: >> > > [hidden email] wrote: >> > > > Hi all, >> > > > >> > > > apologies for seeking advice on a general stats question. I ve run >> >> > > > normality tests using 8 different methods: >> > > > - Lilliefors >> > > > - Shapiro-Wilk >> > > > - Robust Jarque Bera >> > > > - Jarque Bera >> > > > - Anderson-Darling >> > > > - Pearson chi-square >> > > > - Cramer-von Mises >> > > > - Shapiro-Francia >> > > > >> > > > All show that the null hypothesis that the data come from a normal >> >> > > > distro cannot be rejected. Great. However, I don't think >> > it looks nice >> > > > to report the values of 8 different tests on a report. One note is >> >> > > > that my sample size is really tiny (less than 20 >> > independent cases). >> > > > Without wanting to start a flame war, are there any >> > advices of which >> > > > one/ones would be more appropriate and should be reported >> > (along with >> > > > a Q-Q plot). Thank you. >> > > > >> > > > Regards, >> > > > >> > > >> > > Wow - I have so many concerns with that approach that it's >> > hard to know >> > > where to begin. But first of all, why care about >> > normality? Why not >> > > use distribution-free methods? >> > > >> > > You should examine the power of the tests for n=20. You'll probably >> >> > > find it's not good enough to reach a reliable conclusion. >> > >> > And wouldn't it be even worse if I used non-parametric tests? >> >> I believe what Frank meant was that it's probably better to use a >> distribution-free procedure to do the real test of interest (if there is >> one) instead of testing for normality, and then use a test that assumes >> normality. >> >> I guess the question is, what exactly do you want to do with the outcome >> of the normality tests? If those are going to be used as basis for >> deciding which test(s) to do next, then I concur with Frank's >> reservation. >> >> Generally speaking, I do not find goodness-of-fit for distributions very >> useful, mostly for the reason that failure to reject the null is no >> evidence in favor of the null. It's difficult for me to imagine why >> "there's insufficient evidence to show that the data did not come from a >> normal distribution" would be interesting. >> >> Andy >> >> >> > > >> > > Frank >> > > >> > > >> > > -- >> > > Frank E Harrell Jr Professor and Chair School >> > of Medicine >> > > Department of Biostatistics >> > Vanderbilt University >> > > >> > >> > >> > -- >> > yianni >> > >> > ______________________________________________ >> > [hidden email] mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> > >> > >> >> >> ------------------------------------------------------------------------ >> ------ >> Notice: This e-mail message, together with any >> attachments,...{{dropped}} >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
In reply to this post by gatemaze
The normality of the residuals is important in the inference procedures for the classical linear regression model, and normality is very important in correlation analysis (second moment)...
Washington S. Silva > Thank you all for your replies.... they have been more useful... well > in my case I have chosen to do some parametric tests (more precisely > correlation and linear regressions among some variables)... so it > would be nice if I had an extra bit of support on my decisions... If I > understood well from all your replies... I shouldn't pay soooo much > attntion on the normality tests, so it wouldn't matter which one/ones > I use to report... but rather focus on issues such as the power of the > test... > > Thanks again. > > On 25/05/07, Lucke, Joseph F <[hidden email]> wrote: > > Most standard tests, such as t-tests and ANOVA, are fairly resistant to > > non-normalilty for significance testing. It's the sample means that have > > to be normal, not the data. The CLT kicks in fairly quickly. Testing > > for normality prior to choosing a test statistic is generally not a good > > idea. > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of Liaw, Andy > > Sent: Friday, May 25, 2007 12:04 PM > > To: [hidden email]; Frank E Harrell Jr > > Cc: r-help > > Subject: Re: [R] normality tests [Broadcast] > > > > From: [hidden email] > > > > > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > > > [hidden email] wrote: > > > > > Hi all, > > > > > > > > > > apologies for seeking advice on a general stats question. I ve run > > > > > > > normality tests using 8 different methods: > > > > > - Lilliefors > > > > > - Shapiro-Wilk > > > > > - Robust Jarque Bera > > > > > - Jarque Bera > > > > > - Anderson-Darling > > > > > - Pearson chi-square > > > > > - Cramer-von Mises > > > > > - Shapiro-Francia > > > > > > > > > > All show that the null hypothesis that the data come from a normal > > > > > > > distro cannot be rejected. Great. However, I don't think > > > it looks nice > > > > > to report the values of 8 different tests on a report. One note is > > > > > > > that my sample size is really tiny (less than 20 > > > independent cases). > > > > > Without wanting to start a flame war, are there any > > > advices of which > > > > > one/ones would be more appropriate and should be reported > > > (along with > > > > > a Q-Q plot). Thank you. > > > > > > > > > > Regards, > > > > > > > > > > > > > Wow - I have so many concerns with that approach that it's > > > hard to know > > > > where to begin. But first of all, why care about > > > normality? Why not > > > > use distribution-free methods? > > > > > > > > You should examine the power of the tests for n=20. You'll probably > > > > > > find it's not good enough to reach a reliable conclusion. > > > > > > And wouldn't it be even worse if I used non-parametric tests? > > > > I believe what Frank meant was that it's probably better to use a > > distribution-free procedure to do the real test of interest (if there is > > one) instead of testing for normality, and then use a test that assumes > > normality. > > > > I guess the question is, what exactly do you want to do with the outcome > > of the normality tests? If those are going to be used as basis for > > deciding which test(s) to do next, then I concur with Frank's > > reservation. > > > > Generally speaking, I do not find goodness-of-fit for distributions very > > useful, mostly for the reason that failure to reject the null is no > > evidence in favor of the null. It's difficult for me to imagine why > > "there's insufficient evidence to show that the data did not come from a > > normal distribution" would be interesting. > > > > Andy > > > > > > > > > > > > Frank > > > > > > > > > > > > -- > > > > Frank E Harrell Jr Professor and Chair School > > > of Medicine > > > > Department of Biostatistics > > > Vanderbilt University > > > > > > > > > > > > > -- > > > yianni > > > > > > ______________________________________________ > > > [hidden email] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > ------ > > Notice: This e-mail message, together with any > > attachments,...{{dropped}} > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > yianni > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by gatemaze
You can also try validating your regression model via the bootstrap (the validate() function in the Design library is very helpful). To my mind that would be much more reassuring than normality tests performed on twenty residuals. By the way, be careful with the correlation test - it's only good at detecting linear relationships between two variables (i.e. not helpful for detecting non-linear relationships). Regards, -Cody Cody Hamilton, PhD Edwards Lifesciences [hidden email] m Sent by: To r-help-bounces@st "Lucke, Joseph F" at.math.ethz.ch <[hidden email]> cc r-help <[hidden email]> 05/25/2007 11:23 Subject AM Re: [R] normality tests [Broadcast] Thank you all for your replies.... they have been more useful... well in my case I have chosen to do some parametric tests (more precisely correlation and linear regressions among some variables)... so it would be nice if I had an extra bit of support on my decisions... If I understood well from all your replies... I shouldn't pay soooo much attntion on the normality tests, so it wouldn't matter which one/ones I use to report... but rather focus on issues such as the power of the test... Thanks again. On 25/05/07, Lucke, Joseph F <[hidden email]> wrote: > Most standard tests, such as t-tests and ANOVA, are fairly resistant to > non-normalilty for significance testing. It's the sample means that have > to be normal, not the data. The CLT kicks in fairly quickly. Testing > for normality prior to choosing a test statistic is generally not a good > idea. > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Liaw, Andy > Sent: Friday, May 25, 2007 12:04 PM > To: [hidden email]; Frank E Harrell Jr > Cc: r-help > Subject: Re: [R] normality tests [Broadcast] > > From: [hidden email] > > > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > > [hidden email] wrote: > > > > Hi all, > > > > > > > > apologies for seeking advice on a general stats question. I ve run > > > > > normality tests using 8 different methods: > > > > - Lilliefors > > > > - Shapiro-Wilk > > > > - Robust Jarque Bera > > > > - Jarque Bera > > > > - Anderson-Darling > > > > - Pearson chi-square > > > > - Cramer-von Mises > > > > - Shapiro-Francia > > > > > > > > All show that the null hypothesis that the data come from a normal > > > > > distro cannot be rejected. Great. However, I don't think > > it looks nice > > > > to report the values of 8 different tests on a report. One note is > > > > > that my sample size is really tiny (less than 20 > > independent cases). > > > > Without wanting to start a flame war, are there any > > advices of which > > > > one/ones would be more appropriate and should be reported > > (along with > > > > a Q-Q plot). Thank you. > > > > > > > > Regards, > > > > > > > > > > Wow - I have so many concerns with that approach that it's > > hard to know > > > where to begin. But first of all, why care about > > normality? Why not > > > use distribution-free methods? > > > > > > You should examine the power of the tests for n=20. You'll probably > > > > find it's not good enough to reach a reliable conclusion. > > > > And wouldn't it be even worse if I used non-parametric tests? > > I believe what Frank meant was that it's probably better to use a > distribution-free procedure to do the real test of interest (if there is > one) instead of testing for normality, and then use a test that assumes > normality. > > I guess the question is, what exactly do you want to do with the outcome > of the normality tests? If those are going to be used as basis for > deciding which test(s) to do next, then I concur with Frank's > reservation. > > Generally speaking, I do not find goodness-of-fit for distributions very > useful, mostly for the reason that failure to reject the null is no > evidence in favor of the null. It's difficult for me to imagine why > "there's insufficient evidence to show that the data did not come from a > normal distribution" would be interesting. > > Andy > > > > > > > > Frank > > > > > > > > > -- > > > Frank E Harrell Jr Professor and Chair School > > of Medicine > > > Department of Biostatistics > > Vanderbilt University > > > > > > > > > -- > > yianni > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > ------------------------------------------------------------------------ > ------ > Notice: This e-mail message, together with any > attachments,...{{dropped}} > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- yianni ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Frank Harrell
Following up on Frank's thought, why is it that parametric tests are so much more popular than their non-parametric counterparts? As non-parametric tests require fewer assumptions, why aren't they the default? The relative efficiency of the Wilcoxon test as compared to the t-test is 0.955, and yet I still see t-tests in the medical literature all the time. Granted, the Wilcoxon still requires the assumption of symmetry (I'm curious as to why the Wilcoxon is often used when asymmetry is suspected, since the Wilcoxon assumes symmetry), but that's less stringent than requiring normally distributed data. In a similar vein, one usually sees the mean and standard deviation reported as summary statistics for a continuous variable - these are not very informative unless you assume the variable is normally distributed. However, clinicians often insist that I included these figures in reports. Cody Hamilton, PhD Edwards Lifesciences Frank E Harrell Jr <f.harrell@vander To bilt.edu> "Lucke, Joseph F" Sent by: <[hidden email]> r-help-bounces@st cc at.math.ethz.ch r-help <[hidden email]> Subject Re: [R] normality tests 05/25/2007 02:42 [Broadcast] PM Lucke, Joseph F wrote: > Most standard tests, such as t-tests and ANOVA, are fairly resistant to > non-normalilty for significance testing. It's the sample means that have > to be normal, not the data. The CLT kicks in fairly quickly. Testing > for normality prior to choosing a test statistic is generally not a good > idea. I beg to differ Joseph. I have had many datasets in which the CLT was of no use whatsoever, i.e., where bootstrap confidence limits were asymmetric because the data were so skewed, and where symmetric normality-based confidence intervals had bad coverage in both tails (though correct on the average). I see this the opposite way: nonparametric tests works fine if normality holds. Note that the CLT helps with type I error but not so much with type II error. Frank > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Liaw, Andy > Sent: Friday, May 25, 2007 12:04 PM > To: [hidden email]; Frank E Harrell Jr > Cc: r-help > Subject: Re: [R] normality tests [Broadcast] > > From: [hidden email] >> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: >>> [hidden email] wrote: >>>> Hi all, >>>> >>>> apologies for seeking advice on a general stats question. I ve run > >>>> normality tests using 8 different methods: >>>> - Lilliefors >>>> - Shapiro-Wilk >>>> - Robust Jarque Bera >>>> - Jarque Bera >>>> - Anderson-Darling >>>> - Pearson chi-square >>>> - Cramer-von Mises >>>> - Shapiro-Francia >>>> >>>> All show that the null hypothesis that the data come from a normal > >>>> distro cannot be rejected. Great. However, I don't think >> it looks nice >>>> to report the values of 8 different tests on a report. One note is > >>>> that my sample size is really tiny (less than 20 >> independent cases). >>>> Without wanting to start a flame war, are there any >> advices of which >>>> one/ones would be more appropriate and should be reported >> (along with >>>> a Q-Q plot). Thank you. >>>> >>>> Regards, >>>> >>> Wow - I have so many concerns with that approach that it's >> hard to know >>> where to begin. But first of all, why care about >> normality? Why not >>> use distribution-free methods? >>> >>> You should examine the power of the tests for n=20. You'll probably > >>> find it's not good enough to reach a reliable conclusion. >> And wouldn't it be even worse if I used non-parametric tests? > > I believe what Frank meant was that it's probably better to use a > distribution-free procedure to do the real test of interest (if there is > one) instead of testing for normality, and then use a test that assumes > normality. > > I guess the question is, what exactly do you want to do with the outcome > of the normality tests? If those are going to be used as basis for > deciding which test(s) to do next, then I concur with Frank's > reservation. > > Generally speaking, I do not find goodness-of-fit for distributions very > useful, mostly for the reason that failure to reject the null is no > evidence in favor of the null. It's difficult for me to imagine why > "there's insufficient evidence to show that the data did not come from a > normal distribution" would be interesting. > > Andy > > >>> Frank >>> >>> >>> -- >>> Frank E Harrell Jr Professor and Chair School >> of Medicine >>> Department of Biostatistics >> Vanderbilt University >> >> -- >> yianni >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > > ------------------------------------------------------------------------ > ------ > Notice: This e-mail message, together with any > attachments,...{{dropped}} > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
[hidden email] wrote:
> Following up on Frank's thought, why is it that parametric tests are so > much more popular than their non-parametric counterparts? As > non-parametric tests require fewer assumptions, why aren't they the > default? The relative efficiency of the Wilcoxon test as compared to the > t-test is 0.955, and yet I still see t-tests in the medical literature all > the time. Granted, the Wilcoxon still requires the assumption of symmetry > (I'm curious as to why the Wilcoxon is often used when asymmetry is > suspected, since the Wilcoxon assumes symmetry), but that's less stringent > than requiring normally distributed data. In a similar vein, one usually > sees the mean and standard deviation reported as summary statistics for a > continuous variable - these are not very informative unless you assume the > variable is normally distributed. However, clinicians often insist that I > included these figures in reports. > > Cody Hamilton, PhD > Edwards Lifesciences Well said Cody, just want to add that Wilcoxon does not assume symmetry if you are interested in testing for stochastic ordering and not just for a mean. Frank > > > > > Frank E Harrell > Jr > <f.harrell@vander To > bilt.edu> "Lucke, Joseph F" > Sent by: <[hidden email]> > r-help-bounces@st cc > at.math.ethz.ch r-help <[hidden email]> > Subject > Re: [R] normality tests > 05/25/2007 02:42 [Broadcast] > PM > > > > > > > > > > Lucke, Joseph F wrote: >> Most standard tests, such as t-tests and ANOVA, are fairly resistant to >> non-normalilty for significance testing. It's the sample means that have >> to be normal, not the data. The CLT kicks in fairly quickly. Testing >> for normality prior to choosing a test statistic is generally not a good >> idea. > > I beg to differ Joseph. I have had many datasets in which the CLT was > of no use whatsoever, i.e., where bootstrap confidence limits were > asymmetric because the data were so skewed, and where symmetric > normality-based confidence intervals had bad coverage in both tails > (though correct on the average). I see this the opposite way: > nonparametric tests works fine if normality holds. > > Note that the CLT helps with type I error but not so much with type II > error. > > Frank > >> -----Original Message----- >> From: [hidden email] >> [mailto:[hidden email]] On Behalf Of Liaw, Andy >> Sent: Friday, May 25, 2007 12:04 PM >> To: [hidden email]; Frank E Harrell Jr >> Cc: r-help >> Subject: Re: [R] normality tests [Broadcast] >> >> From: [hidden email] >>> On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: >>>> [hidden email] wrote: >>>>> Hi all, >>>>> >>>>> apologies for seeking advice on a general stats question. I ve run >>>>> normality tests using 8 different methods: >>>>> - Lilliefors >>>>> - Shapiro-Wilk >>>>> - Robust Jarque Bera >>>>> - Jarque Bera >>>>> - Anderson-Darling >>>>> - Pearson chi-square >>>>> - Cramer-von Mises >>>>> - Shapiro-Francia >>>>> >>>>> All show that the null hypothesis that the data come from a normal >>>>> distro cannot be rejected. Great. However, I don't think >>> it looks nice >>>>> to report the values of 8 different tests on a report. One note is >>>>> that my sample size is really tiny (less than 20 >>> independent cases). >>>>> Without wanting to start a flame war, are there any >>> advices of which >>>>> one/ones would be more appropriate and should be reported >>> (along with >>>>> a Q-Q plot). Thank you. >>>>> >>>>> Regards, >>>>> >>>> Wow - I have so many concerns with that approach that it's >>> hard to know >>>> where to begin. But first of all, why care about >>> normality? Why not >>>> use distribution-free methods? >>>> >>>> You should examine the power of the tests for n=20. You'll probably >>>> find it's not good enough to reach a reliable conclusion. >>> And wouldn't it be even worse if I used non-parametric tests? >> I believe what Frank meant was that it's probably better to use a >> distribution-free procedure to do the real test of interest (if there is >> one) instead of testing for normality, and then use a test that assumes >> normality. >> >> I guess the question is, what exactly do you want to do with the outcome >> of the normality tests? If those are going to be used as basis for >> deciding which test(s) to do next, then I concur with Frank's >> reservation. >> >> Generally speaking, I do not find goodness-of-fit for distributions very >> useful, mostly for the reason that failure to reject the null is no >> evidence in favor of the null. It's difficult for me to imagine why >> "there's insufficient evidence to show that the data did not come from a >> normal distribution" would be interesting. >> >> Andy >> >> >>>> Frank >>>> >>>> >>>> -- >>>> Frank E Harrell Jr Professor and Chair School >>> of Medicine >>>> Department of Biostatistics >>> Vanderbilt University >>> >>> -- >>> yianni >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >> >> ------------------------------------------------------------------------ >> ------ >> Notice: This e-mail message, together with any >> attachments,...{{dropped}} >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
In reply to this post by gatemaze
For small samples I would think that the Shapiro Wilk test is probably
the most powerful. Chapter 7 of Thon (2002), "Testing for Normality", Marcel Dekker, contains a good summary of research in this area. If you have a specific alternative in view you might find s better test. Regards John On 25/05/07, [hidden email] <[hidden email]> wrote: > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > [hidden email] wrote: > > > Hi all, > > > > > > apologies for seeking advice on a general stats question. I ve run > > > normality tests using 8 different methods: > > > - Lilliefors > > > - Shapiro-Wilk > > > - Robust Jarque Bera > > > - Jarque Bera > > > - Anderson-Darling > > > - Pearson chi-square > > > - Cramer-von Mises > > > - Shapiro-Francia > > > > > > All show that the null hypothesis that the data come from a normal > > > distro cannot be rejected. Great. However, I don't think it looks nice > > > to report the values of 8 different tests on a report. One note is > > > that my sample size is really tiny (less than 20 independent cases). > > > Without wanting to start a flame war, are there any advices of which > > > one/ones would be more appropriate and should be reported (along with > > > a Q-Q plot). Thank you. > > > > > > Regards, > > > > > > > Wow - I have so many concerns with that approach that it's hard to know > > where to begin. But first of all, why care about normality? Why not > > use distribution-free methods? > > > > You should examine the power of the tests for n=20. You'll probably > > find it's not good enough to reach a reliable conclusion. > > And wouldn't it be even worse if I used non-parametric tests? > > > > > Frank > > > > > > -- > > Frank E Harrell Jr Professor and Chair School of Medicine > > Department of Biostatistics Vanderbilt University > > > > > -- > yianni > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- John C Frain Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:[hidden email] mailto:[hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Lucke, Joseph F
>>>>> "LuckeJF" == Lucke, Joseph F <[hidden email]>
>>>>> on Fri, 25 May 2007 12:29:49 -0500 writes: LuckeJF> Most standard tests, such as t-tests and ANOVA, LuckeJF> are fairly resistant to non-normalilty for LuckeJF> significance testing. It's the sample means that LuckeJF> have to be normal, not the data. The CLT kicks in LuckeJF> fairly quickly. Even though such statements appear in too many (text)books, that's just plain wrong practically: Even though *level* of the t-test is resistant to non-normality, the power is not at all!! And that makes the t-test NON-robust! It's an easy exercise to see that lim T-statistic ---> 1 when one observation goes to infinity, i.e., the t-test will never reject when you have one extreme outlier; simple "proof" with R: > t.test(11:20) One Sample t-test data: c(11:20) t = 16.1892, df = 9, p-value = 5.805e-08 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 13.33415 17.66585 sample estimates: mean of x 15.5 ## ---> unknown mean highly significantly different from 0 ## But > t.test(c(11:20, 1000)) One Sample t-test data: c(11:20, 1000) t = 1.1731, df = 10, p-value = 0.2679 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -94.42776 304.42776 sample estimates: mean of x 105 LuckeJF> Testing for normality prior to choosing a test LuckeJF> statistic is generally not a good idea. Definitely. Or even: It's a very bad idea ... Martin Maechler, ETH Zurich LuckeJF> -----Original Message----- From: LuckeJF> [hidden email] LuckeJF> [mailto:[hidden email]] On Behalf LuckeJF> Of Liaw, Andy Sent: Friday, May 25, 2007 12:04 PM LuckeJF> To: [hidden email]; Frank E Harrell Jr Cc: LuckeJF> r-help Subject: Re: [R] normality tests [Broadcast] LuckeJF> From: [hidden email] >> On 25/05/07, Frank E Harrell Jr >> <[hidden email]> wrote: > [hidden email] >> wrote: > > Hi all, >> > > >> > > apologies for seeking advice on a general stats >> question. I ve run >> > > normality tests using 8 different methods: > > - >> Lilliefors > > - Shapiro-Wilk > > - Robust Jarque Bera > >> > - Jarque Bera > > - Anderson-Darling > > - Pearson >> chi-square > > - Cramer-von Mises > > - Shapiro-Francia >> > > >> > > All show that the null hypothesis that the data come >> from a normal >> > > distro cannot be rejected. Great. However, I don't >> think it looks nice > > to report the values of 8 >> different tests on a report. One note is >> > > that my sample size is really tiny (less than 20 >> independent cases). > > Without wanting to start a flame >> war, are there any advices of which > > one/ones would be >> more appropriate and should be reported (along with > > a >> Q-Q plot). Thank you. >> > > >> > > Regards, >> > > >> > >> > Wow - I have so many concerns with that approach that >> it's hard to know > where to begin. But first of all, >> why care about normality? Why not > use >> distribution-free methods? >> > >> > You should examine the power of the tests for n=20. >> You'll probably >> > find it's not good enough to reach a reliable >> conclusion. >> >> And wouldn't it be even worse if I used non-parametric >> tests? LuckeJF> I believe what Frank meant was that it's probably LuckeJF> better to use a distribution-free procedure to do LuckeJF> the real test of interest (if there is one) instead LuckeJF> of testing for normality, and then use a test that LuckeJF> assumes normality. LuckeJF> I guess the question is, what exactly do you want LuckeJF> to do with the outcome of the normality tests? If LuckeJF> those are going to be used as basis for deciding LuckeJF> which test(s) to do next, then I concur with LuckeJF> Frank's reservation. LuckeJF> Generally speaking, I do not find goodness-of-fit LuckeJF> for distributions very useful, mostly for the LuckeJF> reason that failure to reject the null is no LuckeJF> evidence in favor of the null. It's difficult for LuckeJF> me to imagine why "there's insufficient evidence to LuckeJF> show that the data did not come from a normal LuckeJF> distribution" would be interesting. LuckeJF> Andy >> > > Frank >> > >> > >> > -- >> > Frank E Harrell Jr Professor and Chair School of >> Medicine > Department of Biostatistics Vanderbilt >> University >> > >> >> >> -- >> yianni >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >> read the posting guide >> http://www.R-project.org/posting-guide.html and provide >> commented, minimal, self-contained, reproducible code. >> >> >> LuckeJF> ------------------------------------------------------------------------ LuckeJF> ------ Notice: This e-mail message, together with LuckeJF> any attachments,...{{dropped}} LuckeJF> ______________________________________________ LuckeJF> [hidden email] mailing list LuckeJF> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE LuckeJF> do read the posting guide LuckeJF> http://www.R-project.org/posting-guide.html and LuckeJF> provide commented, minimal, self-contained, LuckeJF> reproducible code. LuckeJF> ______________________________________________ LuckeJF> [hidden email] mailing list LuckeJF> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE LuckeJF> do read the posting guide LuckeJF> http://www.R-project.org/posting-guide.html and LuckeJF> provide commented, minimal, self-contained, LuckeJF> reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Mon, 28 May 2007, Martin Maechler wrote:
>>>>>> "LuckeJF" == Lucke, Joseph F <[hidden email]> >>>>>> on Fri, 25 May 2007 12:29:49 -0500 writes: > > LuckeJF> Most standard tests, such as t-tests and ANOVA, > LuckeJF> are fairly resistant to non-normalilty for > LuckeJF> significance testing. It's the sample means that > LuckeJF> have to be normal, not the data. The CLT kicks in > LuckeJF> fairly quickly. > > Even though such statements appear in too many (text)books, > that's just plain wrong practically: > > Even though *level* of the t-test is resistant to non-normality, > the power is not at all!! And that makes the t-test NON-robust! While it is true that this makes the t-test non-robust, it doesn't mean that the statement is just plain wrong practically. The issue really is more complicated than a lot of people claim (not you specifically, Martin, but upthread and previous threads). Starting with the demonstrable mathematical facts: - lots of rank tests are robust in the sense of Huber - rank tests are optimal for specific location-shift testing problems. - lots of rank tests have excellent power for location shift alternatives over a wide range of underlying distributions. - rank tests fail to be transitive when stochastic ordering is not assumed (they are not consistent with any ordering on all distributions) - rank tests do not lead to confidence intervals unless a location shift or similar one-dimensional family is assumed - No rank test is uniformly more powerful than any parametric test or vice versa (if we rule out pathological cases) - there is no rank test that is consistent precisely against a difference in means - the t-test (and essentially all tests) can be made distribution-free in large samples (for small values of 'large', usually) - being distribution-free does not guarantee robustness of power (for the t-test or for any other test) Now, if we assume stochastic ordering is the Wilcoxon rank-sum test more or less powerful than the t-test? Everyone knows that this depends on the null hypothesis distribution. Fewer people seem to know that it also depends on the alternative, especially in large samples. Suppose the alternative of interest is not that the values are uniformly larger by 1 unit, but that 5% of them are about 20 units larger. The Wilcoxon test -- precisely because it gives less weight to outliers -- will have lower power. For example (ObR) one.sim<-function(n, pct, delta){ x<-rnorm(n) y<-rnorm(n)+delta*rbinom(n,1,pct) list(x=x,y=y) } mean(replicate(100, {d<-one.sim(100,.05,20); t.test(d$x,d$y)$p.value})<0.05) mean(replicate(100, {d<-one.sim(100,.05,20); wilcox.test(d$x,d$y)$p.value})<0.05) mean(replicate(100, {d<-one.sim(100,.5,1); t.test(d$x,d$y)$p.value})<0.05) mean(replicate(100, {d<-one.sim(100,.5,1); wilcox.test(d$x,d$y)$p.value})<0.05) Since both relatively uniform shifts and large shifts of small fractions are genuinely important alternatives in real problems it is true in practice as well as in theory that neither the Wilcoxon nor the t-test is uniformly superior. This is without even considering violations of stochastic ordering -- which are not just esoteric pathologies, since it is quite plausible for a treatment to benefit some people and harm others. For example, I've seen one paper in which a Wilcoxon test on medical cost data was statistically significant in the *opposite direction* to the difference in means. This has been a long rant, but I keep encountering statisticians who think anyone who ever recommends a t-test just needs to have the number 0.955 quoted to them. <snip> > > LuckeJF> Testing for normality prior to choosing a test > LuckeJF> statistic is generally not a good idea. > > Definitely. Or even: It's a very bad idea ... > I think that's something we can all agree on. -thomas ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hello,
I have installed R2.5.0 from sources ( x86_64 ) and added the package RODBC and now I am trying to connect to a mysql database In windows R after installing the 3.51 driver and creating the dsn by specifying server, user, and password it is easy to connect with channel <- odbcConnect("dsn") Does anyone know what needs to be done to make this work from linux? Thanks, Bill ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
yOn Mon, 28 May 2007, Bill Szkotnicki wrote:
> Hello, > > I have installed R2.5.0 from sources ( x86_64 ) > and added the package RODBC > and now I am trying to connect to a mysql database > In windows R after installing the 3.51 driver > and creating the dsn by specifying server, user, and password > it is easy to connect with > channel <- odbcConnect("dsn") > > Does anyone know what needs to be done to make this work from linux? Did you not read the RODBC README file? It is described in some detail with reference to tutorials. -- Brian D. Ripley, [hidden email] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
I have now read the README file which I should have done before. :-[
Sorry. To summarize: - Install the odbc connector driver (3.51) - Set up the dsn in the file .odbc.ini - It works beautifully and RODBC is super! Prof Brian Ripley wrote: > yOn Mon, 28 May 2007, Bill Szkotnicki wrote: > >> Hello, >> >> I have installed R2.5.0 from sources ( x86_64 ) >> and added the package RODBC >> and now I am trying to connect to a mysql database >> In windows R after installing the 3.51 driver >> and creating the dsn by specifying server, user, and password >> it is easy to connect with >> channel <- odbcConnect("dsn") >> >> Does anyone know what needs to be done to make this work from linux? > > Did you not read the RODBC README file? It is described in some detail > with reference to tutorials. > -- Bill Szkotnicki Department of Animal and Poultry Science University of Guelph [hidden email] (519)824-4120 Ext 52253 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by wssecn
False. Box proved ~ca 1952 that standard inferences in the linear regression
model are robust to nonnormality, at least for (nearly) balanced designs. The **crucial** assumption is independence, which I suspect partially motivated his time series work on arima modeling. More recently, work on hierarchical models (e.g. repeated measures/mixed effect models) has also dealt with lack of independence. Bert Gunter Genentech Nonclinical Statistics -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of wssecn Sent: Friday, May 25, 2007 2:59 PM To: r-help Subject: Re: [R] normality tests [Broadcast] The normality of the residuals is important in the inference procedures for the classical linear regression model, and normality is very important in correlation analysis (second moment)... Washington S. Silva > Thank you all for your replies.... they have been more useful... well > in my case I have chosen to do some parametric tests (more precisely > correlation and linear regressions among some variables)... so it > would be nice if I had an extra bit of support on my decisions... If I > understood well from all your replies... I shouldn't pay soooo much > attntion on the normality tests, so it wouldn't matter which one/ones > I use to report... but rather focus on issues such as the power of the > test... > > Thanks again. > > On 25/05/07, Lucke, Joseph F <[hidden email]> wrote: > > Most standard tests, such as t-tests and ANOVA, are fairly resistant to > > non-normalilty for significance testing. It's the sample means that have > > to be normal, not the data. The CLT kicks in fairly quickly. Testing > > for normality prior to choosing a test statistic is generally not a good > > idea. > > > > -----Original Message----- > > From: [hidden email] > > [mailto:[hidden email]] On Behalf Of Liaw, Andy > > Sent: Friday, May 25, 2007 12:04 PM > > To: [hidden email]; Frank E Harrell Jr > > Cc: r-help > > Subject: Re: [R] normality tests [Broadcast] > > > > From: [hidden email] > > > > > > On 25/05/07, Frank E Harrell Jr <[hidden email]> wrote: > > > > [hidden email] wrote: > > > > > Hi all, > > > > > > > > > > apologies for seeking advice on a general stats question. I ve run > > > > > > > normality tests using 8 different methods: > > > > > - Lilliefors > > > > > - Shapiro-Wilk > > > > > - Robust Jarque Bera > > > > > - Jarque Bera > > > > > - Anderson-Darling > > > > > - Pearson chi-square > > > > > - Cramer-von Mises > > > > > - Shapiro-Francia > > > > > > > > > > All show that the null hypothesis that the data come from a normal > > > > > > > distro cannot be rejected. Great. However, I don't think > > > it looks nice > > > > > to report the values of 8 different tests on a report. One note is > > > > > > > that my sample size is really tiny (less than 20 > > > independent cases). > > > > > Without wanting to start a flame war, are there any > > > advices of which > > > > > one/ones would be more appropriate and should be reported > > > (along with > > > > > a Q-Q plot). Thank you. > > > > > > > > > > Regards, > > > > > > > > > > > > > Wow - I have so many concerns with that approach that it's > > > hard to know > > > > where to begin. But first of all, why care about > > > normality? Why not > > > > use distribution-free methods? > > > > > > > > You should examine the power of the tests for n=20. You'll probably > > > > > > find it's not good enough to reach a reliable conclusion. > > > > > > And wouldn't it be even worse if I used non-parametric tests? > > > > I believe what Frank meant was that it's probably better to use a > > distribution-free procedure to do the real test of interest (if there is > > one) instead of testing for normality, and then use a test that assumes > > normality. > > > > I guess the question is, what exactly do you want to do with the outcome > > of the normality tests? If those are going to be used as basis for > > deciding which test(s) to do next, then I concur with Frank's > > reservation. > > > > Generally speaking, I do not find goodness-of-fit for distributions very > > useful, mostly for the reason that failure to reject the null is no > > evidence in favor of the null. It's difficult for me to imagine why > > "there's insufficient evidence to show that the data did not come from a > > normal distribution" would be interesting. > > > > Andy > > > > > > > > > > > > Frank > > > > > > > > > > > > -- > > > > Frank E Harrell Jr Professor and Chair School > > > of Medicine > > > > Department of Biostatistics > > > Vanderbilt University > > > > > > > > > > > > > -- > > > yianni > > > > > > ______________________________________________ > > > [hidden email] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > ------ > > Notice: This e-mail message, together with any > > attachments,...{{dropped}} > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > yianni > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Powered by Nabble | Edit this page |