# Problems with normality req. for ANOVA

13 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Problems with normality req. for ANOVA

 I am conducting an experiment with four independent variables each of which has three or more factor levels. The sample size is quite large i.e. several thousand. The dependent variable data does not pass a normality test but "visually" looks close to normal so is there a way to compute the affect this would have on the p-value for ANOVA or is there a way to perform an nonparametric test in R that will handle this many independent variables. Simply saying ANOVA is robust to small departures from normality is not going to be good enough for my client. I need to compute an error amount for ANOVA or find a nonparametric equivalent. Thanks, William
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

 On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > I am conducting an experiment with four independent variables each   > of which > has three or more factor levels. The sample size is quite large i.e.   > several > thousand. The dependent variable data does not pass a normality test   > but > "visually" looks close to normal so is there a way to compute the   > affect > this would have on the p-value for ANOVA or is there a way to   > perform an > nonparametric test in R that will handle this many independent   > variables. > Simply saying ANOVA is robust to small departures from normality is   > not > going to be good enough for my client. The statistical assumption of normality for linear models do not apply   to the distribution of the dependent variable, but rather to the   residuals after a model is estimated. Furthermore, it is the   homoskedasticity assumption that is more commonly violated and also   greater threat to validity. (And if you don't already know both of   these points, then you desperately need to review your basic modeling   practices.) >  I need to compute an error amount for > ANOVA or find a nonparametric equivalent. You might get a better answer if you expressed the first part of that   question in unambiguous terminology.  What is "error amount"? For the second part, there is an entire Task View on Robust   Statistical Methods. -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

 To add to David's note, the Kruskal-Wallis test is the nonparametric counterpart to one-way ANOVA.  You can get a series of K-W tests for several grouping or continuous independent variables (but note these are SEPARATE analyses) using the Hmisc package's spearman2 function. The generalization of K-W to the case of multiple independent variables is the proportional odds ordinal logistic model (see e.g. the rms package lrm function). Frank E Harrell Jr   Professor and Chairman        School of Medicine                       Department of Biostatistics   Vanderbilt University On Mon, 2 Aug 2010, David Winsemius wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > >> >> I am conducting an experiment with four independent variables each >> of which >> has three or more factor levels. The sample size is quite large i.e. >> several >> thousand. The dependent variable data does not pass a normality test >> but >> "visually" looks close to normal so is there a way to compute the >> affect >> this would have on the p-value for ANOVA or is there a way to >> perform an >> nonparametric test in R that will handle this many independent >> variables. >> Simply saying ANOVA is robust to small departures from normality is >> not >> going to be good enough for my client. > > The statistical assumption of normality for linear models do not apply > to the distribution of the dependent variable, but rather to the > residuals after a model is estimated. Furthermore, it is the > homoskedasticity assumption that is more commonly violated and also > greater threat to validity. (And if you don't already know both of > these points, then you desperately need to review your basic modeling > practices.) > >>  I need to compute an error amount for >> ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of that > question in unambiguous terminology.  What is "error amount"? > > For the second part, there is an entire Task View on Robust > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. Frank Harrell Department of Biostatistics, Vanderbilt University
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

 In reply to this post by David Winsemius David et. al: I take issue with this. It is the lack of independence that is the major issue. In particular, clustering, split-plotting, and so forth due to "convenience order" experimentation, lack of randomization, exogenous effects like the systematic effects due to measurement method/location have the major effect on inducing bias and distorting inference. Normality and unequal variances typically pale to insignificance compared to this. Obviously, IMHO. Note 1: George Box noted this at least 50 years ago in the early '60's when he and Jenkins developed arima modeling. Note 2: If you can, have a look at Jack Youden's classic paper "Enduring Values", which comments to some extent on these issues, here: http://www.jstor.org/pss/1266913Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <[hidden email]>wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > >> I am conducting an experiment with four independent variables each of >> which >> has three or more factor levels. The sample size is quite large i.e. >> several >> thousand. The dependent variable data does not pass a normality test but >> "visually" looks close to normal so is there a way to compute the affect >> this would have on the p-value for ANOVA or is there a way to perform an >> nonparametric test in R that will handle this many independent variables. >> Simply saying ANOVA is robust to small departures from normality is not >> going to be good enough for my client. >> > > The statistical assumption of normality for linear models do not apply to > the distribution of the dependent variable, but rather to the residuals > after a model is estimated. Furthermore, it is the homoskedasticity > assumption that is more commonly violated and also greater threat to > validity. (And if you don't already know both of these points, then you > desperately need to review your basic modeling practices.) > > >  I need to compute an error amount for >> ANOVA or find a nonparametric equivalent. >> > > You might get a better answer if you expressed the first part of that > question in unambiguous terminology.  What is "error amount"? > > For the second part, there is an entire Task View on Robust Statistical > Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## RE: Problems with normality req. for ANOVA

 This post was updated on . In reply to this post by David Winsemius I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000. The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval. William ________________________________ From: David Winsemius [via R] [ml-node+2310616-1859960724-371040@n4.nabble.com] Sent: Monday, August 02, 2010 1:33 PM To: Reith, William [USA] Subject: Re: Problems with normality req. for ANOVA On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > I am conducting an experiment with four independent variables each > of which > has three or more factor levels. The sample size is quite large i.e. > several > thousand. The dependent variable data does not pass a normality test > but > "visually" looks close to normal so is there a way to compute the > affect > this would have on the p-value for ANOVA or is there a way to > perform an > nonparametric test in R that will handle this many independent > variables. > Simply saying ANOVA is robust to small departures from normality is > not > going to be good enough for my client. The statistical assumption of normality for linear models do not apply to the distribution of the dependent variable, but rather to the residuals after a model is estimated. Furthermore, it is the homoskedasticity assumption that is more commonly violated and also greater threat to validity. (And if you don't already know both of these points, then you desperately need to review your basic modeling practices.) >  I need to compute an error amount for > ANOVA or find a nonparametric equivalent. You might get a better answer if you expressed the first part of that question in unambiguous terminology.  What is "error amount"? For the second part, there is an entire Task View on Robust Statistical Methods. -- David Winsemius, MD West Hartford, CT ______________________________________________ ________________________________ View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.htmlTo unsubscribe from Problems with normality req. for ANOVA, click here.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

 In reply to this post by Bert Gunter In a general situation of observational studies, your point is   undoubtedly true, and apparently you believe it to be true even in the   setting of designed experiments. Perhaps I should have confined myself   to my first sentence. -- David. On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote: > David et. al: > > I take issue with this. It is the lack of independence that is the   > major issue. In particular, clustering, split-plotting, and so forth   > due to "convenience order" experimentation, lack of randomization,   > exogenous effects like the systematic effects due to measurement   > method/location have the major effect on inducing bias and   > distorting inference. Normality and unequal variances typically pale   > to insignificance compared to this. > > Obviously, IMHO. > > Note 1: George Box noted this at least 50 years ago in the early   > '60's when he and Jenkins developed arima modeling. > > Note 2: If you can, have a look at Jack Youden's classic paper   > "Enduring Values", which comments to some extent on these issues,   > here: http://www.jstor.org/pss/1266913> > Cheers, > Bert > > > Bert Gunter > Genentech Nonclinical Biostatistics > > > > On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <[hidden email] > > wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > > I am conducting an experiment with four independent variables each   > of which > has three or more factor levels. The sample size is quite large i.e.   > several > thousand. The dependent variable data does not pass a normality test   > but > "visually" looks close to normal so is there a way to compute the   > affect > this would have on the p-value for ANOVA or is there a way to   > perform an > nonparametric test in R that will handle this many independent   > variables. > Simply saying ANOVA is robust to small departures from normality is   > not > going to be good enough for my client. > > The statistical assumption of normality for linear models do not   > apply to the distribution of the dependent variable, but rather to   > the residuals after a model is estimated. Furthermore, it is the   > homoskedasticity assumption that is more commonly violated and also   > greater threat to validity. (And if you don't already know both of   > these points, then you desperately need to review your basic   > modeling practices.) > > >  I need to compute an error amount for > ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of   > that question in unambiguous terminology.  What is "error amount"? > > For the second part, there is an entire Task View on Robust   > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > > > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

 Hi, simulating would still require you to operationalize the "lack of normality". Are the tails too heavy? Is the distribution skewed? Does it have multiple peaks? I suspect that the specific choices you would make here would *strongly* influence the result. My condolences on the client you are facing. Good luck, Stephan (ex-BAH) Bert Gunter wrote: > > You could try sensitivity analyses via simulation, though. > > On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]> > wrote: >> >> The problem I have is that I am expected, by my client, to find a >> similiar formula that states which way the p-value would be pushed >> by a lack of normality. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## RE: Problems with normality req. for ANOVA

 In reply to this post by wwreith I have been struggling to make the sense of permutation test for weeks. It seems will work for you.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with normality req. for ANOVA

Open this post in threaded view
|
Report Content as Inappropriate