|
I am conducting an experiment with four independent variables each of which has three or more factor levels. The sample size is quite large i.e. several thousand. The dependent variable data does not pass a normality test but "visually" looks close to normal so is there a way to compute the affect this would have on the p-value for ANOVA or is there a way to perform an nonparametric test in R that will handle this many independent variables. Simply saying ANOVA is robust to small departures from normality is not going to be good enough for my client. I need to compute an error amount for ANOVA or find a nonparametric equivalent.
Thanks, William |
|
On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > I am conducting an experiment with four independent variables each > of which > has three or more factor levels. The sample size is quite large i.e. > several > thousand. The dependent variable data does not pass a normality test > but > "visually" looks close to normal so is there a way to compute the > affect > this would have on the p-value for ANOVA or is there a way to > perform an > nonparametric test in R that will handle this many independent > variables. > Simply saying ANOVA is robust to small departures from normality is > not > going to be good enough for my client. The statistical assumption of normality for linear models do not apply to the distribution of the dependent variable, but rather to the residuals after a model is estimated. Furthermore, it is the homoskedasticity assumption that is more commonly violated and also greater threat to validity. (And if you don't already know both of these points, then you desperately need to review your basic modeling practices.) > I need to compute an error amount for > ANOVA or find a nonparametric equivalent. You might get a better answer if you expressed the first part of that question in unambiguous terminology. What is "error amount"? For the second part, there is an entire Task View on Robust Statistical Methods. -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
To add to David's note, the Kruskal-Wallis test is the nonparametric
counterpart to one-way ANOVA. You can get a series of K-W tests for several grouping or continuous independent variables (but note these are SEPARATE analyses) using the Hmisc package's spearman2 function. The generalization of K-W to the case of multiple independent variables is the proportional odds ordinal logistic model (see e.g. the rms package lrm function). Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University On Mon, 2 Aug 2010, David Winsemius wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > >> >> I am conducting an experiment with four independent variables each >> of which >> has three or more factor levels. The sample size is quite large i.e. >> several >> thousand. The dependent variable data does not pass a normality test >> but >> "visually" looks close to normal so is there a way to compute the >> affect >> this would have on the p-value for ANOVA or is there a way to >> perform an >> nonparametric test in R that will handle this many independent >> variables. >> Simply saying ANOVA is robust to small departures from normality is >> not >> going to be good enough for my client. > > The statistical assumption of normality for linear models do not apply > to the distribution of the dependent variable, but rather to the > residuals after a model is estimated. Furthermore, it is the > homoskedasticity assumption that is more commonly violated and also > greater threat to validity. (And if you don't already know both of > these points, then you desperately need to review your basic modeling > practices.) > >> I need to compute an error amount for >> ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of that > question in unambiguous terminology. What is "error amount"? > > For the second part, there is an entire Task View on Robust > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
|
In reply to this post by David Winsemius
David et. al:
I take issue with this. It is the lack of independence that is the major issue. In particular, clustering, split-plotting, and so forth due to "convenience order" experimentation, lack of randomization, exogenous effects like the systematic effects due to measurement method/location have the major effect on inducing bias and distorting inference. Normality and unequal variances typically pale to insignificance compared to this. Obviously, IMHO. Note 1: George Box noted this at least 50 years ago in the early '60's when he and Jenkins developed arima modeling. Note 2: If you can, have a look at Jack Youden's classic paper "Enduring Values", which comments to some extent on these issues, here: http://www.jstor.org/pss/1266913 Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <[hidden email]>wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > >> I am conducting an experiment with four independent variables each of >> which >> has three or more factor levels. The sample size is quite large i.e. >> several >> thousand. The dependent variable data does not pass a normality test but >> "visually" looks close to normal so is there a way to compute the affect >> this would have on the p-value for ANOVA or is there a way to perform an >> nonparametric test in R that will handle this many independent variables. >> Simply saying ANOVA is robust to small departures from normality is not >> going to be good enough for my client. >> > > The statistical assumption of normality for linear models do not apply to > the distribution of the dependent variable, but rather to the residuals > after a model is estimated. Furthermore, it is the homoskedasticity > assumption that is more commonly violated and also greater threat to > validity. (And if you don't already know both of these points, then you > desperately need to review your basic modeling practices.) > > > I need to compute an error amount for >> ANOVA or find a nonparametric equivalent. >> > > You might get a better answer if you expressed the first part of that > question in unambiguous terminology. What is "error amount"? > > For the second part, there is an entire Task View on Robust Statistical > Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
This post was updated on .
In reply to this post by David Winsemius
I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000.
The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval. William ________________________________ From: David Winsemius [via R] [ml-node+2310616-1859960724-371040@n4.nabble.com] Sent: Monday, August 02, 2010 1:33 PM To: Reith, William [USA] Subject: Re: Problems with normality req. for ANOVA On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > I am conducting an experiment with four independent variables each > of which > has three or more factor levels. The sample size is quite large i.e. > several > thousand. The dependent variable data does not pass a normality test > but > "visually" looks close to normal so is there a way to compute the > affect > this would have on the p-value for ANOVA or is there a way to > perform an > nonparametric test in R that will handle this many independent > variables. > Simply saying ANOVA is robust to small departures from normality is > not > going to be good enough for my client. The statistical assumption of normality for linear models do not apply to the distribution of the dependent variable, but rather to the residuals after a model is estimated. Furthermore, it is the homoskedasticity assumption that is more commonly violated and also greater threat to validity. (And if you don't already know both of these points, then you desperately need to review your basic modeling practices.) > I need to compute an error amount for > ANOVA or find a nonparametric equivalent. You might get a better answer if you expressed the first part of that question in unambiguous terminology. What is "error amount"? For the second part, there is an entire Task View on Robust Statistical Methods. -- David Winsemius, MD West Hartford, CT ______________________________________________ ________________________________ View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html To unsubscribe from Problems with normality req. for ANOVA, click here<http://r.789695.n4.nabble.com/subscriptions/Unsubscribe.jtp?code=cmVpdGhfd2lsbGlhbUBiYWguY29tfDIzMTAyNzV8LTExODc0MDI5OTQ=>. |
|
In reply to this post by Bert Gunter
In a general situation of observational studies, your point is
undoubtedly true, and apparently you believe it to be true even in the setting of designed experiments. Perhaps I should have confined myself to my first sentence. -- David. On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote: > David et. al: > > I take issue with this. It is the lack of independence that is the > major issue. In particular, clustering, split-plotting, and so forth > due to "convenience order" experimentation, lack of randomization, > exogenous effects like the systematic effects due to measurement > method/location have the major effect on inducing bias and > distorting inference. Normality and unequal variances typically pale > to insignificance compared to this. > > Obviously, IMHO. > > Note 1: George Box noted this at least 50 years ago in the early > '60's when he and Jenkins developed arima modeling. > > Note 2: If you can, have a look at Jack Youden's classic paper > "Enduring Values", which comments to some extent on these issues, > here: http://www.jstor.org/pss/1266913 > > Cheers, > Bert > > > Bert Gunter > Genentech Nonclinical Biostatistics > > > > On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <[hidden email] > > wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > > I am conducting an experiment with four independent variables each > of which > has three or more factor levels. The sample size is quite large i.e. > several > thousand. The dependent variable data does not pass a normality test > but > "visually" looks close to normal so is there a way to compute the > affect > this would have on the p-value for ANOVA or is there a way to > perform an > nonparametric test in R that will handle this many independent > variables. > Simply saying ANOVA is robust to small departures from normality is > not > going to be good enough for my client. > > The statistical assumption of normality for linear models do not > apply to the distribution of the dependent variable, but rather to > the residuals after a model is estimated. Furthermore, it is the > homoskedasticity assumption that is more commonly violated and also > greater threat to validity. (And if you don't already know both of > these points, then you desperately need to review your basic > modeling practices.) > > > I need to compute an error amount for > ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of > that question in unambiguous terminology. What is "error amount"? > > For the second part, there is an entire Task View on Robust > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > > > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by wwreith
My sympathies, but I don't think it's the business of list
contributors to facilitate stupidity. "Confidence interval for the p-value" is nonsense. You could try sensitivity analyses via simulation, though. Cheers, Bert Gunter Genentech Nonclinical Biostatistics On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]> wrote: > > I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000. > > The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval. > > William W. Reith III > > Business Analytics > J9 SAC (757)-203-3400 Best Contact From 7:00am-4:00pm > J9 Office (757)-203-3772 > Booz Office (757) 466-3253 > Mobile (434)-989-7948 > > ________________________________ > From: David Winsemius [via R] [[hidden email]] > Sent: Monday, August 02, 2010 1:33 PM > To: Reith, William [USA] > Subject: Re: Problems with normality req. for ANOVA > > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > > > > I am conducting an experiment with four independent variables each > > of which > > has three or more factor levels. The sample size is quite large i.e. > > several > > thousand. The dependent variable data does not pass a normality test > > but > > "visually" looks close to normal so is there a way to compute the > > affect > > this would have on the p-value for ANOVA or is there a way to > > perform an > > nonparametric test in R that will handle this many independent > > variables. > > Simply saying ANOVA is robust to small departures from normality is > > not > > going to be good enough for my client. > > The statistical assumption of normality for linear models do not apply > to the distribution of the dependent variable, but rather to the > residuals after a model is estimated. Furthermore, it is the > homoskedasticity assumption that is more commonly violated and also > greater threat to validity. (And if you don't already know both of > these points, then you desperately need to review your basic modeling > practices.) > > > I need to compute an error amount for > > ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of that > question in unambiguous terminology. What is "error amount"? > > For the second part, there is an entire Task View on Robust > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > [hidden email]<https://webmail.bah.com/OWA/UrlBlockedError.aspx> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ________________________________ > View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html > To unsubscribe from Problems with normality req. for ANOVA, click here< (link removed) =>. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html > Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi,
simulating would still require you to operationalize the "lack of normality". Are the tails too heavy? Is the distribution skewed? Does it have multiple peaks? I suspect that the specific choices you would make here would *strongly* influence the result. My condolences on the client you are facing. Good luck, Stephan (ex-BAH) Bert Gunter wrote: > > You could try sensitivity analyses via simulation, though. > > On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]> > wrote: >> >> The problem I have is that I am expected, by my client, to find a >> similiar formula that states which way the p-value would be pushed >> by a lack of normality. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by wwreith
I have been struggling to make the sense of permutation test for weeks. It seems will work for you.
|
|
In reply to this post by Bert Gunter
In addition the poster did not tell us what is wrong with a
nonparametric test. Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University On Mon, 2 Aug 2010, Bert Gunter wrote: > My sympathies, but I don't think it's the business of list > contributors to facilitate stupidity. > > "Confidence interval for the p-value" is nonsense. You could try > sensitivity analyses via simulation, though. > > Cheers, > > Bert Gunter > Genentech Nonclinical Biostatistics > > On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]> wrote: >> >> I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000. >> >> The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval. >> >> William W. Reith III >> >> Business Analytics >> J9 SAC (757)-203-3400 Best Contact From 7:00am-4:00pm >> J9 Office (757)-203-3772 >> Booz Office (757) 466-3253 >> Mobile (434)-989-7948 >> >> ________________________________ >> From: David Winsemius [via R] [[hidden email]] >> Sent: Monday, August 02, 2010 1:33 PM >> To: Reith, William [USA] >> Subject: Re: Problems with normality req. for ANOVA >> >> >> On Aug 2, 2010, at 9:33 AM, wwreith wrote: >> >>> >>> I am conducting an experiment with four independent variables each >>> of which >>> has three or more factor levels. The sample size is quite large i.e. >>> several >>> thousand. The dependent variable data does not pass a normality test >>> but >>> "visually" looks close to normal so is there a way to compute the >>> affect >>> this would have on the p-value for ANOVA or is there a way to >>> perform an >>> nonparametric test in R that will handle this many independent >>> variables. >>> Simply saying ANOVA is robust to small departures from normality is >>> not >>> going to be good enough for my client. >> >> The statistical assumption of normality for linear models do not apply >> to the distribution of the dependent variable, but rather to the >> residuals after a model is estimated. Furthermore, it is the >> homoskedasticity assumption that is more commonly violated and also >> greater threat to validity. (And if you don't already know both of >> these points, then you desperately need to review your basic modeling >> practices.) >> >>> I need to compute an error amount for >>> ANOVA or find a nonparametric equivalent. >> >> You might get a better answer if you expressed the first part of that >> question in unambiguous terminology. What is "error amount"? >> >> For the second part, there is an entire Task View on Robust >> Statistical Methods. >> >> -- >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> [hidden email]<https://webmail.bah.com/OWA/UrlBlockedError.aspx> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ________________________________ >> View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html >> To unsubscribe from Problems with normality req. for ANOVA, click here< (link removed) =>. >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html >> Sent from the R help mailing list archive at Nabble.com. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University |
|
In reply to this post by David Winsemius
As a matter of fact, I would say both Bert and I encounter "designed
experiments" a lot more than "observational studies", yet we speak from experience that those things that Bert mentioned happen on a daily basis. When you talk to experimenters, ask your questions carefully and you'll see these things crop up. Andy -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of David Winsemius Sent: Monday, August 02, 2010 3:35 PM To: Bert Gunter Cc: [hidden email]; wwreith Subject: Re: [R] Problems with normality req. for ANOVA In a general situation of observational studies, your point is undoubtedly true, and apparently you believe it to be true even in the setting of designed experiments. Perhaps I should have confined myself to my first sentence. -- David. On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote: > David et. al: > > I take issue with this. It is the lack of independence that is the > major issue. In particular, clustering, split-plotting, and so forth > due to "convenience order" experimentation, lack of randomization, > exogenous effects like the systematic effects due to measurement > method/location have the major effect on inducing bias and > distorting inference. Normality and unequal variances typically pale > to insignificance compared to this. > > Obviously, IMHO. > > Note 1: George Box noted this at least 50 years ago in the early > '60's when he and Jenkins developed arima modeling. > > Note 2: If you can, have a look at Jack Youden's classic paper > "Enduring Values", which comments to some extent on these issues, > here: http://www.jstor.org/pss/1266913 > > Cheers, > Bert > > > Bert Gunter > Genentech Nonclinical Biostatistics > > > > On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius > > wrote: > > On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > > I am conducting an experiment with four independent variables each > of which > has three or more factor levels. The sample size is quite large i.e. > several > thousand. The dependent variable data does not pass a normality test > but > "visually" looks close to normal so is there a way to compute the > affect > this would have on the p-value for ANOVA or is there a way to > perform an > nonparametric test in R that will handle this many independent > variables. > Simply saying ANOVA is robust to small departures from normality is > not > going to be good enough for my client. > > The statistical assumption of normality for linear models do not > apply to the distribution of the dependent variable, but rather to > the residuals after a model is estimated. Furthermore, it is the > homoskedasticity assumption that is more commonly violated and also > greater threat to validity. (And if you don't already know both of > these points, then you desperately need to review your basic > modeling practices.) > > > I need to compute an error amount for > ANOVA or find a nonparametric equivalent. > > You might get a better answer if you expressed the first part of > that question in unambiguous terminology. What is "error amount"? > > For the second part, there is an entire Task View on Robust > Statistical Methods. > > -- > > David Winsemius, MD > West Hartford, CT > > > > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Aug 3, 2010, at 2:41 PM, Liaw, Andy wrote: > As a matter of fact, I would say both Bert and I encounter "designed > experiments" a lot more than "observational studies", yet we speak from > experience that those things that Bert mentioned happen on a daily > basis. When you talk to experimenters, ask your questions carefully and > you'll see these things crop up. Yes. I think the most egregious example I have seen involved getting an F test wrong by a factor of 7. This sort of error comes about extremely easily if you divide by the wrong sum of squares in an ANOVA table, and since it often requires dealing with difficult terms like "random interaction", researchers are typically much more prone to collect data in complicated designs than they are to analyze them correctly afterwards. However, it obviously depends on your perspective, epidemiologists usually have different complications from econometricians, and a clinical trial is typically less complicated than a lab experiment. > > Andy > > > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] > On Behalf Of David Winsemius > Sent: Monday, August 02, 2010 3:35 PM > To: Bert Gunter > Cc: [hidden email]; wwreith > Subject: Re: [R] Problems with normality req. for ANOVA > > In a general situation of observational studies, your point is > undoubtedly true, and apparently you believe it to be true even in the > setting of designed experiments. Perhaps I should have confined myself > to my first sentence. > > -- > David. > > > On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote: > >> David et. al: >> >> I take issue with this. It is the lack of independence that is the >> major issue. In particular, clustering, split-plotting, and so forth >> due to "convenience order" experimentation, lack of randomization, >> exogenous effects like the systematic effects due to measurement >> method/location have the major effect on inducing bias and >> distorting inference. Normality and unequal variances typically pale >> to insignificance compared to this. >> >> Obviously, IMHO. >> >> Note 1: George Box noted this at least 50 years ago in the early >> '60's when he and Jenkins developed arima modeling. >> >> Note 2: If you can, have a look at Jack Youden's classic paper >> "Enduring Values", which comments to some extent on these issues, >> here: http://www.jstor.org/pss/1266913 >> >> Cheers, >> Bert >> >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> >> >> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius > <[hidden email] >>> wrote: >> >> On Aug 2, 2010, at 9:33 AM, wwreith wrote: >> >> >> I am conducting an experiment with four independent variables each >> of which >> has three or more factor levels. The sample size is quite large i.e. >> several >> thousand. The dependent variable data does not pass a normality test >> but >> "visually" looks close to normal so is there a way to compute the >> affect >> this would have on the p-value for ANOVA or is there a way to >> perform an >> nonparametric test in R that will handle this many independent >> variables. >> Simply saying ANOVA is robust to small departures from normality is >> not >> going to be good enough for my client. >> >> The statistical assumption of normality for linear models do not >> apply to the distribution of the dependent variable, but rather to >> the residuals after a model is estimated. Furthermore, it is the >> homoskedasticity assumption that is more commonly violated and also >> greater threat to validity. (And if you don't already know both of >> these points, then you desperately need to review your basic >> modeling practices.) >> >> >> I need to compute an error amount for >> ANOVA or find a nonparametric equivalent. >> >> You might get a better answer if you expressed the first part of >> that question in unambiguous terminology. What is "error amount"? >> >> For the second part, there is an entire Task View on Robust >> Statistical Methods. >> >> -- >> >> David Winsemius, MD >> West Hartford, CT >> >> >> >> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Notice: This e-mail message, together with any attachme...{{dropped:11}} > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by wwreith
At 19:31 02/08/2010, wwreith wrote:
>I am testing normality on the studetized residuals that are >generated after performing ANOVA and yes I used Levene's test to see >if the variances can be assumed equal. They infact are not, but I >have found a formula for determining whether the p-value for ANOVA >will become larger or smaller as a result of unequal variances and >unequal sample sizes. Fortuneately it turns out the p-value is >greater. Despite this the ANOVA test is still significant with p=.000. Well perhaps this is starting a new hare but surely if you have shown the variances are unequal you have already shown the treatment(s) made a difference? >The problem I have is that I am expected, by my client, to find a >similiar formula that states which way the p-value would be pushed >by a lack of normality. Despite numerous citations that ANOVA is >robust to departures of normality my client does not care. They want >numerical proof. This lead to looking for a method for estimating >the effects non normality would have on the p-value for ANOVA. In >other words can I build a confidence interval for the p-value? Hence >the error term I am speaking of would be a the margin or error for >p-value confidence interval. > >William W. Reith III > >Business Analytics >J9 SAC (757)-203-3400 Best Contact From 7:00am-4:00pm >J9 Office (757)-203-3772 >Booz Office (757) 466-3253 >Mobile (434)-989-7948 > >________________________________ >From: David Winsemius [via R] >[[hidden email]] >Sent: Monday, August 02, 2010 1:33 PM >To: Reith, William [USA] >Subject: Re: Problems with normality req. for ANOVA > > >On Aug 2, 2010, at 9:33 AM, wwreith wrote: > > > > > I am conducting an experiment with four independent variables each > > of which > > has three or more factor levels. The sample size is quite large i.e. > > several > > thousand. The dependent variable data does not pass a normality test > > but > > "visually" looks close to normal so is there a way to compute the > > affect > > this would have on the p-value for ANOVA or is there a way to > > perform an > > nonparametric test in R that will handle this many independent > > variables. > > Simply saying ANOVA is robust to small departures from normality is > > not > > going to be good enough for my client. > >The statistical assumption of normality for linear models do not apply >to the distribution of the dependent variable, but rather to the >residuals after a model is estimated. Furthermore, it is the >homoskedasticity assumption that is more commonly violated and also >greater threat to validity. (And if you don't already know both of >these points, then you desperately need to review your basic modeling >practices.) > > > I need to compute an error amount for > > ANOVA or find a nonparametric equivalent. > >You might get a better answer if you expressed the first part of that >question in unambiguous terminology. What is "error amount"? > >For the second part, there is an entire Task View on Robust >Statistical Methods. > >-- > >David Winsemius, MD >West Hartford, CT > >______________________________________________ >[hidden email]<https://webmail.bah.com/OWA/UrlBlockedError.aspx> mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > >________________________________ >View message @ >http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html >To unsubscribe from Problems with normality req. for ANOVA, click >here< (link removed) =>. > > >-- >View this message in context: >http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html >Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.uk ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
