Quantcast

Problems with normality req. for ANOVA

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Problems with normality req. for ANOVA

wwreith
I am conducting an experiment with four independent variables each of which has three or more factor levels. The sample size is quite large i.e. several thousand. The dependent variable data does not pass a normality test but "visually" looks close to normal so is there a way to compute the affect this would have on the p-value for ANOVA or is there a way to perform an nonparametric test in R that will handle this many independent variables. Simply saying ANOVA is robust to small departures from normality is not going to be good enough for my client. I need to compute an error amount for ANOVA or find a nonparametric equivalent.

Thanks,

William
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

David Winsemius

On Aug 2, 2010, at 9:33 AM, wwreith wrote:

>
> I am conducting an experiment with four independent variables each  
> of which
> has three or more factor levels. The sample size is quite large i.e.  
> several
> thousand. The dependent variable data does not pass a normality test  
> but
> "visually" looks close to normal so is there a way to compute the  
> affect
> this would have on the p-value for ANOVA or is there a way to  
> perform an
> nonparametric test in R that will handle this many independent  
> variables.
> Simply saying ANOVA is robust to small departures from normality is  
> not
> going to be good enough for my client.

The statistical assumption of normality for linear models do not apply  
to the distribution of the dependent variable, but rather to the  
residuals after a model is estimated. Furthermore, it is the  
homoskedasticity assumption that is more commonly violated and also  
greater threat to validity. (And if you don't already know both of  
these points, then you desperately need to review your basic modeling  
practices.)

>  I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.

You might get a better answer if you expressed the first part of that  
question in unambiguous terminology.  What is "error amount"?

For the second part, there is an entire Task View on Robust  
Statistical Methods.

--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Frank Harrell
To add to David's note, the Kruskal-Wallis test is the nonparametric
counterpart to one-way ANOVA.  You can get a series of K-W tests for
several grouping or continuous independent variables (but note these
are SEPARATE analyses) using the Hmisc package's spearman2 function.
The generalization of K-W to the case of multiple independent
variables is the proportional odds ordinal logistic model (see e.g.
the rms package lrm function).

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

On Mon, 2 Aug 2010, David Winsemius wrote:

>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>>
>> I am conducting an experiment with four independent variables each
>> of which
>> has three or more factor levels. The sample size is quite large i.e.
>> several
>> thousand. The dependent variable data does not pass a normality test
>> but
>> "visually" looks close to normal so is there a way to compute the
>> affect
>> this would have on the p-value for ANOVA or is there a way to
>> perform an
>> nonparametric test in R that will handle this many independent
>> variables.
>> Simply saying ANOVA is robust to small departures from normality is
>> not
>> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not apply
> to the distribution of the dependent variable, but rather to the
> residuals after a model is estimated. Furthermore, it is the
> homoskedasticity assumption that is more commonly violated and also
> greater threat to validity. (And if you don't already know both of
> these points, then you desperately need to review your basic modeling
> practices.)
>
>>  I need to compute an error amount for
>> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of that
> question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust
> Statistical Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Bert Gunter
In reply to this post by David Winsemius
David et. al:

I take issue with this. It is the lack of independence that is the major
issue. In particular, clustering, split-plotting, and so forth due to
"convenience order" experimentation, lack of randomization, exogenous
effects like the systematic effects due to measurement method/location have
the major effect on inducing bias and distorting inference. Normality and
unequal variances typically pale to insignificance compared to this.

Obviously, IMHO.

Note 1: George Box noted this at least 50 years ago in the early '60's when
he and Jenkins developed arima modeling.

Note 2: If you can, have a look at Jack Youden's classic paper "Enduring
Values", which comments to some extent on these issues, here:
http://www.jstor.org/pss/1266913

Cheers,
Bert


Bert Gunter
Genentech Nonclinical Biostatistics



On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <[hidden email]>wrote:

>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
>> I am conducting an experiment with four independent variables each of
>> which
>> has three or more factor levels. The sample size is quite large i.e.
>> several
>> thousand. The dependent variable data does not pass a normality test but
>> "visually" looks close to normal so is there a way to compute the affect
>> this would have on the p-value for ANOVA or is there a way to perform an
>> nonparametric test in R that will handle this many independent variables.
>> Simply saying ANOVA is robust to small departures from normality is not
>> going to be good enough for my client.
>>
>
> The statistical assumption of normality for linear models do not apply to
> the distribution of the dependent variable, but rather to the residuals
> after a model is estimated. Furthermore, it is the homoskedasticity
> assumption that is more commonly violated and also greater threat to
> validity. (And if you don't already know both of these points, then you
> desperately need to review your basic modeling practices.)
>
>
>  I need to compute an error amount for
>> ANOVA or find a nonparametric equivalent.
>>
>
> You might get a better answer if you expressed the first part of that
> question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust Statistical
> Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Problems with normality req. for ANOVA

wwreith
This post was updated on .
In reply to this post by David Winsemius
I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000.

The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval.

William
________________________________
From: David Winsemius [via R] [ml-node+2310616-1859960724-371040@n4.nabble.com]
Sent: Monday, August 02, 2010 1:33 PM
To: Reith, William [USA]
Subject: Re: Problems with normality req. for ANOVA


On Aug 2, 2010, at 9:33 AM, wwreith wrote:

>
> I am conducting an experiment with four independent variables each
> of which
> has three or more factor levels. The sample size is quite large i.e.
> several
> thousand. The dependent variable data does not pass a normality test
> but
> "visually" looks close to normal so is there a way to compute the
> affect
> this would have on the p-value for ANOVA or is there a way to
> perform an
> nonparametric test in R that will handle this many independent
> variables.
> Simply saying ANOVA is robust to small departures from normality is
> not
> going to be good enough for my client.

The statistical assumption of normality for linear models do not apply
to the distribution of the dependent variable, but rather to the
residuals after a model is estimated. Furthermore, it is the
homoskedasticity assumption that is more commonly violated and also
greater threat to validity. (And if you don't already know both of
these points, then you desperately need to review your basic modeling
practices.)

>  I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.

You might get a better answer if you expressed the first part of that
question in unambiguous terminology.  What is "error amount"?

For the second part, there is an entire Task View on Robust
Statistical Methods.

--

David Winsemius, MD
West Hartford, CT

______________________________________________
________________________________
View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html
To unsubscribe from Problems with normality req. for ANOVA, click here<http://r.789695.n4.nabble.com/subscriptions/Unsubscribe.jtp?code=cmVpdGhfd2lsbGlhbUBiYWguY29tfDIzMTAyNzV8LTExODc0MDI5OTQ=>.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

David Winsemius
In reply to this post by Bert Gunter
In a general situation of observational studies, your point is  
undoubtedly true, and apparently you believe it to be true even in the  
setting of designed experiments. Perhaps I should have confined myself  
to my first sentence.

--
David.


On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:

> David et. al:
>
> I take issue with this. It is the lack of independence that is the  
> major issue. In particular, clustering, split-plotting, and so forth  
> due to "convenience order" experimentation, lack of randomization,  
> exogenous effects like the systematic effects due to measurement  
> method/location have the major effect on inducing bias and  
> distorting inference. Normality and unequal variances typically pale  
> to insignificance compared to this.
>
> Obviously, IMHO.
>
> Note 1: George Box noted this at least 50 years ago in the early  
> '60's when he and Jenkins developed arima modeling.
>
> Note 2: If you can, have a look at Jack Youden's classic paper  
> "Enduring Values", which comments to some extent on these issues,  
> here: http://www.jstor.org/pss/1266913
>
> Cheers,
> Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius <[hidden email]
> > wrote:
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
> I am conducting an experiment with four independent variables each  
> of which
> has three or more factor levels. The sample size is quite large i.e.  
> several
> thousand. The dependent variable data does not pass a normality test  
> but
> "visually" looks close to normal so is there a way to compute the  
> affect
> this would have on the p-value for ANOVA or is there a way to  
> perform an
> nonparametric test in R that will handle this many independent  
> variables.
> Simply saying ANOVA is robust to small departures from normality is  
> not
> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not  
> apply to the distribution of the dependent variable, but rather to  
> the residuals after a model is estimated. Furthermore, it is the  
> homoskedasticity assumption that is more commonly violated and also  
> greater threat to validity. (And if you don't already know both of  
> these points, then you desperately need to review your basic  
> modeling practices.)
>
>
>  I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of  
> that question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust  
> Statistical Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Bert Gunter
In reply to this post by wwreith
My sympathies, but I don't think it's the business of list
contributors to facilitate stupidity.

"Confidence interval for the p-value" is nonsense. You could try
sensitivity analyses via simulation, though.

Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics

On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]> wrote:

>
> I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000.
>
> The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval.
>
> William W. Reith III
>
> Business Analytics
> J9 SAC (757)-203-3400  Best Contact From 7:00am-4:00pm
> J9 Office (757)-203-3772
> Booz Office (757) 466-3253
> Mobile (434)-989-7948
>
> ________________________________
> From: David Winsemius [via R] [[hidden email]]
> Sent: Monday, August 02, 2010 1:33 PM
> To: Reith, William [USA]
> Subject: Re: Problems with normality req. for ANOVA
>
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
> >
> > I am conducting an experiment with four independent variables each
> > of which
> > has three or more factor levels. The sample size is quite large i.e.
> > several
> > thousand. The dependent variable data does not pass a normality test
> > but
> > "visually" looks close to normal so is there a way to compute the
> > affect
> > this would have on the p-value for ANOVA or is there a way to
> > perform an
> > nonparametric test in R that will handle this many independent
> > variables.
> > Simply saying ANOVA is robust to small departures from normality is
> > not
> > going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not apply
> to the distribution of the dependent variable, but rather to the
> residuals after a model is estimated. Furthermore, it is the
> homoskedasticity assumption that is more commonly violated and also
> greater threat to validity. (And if you don't already know both of
> these points, then you desperately need to review your basic modeling
> practices.)
>
> >  I need to compute an error amount for
> > ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of that
> question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust
> Statistical Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email]<https://webmail.bah.com/OWA/UrlBlockedError.aspx> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html
> To unsubscribe from Problems with normality req. for ANOVA, click here< (link removed) =>.
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html
> Sent from the R help mailing list archive at Nabble.com.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Stephan Kolassa
Hi,

simulating would still require you to operationalize the "lack of
normality". Are the tails too heavy? Is the distribution skewed? Does it
have multiple peaks? I suspect that the specific choices you would make
here would *strongly* influence the result.

My condolences on the client you are facing.

Good luck,
Stephan (ex-BAH)



Bert Gunter wrote:
>
> You could try sensitivity analyses via simulation, though.
>
> On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]>
> wrote:
>>
>> The problem I have is that I am expected, by my client, to find a
>> similiar formula that states which way the p-value would be pushed
>> by a lack of normality.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Problems with normality req. for ANOVA

Wu Gong
In reply to this post by wwreith
I have been struggling to make the sense of permutation test for weeks. It seems will work for you.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Frank Harrell
In reply to this post by Bert Gunter
In addition the poster did not tell us what is wrong with a
nonparametric test.

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

On Mon, 2 Aug 2010, Bert Gunter wrote:

> My sympathies, but I don't think it's the business of list
> contributors to facilitate stupidity.
>
> "Confidence interval for the p-value" is nonsense. You could try
> sensitivity analyses via simulation, though.
>
> Cheers,
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> On Mon, Aug 2, 2010 at 11:31 AM, wwreith <[hidden email]> wrote:
>>
>> I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000.
>>
>> The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval.
>>
>> William W. Reith III
>>
>> Business Analytics
>> J9 SAC (757)-203-3400  Best Contact From 7:00am-4:00pm
>> J9 Office (757)-203-3772
>> Booz Office (757) 466-3253
>> Mobile (434)-989-7948
>>
>> ________________________________
>> From: David Winsemius [via R] [[hidden email]]
>> Sent: Monday, August 02, 2010 1:33 PM
>> To: Reith, William [USA]
>> Subject: Re: Problems with normality req. for ANOVA
>>
>>
>> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>>
>>>
>>> I am conducting an experiment with four independent variables each
>>> of which
>>> has three or more factor levels. The sample size is quite large i.e.
>>> several
>>> thousand. The dependent variable data does not pass a normality test
>>> but
>>> "visually" looks close to normal so is there a way to compute the
>>> affect
>>> this would have on the p-value for ANOVA or is there a way to
>>> perform an
>>> nonparametric test in R that will handle this many independent
>>> variables.
>>> Simply saying ANOVA is robust to small departures from normality is
>>> not
>>> going to be good enough for my client.
>>
>> The statistical assumption of normality for linear models do not apply
>> to the distribution of the dependent variable, but rather to the
>> residuals after a model is estimated. Furthermore, it is the
>> homoskedasticity assumption that is more commonly violated and also
>> greater threat to validity. (And if you don't already know both of
>> these points, then you desperately need to review your basic modeling
>> practices.)
>>
>>>  I need to compute an error amount for
>>> ANOVA or find a nonparametric equivalent.
>>
>> You might get a better answer if you expressed the first part of that
>> question in unambiguous terminology.  What is "error amount"?
>>
>> For the second part, there is an entire Task View on Robust
>> Statistical Methods.
>>
>> --
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> [hidden email]<https://webmail.bah.com/OWA/UrlBlockedError.aspx> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html
>> To unsubscribe from Problems with normality req. for ANOVA, click here< (link removed) =>.
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Liaw, Andy
In reply to this post by David Winsemius
As a matter of fact, I would say both Bert and I encounter "designed
experiments" a lot more than "observational studies", yet we speak from
experience that those things that Bert mentioned happen on a daily
basis.  When you talk to experimenters, ask your questions carefully and
you'll see these things crop up.

Andy
 

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
On Behalf Of David Winsemius
Sent: Monday, August 02, 2010 3:35 PM
To: Bert Gunter
Cc: [hidden email]; wwreith
Subject: Re: [R] Problems with normality req. for ANOVA

In a general situation of observational studies, your point is  
undoubtedly true, and apparently you believe it to be true even in the  
setting of designed experiments. Perhaps I should have confined myself  
to my first sentence.

--
David.


On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:

> David et. al:
>
> I take issue with this. It is the lack of independence that is the  
> major issue. In particular, clustering, split-plotting, and so forth  
> due to "convenience order" experimentation, lack of randomization,  
> exogenous effects like the systematic effects due to measurement  
> method/location have the major effect on inducing bias and  
> distorting inference. Normality and unequal variances typically pale  
> to insignificance compared to this.
>
> Obviously, IMHO.
>
> Note 1: George Box noted this at least 50 years ago in the early  
> '60's when he and Jenkins developed arima modeling.
>
> Note 2: If you can, have a look at Jack Youden's classic paper  
> "Enduring Values", which comments to some extent on these issues,  
> here: http://www.jstor.org/pss/1266913
>
> Cheers,
> Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius
<[hidden email]

> > wrote:
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
> I am conducting an experiment with four independent variables each  
> of which
> has three or more factor levels. The sample size is quite large i.e.  
> several
> thousand. The dependent variable data does not pass a normality test  
> but
> "visually" looks close to normal so is there a way to compute the  
> affect
> this would have on the p-value for ANOVA or is there a way to  
> perform an
> nonparametric test in R that will handle this many independent  
> variables.
> Simply saying ANOVA is robust to small departures from normality is  
> not
> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not  
> apply to the distribution of the dependent variable, but rather to  
> the residuals after a model is estimated. Furthermore, it is the  
> homoskedasticity assumption that is more commonly violated and also  
> greater threat to validity. (And if you don't already know both of  
> these points, then you desperately need to review your basic  
> modeling practices.)
>
>
>  I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of  
> that question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust  
> Statistical Methods.
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Peter Dalgaard-2

On Aug 3, 2010, at 2:41 PM, Liaw, Andy wrote:

> As a matter of fact, I would say both Bert and I encounter "designed
> experiments" a lot more than "observational studies", yet we speak from
> experience that those things that Bert mentioned happen on a daily
> basis.  When you talk to experimenters, ask your questions carefully and
> you'll see these things crop up.

Yes. I think the most egregious example I have seen involved getting an F test wrong by a factor of 7. This sort of error comes about extremely easily if you divide by the wrong sum of squares in an ANOVA table, and since it often requires dealing with difficult terms like "random interaction", researchers are typically much more prone to collect data in complicated designs than they are to analyze them correctly afterwards.

However, it obviously depends on your perspective,  epidemiologists usually have different complications from econometricians, and a clinical trial is typically less complicated than a lab experiment.

>
> Andy
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> On Behalf Of David Winsemius
> Sent: Monday, August 02, 2010 3:35 PM
> To: Bert Gunter
> Cc: [hidden email]; wwreith
> Subject: Re: [R] Problems with normality req. for ANOVA
>
> In a general situation of observational studies, your point is  
> undoubtedly true, and apparently you believe it to be true even in the  
> setting of designed experiments. Perhaps I should have confined myself  
> to my first sentence.
>
> --
> David.
>
>
> On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:
>
>> David et. al:
>>
>> I take issue with this. It is the lack of independence that is the  
>> major issue. In particular, clustering, split-plotting, and so forth  
>> due to "convenience order" experimentation, lack of randomization,  
>> exogenous effects like the systematic effects due to measurement  
>> method/location have the major effect on inducing bias and  
>> distorting inference. Normality and unequal variances typically pale  
>> to insignificance compared to this.
>>
>> Obviously, IMHO.
>>
>> Note 1: George Box noted this at least 50 years ago in the early  
>> '60's when he and Jenkins developed arima modeling.
>>
>> Note 2: If you can, have a look at Jack Youden's classic paper  
>> "Enduring Values", which comments to some extent on these issues,  
>> here: http://www.jstor.org/pss/1266913
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>>
>>
>> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius
> <[hidden email]
>>> wrote:
>>
>> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>>
>>
>> I am conducting an experiment with four independent variables each  
>> of which
>> has three or more factor levels. The sample size is quite large i.e.  
>> several
>> thousand. The dependent variable data does not pass a normality test  
>> but
>> "visually" looks close to normal so is there a way to compute the  
>> affect
>> this would have on the p-value for ANOVA or is there a way to  
>> perform an
>> nonparametric test in R that will handle this many independent  
>> variables.
>> Simply saying ANOVA is robust to small departures from normality is  
>> not
>> going to be good enough for my client.
>>
>> The statistical assumption of normality for linear models do not  
>> apply to the distribution of the dependent variable, but rather to  
>> the residuals after a model is estimated. Furthermore, it is the  
>> homoskedasticity assumption that is more commonly violated and also  
>> greater threat to validity. (And if you don't already know both of  
>> these points, then you desperately need to review your basic  
>> modeling practices.)
>>
>>
>> I need to compute an error amount for
>> ANOVA or find a nonparametric equivalent.
>>
>> You might get a better answer if you expressed the first part of  
>> that question in unambiguous terminology.  What is "error amount"?
>>
>> For the second part, there is an entire Task View on Robust  
>> Statistical Methods.
>>
>> --
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attachme...{{dropped:11}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Problems with normality req. for ANOVA

Michael Dewey
In reply to this post by wwreith
At 19:31 02/08/2010, wwreith wrote:

>I am testing normality on the studetized residuals that are
>generated after performing ANOVA and yes I used Levene's test to see
>if the variances can be assumed equal. They infact are not, but I
>have found a formula for determining whether the p-value for ANOVA
>will become larger or smaller as a result of unequal variances and
>unequal sample sizes. Fortuneately it turns out the p-value is
>greater. Despite this the ANOVA test is still significant with p=.000.

Well perhaps this is starting a new hare but surely if you have shown
the variances are unequal you have already shown the treatment(s)
made a difference?


>The problem I have is that I am expected, by my client, to find a
>similiar formula that states which way the p-value would be pushed
>by a lack of normality. Despite numerous citations that ANOVA is
>robust to departures of normality my client does not care. They want
>numerical proof. This lead to looking for a method for estimating
>the effects non normality would have on the p-value for ANOVA. In
>other words can I build a confidence interval for the p-value? Hence
>the error term I am speaking of would be a the margin or error for
>p-value confidence interval.
>
>William W. Reith III
>
>Business Analytics
>J9 SAC (757)-203-3400  Best Contact From 7:00am-4:00pm
>J9 Office (757)-203-3772
>Booz Office (757) 466-3253
>Mobile (434)-989-7948
>
>________________________________
>From: David Winsemius [via R]
>[[hidden email]]
>Sent: Monday, August 02, 2010 1:33 PM
>To: Reith, William [USA]
>Subject: Re: Problems with normality req. for ANOVA
>
>
>On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
> >
> > I am conducting an experiment with four independent variables each
> > of which
> > has three or more factor levels. The sample size is quite large i.e.
> > several
> > thousand. The dependent variable data does not pass a normality test
> > but
> > "visually" looks close to normal so is there a way to compute the
> > affect
> > this would have on the p-value for ANOVA or is there a way to
> > perform an
> > nonparametric test in R that will handle this many independent
> > variables.
> > Simply saying ANOVA is robust to small departures from normality is
> > not
> > going to be good enough for my client.
>
>The statistical assumption of normality for linear models do not apply
>to the distribution of the dependent variable, but rather to the
>residuals after a model is estimated. Furthermore, it is the
>homoskedasticity assumption that is more commonly violated and also
>greater threat to validity. (And if you don't already know both of
>these points, then you desperately need to review your basic modeling
>practices.)
>
> >  I need to compute an error amount for
> > ANOVA or find a nonparametric equivalent.
>
>You might get a better answer if you expressed the first part of that
>question in unambiguous terminology.  What is "error amount"?
>
>For the second part, there is an entire Task View on Robust
>Statistical Methods.
>
>--
>
>David Winsemius, MD
>West Hartford, CT
>
>______________________________________________
>[hidden email]<https://webmail.bah.com/OWA/UrlBlockedError.aspx> mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>________________________________
>View message @
>http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html
>To unsubscribe from Problems with normality req. for ANOVA, click
>here< (link removed) =>.
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html
>Sent from the R help mailing list archive at Nabble.com.
>
>         [[alternative HTML version deleted]]

Michael Dewey
http://www.aghmed.fsnet.co.uk

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...