Testing for normality in categorical data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing for normality in categorical data

Nancy Felix
Hello
I have data that are categorical both independent variable and dependent as
well having levels more than 3. How can i check the normality of my data?

I have tried the example given of Shapiro-Wilk for levels of factors

data
summary(chickwts)

## linear model and ANOVA
fm <- lm(weight ~ feed, data = chickwts)
anova(fm)

## QQ plot for residuals + Shapiro-Wilk test
shapiro.test(residuals(fm))

## separate tests for all groups of observations
## (with some formatting)
do.call("rbind", with(chickwts, tapply(weight, feed,
   function(x) unlist(shapiro.test(x)[c("statistic", "p.value")]))))

But ended up with Error message that x should be numeric and more comments
see below.
Hope to get some help on this

Thanks,
Nancy

## linear model and ANOVA
> fm <- lm(retaliation ~ occupation, data = kazi)
Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
> anova(fm)
Error in if (ssr < 1e-10 * mss) warning("ANOVA F-tests on an essentially
perfect fit are unreliable") :
  missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(object$residuals, 2) : ‘^’ not meaningful for factors
> ## QQ plot for residuals + Shapiro-Wilk test
> shapiro.test(residuals(fm))
Error in class(y) <- oldClass(x) :
  adding class "factor" to an invalid object
> ## separate tests for all groups of observations
> ## (with some formatting)
> do.call("rbind", with(kazi, tapply(retaliation, occupation,
+                                        function(x)
unlist(shapiro.test(x)[c("statistic", "p.value")]))))

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Testing for normality in categorical data

Bert Gunter-2
Categorical data cannot be normal. What you are doing is statistical
nonsense, as your error messages suggest.  You need to consult a local
statistician for help.

Furthermore, statistical questions are generally OT on this list, which is
about R programming.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Oct 5, 2019 at 6:19 AM Nancy Felix <[hidden email]> wrote:

> Hello
> I have data that are categorical both independent variable and dependent as
> well having levels more than 3. How can i check the normality of my data?
>
> I have tried the example given of Shapiro-Wilk for levels of factors
>
> data
> summary(chickwts)
>
> ## linear model and ANOVA
> fm <- lm(weight ~ feed, data = chickwts)
> anova(fm)
>
> ## QQ plot for residuals + Shapiro-Wilk test
> shapiro.test(residuals(fm))
>
> ## separate tests for all groups of observations
> ## (with some formatting)
> do.call("rbind", with(chickwts, tapply(weight, feed,
>    function(x) unlist(shapiro.test(x)[c("statistic", "p.value")]))))
>
> But ended up with Error message that x should be numeric and more comments
> see below.
> Hope to get some help on this
>
> Thanks,
> Nancy
>
> ## linear model and ANOVA
> > fm <- lm(retaliation ~ occupation, data = kazi)
> Warning messages:
> 1: In model.response(mf, "numeric") :
>   using type = "numeric" with a factor response will be ignored
> 2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
> > anova(fm)
> Error in if (ssr < 1e-10 * mss) warning("ANOVA F-tests on an essentially
> perfect fit are unreliable") :
>   missing value where TRUE/FALSE needed
> In addition: Warning message:
> In Ops.factor(object$residuals, 2) : ‘^’ not meaningful for factors
> > ## QQ plot for residuals + Shapiro-Wilk test
> > shapiro.test(residuals(fm))
> Error in class(y) <- oldClass(x) :
>   adding class "factor" to an invalid object
> > ## separate tests for all groups of observations
> > ## (with some formatting)
> > do.call("rbind", with(kazi, tapply(retaliation, occupation,
> +                                        function(x)
> unlist(shapiro.test(x)[c("statistic", "p.value")]))))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Testing for normality in categorical data

Jim Lemon-4
In reply to this post by Nancy Felix
Hi Nancy,
The chickwts dataset contains one sort-of continuous variable (weight)
and a categorical variable (feed). Two things that will help you to
understand what you are trying to do is to "eyeball" the "weight"
data:

# this shows you the rough distribution of chick weights
hist(chickwts$weight)
# this shows you how well the distribution of weights fits a normal distribution
qqnorm(chickwts$weight)

For the Shapiro-Wilks statistic on the distribution of all of the weights:

shapiro.test(chickwts$weight)

and if you really want to test the normality within the feed groups:

by(chickwts$weight,chickwts$feed,shapiro.test)

Now because the p-values returned are all fairly large, you can accept
the null hypothesis of normality.
As Bert has noted, it looks like you are just throwing the data into
the functions without really knowing what you are doing. Hopefully,
the above will get you started.

Jim

On Sat, Oct 5, 2019 at 11:19 PM Nancy Felix <[hidden email]> wrote:

>
> Hello
> I have data that are categorical both independent variable and dependent as
> well having levels more than 3. How can i check the normality of my data?
>
> I have tried the example given of Shapiro-Wilk for levels of factors
>
> data
> summary(chickwts)
>
> ## linear model and ANOVA
> fm <- lm(weight ~ feed, data = chickwts)
> anova(fm)
>
> ## QQ plot for residuals + Shapiro-Wilk test
> shapiro.test(residuals(fm))
>
> ## separate tests for all groups of observations
> ## (with some formatting)
> do.call("rbind", with(chickwts, tapply(weight, feed,
>    function(x) unlist(shapiro.test(x)[c("statistic", "p.value")]))))
>
> But ended up with Error message that x should be numeric and more comments
> see below.
> Hope to get some help on this
>
> Thanks,
> Nancy
>
> ## linear model and ANOVA
> > fm <- lm(retaliation ~ occupation, data = kazi)
> Warning messages:
> 1: In model.response(mf, "numeric") :
>   using type = "numeric" with a factor response will be ignored
> 2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
> > anova(fm)
> Error in if (ssr < 1e-10 * mss) warning("ANOVA F-tests on an essentially
> perfect fit are unreliable") :
>   missing value where TRUE/FALSE needed
> In addition: Warning message:
> In Ops.factor(object$residuals, 2) : ‘^’ not meaningful for factors
> > ## QQ plot for residuals + Shapiro-Wilk test
> > shapiro.test(residuals(fm))
> Error in class(y) <- oldClass(x) :
>   adding class "factor" to an invalid object
> > ## separate tests for all groups of observations
> > ## (with some formatting)
> > do.call("rbind", with(kazi, tapply(retaliation, occupation,
> +                                        function(x)
> unlist(shapiro.test(x)[c("statistic", "p.value")]))))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.