A goodness of fit test for two discrete distributions with unequal variance?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A goodness of fit test for two discrete distributions with unequal variance?

Serena De Stefani
I have a computer simulation in which a virtual agent end up in different
areas of a layout based on several factors. There are 18 conditions in
total.
If I collapse the datapoint into bins, where each bin is one of the areas,
the data would look like this:

    x0 <- c(3,3,5,5,2) # computer simulation

Now I would like to validate this model having human subjects going trough
the same conditions, but I run into two sets of issues:

 1. the first issue is due to the fact that the dataset is discrete and
small (there may be less than 5 counts in a bin, and that's a problem for a
Chi-Square Goodness of Fit test), also there may be ties. After some online
digging I found two options:
- a permutation test
- a Cramer-von Mises test of goodness-of-fit (see this paper
<https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf>
 https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf)

I thought the Cramer-von Mises test of goodness-of-fit test could work, so
I ran it with made-up data for *one human subject* and I get the following
result:

    x0 <- c(3,3,5,5,2) # computer simulation
    x1 <- c(4,2,5,4,3) # subject 1

    library(goftest)

    cvm.test(x0, ecdf(x1))

    >Cramer-von Mises test of goodness-of-fit
>Null hypothesis: distribution ‘ecdf(x1)’

    >data:  x0
    >omega2 = 0.14667, p-value = 0.4106

So far so good. But now let’s say I would like to have more than one human
subject, let’s say four of them. These are the results from the additional
subjects:

    x2 <- c(3,3,5,2,5) # subject 2
    x3 <- c(2,2,5,6,3) # subject 3
    x4 <- c(3,2,5,6,2) # subject 4

Now I run in the second set of issues:

2. on the one side I have a single computer simulation, on the other side I
have data from four subjects. Should I take the mean of the results for the
human subjects? Then would my data still be “discrete”? Or should I run my
simulation four times? But I would get always the same results, so the
variance between the two datasets would be different.

Any ideas? Maybe I should change the design and have more levels for my
factors, so that I have more trials and the bins get bigger?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: A goodness of fit test for two discrete distributions with unequal variance?

David Winsemius

On 8/23/19 2:52 PM, Serena De Stefani wrote:

> I have a computer simulation in which a virtual agent end up in different
> areas of a layout based on several factors. There are 18 conditions in
> total.
> If I collapse the datapoint into bins, where each bin is one of the areas,
> the data would look like this:
>
>      x0 <- c(3,3,5,5,2) # computer simulation
>
> Now I would like to validate this model having human subjects going trough
> the same conditions, but I run into two sets of issues:
>
>   1. the first issue is due to the fact that the dataset is discrete and
> small (there may be less than 5 counts in a bin, and that's a problem for a
> Chi-Square Goodness of Fit test), also there may be ties. After some online
> digging I found two options:
> - a permutation test
> - a Cramer-von Mises test of goodness-of-fit (see this paper
> <https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf>
>   https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf)
>
> I thought the Cramer-von Mises test of goodness-of-fit test could work, so
> I ran it with made-up data for *one human subject* and I get the following
> result:
>
>      x0 <- c(3,3,5,5,2) # computer simulation
>      x1 <- c(4,2,5,4,3) # subject 1
>
>      library(goftest)
>
>      cvm.test(x0, ecdf(x1))
>
>      >Cramer-von Mises test of goodness-of-fit
>> Null hypothesis: distribution ‘ecdf(x1)’
>      >data:  x0
>      >omega2 = 0.14667, p-value = 0.4106
>
> So far so good. But now let’s say I would like to have more than one human
> subject, let’s say four of them. These are the results from the additional
> subjects:
>
>      x2 <- c(3,3,5,2,5) # subject 2
>      x3 <- c(2,2,5,6,3) # subject 3
>      x4 <- c(3,2,5,6,2) # subject 4
>
> Now I run in the second set of issues:
>
> 2. on the one side I have a single computer simulation, on the other side I
> have data from four subjects. Should I take the mean of the results for the
> human subjects? Then would my data still be “discrete”? Or should I run my
> simulation four times? But I would get always the same results, so the
> variance between the two datasets would be different.
>
> Any ideas? Maybe I should change the design and have more levels for my
> factors, so that I have more trials and the bins get bigger?
>
> [[alternative HTML version deleted]]


Statistics questions, especially those from people who have failed to
heed the advice of the Posting Guide to post in plain text, are
off-topic on rhelp and should be posted to a forum where statistics
questions are welcomed. (My suspicion is that this question will be
greeted with further requests for clarification of goals, since asking
what you "should" do requires an careful explanation of what your
standards of evidence are and what you are attempting to demonstrate.


--

David.

>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.