I have a computer simulation in which a virtual agent end up in different

areas of a layout based on several factors. There are 18 conditions in

total.

If I collapse the datapoint into bins, where each bin is one of the areas,

the data would look like this:

x0 <- c(3,3,5,5,2) # computer simulation

Now I would like to validate this model having human subjects going trough

the same conditions, but I run into two sets of issues:

1. the first issue is due to the fact that the dataset is discrete and

small (there may be less than 5 counts in a bin, and that's a problem for a

Chi-Square Goodness of Fit test), also there may be ties. After some online

digging I found two options:

- a permutation test

- a Cramer-von Mises test of goodness-of-fit (see this paper

<

https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf>

https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf)

I thought the Cramer-von Mises test of goodness-of-fit test could work, so

I ran it with made-up data for *one human subject* and I get the following

result:

x0 <- c(3,3,5,5,2) # computer simulation

x1 <- c(4,2,5,4,3) # subject 1

library(goftest)

cvm.test(x0, ecdf(x1))

>Cramer-von Mises test of goodness-of-fit

>Null hypothesis: distribution ‘ecdf(x1)’

>data: x0

>omega2 = 0.14667, p-value = 0.4106

So far so good. But now let’s say I would like to have more than one human

subject, let’s say four of them. These are the results from the additional

subjects:

x2 <- c(3,3,5,2,5) # subject 2

x3 <- c(2,2,5,6,3) # subject 3

x4 <- c(3,2,5,6,2) # subject 4

Now I run in the second set of issues:

2. on the one side I have a single computer simulation, on the other side I

have data from four subjects. Should I take the mean of the results for the

human subjects? Then would my data still be “discrete”? Or should I run my

simulation four times? But I would get always the same results, so the

variance between the two datasets would be different.

Any ideas? Maybe I should change the design and have more levels for my

factors, so that I have more trials and the bins get bigger?

[[alternative HTML version deleted]]

______________________________________________

[hidden email] mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.