Hello,

I am trying to fit gamma, negative exponential and inverse power functions

to a dataset, and then test whether the fit of each curve is good. To do

this I have been advised to calculate predicted values for bins of data (I

have grouped a continuous range of distances into 1km bins), and then apply

a chi-squared test. Example:

> data <- data.frame(distance=c(1,2,3,4,5,6,7), observed=c(43,13,10,6,2,1),

predicted=c(28, 18, 10, 5 ,3, 1, 1))

> chisq.test(data$observed, data$predicted)

Which gives:

Pearson's Chi-squared test

data: data$observed and data$predicted

X-squared = 35, df = 25, p-value = 0.0882

Warning message:

In chisq.test(data$observed, data$predicted) :

Chi-squared approximation may be incorrect

I understand this is due to having observed/predicted values of less than

five, however I am interested to know firstly why R uses such a large

number of degrees of freedom (when by my understanding there should only be

4 df), and secondly whether using the following manual calculation is

therefore inappropriate -

> X2 <- sum(((data$observed - data$predicted)^2)/data$predicted)

> 1-pchisq(X2,4)

[1] 0.04114223

If chi-squared is unsuitable, what other test can I use to determine

whether my observed and predicted data come from the same distribution? The

frequently recommended fisher's test doesn't seem to be any more

appropriate as it requires values of greater than 5 for contingency tables

larger than 2 x 2.

Thanks for your help.

Louise

[[alternative HTML version deleted]]

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.