# Numerical stability in chisq.test

4 messages
Open this post in threaded view
|

## Numerical stability in chisq.test

 The chisq.test on line 57 contains following code:         STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE)) However, based on book "Accuracy and stability of numerical algorithms" available from:         http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdfTable 4.1 on page 89, it is better to sort the data in increasing order than in decreasing order, when the data are non-negative. An example:         x = matrix(c(rep(1.1, 10000)), 10^16, nrow = 10001, ncol = 1)    # We have a vector with 10000*1.1 and 1*10^16         c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE))) The result:         10000000000010996 10000000000011000 When we sort the data in the increasing order, we get the correct result. If we sort the data in the decreasing order, we get a result that is off by 4. Shouldn't the sort be in the increasing order rather than in the decreasing order? Best regards, Jan Motl PS: This post is based on discussion on https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation and the response from the post to [hidden email]. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Numerical stability in chisq.test

 >>>>> Jan Motl writes: > The chisq.test on line 57 contains following code: > STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE)) The preceding 2 lines seem relevant:             ## Sorting before summing may look strange, but seems to be             ## a sensible way to deal with rounding issues (PR#3486):             STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE)) -k > However, based on book "Accuracy and stability of numerical algorithms" available from: > http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf> Table 4.1 on page 89, it is better to sort the data in increasing order than in decreasing order, when the data are non-negative. > An example: > x = matrix(c(rep(1.1, 10000)), 10^16, nrow = 10001, ncol = 1)    # We have a vector with 10000*1.1 and 1*10^16 > c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE))) > The result: > 10000000000010996 10000000000011000 > When we sort the data in the increasing order, we get the correct result. If we sort the data in the decreasing order, we get a result that is off by 4. > Shouldn't the sort be in the increasing order rather than in the decreasing order? > Best regards, > Jan Motl > PS: This post is based on discussion on https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation and the response from the post to [hidden email]. > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel