Dear all,

This is the first time I am posting to the r-devel list. On

StackOverflow, they suggested that the strange behaviour of integrate()

was more bug-like. I am providing a short version of the question (full

one with plots:

https://stackoverflow.com/q/55639401).

Suppose one wants integrate a function that is just a product of two

density functions (like gamma). The support of the random variable is

(-Inf, 0]. The scale parameter of the distribution is quite small

(around 0.01), so often, the standard integration routine would fail to

integrate a function that is non-zero on a very small section of the

negative line (like [-0.02, -0.01], where it takes huge values, and

almost 0 everywhere else). R’s integrate would often return the machine

epsilon as a result. So I stretch the function around the zero by an

inverse of the scale parameter, compute the integral, and then divide it

by the scale. Sometimes, this re-scaling also failed, so I did both if

the first result was very small.

Today when integration of the rescaled function suddenly yielded a value

of 1.5 instead of 3.5 (not even zero). The MWE is below.

cons <- -0.020374721416129591

sc <- 0.00271245601724757383

sh <- 5.704

f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,

scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab

curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")

curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")

sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # Checking by summation: 3.575294

sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294

str(integrate(f, -Inf, 0)) # Gives 3.575294

# $ value : num 3.58

# $ abs.error : num 1.71e-06

# $ subdivisions: int 10

str(integrate(f, -Inf, 0, numstab = sc))

# $ value : num 1.5 # What?!

# $ abs.error : num 0.000145 # What?!

# $ subdivisions: int 2

It stop at just two subdivisions! The problem is, I cannot try various

stabilising multipliers for the function because I have to compute this

integral thousands of times for thousands of parameter values on

thousands of sample windows for hundreds on models, so even in the

super-computer cluster, this takes weeks. Besides that, reducing the

rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether

this guarantees success (and reducing it to 1e-7 slowed down the

computations in some cases). And I have looked at the Fortran code of

the quadrature just to see the integration rule, and was wondering.

How can I make sure that the integration routine will not produce such

wrong results for such a function, and the integration will still be fast?

Yours sincerely,

Andreï V. Kostyrka

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel