[r-devel] integrate over an infinite region produces wrong results depending on scaling

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[r-devel] integrate over an infinite region produces wrong results depending on scaling

Andreï V. Kostyrka-2
Dear all,

This is the first time I am posting to the r-devel list. On
StackOverflow, they suggested that the strange behaviour of integrate()
was more bug-like. I am providing a short version of the question (full
one with plots: https://stackoverflow.com/q/55639401).

Suppose one wants integrate a function that is just a product of two
density functions (like gamma). The support of the random variable is
(-Inf, 0]. The scale parameter of the distribution is quite small
(around 0.01), so often, the standard integration routine would fail to
integrate a function that is non-zero on a very small section of the
negative line (like [-0.02, -0.01], where it takes huge values, and
almost 0 everywhere else). R’s integrate would often return the machine
epsilon as a result. So I stretch the function around the zero by an
inverse of the scale parameter, compute the integral, and then divide it
by the scale. Sometimes, this re-scaling also failed, so I did both if
the first result was very small.

Today when integration of the rescaled function suddenly yielded a value
of 1.5 instead of 3.5 (not even zero). The MWE is below.

cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,
scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab

curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")

sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value       : num 3.58
# $ abs.error   : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value       : num 1.5 # What?!
# $ abs.error   : num 0.000145 # What?!
# $ subdivisions: int 2

It stop at just two subdivisions! The problem is, I cannot try various
stabilising multipliers for the function because I have to compute this
integral thousands of times for thousands of parameter values on
thousands of sample windows for hundreds on models, so even in the
super-computer cluster, this takes weeks. Besides that, reducing the
rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether
this guarantees success (and reducing it to 1e-7 slowed down the
computations in some cases). And I have looked at the Fortran code of
the quadrature just to see the integration rule, and was wondering.

How can I make sure that the integration routine will not produce such
wrong results for such a function, and the integration will still be fast?

Yours sincerely,
Andreï V. Kostyrka

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: [r-devel] integrate over an infinite region produces wrong results depending on scaling

R devel mailing list
integrate(f, xmin, xmax) will have problems when f(x) is 0 over large parts
of (xmin,xmax).  It doesn't have any clues to where the non-zero regions
are.  It computes f(x) at 21 points at each step and if all of those are
zero (or some other constant?) for a few steps, it calls it a day.  If you
can narrow down the integration interval to the interesting part of the
function's domain you will get better results.

By the way, here is a way to see where integrate(f) evaluates f()  (the
keep.xy=TRUE argument doesn't seem to do anything).

> debugIntegrate <- function(f)
{
    n_calls <- 0
    x_args <- list()
    other_args <- list()
    value <- list()
    function(x, ...) {
        n_calls <<- n_calls + 1
        x_args[[n_calls]] <<- x
        other_args[[n_calls]] <<- list(...)
        v <- f(x, ...)
        value[[n_calls]] <<- v
        v
    }
}

> str(integrate(DF <- debugIntegrate(f), -Inf, 0, numstab = sc))
List of 5
 $ value       : num 1.5
 $ abs.error   : num 0.000145
 $ subdivisions: int 2
 $ message     : chr "OK"
 $ call        : language integrate(f = DF <- debugIntegrate(f), lower =
-Inf, upper = 0, numstab = sc)
 - attr(*, "class")= chr "integrate"
> curve(f(x, sc), min(unlist(environment(DF)$x_args)), 0, n = 501, main =
"Scaled f", bty = "n")
> with(environment(DF), points(unlist(x_args), unlist(value)))

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Apr 14, 2019 at 5:13 AM Andreï V. Kostyrka <[hidden email]>
wrote:

> Dear all,
>
> This is the first time I am posting to the r-devel list. On
> StackOverflow, they suggested that the strange behaviour of integrate()
> was more bug-like. I am providing a short version of the question (full
> one with plots: https://stackoverflow.com/q/55639401).
>
> Suppose one wants integrate a function that is just a product of two
> density functions (like gamma). The support of the random variable is
> (-Inf, 0]. The scale parameter of the distribution is quite small
> (around 0.01), so often, the standard integration routine would fail to
> integrate a function that is non-zero on a very small section of the
> negative line (like [-0.02, -0.01], where it takes huge values, and
> almost 0 everywhere else). R’s integrate would often return the machine
> epsilon as a result. So I stretch the function around the zero by an
> inverse of the scale parameter, compute the integral, and then divide it
> by the scale. Sometimes, this re-scaling also failed, so I did both if
> the first result was very small.
>
> Today when integration of the rescaled function suddenly yielded a value
> of 1.5 instead of 3.5 (not even zero). The MWE is below.
>
> cons <- -0.020374721416129591
> sc <- 0.00271245601724757383
> sh <- 5.704
> f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh,
> scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
>
> curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
> curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
>
> sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
> sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
> str(integrate(f, -Inf, 0)) # Gives 3.575294
> # $ value       : num 3.58
> # $ abs.error   : num 1.71e-06
> # $ subdivisions: int 10
> str(integrate(f, -Inf, 0, numstab = sc))
> # $ value       : num 1.5 # What?!
> # $ abs.error   : num 0.000145 # What?!
> # $ subdivisions: int 2
>
> It stop at just two subdivisions! The problem is, I cannot try various
> stabilising multipliers for the function because I have to compute this
> integral thousands of times for thousands of parameter values on
> thousands of sample windows for hundreds on models, so even in the
> super-computer cluster, this takes weeks. Besides that, reducing the
> rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether
> this guarantees success (and reducing it to 1e-7 slowed down the
> computations in some cases). And I have looked at the Fortran code of
> the quadrature just to see the integration rule, and was wondering.
>
> How can I make sure that the integration routine will not produce such
> wrong results for such a function, and the integration will still be fast?
>
> Yours sincerely,
> Andreï V. Kostyrka
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel