Integral of PDF

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Integral of PDF

Doran, Harold
The integral of any probability density from -Inf to Inf should equal 1, correct? I don't understand last result below.

> integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
1 with absolute error < 9.4e-05

> integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
1 with absolute error < 0.00012

> integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
8.410947e-11 with absolute error < 1.6e-10

> all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0)
[1] TRUE

> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35   Matrix_0.999375-33 lattice_0.17-26

loaded via a namespace (and not attached):
[1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Hans W Borchers
You can dive into the thread "puzzle with integrate over infinite range"
from September this year. The short answer appears to be: Increase the
error tolerance.

    integrate(function(x) dnorm(x, 500,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)
    # 1 with absolute error < 1.1e-12

Hans Werner


Doran, Harold wrote
The integral of any probability density from -Inf to Inf should equal 1, correct? I don't understand last result below.

> integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
1 with absolute error < 9.4e-05

> integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
1 with absolute error < 0.00012

> integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
8.410947e-11 with absolute error < 1.6e-10

> all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0)
[1] TRUE

> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35   Matrix_0.999375-33 lattice_0.17-26

loaded via a namespace (and not attached):
[1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

jones-2
In reply to this post by Doran, Harold
To really understaand it you will have to look at the fortran code
underlying integrate.  I tracked it back through a couple of layers
(dqagi, dqagie, ...  just use google, these are old netlib
subroutines) then decided I ought to get back to grading papers :-)

It looks like the integral is split into the sum of two integrals,
(-Inf,0] and [0,Inf).  


> integrate(function(x) dnorm(x, 350,50), 0, Inf)
1 with absolute error < 1.5e-05
> integrate(function(x) dnorm(x, 400,50), 0, Inf)
1.068444e-05 with absolute error < 2.1e-05
> integrate(function(x) dnorm(x, 500,50), 0, Inf)
8.410947e-11 with absolute error < 1.6e-10
> integrate(function(x) dnorm(x, 500,50), 0, 1000)
1 with absolute error < 7.4e-05

The integral from 0 to Inf is the lim_{x -> Inf} of the integral
over [0,x].  I suspect that the algorithm is picking an interval
[0,x], evaluating the integral over that interval, and then performing
some test to decide whether to expand the interval.  When the initial
interval doesn't contain much, expanding a little won't add enough to
trigger another iteration.  

albyn

On Thu, Dec 02, 2010 at 04:21:34PM -0500, Doran, Harold wrote:

> The integral of any probability density from -Inf to Inf should equal 1, correct? I don't understand last result below.
>
> > integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
> 1 with absolute error < 9.4e-05
>
> > integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
> 1 with absolute error < 0.00012
>
> > integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
> 8.410947e-11 with absolute error < 1.6e-10
>
> > all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0)
> [1] TRUE
>
> > sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35   Matrix_0.999375-33 lattice_0.17-26
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Albyn Jones
Reed College
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

William Dunlap
In reply to this post by Doran, Harold
You can use trace() to see what is happening

> trace(dnorm, quote(print(round(x))))
> integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
Tracing dnorm(x, 500, 50) on entry
 [1]   1 233   0  38   0  14   0   7   0   4   0   2   0   2   1
Tracing dnorm(x, 500, 50) on entry
 [1]   -1 -233    0  -38    0  -14    0   -7    0   -4    0   -2    0
-2   -1
Tracing dnorm(x, 500, 50) on entry
 [1]   3 467   1  78   1  29   1  14   1   9   2   6   2   4   2
Tracing dnorm(x, 500, 50) on entry
 [1]   -3 -467   -1  -78   -1  -29   -1  -14   -1   -9   -2   -6   -2
-4   -2
Tracing dnorm(x, 500, 50) on entry
 [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0
Tracing dnorm(x, 500, 50) on entry
 [1]  0 -1  0 -1  0 -1  0 -1  0 -1  0 -1  0  0  0
Tracing dnorm(x, 500, 50) on entry
 [1]   7 935   3 156   3  58   3  30   4  18   4  12   5   9   6
Tracing dnorm(x, 500, 50) on entry
 [1]   -7 -935   -3 -156   -3  -58   -3  -30   -4  -18   -4  -12   -5
-9   -6
Tracing dnorm(x, 500, 50) on entry
 [1] 2 3 1 3 1 3 1 3 1 2 1 2 1 2 1
Tracing dnorm(x, 500, 50) on entry
 [1] -2 -3 -1 -3 -1 -3 -1 -3 -1 -2 -1 -2 -1 -2 -1
8.410947e-11 with absolute error < 1.6e-10

You are asking integrate to find the relatively tiny portion of
the the floating point real line (from c. -10^300 to 10^300)
on which dnorm(x) is bigger than c. 10^-300.  It found a few
points where it was that big, but apparently not enough to home
on the answer.  You need to give it better hints abouts dnorm's
support region.  E.g.,

> integrate(function(x) dnorm(x, 500,50), -100, 900)
Tracing dnorm(x, 500, 50) on entry
 [1] 400 -87 887 -33 833  60 740 183 617 326 474 -98 898 -65 865  10 790
119 681
[20] 253 547
Tracing dnorm(x, 500, 50) on entry
 [1] 150 -93 393 -66 366 -20 320  42 258 113 187 -99 399 -83 383 -45 345
9 291
[20]  76 224
Tracing dnorm(x, 500, 50) on entry
 [1] 650 407 893 434 866 480 820 542 758 613 687 401 899 417 883 455 845
509 791
[20] 576 724
Tracing dnorm(x, 500, 50) on entry
 [1] 525 403 647 417 633 440 610 471 579 506 544 401 649 409 641 427 623
455 595
[20] 488 562
Tracing dnorm(x, 500, 50) on entry
 [1] 775 653 897 667 883 690 860 721 829 756 794 651 899 659 891 677 873
705 845
[20] 738 812
1 with absolute error < 4.0e-07

Making the limits +-10^4 works ok also, but +-Inf is apparently
asking for too much.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Doran, Harold
> Sent: Thursday, December 02, 2010 1:22 PM
> To: [hidden email]
> Subject: [R] Integral of PDF
>
> The integral of any probability density from -Inf to Inf
> should equal 1, correct? I don't understand last result below.
>
> > integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
> 1 with absolute error < 9.4e-05
>
> > integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
> 1 with absolute error < 0.00012
>
> > integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
> 8.410947e-11 with absolute error < 1.6e-10
>
> > all.equal(integrate(function(x) dnorm(x, 500,50), -Inf,
> Inf)$value, 0)
> [1] TRUE
>
> > sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  
> LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35  
> Matrix_0.999375-33 lattice_0.17-26
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Ravi Varadhan
In reply to this post by Hans W Borchers
But that only postpones the misery, Hans Werner!  You can always make the
mean large enough to get a wrong answer from `integrate'.  

integrate(function(x) dnorm(x, 700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

integrate(function(x) dnorm(x, -700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

The problem is that the quadrature algorithm is not smart enough to
recognize that the center of mass is at `mean'.  The QUADPACK algorithm
(Gauss-Kronrod quadratutre) is adaptive, but it does not look to identify
regions of high mass.  Algorithms which can locate regions of highest
density, such as those developed by Alan Genz, will do much better in
problems like this.

Genz and Kass (1997).  J Computational Graphics and Statistics.

There is a FORTRAN routine DCHURE that you might want to try for infinite
regions.  I don't think this has been ported to R (although I may be wrong).
There may be other more recent ones.

A simple solution is to locate the mode of the integrand, which should be
quite easy to do, and then do a coordinate shift to that point and then
integrate the mean-shifted integrand using `integrate'.

Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: [hidden email]


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Hans W Borchers
Sent: Thursday, December 02, 2010 5:16 PM
To: [hidden email]
Subject: Re: [R] Integral of PDF


You can dive into the thread "puzzle with integrate over infinite range"
from September this year. The short answer appears to be: Increase the
error tolerance.

    integrate(function(x) dnorm(x, 500,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)
    # 1 with absolute error < 1.1e-12

Hans Werner



Doran, Harold wrote:

>
> The integral of any probability density from -Inf to Inf should equal 1,
> correct? I don't understand last result below.
>
>> integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
> 1 with absolute error < 9.4e-05
>
>> integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
> 1 with absolute error < 0.00012
>
>> integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
> 8.410947e-11 with absolute error < 1.6e-10
>
>> all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0)
> [1] TRUE
>
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35  
> Matrix_0.999375-33 lattice_0.17-26
>
> loaded via a namespace (and not attached):
> [1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context:
http://r.789695.n4.nabble.com/Integral-of-PDF-tp3070243p3070315.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Ted Harding
In reply to this post by jones-2
Albyn's reply is in line with an hypothesis I was beginning to
formulate (without looking at the underlying FoRTRAN code),
prompted by the hint in '?integrate':

  Details:
     If one or both limits are infinite, the infinite range
     is mapped onto a finite interval.
     For a finite interval, globally adaptive interval
     subdivision is used in connection with extrapolation
     by the Epsilon algorithm.

as a result of some numerical experiments. Following up on
Harold Doran's original dnorm(x,mean.mean/10) pattern (and
quoting a small subset of the experiments ... ):

  integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
  # 1 with absolute error < 0.00012
  integrate(function(x) dnorm(x, 200,20), -Inf, Inf)
  # 1.429508e-08 with absolute error < 2.8e-08

  integrate(function(x) dnorm(x, 140,14), -Inf, Inf)
  # 1 with absolute error < 2.2e-06
  integrate(function(x) dnorm(x, 150,15), -Inf, Inf)
  # 2.261582e-05 with absolute error < 4.4e-05

  integrate(function(x) dnorm(x, 144,14.4), -Inf, Inf)
  # 1 with absolute error < 1.7e-06
  integrate(function(x) dnorm(x, 145,14.5), -Inf, Inf)
  # 5.447138e-05 with absolute error < 0.00011

  integrate(function(x) dnorm(x, 150,15), -33000, 33300)
  # 1 with absolute error < 0.00012
  integrate(function(x) dnorm(x, 150,15), -34000, 34300)
  # 5.239671e-05 with absolute error < 0.00010

  integrate(function(x) dnorm(x, 150,15),
            -33768.260234277, 34068.2602334277)
  # 0.5000307 with absolute error < 6.1e-05
  integrate(function(x) dnorm(x, 150,15),
            -33768.260234278, 34068.2602334278)
  # 6.139415e-05 with absolute error < 0.00012

So it seems that, depending on how integrate() "maps to a finite
interval", and on how the algorithm goes about its "adaptive
interval subdivision", there are critical points where integrate()
switches from one to another of the following:

[A] The whole of a finite interval containing all but the extreme
      outer tails of dnorm() is integrated over;
[B] The whole of a finite interval containing one half of the
      distribution of dnorm() is integrated over;
[C] The whole of a finite interval lying in the extreme of one
      tail of the dnorm distribution is integrated over.

with result: [A] close to 1; [B] close to 0.5; [C] close to 0.

This is compatible with Albyn's conclusion that the integral is
split into the sum of (-Inf,0) and (0,Inf), with the algorithm
ignoring one, or the other, or both, or neither!

This must be one of the nastiest exemplar's of "rounding error" ever!!!
Ted.

On 02-Dec-10 22:39:37, Albyn Jones wrote:

> To really understaand it you will have to look at the fortran code
> underlying integrate.  I tracked it back through a couple of layers
> (dqagi, dqagie, ...  just use google, these are old netlib
> subroutines) then decided I ought to get back to grading papers :-)
>
> It looks like the integral is split into the sum of two integrals,
> (-Inf,0] and [0,Inf).  
>
>
>> integrate(function(x) dnorm(x, 350,50), 0, Inf)
> 1 with absolute error < 1.5e-05
>> integrate(function(x) dnorm(x, 400,50), 0, Inf)
> 1.068444e-05 with absolute error < 2.1e-05
>> integrate(function(x) dnorm(x, 500,50), 0, Inf)
> 8.410947e-11 with absolute error < 1.6e-10
>> integrate(function(x) dnorm(x, 500,50), 0, 1000)
> 1 with absolute error < 7.4e-05
>
> The integral from 0 to Inf is the lim_{x -> Inf} of the integral
> over [0,x].  I suspect that the algorithm is picking an interval
> [0,x], evaluating the integral over that interval, and then performing
> some test to decide whether to expand the interval.  When the initial
> interval doesn't contain much, expanding a little won't add enough to
> trigger another iteration.  
>
> albyn
>
> On Thu, Dec 02, 2010 at 04:21:34PM -0500, Doran, Harold wrote:
>> The integral of any probability density from -Inf to Inf should equal
>> 1, correct? I don't understand last result below.
>>
>> > integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
>> 1 with absolute error < 9.4e-05
>>
>> > integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
>> 1 with absolute error < 0.00012
>>
>> > integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
>> 8.410947e-11 with absolute error < 1.6e-10
>>
>> > all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value,
>> > 0)
>> [1] TRUE
>>
>> > sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252
>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35  
>> Matrix_0.999375-33 lattice_0.17-26
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Albyn Jones
> Reed College
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Dec-10                                       Time: 23:26:44
------------------------------ XFMail ------------------------------

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

jones-2
In reply to this post by Ravi Varadhan
On Thu, Dec 02, 2010 at 06:23:45PM -0500, Ravi Varadhan wrote:

> A simple solution is to locate the mode of the integrand, which should be
> quite easy to do, and then do a coordinate shift to that point and then
> integrate the mean-shifted integrand using `integrate'.
>
> Ravi.

Translation:  think before you compute!

albyn

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Hans W Borchers
In reply to this post by Ravi Varadhan
That does not remedy the situation in any case, take the following function

        fun <- function(x) dnorm(x, -500, 50) + dnorm(x, 500, 50)

that has a 'mode' of 0 again. Interestingly, if I transform it by 1/x, to
integrate I again have to reduce the error tolerance to at least 1e-10:

        f <- function(x) fun(1/x)/x^2
        integrate(f, 0, 1)             #-> 0, should be 1

while cubature::adaptIntegrate does not require to change the tolerance:

        require(cubature)
        adaptIntegrate(f, 0, 1)        #-> 1

What I find *dangerous* about 'integrate' is not that it returns a wrong
result, but that it --- w/o warning --- gives an error estimate that leads
completely astray.

Hans Werner

<quote author="Ravi Varadhan">
But that only postpones the misery, Hans Werner!  You can always make the
mean large enough to get a wrong answer from `integrate'.  

integrate(function(x) dnorm(x, 700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

integrate(function(x) dnorm(x, -700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

The problem is that the quadrature algorithm is not smart enough to
recognize that the center of mass is at `mean'.  The QUADPACK algorithm
(Gauss-Kronrod quadratutre) is adaptive, but it does not look to identify
regions of high mass.  Algorithms which can locate regions of highest
density, such as those developed by Alan Genz, will do much better in
problems like this.

Genz and Kass (1997).  J Computational Graphics and Statistics.

There is a FORTRAN routine DCHURE that you might want to try for infinite
regions.  I don't think this has been ported to R (although I may be wrong).
There may be other more recent ones.

A simple solution is to locate the mode of the integrand, which should be
quite easy to do, and then do a coordinate shift to that point and then
integrate the mean-shifted integrand using `integrate'.

Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvaradhan@jhmi.edu
That does not remedy the situation in any case, take the following function

        fun <- function(x) dnorm(x, -500, 50) + dnorm(x, 500, 50)

that has a 'mode' of 0 again. Interestingly, if I transform it by 1/x, to
integrate I again have to reduce the error tolerance to at least 1e-10:

        f <- function(x) fun(1/x)/x^2
        integrate(f, 0, 1)             #-> 0, should be 1

while cubature::adaptIntegrate does not require to change the tolerance:

        require(cubature)
        adaptIntegrate(f, 0, 1)        #-> 1

What I find *dangerous* about 'integrate' is not that it returns a wrong
result, but that it --- w/o warning --- gives an error estimate that leads
completely astray.

Hans Werner


Ravi Varadhan wrote
But that only postpones the misery, Hans Werner!  You can always make the
mean large enough to get a wrong answer from `integrate'.  

integrate(function(x) dnorm(x, 700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

integrate(function(x) dnorm(x, -700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

The problem is that the quadrature algorithm is not smart enough to
recognize that the center of mass is at `mean'.  The QUADPACK algorithm
(Gauss-Kronrod quadratutre) is adaptive, but it does not look to identify
regions of high mass.  Algorithms which can locate regions of highest
density, such as those developed by Alan Genz, will do much better in
problems like this.

Genz and Kass (1997).  J Computational Graphics and Statistics.

There is a FORTRAN routine DCHURE that you might want to try for infinite
regions.  I don't think this has been ported to R (although I may be wrong).
There may be other more recent ones.

A simple solution is to locate the mode of the integrand, which should be
quite easy to do, and then do a coordinate shift to that point and then
integrate the mean-shifted integrand using `integrate'.

Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvaradhan@jhmi.edu
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Samuel Le
In reply to this post by Doran, Harold
Hi,

You need to be careful with -Inf and Inf in R, I suppose they are some large numbers arbitrarily chosen by R, but not infinite quantities. So the function integrate can return errors if the function to integrate doesn't have negligible values beyond those large numbers.
I ran the following to convince myself:

> integrate(function(x){return(ifelse(x<500,0,1/(x^2)))},-Inf,Inf)
0 with absolute error < 0
> integrate(function(x){return(ifelse(x<500,0,1/(x^2)))},500,Inf)
0.001999999 with absolute error < 0.000011

The second result shouldn't be greater that the first one but...

And to solve your problem:
> integrate(function(x) dnorm(x, 500,50), 500-1000, 500+1000)
1 with absolute error < 0.000074

I centred the interval of integration on 500 instead of on 0 if you integrate between -Inf and Inf.

HTH,

Samuel

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Doran, Harold
Sent: 02 December 2010 21:22
To: [hidden email]
Subject: [R] Integral of PDF

The integral of any probability density from -Inf to Inf should equal 1, correct? I don't understand last result below.

> integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
1 with absolute error < 9.4e-05

> integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
1 with absolute error < 0.00012

> integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
8.410947e-11 with absolute error < 1.6e-10

> all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0)
[1] TRUE

> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] statmod_1.4.6      mlmRev_0.99875-1   lme4_0.999375-35   Matrix_0.999375-33 lattice_0.17-26

loaded via a namespace (and not attached):
[1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__________ Information from ESET NOD32 Antivirus, version of virus signature database 5668 (20101202) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



__________ Information from ESET NOD32 Antivirus, version of virus signature database 5669 (20101203) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Ravi Varadhan
In reply to this post by Hans W Borchers
Your function does NOT have a mode at zero.  It is bimodal with 2 modes:
-500 and 500.  So, my approach still works.  Zero is a stationary point
where the gradient is zero, but it is not a mode (the second derivative at
zero is not negative but it is zero).

Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: [hidden email]


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Hans W Borchers
Sent: Friday, December 03, 2010 3:30 AM
To: [hidden email]
Subject: Re: [R] Integral of PDF


That does not remedy the situation in any case, take the following function

        fun <- function(x) dnorm(x, -500, 50) + dnorm(x, 500, 50)

that has a 'mode' of 0 again. Interestingly, if I transform it by 1/x, to
integrate I again have to reduce the error tolerance to at least 1e-10:

        f <- function(x) fun(1/x)/x^2
        integrate(f, 0, 1)             #-> 0, should be 1

while cubature::adaptIntegrate does not require to change the tolerance:

        require(cubature)
        adaptIntegrate(f, 0, 1)        #-> 1

What I find *dangerous* about 'integrate' is not that it returns a wrong
result, but that it --- w/o warning --- gives an error estimate that leads
completely astray.

Hans Werner


But that only postpones the misery, Hans Werner!  You can always make the
mean large enough to get a wrong answer from `integrate'.  

integrate(function(x) dnorm(x, 700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

integrate(function(x) dnorm(x, -700,50), -Inf, Inf,
              subdivisions=500, rel.tol=1e-11)

The problem is that the quadrature algorithm is not smart enough to
recognize that the center of mass is at `mean'.  The QUADPACK algorithm
(Gauss-Kronrod quadratutre) is adaptive, but it does not look to identify
regions of high mass.  Algorithms which can locate regions of highest
density, such as those developed by Alan Genz, will do much better in
problems like this.

Genz and Kass (1997).  J Computational Graphics and Statistics.

There is a FORTRAN routine DCHURE that you might want to try for infinite
regions.  I don't think this has been ported to R (although I may be wrong).
There may be other more recent ones.

A simple solution is to locate the mode of the integrand, which should be
quite easy to do, and then do a coordinate shift to that point and then
integrate the mean-shifted integrand using `integrate'.

Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: [hidden email]
That does not remedy the situation in any case, take the following function

        fun <- function(x) dnorm(x, -500, 50) + dnorm(x, 500, 50)

that has a 'mode' of 0 again. Interestingly, if I transform it by 1/x, to
integrate I again have to reduce the error tolerance to at least 1e-10:

        f <- function(x) fun(1/x)/x^2
        integrate(f, 0, 1)             #-> 0, should be 1

while cubature::adaptIntegrate does not require to change the tolerance:

        require(cubature)
        adaptIntegrate(f, 0, 1)        #-> 1

What I find *dangerous* about 'integrate' is not that it returns a wrong
result, but that it --- w/o warning --- gives an error estimate that leads
completely astray.

Hans Werner



Ravi Varadhan wrote:

>
> But that only postpones the misery, Hans Werner!  You can always make the
> mean large enough to get a wrong answer from `integrate'.  
>
> integrate(function(x) dnorm(x, 700,50), -Inf, Inf,
>               subdivisions=500, rel.tol=1e-11)
>
> integrate(function(x) dnorm(x, -700,50), -Inf, Inf,
>               subdivisions=500, rel.tol=1e-11)
>
> The problem is that the quadrature algorithm is not smart enough to
> recognize that the center of mass is at `mean'.  The QUADPACK algorithm
> (Gauss-Kronrod quadratutre) is adaptive, but it does not look to identify
> regions of high mass.  Algorithms which can locate regions of highest
> density, such as those developed by Alan Genz, will do much better in
> problems like this.
>
> Genz and Kass (1997).  J Computational Graphics and Statistics.
>
> There is a FORTRAN routine DCHURE that you might want to try for infinite
> regions.  I don't think this has been ported to R (although I may be
> wrong).
> There may be other more recent ones.
>
> A simple solution is to locate the mode of the integrand, which should be
> quite easy to do, and then do a coordinate shift to that point and then
> integrate the mean-shifted integrand using `integrate'.
>
> Ravi.
>
> -------------------------------------------------------
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: [hidden email]
>
--
View this message in context:
http://r.789695.n4.nabble.com/Integral-of-PDF-tp3070243p3070768.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Integral of PDF

Ravi Varadhan
In reply to this post by jones-2
Yes, Albyn.  I do not think that this is a dangerous behavior of the tool
(`integrate').  It is certainly a "dangerous use" of the tool.  One will be
hard-pressed to find a numerical algorithm/software that is fool-proof in
the sense that it always gives you either the correct results or warns you
that the results could be wrong for your problem.  We can always devise a
problem to defeat even the most robust algorithm.  So, I would argue that
the ultimate responsibility lies with the user, via careful thinking/prior
analysis/ and multiple types of checks, to ensure that the results are
reasonably accurate.  

This, of course, does not mean that `integrate' gets away scot free.  I am
sure it could be improved, but I am also quite certain that there are better
quadrature algorithms.  So, one needs to choose appropriate algorithms to
suit their problem needs.  Perhaps, the simplest thing to do for `integrate'
might be to print a generic warning like:  "The error estimate is not always
reliable".  Of course, this does not help the user since they would not know
when it is NOT reliable. The next thing to try might be to examine how it
picks the initial interval [0, x], and see if this procedure can be
improved.  

Best,
Ravi.

-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: [hidden email]


-----Original Message-----
From: Albyn Jones [mailto:[hidden email]]
Sent: Thursday, December 02, 2010 6:41 PM
To: Ravi Varadhan
Cc: 'Hans W Borchers'; [hidden email]
Subject: Re: [R] Integral of PDF

On Thu, Dec 02, 2010 at 06:23:45PM -0500, Ravi Varadhan wrote:

> A simple solution is to locate the mode of the integrand, which should be
> quite easy to do, and then do a coordinate shift to that point and then
> integrate the mean-shifted integrand using `integrate'.
>
> Ravi.

Translation:  think before you compute!

albyn

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.