

Dear list,
I'm calculating the integral of a Gaussian function from 0 to
infinity. I understand from ?integrate that it's usually better to
specify Inf explicitly as a limit rather than an arbitrary large
number, as in this case integrate() performs a trick to do the
integration better.
However, I do not understand the following, if I shift the Gauss
function by some amount the integral should not be affected,
shiftedGauss < function(x0=500){
integrate(function(x) exp((xx0)^2/100^2), 0, Inf)$value
}
shift < seq(500, 800, by=10)
plot(shift, sapply(shift, shiftedGauss))
Suddenly, just after 700, the value of the integral drops to nearly 0
when it should be constant all the way. Any clue as to what's going on
here? I guess it's suddenly missing the important part of the range
where the integrand is nonzero, but how could this be overcome?
Regards,
baptiste
sessionInfo()
R version 2.11.1 (20100531)
x86_64appledarwin9.8.0
locale:
[1] en_GB.UTF8/en_GB.UTF8/C/C/en_GB.UTF8/en_GB.UTF8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] inline_0.3.5 RcppArmadillo_0.2.6 Rcpp_0.8.6
statmod_1.4.6
loaded via a namespace (and not attached):
[1] tools_2.11.1
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


There is nothing mysterious. You need to increase the accuracy of
quadrature by decreasing the error tolerance:
# I scaled your function to a proper Gaussian density
shiftedGauss < function(x0=500){
integrate(function(x) 1/sqrt(2*pi * 100^2) * exp((xx0)^2/(2*100^2)), 0,
Inf, rel.tol=1.e07)$value }
shift < seq(500, 800, by=10)
plot(shift, sapply(shift, shiftedGauss))
Hope this helps,
Ravi.
Original Message
From: [hidden email] [mailto: [hidden email]] On
Behalf Of baptiste auguie
Sent: Tuesday, September 21, 2010 8:38 AM
To: rhelp
Subject: [R] puzzle with integrate over infinite range
Dear list,
I'm calculating the integral of a Gaussian function from 0 to
infinity. I understand from ?integrate that it's usually better to
specify Inf explicitly as a limit rather than an arbitrary large
number, as in this case integrate() performs a trick to do the
integration better.
However, I do not understand the following, if I shift the Gauss
function by some amount the integral should not be affected,
shiftedGauss < function(x0=500){
integrate(function(x) exp((xx0)^2/100^2), 0, Inf)$value
}
shift < seq(500, 800, by=10)
plot(shift, sapply(shift, shiftedGauss))
Suddenly, just after 700, the value of the integral drops to nearly 0
when it should be constant all the way. Any clue as to what's going on
here? I guess it's suddenly missing the important part of the range
where the integrand is nonzero, but how could this be overcome?
Regards,
baptiste
sessionInfo()
R version 2.11.1 (20100531)
x86_64appledarwin9.8.0
locale:
[1] en_GB.UTF8/en_GB.UTF8/C/C/en_GB.UTF8/en_GB.UTF8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] inline_0.3.5 RcppArmadillo_0.2.6 Rcpp_0.8.6
statmod_1.4.6
loaded via a namespace (and not attached):
[1] tools_2.11.1
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi,
Thanks for the tip, but it's still mysterious to me. Reading ?integrate did not give me much hint as to what "relative accuracy" means in this context. I looked at the source of integrate.c but it's still not clear to me how the default value of rel.tol (10^4 for me) is not enough to prevent a completely wrong answer (the error is much larger than this).
Obviously, I'm worried now that I may not always choose a good value of ref.tol, if picked arbitrarily without my understanding of what it means.
Thanks,
baptiste
On Sep 21, 2010, at 3:38 PM, Ravi Varadhan wrote:
> There is nothing mysterious. You need to increase the accuracy of
> quadrature by decreasing the error tolerance:
>
> # I scaled your function to a proper Gaussian density
> shiftedGauss < function(x0=500){
> integrate(function(x) 1/sqrt(2*pi * 100^2) * exp((xx0)^2/(2*100^2)), 0,
> Inf, rel.tol=1.e07)$value }
>
> shift < seq(500, 800, by=10)
> plot(shift, sapply(shift, shiftedGauss))
>
>
> Hope this helps,
> Ravi.
>
> Original Message
> From: [hidden email] [mailto: [hidden email]] On
> Behalf Of baptiste auguie
> Sent: Tuesday, September 21, 2010 8:38 AM
> To: rhelp
> Subject: [R] puzzle with integrate over infinite range
>
> Dear list,
>
> I'm calculating the integral of a Gaussian function from 0 to
> infinity. I understand from ?integrate that it's usually better to
> specify Inf explicitly as a limit rather than an arbitrary large
> number, as in this case integrate() performs a trick to do the
> integration better.
>
> However, I do not understand the following, if I shift the Gauss
> function by some amount the integral should not be affected,
>
> shiftedGauss < function(x0=500){
> integrate(function(x) exp((xx0)^2/100^2), 0, Inf)$value
> }
>
> shift < seq(500, 800, by=10)
> plot(shift, sapply(shift, shiftedGauss))
>
> Suddenly, just after 700, the value of the integral drops to nearly 0
> when it should be constant all the way. Any clue as to what's going on
> here? I guess it's suddenly missing the important part of the range
> where the integrand is nonzero, but how could this be overcome?
>
> Regards,
>
> baptiste
>
>
> sessionInfo()
> R version 2.11.1 (20100531)
> x86_64appledarwin9.8.0
>
> locale:
> [1] en_GB.UTF8/en_GB.UTF8/C/C/en_GB.UTF8/en_GB.UTF8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] inline_0.3.5 RcppArmadillo_0.2.6 Rcpp_0.8.6
> statmod_1.4.6
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


You could try pnorm also:
shiftedGaussR < function(x0 = 500) {
sd < 100/sqrt(2)
int < pnorm(0, x0, sd, lower.tail=FALSE, log.p=TRUE)
exp(int + log(sd) + 0.5 * log(2*pi))
}
> shiftedGaussR(500)
[1] 177.2454
> shiftedGauss(500)
[1] 177.2454
Matt
On Tue, 20100921 at 09:38 0400, Ravi Varadhan wrote:
> There is nothing mysterious. You need to increase the accuracy of
> quadrature by decreasing the error tolerance:
>
> # I scaled your function to a proper Gaussian density
> shiftedGauss < function(x0=500){
> integrate(function(x) 1/sqrt(2*pi * 100^2) * exp((xx0)^2/(2*100^2)), 0,
> Inf, rel.tol=1.e07)$value }
>
> shift < seq(500, 800, by=10)
> plot(shift, sapply(shift, shiftedGauss))
>
>
> Hope this helps,
> Ravi.
>
> Original Message
> From: [hidden email] [mailto: [hidden email]] On
> Behalf Of baptiste auguie
> Sent: Tuesday, September 21, 2010 8:38 AM
> To: rhelp
> Subject: [R] puzzle with integrate over infinite range
>
> Dear list,
>
> I'm calculating the integral of a Gaussian function from 0 to
> infinity. I understand from ?integrate that it's usually better to
> specify Inf explicitly as a limit rather than an arbitrary large
> number, as in this case integrate() performs a trick to do the
> integration better.
>
> However, I do not understand the following, if I shift the Gauss
> function by some amount the integral should not be affected,
>
> shiftedGauss < function(x0=500){
> integrate(function(x) exp((xx0)^2/100^2), 0, Inf)$value
> }
>
> shift < seq(500, 800, by=10)
> plot(shift, sapply(shift, shiftedGauss))
>
> Suddenly, just after 700, the value of the integral drops to nearly 0
> when it should be constant all the way. Any clue as to what's going on
> here? I guess it's suddenly missing the important part of the range
> where the integrand is nonzero, but how could this be overcome?
>
> Regards,
>
> baptiste
>
>
> sessionInfo()
> R version 2.11.1 (20100531)
> x86_64appledarwin9.8.0
>
> locale:
> [1] en_GB.UTF8/en_GB.UTF8/C/C/en_GB.UTF8/en_GB.UTF8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] inline_0.3.5 RcppArmadillo_0.2.6 Rcpp_0.8.6
> statmod_1.4.6
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


You are dealing with functions that are nonzero over a very small interval,
so you have to be very careful. There is no method that is going to be
totally foolproof. Having said that, I have always felt that the default
tolerance in integrate is too liberal (i.e. too large). I always use
rel.tol of 1.e08 (roughly, sqrt(machine epsilon)) in my computations, and I
also increase subdivisions to 500.
Ravi.
From: baptiste Auguié [mailto: [hidden email]]
Sent: Tuesday, September 21, 2010 9:58 AM
To: Ravi Varadhan
Cc: 'baptiste auguie'; 'rhelp'
Subject: Re: [R] puzzle with integrate over infinite range
Hi,
Thanks for the tip, but it's still mysterious to me. Reading ?integrate did
not give me much hint as to what "relative accuracy" means in this context.
I looked at the source of integrate.c but it's still not clear to me how the
default value of rel.tol (10^4 for me) is not enough to prevent a
completely wrong answer (the error is much larger than this).
Obviously, I'm worried now that I may not always choose a good value of
ref.tol, if picked arbitrarily without my understanding of what it means.
Thanks,
baptiste
On Sep 21, 2010, at 3:38 PM, Ravi Varadhan wrote:
There is nothing mysterious. You need to increase the accuracy of
quadrature by decreasing the error tolerance:
# I scaled your function to a proper Gaussian density
shiftedGauss < function(x0=500){
integrate(function(x) 1/sqrt(2*pi * 100^2) * exp((xx0)^2/(2*100^2)), 0,
Inf, rel.tol=1.e07)$value }
shift < seq(500, 800, by=10)
plot(shift, sapply(shift, shiftedGauss))
Hope this helps,
Ravi.
Original Message
From: [hidden email] [mailto: [hidden email]] On
Behalf Of baptiste auguie
Sent: Tuesday, September 21, 2010 8:38 AM
To: rhelp
Subject: [R] puzzle with integrate over infinite range
Dear list,
I'm calculating the integral of a Gaussian function from 0 to
infinity. I understand from ?integrate that it's usually better to
specify Inf explicitly as a limit rather than an arbitrary large
number, as in this case integrate() performs a trick to do the
integration better.
However, I do not understand the following, if I shift the Gauss
function by some amount the integral should not be affected,
shiftedGauss < function(x0=500){
integrate(function(x) exp((xx0)^2/100^2), 0, Inf)$value
}
shift < seq(500, 800, by=10)
plot(shift, sapply(shift, shiftedGauss))
Suddenly, just after 700, the value of the integral drops to nearly 0
when it should be constant all the way. Any clue as to what's going on
here? I guess it's suddenly missing the important part of the range
where the integrand is nonzero, but how could this be overcome?
Regards,
baptiste
sessionInfo()
R version 2.11.1 (20100531)
x86_64appledarwin9.8.0
locale:
[1] en_GB.UTF8/en_GB.UTF8/C/C/en_GB.UTF8/en_GB.UTF8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] inline_0.3.5 RcppArmadillo_0.2.6 Rcpp_0.8.6
statmod_1.4.6
loaded via a namespace (and not attached):
[1] tools_2.11.1
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks, I'll do that too from now on.
It strikes me that in a case such as this one it may be safer to use a truncated, finite interval around the region where the integrand is nonzero, rather than following the advice of ?integrate to use Inf as integration limit. At least one wouldn't risk to get an entirely wrong result depending on a choice of rel.tol. Regarding this parameter, is there a simple interpretation of how it affected the result in the context of my example?
Thanks again,
baptiste
On Sep 21, 2010, at 4:04 PM, Ravi Varadhan wrote:
> You are dealing with functions that are nonzero over a very small interval, so you have to be very careful. There is no method that is going to be totally foolproof. Having said that, I have always felt that the default tolerance in integrate is too liberal (i.e. too large). I always use rel.tol of 1.e08 (roughly, sqrt(machine epsilon)) in my computations, and I also increase subdivisions to 500.
>
> Ravi.
>
> From: baptiste Auguié [mailto: [hidden email]]
> Sent: Tuesday, September 21, 2010 9:58 AM
> To: Ravi Varadhan
> Cc: 'baptiste auguie'; 'rhelp'
> Subject: Re: [R] puzzle with integrate over infinite range
>
> Hi,
>
> Thanks for the tip, but it's still mysterious to me. Reading ?integrate did not give me much hint as to what "relative accuracy" means in this context. I looked at the source of integrate.c but it's still not clear to me how the default value of rel.tol (10^4 for me) is not enough to prevent a completely wrong answer (the error is much larger than this).
> Obviously, I'm worried now that I may not always choose a good value of ref.tol, if picked arbitrarily without my understanding of what it means.
>
> Thanks,
>
> baptiste
>
>
> On Sep 21, 2010, at 3:38 PM, Ravi Varadhan wrote:
>
>
> There is nothing mysterious. You need to increase the accuracy of
> quadrature by decreasing the error tolerance:
>
> # I scaled your function to a proper Gaussian density
> shiftedGauss < function(x0=500){
> integrate(function(x) 1/sqrt(2*pi * 100^2) * exp((xx0)^2/(2*100^2)), 0,
> Inf, rel.tol=1.e07)$value }
>
> shift < seq(500, 800, by=10)
> plot(shift, sapply(shift, shiftedGauss))
>
>
> Hope this helps,
> Ravi.
>
> Original Message
> From: [hidden email] [mailto: [hidden email]] On
> Behalf Of baptiste auguie
> Sent: Tuesday, September 21, 2010 8:38 AM
> To: rhelp
> Subject: [R] puzzle with integrate over infinite range
>
> Dear list,
>
> I'm calculating the integral of a Gaussian function from 0 to
> infinity. I understand from ?integrate that it's usually better to
> specify Inf explicitly as a limit rather than an arbitrary large
> number, as in this case integrate() performs a trick to do the
> integration better.
>
> However, I do not understand the following, if I shift the Gauss
> function by some amount the integral should not be affected,
>
> shiftedGauss < function(x0=500){
> integrate(function(x) exp((xx0)^2/100^2), 0, Inf)$value
> }
>
> shift < seq(500, 800, by=10)
> plot(shift, sapply(shift, shiftedGauss))
>
> Suddenly, just after 700, the value of the integral drops to nearly 0
> when it should be constant all the way. Any clue as to what's going on
> here? I guess it's suddenly missing the important part of the range
> where the integrand is nonzero, but how could this be overcome?
>
> Regards,
>
> baptiste
>
>
> sessionInfo()
> R version 2.11.1 (20100531)
> x86_64appledarwin9.8.0
>
> locale:
> [1] en_GB.UTF8/en_GB.UTF8/C/C/en_GB.UTF8/en_GB.UTF8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] inline_0.3.5 RcppArmadillo_0.2.6 Rcpp_0.8.6
> statmod_1.4.6
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Tue, 21 Sep 2010, baptiste Auguié wrote:
> Thanks, I'll do that too from now on.
> It strikes me that in a case such as this one it may be safer to use a truncated, finite interval around the region where the integrand is nonzero, rather than following the advice of ?integrate to use Inf as integration limit. At least one wouldn't risk to get an entirely wrong result depending on a choice of rel.tol. Regarding this parameter, is there a simple interpretation of how it affected the result in the context of my example?
>
Not really.
If you know where the integrand is nonzero then you can shift it so that integrate() can handle it. If you don't know then you can't get the truncated interval right.
The truncation approach works well for the Normal density because it it is nonnegative, symmetric, and has nearly bounded support. The truncation error goes down extremely fast and if the mode of the density is in the center of the interval then all the mass can easily be found. If you have a function with multiple modes and heavier tails it is harder to get an interval that is large enough to make the truncation error small, and still allows the integrate() function to find all the mass.
thomas
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I see, thank you.
I'm still worried by the very dramatic error I obtained just from
shifting so slightly the support of the integrand, it took me a while
to figure what happened even with this basic example (I knew the
integral couldn't be so small!).
For a general integration in [0, infty), there must be a link between
the number of quadrature points, the transformation applied to the
integrand, and the region (position, width) where the measurable
integrand needs to be in order to be sampled by the quadrature. I
guess this kind of info lies deep inside the source code though.
Thanks,
baptiste
On 21 September 2010 19:00, Thomas Lumley < [hidden email]> wrote:
> On Tue, 21 Sep 2010, baptiste Auguié wrote:
>
>> Thanks, I'll do that too from now on.
>> It strikes me that in a case such as this one it may be safer to use a
>> truncated, finite interval around the region where the integrand is
>> nonzero, rather than following the advice of ?integrate to use Inf as
>> integration limit. At least one wouldn't risk to get an entirely wrong
>> result depending on a choice of rel.tol. Regarding this parameter, is there
>> a simple interpretation of how it affected the result in the context of my
>> example?
>>
>
> Not really.
>
> If you know where the integrand is nonzero then you can shift it so that
> integrate() can handle it. If you don't know then you can't get the
> truncated interval right.
>
> The truncation approach works well for the Normal density because it it is
> nonnegative, symmetric, and has nearly bounded support. The truncation
> error goes down extremely fast and if the mode of the density is in the
> center of the interval then all the mass can easily be found. If you have
> a function with multiple modes and heavier tails it is harder to get an
> interval that is large enough to make the truncation error small, and still
> allows the integrate() function to find all the mass.
>
> thomas
>
> Thomas Lumley
> Professor of Biostatistics
> University of Washington, Seattle
>

____________________
Dr. Baptiste Auguié
Departamento de Química Física,
Universidade de Vigo,
Campus Universitario, 36310, Vigo, Spain
tel: +34 9868 18617
http://webs.uvigo.es/coloides______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 21/09/2010 1:29 PM, baptiste auguie wrote:
> I see, thank you.
>
> I'm still worried by the very dramatic error I obtained just from
> shifting so slightly the support of the integrand, it took me a while
> to figure what happened even with this basic example (I knew the
> integral couldn't be so small!).
>
> For a general integration in [0, infty), there must be a link between
> the number of quadrature points, the transformation applied to the
> integrand, and the region (position, width) where the measurable
> integrand needs to be in order to be sampled by the quadrature. I
> guess this kind of info lies deep inside the source code though.
You probably do have to look at the source code to find out the rule,
but it is very easy to find out how it applies for any particular set of
parameters: Just have your objective function print or save all the
points where it is asked to evaluate itself. Plotting those will be
very informative.
Duncan Murdoch
> Thanks,
>
> baptiste
>
>
>
> On 21 September 2010 19:00, Thomas Lumley< [hidden email]> wrote:
> > On Tue, 21 Sep 2010, baptiste Auguié wrote:
> >
> >> Thanks, I'll do that too from now on.
> >> It strikes me that in a case such as this one it may be safer to use a
> >> truncated, finite interval around the region where the integrand is
> >> nonzero, rather than following the advice of ?integrate to use Inf as
> >> integration limit. At least one wouldn't risk to get an entirely wrong
> >> result depending on a choice of rel.tol. Regarding this parameter, is there
> >> a simple interpretation of how it affected the result in the context of my
> >> example?
> >>
> >
> > Not really.
> >
> > If you know where the integrand is nonzero then you can shift it so that
> > integrate() can handle it. If you don't know then you can't get the
> > truncated interval right.
> >
> > The truncation approach works well for the Normal density because it it is
> > nonnegative, symmetric, and has nearly bounded support. The truncation
> > error goes down extremely fast and if the mode of the density is in the
> > center of the interval then all the mass can easily be found. If you have
> > a function with multiple modes and heavier tails it is harder to get an
> > interval that is large enough to make the truncation error small, and still
> > allows the integrate() function to find all the mass.
> >
> > thomas
> >
> > Thomas Lumley
> > Professor of Biostatistics
> > University of Washington, Seattle
> >
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

