Hello,
I was interested in trying to write an R script to calculate a UCL for a lognormal distribution using the Chebyshev Inequality MVUE Approach (based on EPAs guidance found in http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf). This looks like it should be straight forward, but I am need to calculate an MVUE for the population mean and an MVUE for the population variance, which requires a value (g_n) from a table A7, found in Aitchison and Brown (1969): The lognormal distribution. I have looked across the RSiteSearch and can not seem to find a function that will give me g_n or the MVUE for mean and variance of lognormal distribution. Is there an R function that will give me g_n or will calculate an MVUE for the population mean and variance for the lognormal distribution? VR Jim James T. Durant, MSPH CIH Emergency Response Coordinator US Agency for Toxic Substances and Disease Registry Atlanta, GA 30341 770-378-1695 [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On 10-Jul-11 16:27:04, Durant, James T. (ATSDR/DTEM/PRMSB) wrote:
> Hello, > I was interested in trying to write an R script to calculate a > UCL for a lognormal distribution using the Chebyshev Inequality > -- MVUE Approach (based on EPAs guidance found in > http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf). > This looks like it should be straight forward, but I am need to > calculate an MVUE for the population mean and an MVUE for the > population variance, which requires a value (g_n) from a table A7, > found in Aitchison and Brown (1969): The lognormal distribution. > I have looked across the RSiteSearch and can not seem to find a > function that will give me g_n or the MVUE for mean and variance > of lognormal distribution. > > Is there an R function that will give me g_n or will calculate > an MVUE for the population mean and variance for the lognormal > distribution? > > VR > Jim > James T. Durant, MSPH CIH > Emergency Response Coordinator > US Agency for Toxic Substances and Disease Registry > Atlanta, GA 30341 > 770-378-1695 Some quick comments. I will try to repond more fully later. 1. The Chebyshev inequality is usually very conservative. As a simple example, consider X with a negative exponential distribution with density exp(x), so that the population mean is 1 and the population variance is also 1. Then, for a factor K, Chebyshev says that Prob(|X-1] > K*1) < 1/(K^2). This is only informative if K>1. So (e.g.) take K=2. Then the Chebyshev result is that this Prob < 1/4. HOwever, because X is positive, the event in question is X > 1 + 2 = 3 so Prob is exp(-3) = 0.0498 < 1/20. The reference you cite suggests ("Exhibit 5") applying the method to log-transformed data, which for lognormal data would be normally distributed. So apply Chebyshev to N(0,1) (mean=0, var=1). Then Prob(|X-0| > K*1) < 1/(K^2) as before. Now take K=2 again (i.e. outside +/- 2 SDs, so Prob approx=0.05). But Chebyshev still says "Prob < 1/4 = 0.25". So, as a first comment, I am seriously wondering about the wisdom of basing an approach on Chebyshev's inequality. Note also the comments in your reference at the end of that section (bottom of page 12) headed "Caveats about the Chebyshev method.", which is essentially a warning on similar lines to the above. 2. The function in the reference you cite is not "g_n" but "psi_n", and the Table cited from Aitchison and Brown is not A7 but A2. On page 45 of Aitchison and Brown (1969), section 5.41 "The Method of Maximum Likelihood", the function psi_n is defined (Eqn 5.38) so as to be applicable to the sufficient statistics mean(log(X)) and var(log(X)) to yield unbiased estimators of the population mean of X and the population variance of X (Eqns (5.40) and (5.42)). psi_n is defined as an infinite series which, according to A&B (page 46) "converges only slowly", and they exhibit a finite-form asymptotic approximation to it (Eqn (5.43)) which is accurate asyn=mptotically to O(1/(n^3)). This fairly simple expression would be easy to define as a function in R: psi <- function(t,n){ exp(t)*(1 - t*(t+1)/n + (t^2)*(3(t^2) + 22*t + 21)/(6*(n^2))) } Hoping this helps. As I say, I hope to find time later to look at this in more detail. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Fax-to-email: +44 (0)870 094 0861 Date: 10-Jul-11 Time: 19:49:39 ------------------------------ XFMail ------------------------------ ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Jul 10, 2011, at 2:49 PM, (Ted Harding) wrote: > On 10-Jul-11 16:27:04, Durant, James T. (ATSDR/DTEM/PRMSB) wrote: >> Hello, >> I was interested in trying to write an R script to calculate a >> UCL for a lognormal distribution using the Chebyshev Inequality >> -- MVUE Approach (based on EPA’s guidance found in >> http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf). >> This looks like it should be straight forward, but I am need to >> calculate an MVUE for the population mean and an MVUE for the >> population variance, which requires a value (g_n) from a table A7, >> found in Aitchison and Brown (1969): The lognormal distribution. >> I have looked across the RSiteSearch and can not seem to find a >> function that will give me g_n or the MVUE for mean and variance >> of lognormal distribution. >> >> Is there an R function that will give me g_n or will calculate >> an MVUE for the population mean and variance for the lognormal >> distribution? >> >> VR >> Jim >> James T. Durant, MSPH CIH >> Emergency Response Coordinator >> US Agency for Toxic Substances and Disease Registry >> Atlanta, GA 30341 >> 770-378-1695 > > Some quick comments. I will try to repond more fully later. > > 1. The Chebyshev inequality is usually very conservative. > As a simple example, consider X with a negative exponential > distribution with density exp(x), so that the population > mean is 1 and the population variance is also 1. > > Then, for a factor K, Chebyshev says that > > Prob(|X-1] > K*1) < 1/(K^2). > > This is only informative if K>1. So (e.g.) take K=2. Then the > Chebyshev > result is that this Prob < 1/4. HOwever, because X is positive, the > event in question is X > 1 + 2 = 3 so Prob is exp(-3) = 0.0498 < 1/20. > > The reference you cite suggests ("Exhibit 5") applying the method to > log-transformed data, which for lognormal data would be normally > distributed. So apply Chebyshev to N(0,1) (mean=0, var=1). Then > > Prob(|X-0| > K*1) < 1/(K^2) as before. > > Now take K=2 again (i.e. outside +/- 2 SDs, so Prob approx=0.05). > But Chebyshev still says "Prob < 1/4 = 0.25". > > So, as a first comment, I am seriously wondering about the wisdom > of basing an approach on Chebyshev's inequality. Note also the > comments in your reference at the end of that section (bottom of > page 12) headed "Caveats about the Chebyshev method.", which is > essentially a warning on similar lines to the above. > > 2. The function in the reference you cite is not "g_n" but "psi_n", > and the Table cited from Aitchison and Brown is not A7 but A2. > > On page 45 of Aitchison and Brown (1969), section 5.41 "The Method > of Maximum Likelihood", the function psi_n is defined (Eqn 5.38) > so as to be applicable to the sufficient statistics mean(log(X)) > and var(log(X)) to yield unbiased estimators of the population > mean of X and the population variance of X (Eqns (5.40) and (5.42)). > > psi_n is defined as an infinite series which, according to A&B > (page 46) "converges only slowly", and they exhibit a finite-form > asymptotic approximation to it (Eqn (5.43)) which is accurate > asyn=mptotically to O(1/(n^3)). This fairly simple expression > would be easy to define as a function in R: > > psi <- function(t,n){ > exp(t)*(1 - t*(t+1)/n + (t^2)*(3(t^2) + 22*t + 21)/(6*(n^2))) > } ITYM: psi <- function(t,n){ exp(t)*(1 - t*(t+1)/n + (t^2)*(3*(t^2) + 22*t + 21)/(6*(n^2))) } I was doing a bit of searching an found some VB code that whuber (and am wondering if it's the same whuber as frequently makes cogent posts on stats.stackexchange.com ?) had posted in an Excel macro about ten years ago that claimed to have reproduced the A9 Table in Gilbert. http://www.quantdec.com/envstats/software/ln_mvue.xls His macro was named Finney and I transposed it into R: Finney <- function(m , z){ aTol <- 0.0000000001 iterMax <- 1000 if (m <= -1) {# issue an error error("Finney = 0#")} x <- z * m * m / (m + 1) if (abs(x) < aTol) { return(Finney = 1L)} # This is the correct answer. iMax = abs(trunc(z) + 1) + 20 if (iMax > iterMax) {error("iMax > iterMax")} # Init a = 1L g = a # Lead terms for ( i in seq(iMax) ) { # Test for convergence if (abs(a) <= aTol * abs(g)) { break()} # Compute the next term a <- a * x / (m + 2 * (i - 1)) / i #' #' Accumulate terms #' g = g + a} # Next return(g) } The order of the arguments is reversed but they seem to offer similar results: > psi(2, 30) [1] 6.332695 > Finney(30, 2) [1] 6.254139 > Finney(60, 1.5) [1] 4.230381 > psi(1.5, 60) [1] 4.229944 I would think that a conservative statistical method _should_ be used when assessing toxic risks as the OP might to be doing, given his address and title. -- David. > > Hoping this helps. As I say, I hope to find time later to look > at this in more detail. > > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <[hidden email]> > Fax-to-email: +44 (0)870 094 0861 > Date: 10-Jul-11 Time: 19:49:39 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Ah, thanks so much.
I found the excel spreadsheet almost right after I posted to the r group. I had concerns about using Chebyshev and wanted to reproduce Dr. Whubers simulation to see for myself how it performs. To be clear I normally use Lands exact or bootstrap for the UCL (or sometimes take a Bayesian approach with an uninformed prior). Someone wanted me to use proUCL from EPA for something and I had never seen a Chebyshev inequality until then, so I was curious about its performance. Thanks to you both! You have made my day. Jim ----- Original Message ----- From: David Winsemius [mailto:[hidden email]] Sent: Sunday, July 10, 2011 03:14 PM To: [hidden email] <[hidden email]> Cc: Durant, James T. (ATSDR/DTEM/PRMSB); [hidden email] <[hidden email]> Subject: Re: [R] Chebyshev Inequality — MVUE On Jul 10, 2011, at 2:49 PM, (Ted Harding) wrote: > On 10-Jul-11 16:27:04, Durant, James T. (ATSDR/DTEM/PRMSB) wrote: >> Hello, >> I was interested in trying to write an R script to calculate a >> UCL for a lognormal distribution using the Chebyshev Inequality >> -- MVUE Approach (based on EPA’s guidance found in >> http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf). >> This looks like it should be straight forward, but I am need to >> calculate an MVUE for the population mean and an MVUE for the >> population variance, which requires a value (g_n) from a table A7, >> found in Aitchison and Brown (1969): The lognormal distribution. >> I have looked across the RSiteSearch and can not seem to find a >> function that will give me g_n or the MVUE for mean and variance >> of lognormal distribution. >> >> Is there an R function that will give me g_n or will calculate >> an MVUE for the population mean and variance for the lognormal >> distribution? >> >> VR >> Jim >> James T. Durant, MSPH CIH >> Emergency Response Coordinator >> US Agency for Toxic Substances and Disease Registry >> Atlanta, GA 30341 >> 770-378-1695 > > Some quick comments. I will try to repond more fully later. > > 1. The Chebyshev inequality is usually very conservative. > As a simple example, consider X with a negative exponential > distribution with density exp(x), so that the population > mean is 1 and the population variance is also 1. > > Then, for a factor K, Chebyshev says that > > Prob(|X-1] > K*1) < 1/(K^2). > > This is only informative if K>1. So (e.g.) take K=2. Then the > Chebyshev > result is that this Prob < 1/4. HOwever, because X is positive, the > event in question is X > 1 + 2 = 3 so Prob is exp(-3) = 0.0498 < 1/20. > > The reference you cite suggests ("Exhibit 5") applying the method to > log-transformed data, which for lognormal data would be normally > distributed. So apply Chebyshev to N(0,1) (mean=0, var=1). Then > > Prob(|X-0| > K*1) < 1/(K^2) as before. > > Now take K=2 again (i.e. outside +/- 2 SDs, so Prob approx=0.05). > But Chebyshev still says "Prob < 1/4 = 0.25". > > So, as a first comment, I am seriously wondering about the wisdom > of basing an approach on Chebyshev's inequality. Note also the > comments in your reference at the end of that section (bottom of > page 12) headed "Caveats about the Chebyshev method.", which is > essentially a warning on similar lines to the above. > > 2. The function in the reference you cite is not "g_n" but "psi_n", > and the Table cited from Aitchison and Brown is not A7 but A2. > > On page 45 of Aitchison and Brown (1969), section 5.41 "The Method > of Maximum Likelihood", the function psi_n is defined (Eqn 5.38) > so as to be applicable to the sufficient statistics mean(log(X)) > and var(log(X)) to yield unbiased estimators of the population > mean of X and the population variance of X (Eqns (5.40) and (5.42)). > > psi_n is defined as an infinite series which, according to A&B > (page 46) "converges only slowly", and they exhibit a finite-form > asymptotic approximation to it (Eqn (5.43)) which is accurate > asyn=mptotically to O(1/(n^3)). This fairly simple expression > would be easy to define as a function in R: > > psi <- function(t,n){ > exp(t)*(1 - t*(t+1)/n + (t^2)*(3(t^2) + 22*t + 21)/(6*(n^2))) > } ITYM: psi <- function(t,n){ exp(t)*(1 - t*(t+1)/n + (t^2)*(3*(t^2) + 22*t + 21)/(6*(n^2))) } I was doing a bit of searching an found some VB code that whuber (and am wondering if it's the same whuber as frequently makes cogent posts on stats.stackexchange.com ?) had posted in an Excel macro about ten years ago that claimed to have reproduced the A9 Table in Gilbert. http://www.quantdec.com/envstats/software/ln_mvue.xls His macro was named Finney and I transposed it into R: Finney <- function(m , z){ aTol <- 0.0000000001 iterMax <- 1000 if (m <= -1) {# issue an error error("Finney = 0#")} x <- z * m * m / (m + 1) if (abs(x) < aTol) { return(Finney = 1L)} # This is the correct answer. iMax = abs(trunc(z) + 1) + 20 if (iMax > iterMax) {error("iMax > iterMax")} # Init a = 1L g = a # Lead terms for ( i in seq(iMax) ) { # Test for convergence if (abs(a) <= aTol * abs(g)) { break()} # Compute the next term a <- a * x / (m + 2 * (i - 1)) / i #' #' Accumulate terms #' g = g + a} # Next return(g) } The order of the arguments is reversed but they seem to offer similar results: > psi(2, 30) [1] 6.332695 > Finney(30, 2) [1] 6.254139 > Finney(60, 1.5) [1] 4.230381 > psi(1.5, 60) [1] 4.229944 I would think that a conservative statistical method _should_ be used when assessing toxic risks as the OP might to be doing, given his address and title. -- David. > > Hoping this helps. As I say, I hope to find time later to look > at this in more detail. > > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <[hidden email]> > Fax-to-email: +44 (0)870 094 0861 > Date: 10-Jul-11 Time: 19:49:39 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
The formula from Finley is reproduced in Johnson, Kotz, and
Balakrishnan's "Continuous Distributions: Vol. 1" in the beginning of their Log Normal chapter. I am not clear that the recursive formula in W. Huber's spreadsheet is a correct representation of the iterative version there, but cannot claim to have disproven it either. (I am suggesting checking it against references.) It certainly seems to deliver results close to the results of the psi() approximation, which is also reproduced in that text. (And on a three year-old computer, the "converges slowly" comment does not seem to have meaning.) Results have no discernible time lag, so I think speed of convergence in 1941 (at the end of the mechanical computation era) may have quite a different meaning in the electronic era. -- David. On Jul 10, 2011, at 7:42 PM, Durant, James T. (ATSDR/DTEM/PRMSB) wrote: > Ah, thanks so much. > > I found the excel spreadsheet almost right after I posted to the r > group. I had concerns about using Chebyshev and wanted to reproduce > Dr. Whubers simulation to see for myself how it performs. To be > clear I normally use Lands exact or bootstrap for the UCL (or > sometimes take a Bayesian approach with an uninformed prior). > Someone wanted me to use proUCL from EPA for something and I had > never seen a Chebyshev inequality until then, so I was curious about > its performance. > > Thanks to you both! You have made my day. > > > Jim > > > ----- Original Message ----- > From: David Winsemius [mailto:[hidden email]] > Sent: Sunday, July 10, 2011 03:14 PM > To: [hidden email] <[hidden email]> > Cc: Durant, James T. (ATSDR/DTEM/PRMSB); [hidden email] <[hidden email] > > > Subject: Re: [R] Chebyshev Inequality — MVUE > > > On Jul 10, 2011, at 2:49 PM, (Ted Harding) wrote: > >> On 10-Jul-11 16:27:04, Durant, James T. (ATSDR/DTEM/PRMSB) wrote: >>> Hello, >>> I was interested in trying to write an R script to calculate a >>> UCL for a lognormal distribution using the Chebyshev Inequality >>> -- MVUE Approach (based on EPA’s guidance found in >>> http://www.epa.gov/oswer/riskassessment/pdf/ucl.pdf). >>> This looks like it should be straight forward, but I am need to >>> calculate an MVUE for the population mean and an MVUE for the >>> population variance, which requires a value (g_n) from a table A7, >>> found in Aitchison and Brown (1969): The lognormal distribution. >>> I have looked across the RSiteSearch and can not seem to find a >>> function that will give me g_n or the MVUE for mean and variance >>> of lognormal distribution. >>> >>> Is there an R function that will give me g_n or will calculate >>> an MVUE for the population mean and variance for the lognormal >>> distribution? >>> >>> VR >>> Jim >>> James T. Durant, MSPH CIH >>> Emergency Response Coordinator >>> US Agency for Toxic Substances and Disease Registry >>> Atlanta, GA 30341 >>> 770-378-1695 >> >> Some quick comments. I will try to repond more fully later. >> >> 1. The Chebyshev inequality is usually very conservative. >> As a simple example, consider X with a negative exponential >> distribution with density exp(x), so that the population >> mean is 1 and the population variance is also 1. >> >> Then, for a factor K, Chebyshev says that >> >> Prob(|X-1] > K*1) < 1/(K^2). >> >> This is only informative if K>1. So (e.g.) take K=2. Then the >> Chebyshev >> result is that this Prob < 1/4. HOwever, because X is positive, the >> event in question is X > 1 + 2 = 3 so Prob is exp(-3) = 0.0498 < >> 1/20. >> >> The reference you cite suggests ("Exhibit 5") applying the method to >> log-transformed data, which for lognormal data would be normally >> distributed. So apply Chebyshev to N(0,1) (mean=0, var=1). Then >> >> Prob(|X-0| > K*1) < 1/(K^2) as before. >> >> Now take K=2 again (i.e. outside +/- 2 SDs, so Prob approx=0.05). >> But Chebyshev still says "Prob < 1/4 = 0.25". >> >> So, as a first comment, I am seriously wondering about the wisdom >> of basing an approach on Chebyshev's inequality. Note also the >> comments in your reference at the end of that section (bottom of >> page 12) headed "Caveats about the Chebyshev method.", which is >> essentially a warning on similar lines to the above. >> >> 2. The function in the reference you cite is not "g_n" but "psi_n", >> and the Table cited from Aitchison and Brown is not A7 but A2. >> >> On page 45 of Aitchison and Brown (1969), section 5.41 "The Method >> of Maximum Likelihood", the function psi_n is defined (Eqn 5.38) >> so as to be applicable to the sufficient statistics mean(log(X)) >> and var(log(X)) to yield unbiased estimators of the population >> mean of X and the population variance of X (Eqns (5.40) and (5.42)). >> >> psi_n is defined as an infinite series which, according to A&B >> (page 46) "converges only slowly", and they exhibit a finite-form >> asymptotic approximation to it (Eqn (5.43)) which is accurate >> asyn=mptotically to O(1/(n^3)). This fairly simple expression >> would be easy to define as a function in R: >> >> psi <- function(t,n){ >> exp(t)*(1 - t*(t+1)/n + (t^2)*(3(t^2) + 22*t + 21)/(6*(n^2))) >> } > > ITYM: > psi <- function(t,n){ > exp(t)*(1 - t*(t+1)/n + (t^2)*(3*(t^2) + 22*t + 21)/(6*(n^2))) > } > > I was doing a bit of searching an found some VB code that whuber (and > am wondering if it's the same whuber as frequently makes cogent posts > on stats.stackexchange.com ?) had posted in an Excel macro about ten > years ago that claimed to have reproduced the A9 Table in Gilbert. > > http://www.quantdec.com/envstats/software/ln_mvue.xls > > His macro was named Finney and I transposed it into R: > > Finney <- function(m , z){ > aTol <- 0.0000000001 > iterMax <- 1000 > if (m <= -1) {# issue an error > error("Finney = 0#")} > x <- z * m * m / (m + 1) > if (abs(x) < aTol) { return(Finney = 1L)} > # This is the correct answer. > iMax = abs(trunc(z) + 1) + 20 > if (iMax > iterMax) {error("iMax > iterMax")} > # Init > a = 1L > g = a # Lead terms > > for ( i in seq(iMax) ) { > # Test for convergence > if (abs(a) <= aTol * abs(g)) { > break()} # Compute the next term > a <- a * x / (m + 2 * (i - 1)) / i > #' > #' Accumulate terms > #' > g = g + a} # Next > return(g) > } > > The order of the arguments is reversed but they seem to offer similar > results: > >> psi(2, 30) > [1] 6.332695 >> Finney(30, 2) > [1] 6.254139 > >> Finney(60, 1.5) > [1] 4.230381 >> psi(1.5, 60) > [1] 4.229944 > > I would think that a conservative statistical method _should_ be used > when assessing toxic risks as the OP might to be doing, given his > address and title. > > -- > David. >> >> Hoping this helps. As I say, I hope to find time later to look >> at this in more detail. >> >> Ted. >> >> -------------------------------------------------------------------- >> E-Mail: (Ted Harding) <[hidden email]> >> Fax-to-email: +44 (0)870 094 0861 >> Date: 10-Jul-11 Time: 19:49:39 >> ------------------------------ XFMail ------------------------------ >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Powered by Nabble | Edit this page |