Hi all,
I recently noticed that `qnbinom()` can take a long time to calculate a result if the `size` argument is very small. For example qnbinom(0.5, mu = 3, size = 1e-10) takes ~30 seconds on my computer. I used gdb to step through the qnbinom.c implementation and noticed that in line 106 (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106) `y` becomes a very large negative number. Later in the function `y` is (as far as I can see) only used as input for `pnbinom()` which is why I would assume that it should be a non-negative integer. I was wondering if this behavior could be considered a bug and should be reported on the bugzilla? I read the instructions at https://www.r-project.org/bugs.html and wasn't quite sure, so I decided to ask here first :) Best, Constantin PS: I tested the code with R 4.0.0 on macOS and the latest unstable version using docker (https://github.com/wch/r-debug). The session info is > sessionInfo() R Under development (unstable) (2020-08-06 r78973) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04 LTS Matrix products: default BLAS: /usr/local/RD/lib/R/lib/libRblas.so LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
I can reproduce this on
R Under development (unstable) (2020-07-24 r78910) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Pop!_OS 18.04 LTS In my opinion this is worth reporting, but discussing it here first was a good idea. Many more people read this list than watch the bug tracker, so it will get more attention here; once the excitement has died down here (which might be almost immediately!), if no-one has already volunteered to post it to the bug tracker, request an account (as specified at https://www.r-project.org/bugs.html ) Thanks! Ben Bolker For what it's worth it doesn't seem to be a threshold effect: approximately log10(time[seconds]) ~ -8 - log10(-size) over the range from 1e-6 to 1e-9 ff <- function(x) { system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]] } svec <- seq(-5,-9,by=-0.2) res <- lapply(svec, function(x) { cat(x,"\n") replicate(10,ff(x)) }) dd <- data.frame(size=rep(svec,each=10), time=unlist(res)) boxplot(log10(time)~size, dd) summary(lm(log10(time)~size, data=dd, subset=time>0)) On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote: > Hi all, > > I recently noticed that `qnbinom()` can take a long time to calculate > a result if the `size` argument is very small. > For example > qnbinom(0.5, mu = 3, size = 1e-10) > takes ~30 seconds on my computer. > > I used gdb to step through the qnbinom.c implementation and noticed > that in line 106 > (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106) > `y` becomes a very large negative number. Later in the function `y` is > (as far as I can see) only used as input for `pnbinom()` which is why > I would assume that it should be a non-negative integer. > > I was wondering if this behavior could be considered a bug and should > be reported on the bugzilla? I read the instructions at > https://www.r-project.org/bugs.html and wasn't quite sure, so I > decided to ask here first :) > > Best, > Constantin > > > > > PS: I tested the code with R 4.0.0 on macOS and the latest unstable > version using docker (https://github.com/wch/r-debug). The session > info is >> sessionInfo() > R Under development (unstable) (2020-08-06 r78973) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04 LTS > > Matrix products: default > BLAS: /usr/local/RD/lib/R/lib/libRblas.so > LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_4.1.0 > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Thanks Ben for verifying the issue. It is always reassuring to hear
when others can reproduce the problem. I wrote a small patch that fixes the issue (https://github.com/r-devel/r-svn/pull/11): diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c index b313ce56b2..d2e8d98759 100644 --- a/src/nmath/qnbinom.c +++ b/src/nmath/qnbinom.c @@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob, int lower_tail, int log_p) /* y := approx.value (Cornish-Fisher expansion) : */ z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE); y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6)); + y = fmax2(0.0, y); z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE); I used the https://github.com/r-devel/r-svn repo and its continuous integration tools to check that it doesn't break any existing tests: https://github.com/r-devel/r-svn/actions/runs/201327042 I have also requested a Bugzilla-account, but haven't heard anything back yet. Best, Constantin Am Fr., 7. Aug. 2020 um 21:41 Uhr schrieb Ben Bolker <[hidden email]>: > > I can reproduce this on > > R Under development (unstable) (2020-07-24 r78910) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Pop!_OS 18.04 LTS > > In my opinion this is worth reporting, but discussing it here first > was a good idea. Many more people read this list than watch the bug > tracker, so it will get more attention here; once the excitement has > died down here (which might be almost immediately!), if no-one has > already volunteered to post it to the bug tracker, request an account > (as specified at https://www.r-project.org/bugs.html ) > > Thanks! > > Ben Bolker > > > For what it's worth it doesn't seem to be a threshold effect: approximately > > log10(time[seconds]) ~ -8 - log10(-size) > > over the range from 1e-6 to 1e-9 > > > ff <- function(x) { > system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]] > } > svec <- seq(-5,-9,by=-0.2) > res <- lapply(svec, function(x) { > cat(x,"\n") > replicate(10,ff(x)) > }) > > dd <- data.frame(size=rep(svec,each=10), > time=unlist(res)) > boxplot(log10(time)~size, dd) > summary(lm(log10(time)~size, data=dd, subset=time>0)) > > > > > On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote: > > > Hi all, > > > > I recently noticed that `qnbinom()` can take a long time to calculate > > a result if the `size` argument is very small. > > For example > > qnbinom(0.5, mu = 3, size = 1e-10) > > takes ~30 seconds on my computer. > > > > I used gdb to step through the qnbinom.c implementation and noticed > > that in line 106 > > (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106) > > `y` becomes a very large negative number. Later in the function `y` is > > (as far as I can see) only used as input for `pnbinom()` which is why > > I would assume that it should be a non-negative integer. > > > > I was wondering if this behavior could be considered a bug and should > > be reported on the bugzilla? I read the instructions at > > https://www.r-project.org/bugs.html and wasn't quite sure, so I > > decided to ask here first :) > > > > Best, > > Constantin > > > > > > > > > > PS: I tested the code with R 4.0.0 on macOS and the latest unstable > > version using docker (https://github.com/wch/r-debug). The session > > info is > >> sessionInfo() > > R Under development (unstable) (2020-08-06 r78973) > > Platform: x86_64-pc-linux-gnu (64-bit) > > Running under: Ubuntu 20.04 LTS > > > > Matrix products: default > > BLAS: /usr/local/RD/lib/R/lib/libRblas.so > > LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > loaded via a namespace (and not attached): > > [1] compiler_4.1.0 > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
>>>>> Constantin Ahlmann-Eltze via R-devel
>>>>> on Mon, 10 Aug 2020 10:05:36 +0200 writes: > Thanks Ben for verifying the issue. It is always reassuring to hear > when others can reproduce the problem. > I wrote a small patch that fixes the issue > (https://github.com/r-devel/r-svn/pull/11): > diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c > index b313ce56b2..d2e8d98759 100644 > --- a/src/nmath/qnbinom.c > +++ b/src/nmath/qnbinom.c > @@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob, > int lower_tail, int log_p) > /* y := approx.value (Cornish-Fisher expansion) : */ > z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE); > y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6)); > + y = fmax2(0.0, y); > z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE); > I used the https://github.com/r-devel/r-svn repo and its continuous > integration tools to check that it doesn't break any existing tests: > https://github.com/r-devel/r-svn/actions/runs/201327042 > I have also requested a Bugzilla-account, but haven't heard anything back yet. > Best, > Constantin Thank you for the report, and Ben for his experiment. And, indeed in this case, this returns 0 much more quickly. Note that this could be even much more quickly: The Cornish-Fisher expansion is not really of much use here, ... and quick check would just see that pnbinom(0, size, prob) > Note however, that in other cases, results for small 'size' are *still* not good (and *not* influenced by your patch !!), e.g., ## Other examples, not giving 0, are fast already but *in*accurate: qnbinom(.9999, mu=3, size=1e-4) ## [1] 8044 ## but str(ur1 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999, c(7000,8000))) ## List of 5 ## $ root : num 7942 ## $ f.root : num 1.52e-09 ## $ iter : int 18 ## $ init.it : int NA ## $ estim.prec: num 6.49e-05 ## and this of course does not change when asking for more precision : str(ur2 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999, c(7000,8000), tol=1e-12)) ## List of 5 ## $ root : num 7942 <<< correct is 7942, not 8044 !!! ## $ f.root : num 1.52e-09 ## $ iter : int 47 ## $ init.it : int NA ## $ estim.prec: num 7.28e-12 ---------- so, in principle the C-internal search() function really should be improved for such ( somewhat extreme!! ) cases. or ... ?? ... a different approximation should be used for such extreme small 'size' (and prob := size/(size+mu) ) ... Martin Maechler ETH Zurich and R Core team ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Hi Martin,
thanks for verifying. I agree that the Cornish-Fisher seems to struggle with the small size parameters, but I also don't have a good idea how to replace it. But I think fixing do_search() is possible: I think the problem is that when searching to the left y is decremented only if `pnbinom(y - incr, n, pr, /*l._t.*/TRUE, /*log_p*/FALSE)) < p` is FALSE. I think the solution is to move the update of y before the if. However, I need to make this slightly awkward check if incr == 1, so that the return in line 123 and the do-while block at the end of qnbinom() do not need to be modified. diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c index b313ce56b2..16845d9373 100644 --- a/src/nmath/qnbinom.c +++ b/src/nmath/qnbinom.c @@ -49,10 +49,18 @@ do_search(double y, double *z, double p, double n, double pr, double incr) { if(*z >= p) { /* search to the left */ for(;;) { + y = fmax2(0, y - incr); if(y == 0 || - (*z = pnbinom(y - incr, n, pr, /*l._t.*/TRUE, /*log_p*/FALSE)) < p) - return y; - y = fmax2(0, y - incr); + (*z = pnbinom(y, n, pr, /*l._t.*/TRUE, /*log_p*/FALSE)) < p){ + if(incr == 1){ + // we know that the search is stopped if incr == 1 + // and we know that the correct result is just right + // of the current y + return y + 1; + }else{ + return y; + } + } } } else { /* search to the right */ With this patch, we get the expected result > qnbinom(0.9999, mu = 3, size = 1e-4) [1] 7942 I have updated the pull request at https://github.com/r-devel/r-svn/pull/11 and it is currently checking if the change breaks anything. Best, Constantin Am 20.08.20 um 22:27 schrieb Martin Maechler: Constantin Ahlmann-Eltze via R-devel on Mon, 10 Aug 2020 10:05:36 +0200 writes: > Thanks Ben for verifying the issue. It is always reassuring to hear > when others can reproduce the problem. > I wrote a small patch that fixes the issue > (https://github.com/r-devel/r-svn/pull/11): > diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c > index b313ce56b2..d2e8d98759 100644 > --- a/src/nmath/qnbinom.c > +++ b/src/nmath/qnbinom.c > @@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob, > int lower_tail, int log_p) > /* y := approx.value (Cornish-Fisher expansion) : */ > z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE); > y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6)); > + y = fmax2(0.0, y); > z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE); > I used the https://github.com/r-devel/r-svn repo and its continuous > integration tools to check that it doesn't break any existing tests: > https://github.com/r-devel/r-svn/actions/runs/201327042 > I have also requested a Bugzilla-account, but haven't heard anything back yet. > Best, > Constantin Thank you for the report, and Ben for his experiment. And, indeed in this case, this returns 0 much more quickly. Note that this could be even much more quickly: The Cornish-Fisher expansion is not really of much use here, ... and quick check would just see that pnbinom(0, size, prob) > Note however, that in other cases, results for small 'size' are *still* not good (and *not* influenced by your patch !!), e.g., ## Other examples, not giving 0, are fast already but *in*accurate: qnbinom(.9999, mu=3, size=1e-4) ## [1] 8044 ## but str(ur1 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999, c(7000,8000))) ## List of 5 ## $ root : num 7942 ## $ f.root : num 1.52e-09 ## $ iter : int 18 ## $ init.it : int NA ## $ estim.prec: num 6.49e-05 ## and this of course does not change when asking for more precision : str(ur2 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999, c(7000,8000), tol=1e-12)) ## List of 5 ## $ root : num 7942 <<< correct is 7942, not 8044 !!! ## $ f.root : num 1.52e-09 ## $ iter : int 47 ## $ init.it : int NA ## $ estim.prec: num 7.28e-12 ---------- so, in principle the C-internal search() function really should be improved for such ( somewhat extreme!! ) cases. or ... ?? ... a different approximation should be used for such extreme small 'size' (and prob := size/(size+mu) ) ... Martin Maechler ETH Zurich and R Core team [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
>>>>> Constantin Ahlmann-Eltze
>>>>> on Fri, 21 Aug 2020 11:51:13 +0200 writes: > Hi Martin, thanks for verifying. I agree that the > Cornish-Fisher seems to struggle with the small size > parameters, but I also don't have a good idea how to > replace it. > But I think fixing do_search() is possible: Yes, but your fix was not correct.. (and I used time to find out) Note that do_search() is used in all three of qnbinom() qbinom() qpois() If a change was correct and an improvement, it will be an improvement for all three cases; if it is only sometimes good, it may also be sometimes bad for the other two cases. Note that running the checks on the current R sources makes almost no sense: Of course the new bugs (your: "slow"; mine: inaccurate when result is quite larger than 0) are *NOT* yet in the regression tests. And it's really not worth doing experiments on the cloud ("QI" is really increasing the climate change, by each time installing tons (=> bandwidth) and running 1.5 hours ..): I don't know what big a part of tree is killed by the bandwith (==> electricity of internet routers, cooling, ...) and the 1.5 hour run (==> electricity of the CPUs; cooling; ...) but I'd rather do test such small changes on my small desktop where I don't need to install anything etc (*and* I know that the Swiss electricity is almost exclusively produced from no-Carbon-emission plants). -------- I'm busy for another 1-3 days (taking & grading exams), but really do want to address this myself, rather than trying these proposals (and having you burn electricity on routers and CPUs and coolers) ... Martin ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel |
Free forum by Nabble | Edit this page |