qnbinom with small size is slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

qnbinom with small size is slow

R devel mailing list
Hi all,

I recently noticed that `qnbinom()` can take a long time to calculate
a result if the `size` argument is very small.
For example
   qnbinom(0.5, mu = 3, size = 1e-10)
takes ~30 seconds on my computer.

I used gdb to step through the qnbinom.c implementation and noticed
that in line 106
(https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
`y` becomes a very large negative number. Later in the function `y` is
(as far as I can see) only used as input for `pnbinom()` which is why
I would assume that it should be a non-negative integer.

I was wondering if this behavior could be considered a bug and should
be reported on the bugzilla? I read the instructions at
https://www.r-project.org/bugs.html and wasn't quite sure, so I
decided to ask here first :)

Best,
Constantin




PS: I tested the code with R 4.0.0 on macOS and the latest unstable
version using docker (https://github.com/wch/r-debug). The session
info is
> sessionInfo()
R Under development (unstable) (2020-08-06 r78973)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/local/RD/lib/R/lib/libRblas.so
LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.0

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: qnbinom with small size is slow

bbolker
    I can reproduce this on

R Under development (unstable) (2020-07-24 r78910)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 18.04 LTS

   In my opinion this is worth reporting, but discussing it here first
was a good idea.  Many more people read this list than watch the bug
tracker, so it will get more attention here; once the excitement has
died down here (which might be almost immediately!), if no-one has
already volunteered to post it to the bug tracker, request an account
(as specified at https://www.r-project.org/bugs.html )

   Thanks!

    Ben Bolker


For what it's worth it doesn't seem to be a threshold effect: approximately

log10(time[seconds]) ~ -8 - log10(-size)

over the range from 1e-6 to 1e-9


ff <- function(x) {
    system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]]
}
svec <- seq(-5,-9,by=-0.2)
res <- lapply(svec, function(x) {
     cat(x,"\n")
     replicate(10,ff(x))
     })

dd <- data.frame(size=rep(svec,each=10),
                  time=unlist(res))
boxplot(log10(time)~size, dd)
summary(lm(log10(time)~size, data=dd, subset=time>0))




On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote:

> Hi all,
>
> I recently noticed that `qnbinom()` can take a long time to calculate
> a result if the `size` argument is very small.
> For example
>     qnbinom(0.5, mu = 3, size = 1e-10)
> takes ~30 seconds on my computer.
>
> I used gdb to step through the qnbinom.c implementation and noticed
> that in line 106
> (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
> `y` becomes a very large negative number. Later in the function `y` is
> (as far as I can see) only used as input for `pnbinom()` which is why
> I would assume that it should be a non-negative integer.
>
> I was wondering if this behavior could be considered a bug and should
> be reported on the bugzilla? I read the instructions at
> https://www.r-project.org/bugs.html and wasn't quite sure, so I
> decided to ask here first :)
>
> Best,
> Constantin
>
>
>
>
> PS: I tested the code with R 4.0.0 on macOS and the latest unstable
> version using docker (https://github.com/wch/r-debug). The session
> info is
>> sessionInfo()
> R Under development (unstable) (2020-08-06 r78973)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04 LTS
>
> Matrix products: default
> BLAS:   /usr/local/RD/lib/R/lib/libRblas.so
> LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.0
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: qnbinom with small size is slow

R devel mailing list
Thanks Ben for verifying the issue. It is always reassuring to hear
when others can reproduce the problem.

I wrote a small patch that fixes the issue
(https://github.com/r-devel/r-svn/pull/11):

diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c
index b313ce56b2..d2e8d98759 100644
--- a/src/nmath/qnbinom.c
+++ b/src/nmath/qnbinom.c
@@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob,
int lower_tail, int log_p)
     /* y := approx.value (Cornish-Fisher expansion) :  */
     z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE);
     y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6));
+    y = fmax2(0.0, y);

     z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE);

I used the https://github.com/r-devel/r-svn repo and its continuous
integration tools to check that it doesn't break any existing tests:
https://github.com/r-devel/r-svn/actions/runs/201327042

I have also requested a Bugzilla-account, but haven't heard anything back yet.

Best,
Constantin

Am Fr., 7. Aug. 2020 um 21:41 Uhr schrieb Ben Bolker <[hidden email]>:

>
>     I can reproduce this on
>
> R Under development (unstable) (2020-07-24 r78910)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Pop!_OS 18.04 LTS
>
>    In my opinion this is worth reporting, but discussing it here first
> was a good idea.  Many more people read this list than watch the bug
> tracker, so it will get more attention here; once the excitement has
> died down here (which might be almost immediately!), if no-one has
> already volunteered to post it to the bug tracker, request an account
> (as specified at https://www.r-project.org/bugs.html )
>
>    Thanks!
>
>     Ben Bolker
>
>
> For what it's worth it doesn't seem to be a threshold effect: approximately
>
> log10(time[seconds]) ~ -8 - log10(-size)
>
> over the range from 1e-6 to 1e-9
>
>
> ff <- function(x) {
>     system.time(qnbinom(0.5, mu=3, size=10^x))[["elapsed"]]
> }
> svec <- seq(-5,-9,by=-0.2)
> res <- lapply(svec, function(x) {
>      cat(x,"\n")
>      replicate(10,ff(x))
>      })
>
> dd <- data.frame(size=rep(svec,each=10),
>                   time=unlist(res))
> boxplot(log10(time)~size, dd)
> summary(lm(log10(time)~size, data=dd, subset=time>0))
>
>
>
>
> On 8/7/20 2:01 PM, Constantin Ahlmann-Eltze via R-devel wrote:
>
> > Hi all,
> >
> > I recently noticed that `qnbinom()` can take a long time to calculate
> > a result if the `size` argument is very small.
> > For example
> >     qnbinom(0.5, mu = 3, size = 1e-10)
> > takes ~30 seconds on my computer.
> >
> > I used gdb to step through the qnbinom.c implementation and noticed
> > that in line 106
> > (https://github.com/wch/r-source/blob/f8d4d7d48051860cc695b99db9be9cf439aee743/src/nmath/qnbinom.c#L106)
> > `y` becomes a very large negative number. Later in the function `y` is
> > (as far as I can see) only used as input for `pnbinom()` which is why
> > I would assume that it should be a non-negative integer.
> >
> > I was wondering if this behavior could be considered a bug and should
> > be reported on the bugzilla? I read the instructions at
> > https://www.r-project.org/bugs.html and wasn't quite sure, so I
> > decided to ask here first :)
> >
> > Best,
> > Constantin
> >
> >
> >
> >
> > PS: I tested the code with R 4.0.0 on macOS and the latest unstable
> > version using docker (https://github.com/wch/r-debug). The session
> > info is
> >> sessionInfo()
> > R Under development (unstable) (2020-08-06 r78973)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 20.04 LTS
> >
> > Matrix products: default
> > BLAS:   /usr/local/RD/lib/R/lib/libRblas.so
> > LAPACK: /usr/local/RD/lib/R/lib/libRlapack.so
> >
> > locale:
> >   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_4.1.0
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: qnbinom with small size is slow

Martin Maechler
>>>>> Constantin Ahlmann-Eltze via R-devel
>>>>>     on Mon, 10 Aug 2020 10:05:36 +0200 writes:

    > Thanks Ben for verifying the issue. It is always reassuring to hear
    > when others can reproduce the problem.

    > I wrote a small patch that fixes the issue
    > (https://github.com/r-devel/r-svn/pull/11):

    > diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c
    > index b313ce56b2..d2e8d98759 100644
    > --- a/src/nmath/qnbinom.c
    > +++ b/src/nmath/qnbinom.c
    > @@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob,
    > int lower_tail, int log_p)
    > /* y := approx.value (Cornish-Fisher expansion) :  */
    > z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE);
    > y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6));
    > +    y = fmax2(0.0, y);

    > z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE);

    > I used the https://github.com/r-devel/r-svn repo and its continuous
    > integration tools to check that it doesn't break any existing tests:
    > https://github.com/r-devel/r-svn/actions/runs/201327042

    > I have also requested a Bugzilla-account, but haven't heard anything back yet.

    > Best,
    > Constantin

Thank you for the report, and Ben for his experiment.

And, indeed in this case, this returns  0  much more  quickly.

Note that this could be even much more quickly: The
Cornish-Fisher expansion is not really of much use here, ...
and quick check would just see that pnbinom(0, size, prob) >

Note however, that in other cases, results for  small 'size'
are *still* not good  (and *not* influenced by your patch !!),
e.g.,

## Other examples, not giving 0, are fast already but  *in*accurate:
qnbinom(.9999, mu=3, size=1e-4)
## [1] 8044

## but
str(ur1 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999, c(7000,8000)))
## List of 5
##  $ root      : num 7942
##  $ f.root    : num 1.52e-09
##  $ iter      : int 18
##  $ init.it   : int NA
##  $ estim.prec: num 6.49e-05

## and this of course does not change when asking for more precision :

str(ur2 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999, c(7000,8000), tol=1e-12))
## List of 5
##  $ root      : num 7942  <<< correct is 7942, not 8044 !!!
##  $ f.root    : num 1.52e-09
##  $ iter      : int 47
##  $ init.it   : int NA
##  $ estim.prec: num 7.28e-12

----------

so, in principle the C-internal  search() function really should be
improved for such  ( somewhat extreme!! )  cases.
or ... ?? ... a different approximation should be used for such
extreme small 'size' (and  prob := size/(size+mu)  ) ...

Martin Maechler
ETH Zurich   and  R Core team

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: qnbinom with small size is slow

R devel mailing list
Hi Martin,

thanks for verifying. I agree that the Cornish-Fisher seems to struggle
with the small size parameters, but I also don't have a good idea how to
replace it.

But I think fixing do_search() is possible:

I think the problem is that when searching to the left y is decremented
only if `pnbinom(y - incr, n, pr, /*l._t.*/TRUE, /*log_p*/FALSE)) < p` is
FALSE. I think the solution is to move the update of y before the if.
However, I need to make this slightly awkward check if incr == 1, so that
the return in line 123 and the do-while block at the end of qnbinom() do
not need to be modified.

diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c
index b313ce56b2..16845d9373 100644
--- a/src/nmath/qnbinom.c
+++ b/src/nmath/qnbinom.c
@@ -49,10 +49,18 @@ do_search(double y, double *z, double p, double n,
double pr, double incr)
 {
     if(*z >= p) { /* search to the left */
  for(;;) {
+        y = fmax2(0, y - incr);
     if(y == 0 ||
-       (*z = pnbinom(y - incr, n, pr, /*l._t.*/TRUE, /*log_p*/FALSE)) < p)
- return y;
-    y = fmax2(0, y - incr);
+       (*z = pnbinom(y, n, pr, /*l._t.*/TRUE, /*log_p*/FALSE)) < p){
+            if(incr == 1){
+                // we know that the search is stopped if incr == 1
+                // and we know that the correct result is just right
+                // of the current y
+                return y + 1;
+            }else{
+                return y;
+            }
+        }
  }
     }
     else { /* search to the right */


With this patch, we get the expected result

> qnbinom(0.9999, mu = 3, size = 1e-4)
[1] 7942

I have updated the pull request at https://github.com/r-devel/r-svn/pull/11
and it is currently checking if the change breaks anything.

Best,
Constantin


Am 20.08.20 um 22:27 schrieb Martin Maechler:

Constantin Ahlmann-Eltze via R-devel
    on Mon, 10 Aug 2020 10:05:36 +0200 writes:

    > Thanks Ben for verifying the issue. It is always reassuring to hear
    > when others can reproduce the problem.

    > I wrote a small patch that fixes the issue
    > (https://github.com/r-devel/r-svn/pull/11):

    > diff --git a/src/nmath/qnbinom.c b/src/nmath/qnbinom.c
    > index b313ce56b2..d2e8d98759 100644
    > --- a/src/nmath/qnbinom.c
    > +++ b/src/nmath/qnbinom.c
    > @@ -104,6 +104,7 @@ double qnbinom(double p, double size, double prob,
    > int lower_tail, int log_p)
    > /* y := approx.value (Cornish-Fisher expansion) :  */
    > z = qnorm(p, 0., 1., /*lower_tail*/TRUE, /*log_p*/FALSE);
    > y = R_forceint(mu + sigma * (z + gamma * (z*z - 1) / 6));
    > +    y = fmax2(0.0, y);

    > z = pnbinom(y, size, prob, /*lower_tail*/TRUE, /*log_p*/FALSE);

    > I used the https://github.com/r-devel/r-svn repo and its continuous
    > integration tools to check that it doesn't break any existing tests:
    > https://github.com/r-devel/r-svn/actions/runs/201327042

    > I have also requested a Bugzilla-account, but haven't heard
anything back yet.

    > Best,
    > Constantin

Thank you for the report, and Ben for his experiment.

And, indeed in this case, this returns  0  much more  quickly.

Note that this could be even much more quickly: The
Cornish-Fisher expansion is not really of much use here, ...
and quick check would just see that pnbinom(0, size, prob) >

Note however, that in other cases, results for  small 'size'
are *still* not good  (and *not* influenced by your patch !!),
e.g.,

## Other examples, not giving 0, are fast already but  *in*accurate:
qnbinom(.9999, mu=3, size=1e-4)
## [1] 8044

## but
str(ur1 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999,
c(7000,8000)))
## List of 5
##  $ root      : num 7942
##  $ f.root    : num 1.52e-09
##  $ iter      : int 18
##  $ init.it   : int NA
##  $ estim.prec: num 6.49e-05

## and this of course does not change when asking for more precision :

str(ur2 <- uniroot(function(q) pnbinom(q, mu=3, size=1e-4) - 0.9999,
c(7000,8000), tol=1e-12))
## List of 5
##  $ root      : num 7942  <<< correct is 7942, not 8044 !!!
##  $ f.root    : num 1.52e-09
##  $ iter      : int 47
##  $ init.it   : int NA
##  $ estim.prec: num 7.28e-12

----------

so, in principle the C-internal  search() function really should be
improved for such  ( somewhat extreme!! )  cases.
or ... ?? ... a different approximation should be used for such
extreme small 'size' (and  prob := size/(size+mu)  ) ...

Martin Maechler
ETH Zurich   and  R Core team

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: qnbinom with small size is slow

Martin Maechler
>>>>> Constantin Ahlmann-Eltze
>>>>>     on Fri, 21 Aug 2020 11:51:13 +0200 writes:

    > Hi Martin, thanks for verifying. I agree that the
    > Cornish-Fisher seems to struggle with the small size
    > parameters, but I also don't have a good idea how to
    > replace it.

    > But I think fixing do_search() is possible:

Yes,  but your fix was not correct..
(and I used time to find out)

Note that do_search()  is used in all three of

qnbinom()
 qbinom()
  qpois()

If a change was correct and an improvement, it will be an
improvement for all three cases; if it is only sometimes good,
it may also be sometimes bad for the other two cases.

Note that running the checks on the current R sources makes
almost no sense: Of course the new bugs (your: "slow"; mine:
inaccurate when result is quite larger than 0) are *NOT* yet in
the regression tests.

And it's really not worth doing experiments on the cloud ("QI"
is really increasing the climate change, by each time installing
tons (=> bandwidth) and running 1.5 hours ..):

I don't know what big a part of tree is killed by the bandwith
  (==> electricity of internet routers, cooling, ...)
and the 1.5 hour run
  (==> electricity of the CPUs; cooling; ...)
but I'd rather do test such small changes on my small desktop
where I don't need to install anything etc
(*and* I know that the Swiss electricity is almost exclusively
  produced from no-Carbon-emission plants).

--------

I'm busy for another 1-3 days (taking & grading exams),
but really do want to address this myself, rather than trying
these proposals (and having you burn electricity on routers and
CPUs and coolers) ...

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel