Request: Increasing MAX_NUM_DLLS in Rdynload.c

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Request: Increasing MAX_NUM_DLLS in Rdynload.c

Steve Bronder
This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to 500.

On line 131 of Rdynload.c, changing

#define MAX_NUM_DLLS 100

 to

#define MAX_NUM_DLLS 500


In development of the mlr package, there have been several episodes in the
past where we have had to break up unit tests because of the "maximum
number of DLLs reached" error. This error has been an inconvenience that is
going to keep happening as the package continues to grow. Is there more
than meets the eye with this error or would everything be okay if the above
line changes? Would that have a larger effect in other parts of R?

As R grows, we are likely to see more 'meta-packages' such as the
Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded at
any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is set
to 100 for a very particular reason than I apologize, but if it is possible
to increase MAX_NUM_DLLS it would at least make the testing at mlr much
easier.

I understand you are all very busy and thank you for your time.


Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: [hidden email]

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Henrik Bengtsson-5
On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
packages don't unload their DLLs when they being unloaded themselves.
In other words, there may be left-over DLLs just sitting there doing
nothing but occupying space.  You can remove these, using:

   R.utils::gcDLLs()

Maybe that will help you get through your tests (as long as you're
unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
its content and compare to loadedNamespaces() and unregister any
"stray" DLLs that remain after corresponding packages have been
unloaded.

I think it would be useful if R CMD check would also check that DLLs
are unregistered when a package is unloaded
(https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
course, someone needs to write the code / a patch for this to happen.

/Henrik

On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
<[hidden email]> wrote:

> This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to 500.
>
> On line 131 of Rdynload.c, changing
>
> #define MAX_NUM_DLLS 100
>
>  to
>
> #define MAX_NUM_DLLS 500
>
>
> In development of the mlr package, there have been several episodes in the
> past where we have had to break up unit tests because of the "maximum
> number of DLLs reached" error. This error has been an inconvenience that is
> going to keep happening as the package continues to grow. Is there more
> than meets the eye with this error or would everything be okay if the above
> line changes? Would that have a larger effect in other parts of R?
>
> As R grows, we are likely to see more 'meta-packages' such as the
> Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded at
> any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is set
> to 100 for a very particular reason than I apologize, but if it is possible
> to increase MAX_NUM_DLLS it would at least make the testing at mlr much
> easier.
>
> I understand you are all very busy and thank you for your time.
>
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: [hidden email]
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Steve Bronder
Thanks Henrik this is very helpful! I will try this out on our tests and
see if gcDLLs() has a positive effect.

mlr currently has tests broken down by learner type such as classification,
regression, forecasting, clustering, etc.. There are 83 classifiers alone
so even when loading and unloading across learner types we can still hit
the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
Bischl to make sure I am representing the issue well.

Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: [hidden email]


On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
[hidden email]> wrote:

> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
> packages don't unload their DLLs when they being unloaded themselves.
> In other words, there may be left-over DLLs just sitting there doing
> nothing but occupying space.  You can remove these, using:
>
>    R.utils::gcDLLs()
>
> Maybe that will help you get through your tests (as long as you're
> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
> its content and compare to loadedNamespaces() and unregister any
> "stray" DLLs that remain after corresponding packages have been
> unloaded.
>
> I think it would be useful if R CMD check would also check that DLLs
> are unregistered when a package is unloaded
> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
> course, someone needs to write the code / a patch for this to happen.
>
> /Henrik
>
> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
> <[hidden email]> wrote:
> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
> 500.
> >
> > On line 131 of Rdynload.c, changing
> >
> > #define MAX_NUM_DLLS 100
> >
> >  to
> >
> > #define MAX_NUM_DLLS 500
> >
> >
> > In development of the mlr package, there have been several episodes in
> the
> > past where we have had to break up unit tests because of the "maximum
> > number of DLLs reached" error. This error has been an inconvenience that
> is
> > going to keep happening as the package continues to grow. Is there more
> > than meets the eye with this error or would everything be okay if the
> above
> > line changes? Would that have a larger effect in other parts of R?
> >
> > As R grows, we are likely to see more 'meta-packages' such as the
> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
> at
> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
> set
> > to 100 for a very particular reason than I apologize, but if it is
> possible
> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
> > easier.
> >
> > I understand you are all very busy and thank you for your time.
> >
> >
> > Regards,
> >
> > Steve Bronder
> > Website: stevebronder.com
> > Phone: 412-719-1282
> > Email: [hidden email]
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Jeroen Ooms.
In reply to this post by Henrik Bengtsson-5
On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson
<[hidden email]> wrote:
> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
> packages don't unload their DLLs when they being unloaded themselves.

I am surprised by this. Why does R not do this automatically? What is
the case for keeping the DLL loaded after the package has been
unloaded? What happens if you reload another version of the same
package from a different library after unloading?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

R devel mailing list
It's not always clear when it's safe to remove the DLL.

The main problem that I'm aware of is that native objects with
finalizers might still exist (created by R_RegisterCFinalizer etc).
Even if there are no live references to such objects (which would be
hard to verify), it still wouldn't be safe to unload the DLL until a
full garbage collection has been done.

If the DLL is unloaded, then the function pointer that was registered
now becomes a pointer into the memory where the DLL was, leading to an
almost certain crash when such objects get garbage collected.

A better approach would be to just remove the limit on the number of
DLLs, dynamically expanding the array if/when needed.


On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms <[hidden email]> wrote:

> On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson
> <[hidden email]> wrote:
>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>> packages don't unload their DLLs when they being unloaded themselves.
>
> I am surprised by this. Why does R not do this automatically? What is
> the case for keeping the DLL loaded after the package has been
> unloaded? What happens if you reload another version of the same
> package from a different library after unloading?
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Martin Maechler
In reply to this post by Steve Bronder
>>>>> Steve Bronder <[hidden email]>
>>>>>     on Tue, 20 Dec 2016 01:34:31 -0500 writes:

    > Thanks Henrik this is very helpful! I will try this out on our tests and
    > see if gcDLLs() has a positive effect.

    > mlr currently has tests broken down by learner type such as classification,
    > regression, forecasting, clustering, etc.. There are 83 classifiers alone
    > so even when loading and unloading across learner types we can still hit
    > the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
    > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
    > Bischl to make sure I am representing the issue well.

This came up *here* in May 2015
and then May 2016 ... did you not find it when googling.

Hint:  Use  
       site:stat.ethz.ch MAX_NUM_DLLS
as search string in Google, so it will basically only search the
R mailing list archives

Here's the start of that thread :

  https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html

There was not a clear conclusion back then, notably as
Prof Brian Ripley noted that 100 had already been an increase
and that a large number of loaded DLLs decreases look up speed.

OTOH (I think others have noted that) a large number of DLLs
only penalizes those who *do* load many, and we should probably
increase it.

Your use case of "hyper packages" which load many others
simultaneously is somewhat convincing to me... in so far as the
general feeling is that memory should be cheap and limits should
not be low.

(In spite of Brian Ripleys good reasons against it, I'd still
 aim for a *dynamic*, i.e. automatically increased list here).

Martin Maechler

    > Regards,

    > Steve Bronder
    > Website: stevebronder.com
    > Phone: 412-719-1282
    > Email: [hidden email]


    > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
    > [hidden email]> wrote:

    >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
    >> packages don't unload their DLLs when they being unloaded themselves.
    >> In other words, there may be left-over DLLs just sitting there doing
    >> nothing but occupying space.  You can remove these, using:
    >>
    >> R.utils::gcDLLs()
    >>
    >> Maybe that will help you get through your tests (as long as you're
    >> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
    >> its content and compare to loadedNamespaces() and unregister any
    >> "stray" DLLs that remain after corresponding packages have been
    >> unloaded.
    >>
    >> I think it would be useful if R CMD check would also check that DLLs
    >> are unregistered when a package is unloaded
    >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
    >> course, someone needs to write the code / a patch for this to happen.
    >>
    >> /Henrik
    >>
    >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
    >> <[hidden email]> wrote:
    >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
    >> 500.
    >> >
    >> > On line 131 of Rdynload.c, changing
    >> >
    >> > #define MAX_NUM_DLLS 100
    >> >
    >> >  to
    >> >
    >> > #define MAX_NUM_DLLS 500
    >> >
    >> >
    >> > In development of the mlr package, there have been several episodes in
    >> the
    >> > past where we have had to break up unit tests because of the "maximum
    >> > number of DLLs reached" error. This error has been an inconvenience that
    >> is
    >> > going to keep happening as the package continues to grow. Is there more
    >> > than meets the eye with this error or would everything be okay if the
    >> above
    >> > line changes? Would that have a larger effect in other parts of R?
    >> >
    >> > As R grows, we are likely to see more 'meta-packages' such as the
    >> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
    >> at
    >> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
    >> set
    >> > to 100 for a very particular reason than I apologize, but if it is
    >> possible
    >> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
    >> > easier.
    >> >
    >> > I understand you are all very busy and thank you for your time.
    >> >
    >> >
    >> > Regards,
    >> >
    >> > Steve Bronder
    >> > Website: stevebronder.com
    >> > Phone: 412-719-1282
    >> > Email: [hidden email]
    >> >
    >> >         [[alternative HTML version deleted]]
    >> >
    >> > ______________________________________________
    >> > [hidden email] mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Dirk Eddelbuettel

On 20 December 2016 at 17:40, Martin Maechler wrote:
| >>>>> Steve Bronder <[hidden email]>
| >>>>>     on Tue, 20 Dec 2016 01:34:31 -0500 writes:
|
|     > Thanks Henrik this is very helpful! I will try this out on our tests and
|     > see if gcDLLs() has a positive effect.
|
|     > mlr currently has tests broken down by learner type such as classification,
|     > regression, forecasting, clustering, etc.. There are 83 classifiers alone
|     > so even when loading and unloading across learner types we can still hit
|     > the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
|     > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
|     > Bischl to make sure I am representing the issue well.
|
| This came up *here* in May 2015
| and then May 2016 ... did you not find it when googling.
|
| Hint:  Use  
|        site:stat.ethz.ch MAX_NUM_DLLS
| as search string in Google, so it will basically only search the
| R mailing list archives
|
| Here's the start of that thread :
|
|   https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html
|
| There was not a clear conclusion back then, notably as
| Prof Brian Ripley noted that 100 had already been an increase
| and that a large number of loaded DLLs decreases look up speed.
|
| OTOH (I think others have noted that) a large number of DLLs
| only penalizes those who *do* load many, and we should probably
| increase it.
|
| Your use case of "hyper packages" which load many others
| simultaneously is somewhat convincing to me... in so far as the
| general feeling is that memory should be cheap and limits should
| not be low.
|
| (In spite of Brian Ripleys good reasons against it, I'd still
|  aim for a *dynamic*, i.e. automatically increased list here).

Yes.  Start with 10 or 20, add 10 as needed.  Still fast in the 'small N'
case and no longer a road block for the 'big N' case required by mlr et al.

As a C++ programmer, I am now going to hug my std::vector and quietly retreat.

Dirk

 
| Martin Maechler
|
|     > Regards,
|
|     > Steve Bronder
|     > Website: stevebronder.com
|     > Phone: 412-719-1282
|     > Email: [hidden email]
|
|
|     > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
|     > [hidden email]> wrote:
|
|     >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
|     >> packages don't unload their DLLs when they being unloaded themselves.
|     >> In other words, there may be left-over DLLs just sitting there doing
|     >> nothing but occupying space.  You can remove these, using:
|     >>
|     >> R.utils::gcDLLs()
|     >>
|     >> Maybe that will help you get through your tests (as long as you're
|     >> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
|     >> its content and compare to loadedNamespaces() and unregister any
|     >> "stray" DLLs that remain after corresponding packages have been
|     >> unloaded.
|     >>
|     >> I think it would be useful if R CMD check would also check that DLLs
|     >> are unregistered when a package is unloaded
|     >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
|     >> course, someone needs to write the code / a patch for this to happen.
|     >>
|     >> /Henrik
|     >>
|     >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
|     >> <[hidden email]> wrote:
|     >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
|     >> 500.
|     >> >
|     >> > On line 131 of Rdynload.c, changing
|     >> >
|     >> > #define MAX_NUM_DLLS 100
|     >> >
|     >> >  to
|     >> >
|     >> > #define MAX_NUM_DLLS 500
|     >> >
|     >> >
|     >> > In development of the mlr package, there have been several episodes in
|     >> the
|     >> > past where we have had to break up unit tests because of the "maximum
|     >> > number of DLLs reached" error. This error has been an inconvenience that
|     >> is
|     >> > going to keep happening as the package continues to grow. Is there more
|     >> > than meets the eye with this error or would everything be okay if the
|     >> above
|     >> > line changes? Would that have a larger effect in other parts of R?
|     >> >
|     >> > As R grows, we are likely to see more 'meta-packages' such as the
|     >> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
|     >> at
|     >> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
|     >> set
|     >> > to 100 for a very particular reason than I apologize, but if it is
|     >> possible
|     >> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
|     >> > easier.
|     >> >
|     >> > I understand you are all very busy and thank you for your time.
|     >> >
|     >> >
|     >> > Regards,
|     >> >
|     >> > Steve Bronder
|     >> > Website: stevebronder.com
|     >> > Phone: 412-719-1282
|     >> > Email: [hidden email]
|     >> >
|     >> >         [[alternative HTML version deleted]]
|     >> >
|     >> > ______________________________________________
|     >> > [hidden email] mailing list
|     >> > https://stat.ethz.ch/mailman/listinfo/r-devel
|     >>
|
|     > [[alternative HTML version deleted]]
|
|     > ______________________________________________
|     > [hidden email] mailing list
|     > https://stat.ethz.ch/mailman/listinfo/r-devel
|
| ______________________________________________
| [hidden email] mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Spencer Graves-3
Hi, Dirk:


On 12/20/2016 10:56 AM, Dirk Eddelbuettel wrote:

> On 20 December 2016 at 17:40, Martin Maechler wrote:
> | >>>>> Steve Bronder <[hidden email]>
> | >>>>>     on Tue, 20 Dec 2016 01:34:31 -0500 writes:
> |
> |     > Thanks Henrik this is very helpful! I will try this out on our tests and
> |     > see if gcDLLs() has a positive effect.
> |
> |     > mlr currently has tests broken down by learner type such as classification,
> |     > regression, forecasting, clustering, etc.. There are 83 classifiers alone
> |     > so even when loading and unloading across learner types we can still hit
> |     > the MAX_NUM_DLLS error, meaning we'll have to break them down further (or
> |     > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff and Bernd
> |     > Bischl to make sure I am representing the issue well.
> |
> | This came up *here* in May 2015
> | and then May 2016 ... did you not find it when googling.
> |
> | Hint:  Use
> |        site:stat.ethz.ch MAX_NUM_DLLS
> | as search string in Google, so it will basically only search the
> | R mailing list archives
> |
> | Here's the start of that thread :
> |
> |   https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html
> |
> | There was not a clear conclusion back then, notably as
> | Prof Brian Ripley noted that 100 had already been an increase
> | and that a large number of loaded DLLs decreases look up speed.
> |
> | OTOH (I think others have noted that) a large number of DLLs
> | only penalizes those who *do* load many, and we should probably
> | increase it.
> |
> | Your use case of "hyper packages" which load many others
> | simultaneously is somewhat convincing to me... in so far as the
> | general feeling is that memory should be cheap and limits should
> | not be low.
> |
> | (In spite of Brian Ripleys good reasons against it, I'd still
> |  aim for a *dynamic*, i.e. automatically increased list here).
>
> Yes.  Start with 10 or 20, add 10 as needed.  Still fast in the 'small N'
> case and no longer a road block for the 'big N' case required by mlr et al.
>
> As a C++ programmer, I am now going to hug my std::vector and quietly retreat.


May I humbly request a translation of "std::vector" for people like me
who are not familiar with C++?


I got the following:


 > install.packages('std')
Warning in install.packages :
   package ‘std’ is not available (for R version 3.3.2)


       Thanks,
       Spencer Graves

>
> Dirk
>
>  
> | Martin Maechler
> |
> |     > Regards,
> |
> |     > Steve Bronder
> |     > Website: stevebronder.com
> |     > Phone: 412-719-1282
> |     > Email: [hidden email]
> |
> |
> |     > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
> |     > [hidden email]> wrote:
> |
> |     >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
> |     >> packages don't unload their DLLs when they being unloaded themselves.
> |     >> In other words, there may be left-over DLLs just sitting there doing
> |     >> nothing but occupying space.  You can remove these, using:
> |     >>
> |     >> R.utils::gcDLLs()
> |     >>
> |     >> Maybe that will help you get through your tests (as long as you're
> |     >> unloading packages).  gcDLLs() will look at base::getLoadedDLLs() and
> |     >> its content and compare to loadedNamespaces() and unregister any
> |     >> "stray" DLLs that remain after corresponding packages have been
> |     >> unloaded.
> |     >>
> |     >> I think it would be useful if R CMD check would also check that DLLs
> |     >> are unregistered when a package is unloaded
> |     >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29), but of
> |     >> course, someone needs to write the code / a patch for this to happen.
> |     >>
> |     >> /Henrik
> |     >>
> |     >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
> |     >> <[hidden email]> wrote:
> |     >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in from 100 to
> |     >> 500.
> |     >> >
> |     >> > On line 131 of Rdynload.c, changing
> |     >> >
> |     >> > #define MAX_NUM_DLLS 100
> |     >> >
> |     >> >  to
> |     >> >
> |     >> > #define MAX_NUM_DLLS 500
> |     >> >
> |     >> >
> |     >> > In development of the mlr package, there have been several episodes in
> |     >> the
> |     >> > past where we have had to break up unit tests because of the "maximum
> |     >> > number of DLLs reached" error. This error has been an inconvenience that
> |     >> is
> |     >> > going to keep happening as the package continues to grow. Is there more
> |     >> > than meets the eye with this error or would everything be okay if the
> |     >> above
> |     >> > line changes? Would that have a larger effect in other parts of R?
> |     >> >
> |     >> > As R grows, we are likely to see more 'meta-packages' such as the
> |     >> > Hadley-verse, caret, mlr, etc. need an increasing amount of DLLs loaded
> |     >> at
> |     >> > any point in time to conduct effective unit tests. If  MAX_NUM_DLLS is
> |     >> set
> |     >> > to 100 for a very particular reason than I apologize, but if it is
> |     >> possible
> |     >> > to increase MAX_NUM_DLLS it would at least make the testing at mlr much
> |     >> > easier.
> |     >> >
> |     >> > I understand you are all very busy and thank you for your time.
> |     >> >
> |     >> >
> |     >> > Regards,
> |     >> >
> |     >> > Steve Bronder
> |     >> > Website: stevebronder.com
> |     >> > Phone: 412-719-1282
> |     >> > Email: [hidden email]
> |     >> >
> |     >> >         [[alternative HTML version deleted]]
> |     >> >
> |     >> > ______________________________________________
> |     >> > [hidden email] mailing list
> |     >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> |     >>
> |
> |     > [[alternative HTML version deleted]]
> |
> |     > ______________________________________________
> |     > [hidden email] mailing list
> |     > https://stat.ethz.ch/mailman/listinfo/r-devel
> |
> | ______________________________________________
> | [hidden email] mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Steve Bronder
See inlin
​e​


On Tue, Dec 20, 2016 at 12:14 PM, Spencer Graves <
[hidden email]> wrote:

> Hi, Dirk:
>
>
>
> On 12/20/2016 10:56 AM, Dirk Eddelbuettel wrote:
>
>> On 20 December 2016 at 17:40, Martin Maechler wrote:
>> | >>>>> Steve Bronder <[hidden email]>
>> | >>>>>     on Tue, 20 Dec 2016 01:34:31 -0500 writes:
>> |
>> |     > Thanks Henrik this is very helpful! I will try this out on our
>> tests and
>> |     > see if gcDLLs() has a positive effect.
>> |
>> |     > mlr currently has tests broken down by learner type such as
>> classification,
>> |     > regression, forecasting, clustering, etc.. There are 83
>> classifiers alone
>> |     > so even when loading and unloading across learner types we can
>> still hit
>> |     > the MAX_NUM_DLLS error, meaning we'll have to break them down
>> further (or
>> |     > maybe we can be clever with gcDLLs()?). I'm CC'ing Lars Kotthoff
>> and Bernd
>> |     > Bischl to make sure I am representing the issue well.
>> |
>> | This came up *here* in May 2015
>> | and then May 2016 ... did you not find it when googling.
>
> |
>> | Hint:  Use
>> |        site:stat.ethz.ch MAX_NUM_DLLS
>> | as search string in Google, so it will basically only search the
>> | R mailing list archives
>>
> ​I did not know this and apologize. I starred this email so I can use it
next time I have a question or request. I did find (and left a comment) on
the stackoverflow question in which you left an answer to this question.
http://stackoverflow.com/a/37021455/2269255

> |
>> | Here's the start of that thread :
>> |
>> |
>> ​​
>> ​​
>>  https://stat.ethz.ch/pipermail/r-devel/2016-May/072637.html
>> |
>> | There was not a clear conclusion back then, notably as
>> | Prof Brian Ripley noted that 100 had already been an increase
>> | and that a large number of loaded DLLs decreases look up speed.
>
> |
>> | OTOH (I think others have noted that) a large number of DLLs
>> | only penalizes those who *do* load many, and we should probably
>> | increase it.
>>
> ​Am I correct in understanding that the decrease in lookup speed only
happens when a large number of DLLs are loaded? If so, this is an expected
cost to having many DLLs and one that I, and I would guess other
developers, would be willing to pay to have more DLLs available. If
increasing MAX_NUM_DLLS would increase R's fixed memory footprint a
significant amount then I think that's a reasonable argument against the
increase in MAX_NUM_DLLS. ​


> |
>> | Your use case of "hyper packages" which load many others
>> | simultaneously is somewhat convincing to me... in so far as the
>> | general feeling is that memory should be cheap and limits should
>> | not be low.
>>
> ​It should also be pointed out that even in the case of "hyper packages"
like mlr, this is only an issue during unit testing. I wonder if there is
some middle ground here? Would it be difficult to have a compile flag that
would change the number of MAX_NUM_DLLS when compiling R from source? I
believe this would allow us to increase MAX_NUM_DLLS when testing in Travis
and Jenkins while keeping the same footprint for regular users.​


> |
>> | (In spite of Brian Ripleys good reasons against it, I'd still
>> |  aim for a *dynamic*, i.e. automatically increased list here).
>>
>> Yes.  Start with 10 or 20, add 10 as needed.  Still fast in the 'small N'
>> case and no longer a road block for the 'big N' case required by mlr et
>> al.
>>
> ​This would be nice! Though my concern is the R-core team's time. This is
the best answer, but I don't feel comfortable requesting it because I can't
help with this and do not want to take up R-core's time without a very
significant reason.​

​Unit testing for a meta-package is a particular case, though I think an
important one which will impact R over the long term. The answers from
least to most complex are something like:
1. Do nothing
2. Increase MAX_NUM_DLLS
3. Compiler flag for MAX_NUM_DLLS ( I actually have no reference to how
difficult this would be)
4. Change to dynamic loading
I'm requesting (2) because I think it's a simple short term answer until
someone has time to sit down and work out (4).​

>
>> As a C++ programmer, I am now going to hug my
>> ​​
>> std::vector and quietly retreat.
>>
>
>
> May I humbly request a translation of "std::vector" for people like me who
> are not familiar with C++?
>
>
> I got the following:
>
>
> > install.packages('std')
> Warning in install.packages :
>   package ‘std’ is not available (for R version 3.3.2)
>
>
>       Thanks,
>       Spencer Graves
>
>
>> Dirk
>>
>>   | Martin Maechler
>> |
>> |     > Regards,
>> |
>> |     > Steve Bronder
>> |     > Website: stevebronder.com
>> |     > Phone: 412-719-1282
>> |     > Email: [hidden email]
>> |
>> |
>> |     > On Tue, Dec 20, 2016 at 1:04 AM, Henrik Bengtsson <
>> |     > [hidden email]> wrote:
>> |
>> |     >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because
>> some
>> |     >> packages don't unload their DLLs when they being unloaded
>> themselves.
>> |     >> In other words, there may be left-over DLLs just sitting there
>> doing
>> |     >> nothing but occupying space.  You can remove these, using:
>> |     >>
>> |     >> R.utils::gcDLLs()
>> |     >>
>> |     >> Maybe that will help you get through your tests (as long as
>> you're
>> |     >> unloading packages).  gcDLLs() will look at
>> base::getLoadedDLLs() and
>> |     >> its content and compare to loadedNamespaces() and unregister any
>> |     >> "stray" DLLs that remain after corresponding packages have been
>> |     >> unloaded.
>> |     >>
>> |     >> I think it would be useful if R CMD check would also check that
>> DLLs
>> |     >> are unregistered when a package is unloaded
>> |     >> (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/29),
>> but of
>> |     >> course, someone needs to write the code / a patch for this to
>> happen.
>> |     >>
>> |     >> /Henrik
>> |     >>
>> |     >> On Mon, Dec 19, 2016 at 6:01 PM, Steve Bronder
>> |     >> <[hidden email]> wrote:
>> |     >> > This is a request to increase MAX_NUM_DLLS in Rdynload.c in
>> from 100 to
>> |     >> 500.
>> |     >> >
>> |     >> > On line 131 of Rdynload.c, changing
>> |     >> >
>> |     >> > #define MAX_NUM_DLLS 100
>> |     >> >
>> |     >> >  to
>> |     >> >
>> |     >> > #define MAX_NUM_DLLS 500
>> |     >> >
>> |     >> >
>> |     >> > In development of the mlr package, there have been several
>> episodes in
>> |     >> the
>> |     >> > past where we have had to break up unit tests because of the
>> "maximum
>> |     >> > number of DLLs reached" error. This error has been an
>> inconvenience that
>> |     >> is
>> |     >> > going to keep happening as the package continues to grow. Is
>> there more
>> |     >> > than meets the eye with this error or would everything be okay
>> if the
>> |     >> above
>> |     >> > line changes? Would that have a larger effect in other parts
>> of R?
>> |     >> >
>> |     >> > As R grows, we are likely to see more 'meta-packages' such as
>> the
>> |     >> > Hadley-verse, caret, mlr, etc. need an increasing amount of
>> DLLs loaded
>> |     >> at
>> |     >> > any point in time to conduct effective unit tests. If
>> MAX_NUM_DLLS is
>> |     >> set
>> |     >> > to 100 for a very particular reason than I apologize, but if
>> it is
>> |     >> possible
>> |     >> > to increase MAX_NUM_DLLS it would at least make the testing at
>> mlr much
>> |     >> > easier.
>> |     >> >
>> |     >> > I understand you are all very busy and thank you for your time.
>> |     >> >
>> |     >> >
>> |     >> > Regards,
>> |     >> >
>> |     >> > Steve Bronder
>> |     >> > Website: stevebronder.com
>> |     >> > Phone: 412-719-1282
>> |     >> > Email: [hidden email]
>> |     >> >
>> |     >> >         [[alternative HTML version deleted]]
>> |     >> >
>> |     >> > ______________________________________________
>> |     >> > [hidden email] mailing list
>> |     >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> |     >>
>> |
>> |     > [[alternative HTML version deleted]]
>> |
>> |     > ______________________________________________
>> |     > [hidden email] mailing list
>> |     > https://stat.ethz.ch/mailman/listinfo/r-devel
>> |
>> | ______________________________________________
>> | [hidden email] mailing list
>> | https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-
​ Steve Bronder​

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Henrik Bengtsson-5
In reply to this post by R devel mailing list
On Tue, Dec 20, 2016 at 7:39 AM, Karl Millar <[hidden email]> wrote:

> It's not always clear when it's safe to remove the DLL.
>
> The main problem that I'm aware of is that native objects with
> finalizers might still exist (created by R_RegisterCFinalizer etc).
> Even if there are no live references to such objects (which would be
> hard to verify), it still wouldn't be safe to unload the DLL until a
> full garbage collection has been done.
>
> If the DLL is unloaded, then the function pointer that was registered
> now becomes a pointer into the memory where the DLL was, leading to an
> almost certain crash when such objects get garbage collected.

Very good point.

Does base::gc() perform such a *full* garbage collection and thereby
trigger all remaining finalizers to be called?  In other words, do you
think an explicit call to base::gc() prior to cleaning out left-over
DLLs (e.g. R.utils::gcDLLs()) would be sufficient?

/Henrik

>
> A better approach would be to just remove the limit on the number of
> DLLs, dynamically expanding the array if/when needed.
>
>
> On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms <[hidden email]> wrote:
>> On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson
>> <[hidden email]> wrote:
>>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>>> packages don't unload their DLLs when they being unloaded themselves.
>>
>> I am surprised by this. Why does R not do this automatically? What is
>> the case for keeping the DLL loaded after the package has been
>> unloaded? What happens if you reload another version of the same
>> package from a different library after unloading?
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

R devel mailing list
It does, but you'd still be relying on the R code ensuring that all of
these objects are dead prior to unloading the DLL, otherwise they'll
survive the GC.  Maybe if the package counted how many such objects
exist, it could work out when it's safe to remove the DLL.  I'm not
sure that it can be done automatically.

What could be done is to to keep the DLL loaded, but remove it from
R's table of loaded DLLs.  That way, there's no risk of dangling
function pointers and a new DLL of the same name could be loaded.  You
could still run into issues though as some DLLs assume that the
associated namespace exists.

Currently what I do is to never unload DLLs.  If I need to replace
one, then I just restart R.  It's less convenient, but it's always
correct.


On Wed, Dec 21, 2016 at 9:10 AM, Henrik Bengtsson
<[hidden email]> wrote:

> On Tue, Dec 20, 2016 at 7:39 AM, Karl Millar <[hidden email]> wrote:
>> It's not always clear when it's safe to remove the DLL.
>>
>> The main problem that I'm aware of is that native objects with
>> finalizers might still exist (created by R_RegisterCFinalizer etc).
>> Even if there are no live references to such objects (which would be
>> hard to verify), it still wouldn't be safe to unload the DLL until a
>> full garbage collection has been done.
>>
>> If the DLL is unloaded, then the function pointer that was registered
>> now becomes a pointer into the memory where the DLL was, leading to an
>> almost certain crash when such objects get garbage collected.
>
> Very good point.
>
> Does base::gc() perform such a *full* garbage collection and thereby
> trigger all remaining finalizers to be called?  In other words, do you
> think an explicit call to base::gc() prior to cleaning out left-over
> DLLs (e.g. R.utils::gcDLLs()) would be sufficient?
>
> /Henrik
>
>>
>> A better approach would be to just remove the limit on the number of
>> DLLs, dynamically expanding the array if/when needed.
>>
>>
>> On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms <[hidden email]> wrote:
>>> On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson
>>> <[hidden email]> wrote:
>>>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>>>> packages don't unload their DLLs when they being unloaded themselves.
>>>
>>> I am surprised by this. Why does R not do this automatically? What is
>>> the case for keeping the DLL loaded after the package has been
>>> unloaded? What happens if you reload another version of the same
>>> package from a different library after unloading?
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request: Increasing MAX_NUM_DLLS in Rdynload.c

Dirk Eddelbuettel

On 21 December 2016 at 09:42, Karl Millar via R-devel wrote:
| Currently what I do is to never unload DLLs.  If I need to replace
| one, then I just restart R.  It's less convenient, but it's always
| correct.

Same here. Ever since we built littler in 2006 (!!) I have been doing tests
at the command-line with fresh 'r' processes.  No surprises, no side effects.

Dirk

PS Spencer, if you are still reading, std::vector is describe inter alia here
http://en.cppreference.com/w/cpp/container/vector  My point of bringing it up
was a deeper one because that (really widely used) data structure grows as
needed. No pointers, no malloc, no horror stories you may have heard from C.

--
http://dirk.eddelbuettel.com | @eddelbuettel | [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel