Portability and Memory Issues for R-package

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Portability and Memory Issues for R-package

KNygren
I have an upcoming JASA paper with an iid sampling algorithm for Bayesian Generalized Linear models (e.g., Logit, Poisson Regression, and Conditional Logit models with multivariate normal priors). At this point, I have implemented the algorithms in C and hope to make the functions and corresponding source code available through an R package.   I have successfully created the code necessary to create and install a package with most of the functions on my local machine (using R CMD check,R CMD build, and R CMD INSTALL).  As my code makes extensive use of the GSL matrix library, however, I have some questions regarding portability of my package. I am also running into some memory issues when making repeated calls to my functions which I would hope to be able to fix before making a formal distribution of the package. More specifically, the issues are the following:

I. Portability-

Since I make extensive use of the gsl library in my C code, I have the gsl library installed (within the MinGw directory so it is included in the path) on my local machine. Within the package, I am then including a Makevars file with the following code in order to link to the gsl library:

PKG_LIBS=-lgsl -lgslcblas

I also know that there is an R package (gsl) making use of some gsl functions which contains a Makevars.win file with the following code:

PKG_LIBS=-LF:/MinGW/usr/local/lib -lgsl -lgslcblas
# CPPFLAGS=-I$(R_HOME)/include -IF:/MinGW/usr/local/include
PKG_CPPFLAGS=-IF:/MinGW/usr/local/include

For my package to install properly on other machines, however, I take it they would have to have the gsl library files already installed in the proper location (or am I mistaken here?).  In order to make it fully portable on other machines, it thus seems like I would need to either include instructions for how to first install the gsl library prior to installation (which would have to be platform specific), or to somehow have the gsl library files installed during the R package installation. Is the latter even possible? If so, how could it be done (the key files are likely the two library files)?  I believe the gsl package requires the user to have the gsl library preinstalled.  

I guess long-term, an option is for me to rework my C code to eliminate the dependence on  the gsl library. This could, however, be a time consuming effort. In the meantime would it be possible to contribute the package with the existing dependence (as I think is the case for the gsl library).

II.  Memory Issue-

The functions in my package are generally fast and seem to work well if I make a limited number of calls to them from my R code. If I try to make use of them as part of an R MCMC implementation (say updating each Gibbs block 10,000 times in an R loop), I run into memory issues.  Despite the fact that my underlying C code frees memory to all pointers, it does not seem like windows recognizes that the memory has been freed.  This is apparent as the Mem Usage for RGUI.exe in the windows task manager keeps growing throughout the loop and the code slows down and eventually makes virtually no progress. I have noticed similar issues in the past when calling Winbugs repeatedly using Gelmans functions, so it is likely not an issue that is coming just from my code.
I suspect that the memory issues could have something to do with the fact that my C code makes repeated use of the gsl_matrix_alloc and gsl_matrix_free functions rather than the R_alloc function (I suspect that the memory is not Garbage collected).   I searched the web and found the following suggestion from Bryan Gouch in response to a similar question posted on the gsl discussion forum.
"If you want to return an R object containing a gsl_matrix which can be garbage collected then you could use a C++ wrapper, as the C++ interface in R allows the use of separate constructors and destructors. "  
Would this be a possible solution?  If so, how can I find information on how to write such wrapper functions that will work for gsl matrices? I must admit that I am not familiar with how the use of separate constructors and destructors would work.  If that is not the solution, would anyone have any other ideas as to how I can solve the memory issues.
Kjell Nygren

Kjell Nygren,  Ph.D.
Director Pricing and Advanced Analytics
Statistical Services
IMS Health®
960 Harvest Drive, Building A
Blue Bell, PA 19422 USA
voice: 610.832.5586 *  fax: 610.832.5850
email: <mailto:[hidden email]>      
www.imshealth.com
 
The information contained in this communication is confident...{{dropped}}


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Portability and Memory Issues for R-package

KNygren
I was able to get the memory issues resolved, so no need to post a response in that regards.   When it comes to the portability issues, I would still like to understand how to best deal with it in regards to the gsl library.

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]]On Behalf Of Nygren, Kjell (Union
Meeting)
Sent: Sunday, December 25, 2005 2:35 PM
To: [hidden email]
Subject: [Rd] Portability and Memory Issues for R-package


I have an upcoming JASA paper with an iid sampling algorithm for Bayesian Generalized Linear models (e.g., Logit, Poisson Regression, and Conditional Logit models with multivariate normal priors). At this point, I have implemented the algorithms in C and hope to make the functions and corresponding source code available through an R package.   I have successfully created the code necessary to create and install a package with most of the functions on my local machine (using R CMD check,R CMD build, and R CMD INSTALL).  As my code makes extensive use of the GSL matrix library, however, I have some questions regarding portability of my package. I am also running into some memory issues when making repeated calls to my functions which I would hope to be able to fix before making a formal distribution of the package. More specifically, the issues are the following:

I. Portability-

Since I make extensive use of the gsl library in my C code, I have the gsl library installed (within the MinGw directory so it is included in the path) on my local machine. Within the package, I am then including a Makevars file with the following code in order to link to the gsl library:

PKG_LIBS=-lgsl -lgslcblas

I also know that there is an R package (gsl) making use of some gsl functions which contains a Makevars.win file with the following code:

PKG_LIBS=-LF:/MinGW/usr/local/lib -lgsl -lgslcblas
# CPPFLAGS=-I$(R_HOME)/include -IF:/MinGW/usr/local/include
PKG_CPPFLAGS=-IF:/MinGW/usr/local/include

For my package to install properly on other machines, however, I take it they would have to have the gsl library files already installed in the proper location (or am I mistaken here?).  In order to make it fully portable on other machines, it thus seems like I would need to either include instructions for how to first install the gsl library prior to installation (which would have to be platform specific), or to somehow have the gsl library files installed during the R package installation. Is the latter even possible? If so, how could it be done (the key files are likely the two library files)?  I believe the gsl package requires the user to have the gsl library preinstalled.  

I guess long-term, an option is for me to rework my C code to eliminate the dependence on  the gsl library. This could, however, be a time consuming effort. In the meantime would it be possible to contribute the package with the existing dependence (as I think is the case for the gsl library).

II.  Memory Issue-

The functions in my package are generally fast and seem to work well if I make a limited number of calls to them from my R code. If I try to make use of them as part of an R MCMC implementation (say updating each Gibbs block 10,000 times in an R loop), I run into memory issues.  Despite the fact that my underlying C code frees memory to all pointers, it does not seem like windows recognizes that the memory has been freed.  This is apparent as the Mem Usage for RGUI.exe in the windows task manager keeps growing throughout the loop and the code slows down and eventually makes virtually no progress. I have noticed similar issues in the past when calling Winbugs repeatedly using Gelmans functions, so it is likely not an issue that is coming just from my code.
I suspect that the memory issues could have something to do with the fact that my C code makes repeated use of the gsl_matrix_alloc and gsl_matrix_free functions rather than the R_alloc function (I suspect that the memory is not Garbage collected).   I searched the web and found the following suggestion from Bryan Gouch in response to a similar question posted on the gsl discussion forum.
"If you want to return an R object containing a gsl_matrix which can be garbage collected then you could use a C++ wrapper, as the C++ interface in R allows the use of separate constructors and destructors. "  
Would this be a possible solution?  If so, how can I find information on how to write such wrapper functions that will work for gsl matrices? I must admit that I am not familiar with how the use of separate constructors and destructors would work.  If that is not the solution, would anyone have any other ideas as to how I can solve the memory issues.
Kjell Nygren

Kjell Nygren,  Ph.D.
Director Pricing and Advanced Analytics
Statistical Services
IMS Health®
960 Harvest Drive, Building A
Blue Bell, PA 19422 USA
voice: 610.832.5586 *  fax: 610.832.5850
email: <mailto:[hidden email]>      
www.imshealth.com
 
The information contained in this communication is confident...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Portability and Memory Issues for R-package

Duncan Murdoch
In reply to this post by KNygren
On 12/25/2005 2:35 PM, [hidden email] wrote:
> I have an upcoming JASA paper with an iid sampling algorithm for Bayesian Generalized Linear models (e.g., Logit, Poisson Regression, and Conditional Logit models with multivariate normal priors). At this point, I have implemented the algorithms in C and hope to make the functions and corresponding source code available through an R package.   I have successfully created the code necessary to create and install a package with most of the functions on my local machine (using R CMD check,R CMD build, and R CMD INSTALL).  As my code makes extensive use of the GSL matrix library, however, I have some questions regarding portability of my package. I am also running into some memory issues when making repeated calls to my functions which I would hope to be able to fix before making a formal distribution of the package. More specifically, the issues are the following:
>
> I. Portability-
>
> Since I make extensive use of the gsl library in my C code, I have the gsl library installed (within the MinGw directory so it is included in the path) on my local machine. Within the package, I am then including a Makevars file with the following code in order to link to the gsl library:
>
> PKG_LIBS=-lgsl -lgslcblas
>
> I also know that there is an R package (gsl) making use of some gsl functions which contains a Makevars.win file with the following code:

This package requires manual handling to build for Windows, and probably
for some other platforms if they don't come with gsl by default.

My recommendation would be to work with its author (Robin Hankin, see
the DESCRIPTION file for contact information) to add whatever functions
are not already there, and then just make your package depend on the R
package, rather than on the GSL library directly.

This will mean that all the manual work that has been done to get gsl to
build will not need to be repeated by anyone who wants to install your
package.

Duncan Murdoch

> PKG_LIBS=-LF:/MinGW/usr/local/lib -lgsl -lgslcblas
> # CPPFLAGS=-I$(R_HOME)/include -IF:/MinGW/usr/local/include
> PKG_CPPFLAGS=-IF:/MinGW/usr/local/include
>
> For my package to install properly on other machines, however, I take it they would have to have the gsl library files already installed in the proper location (or am I mistaken here?).  In order to make it fully portable on other machines, it thus seems like I would need to either include instructions for how to first install the gsl library prior to installation (which would have to be platform specific), or to somehow have the gsl library files installed during the R package installation. Is the latter even possible? If so, how could it be done (the key files are likely the two library files)?  I believe the gsl package requires the user to have the gsl library preinstalled.  
>
> I guess long-term, an option is for me to rework my C code to eliminate the dependence on  the gsl library. This could, however, be a time consuming effort. In the meantime would it be possible to contribute the package with the existing dependence (as I think is the case for the gsl library).
>
> II.  Memory Issue-
>
> The functions in my package are generally fast and seem to work well if I make a limited number of calls to them from my R code. If I try to make use of them as part of an R MCMC implementation (say updating each Gibbs block 10,000 times in an R loop), I run into memory issues.  Despite the fact that my underlying C code frees memory to all pointers, it does not seem like windows recognizes that the memory has been freed.  This is apparent as the Mem Usage for RGUI.exe in the windows task manager keeps growing throughout the loop and the code slows down and eventually makes virtually no progress. I have noticed similar issues in the past when calling Winbugs repeatedly using Gelmans functions, so it is likely not an issue that is coming just from my code.
> I suspect that the memory issues could have something to do with the fact that my C code makes repeated use of the gsl_matrix_alloc and gsl_matrix_free functions rather than the R_alloc function (I suspect that the memory is not Garbage collected).   I searched the web and found the following suggestion from Bryan Gouch in response to a similar question posted on the gsl discussion forum.
> "If you want to return an R object containing a gsl_matrix which can be garbage collected then you could use a C++ wrapper, as the C++ interface in R allows the use of separate constructors and destructors. "  
> Would this be a possible solution?  If so, how can I find information on how to write such wrapper functions that will work for gsl matrices? I must admit that I am not familiar with how the use of separate constructors and destructors would work.  If that is not the solution, would anyone have any other ideas as to how I can solve the memory issues.
> Kjell Nygren
>
> Kjell Nygren,  Ph.D.
> Director Pricing and Advanced Analytics
> Statistical Services
> IMS Health®
> 960 Harvest Drive, Building A
> Blue Bell, PA 19422 USA
> voice: 610.832.5586 *  fax: 610.832.5850
> email: <mailto:[hidden email]>      
> www.imshealth.com
>  
> The information contained in this communication is confident...{{dropped}}
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Portability and Memory Issues for R-package

KNygren
In reply to this post by KNygren
My guess is that the key step for a user to be able to use my package still would be to install the gsl library first so it can be accessed during the build. I am not sure if Robin has a set of instructions for platform specific installation of his package (which would likely include the pre-installation of the gsl library). I may follow up with him in regards to this and to see if it makes sense to link to his library. I will also look into the possibility of adding a configure script (as per Jan's suggestion). I know that the use of the gsl library is not ideal, and may eventually try to replace the gsl dependent code, perhaps by making use of the R matrix package (though I don't know if it has all the features I am currently using).    


Kjell Nygren
 
> I. Portability-
>
> Since I make extensive use of the gsl library in my C code, I have the gsl library installed (within the MinGw directory so it is included in the path) on my local machine. Within the package, I am then including a Makevars file with the following code in order to link to the gsl library:
>
> PKG_LIBS=-lgsl -lgslcblas
>
> I also know that there is an R package (gsl) making use of some gsl functions which contains a Makevars.win file with the following code:

This package requires manual handling to build for Windows, and probably
for some other platforms if they don't come with gsl by default.

My recommendation would be to work with its author (Robin Hankin, see
the DESCRIPTION file for contact information) to add whatever functions
are not already there, and then just make your package depend on the R
package, rather than on the GSL library directly.

This will mean that all the manual work that has been done to get gsl to
build will not need to be repeated by anyone who wants to install your
package.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Portability and Memory Issues for R-package

Duncan Murdoch
On 12/27/2005 3:44 PM, [hidden email] wrote:
> My guess is that the key step for a user to be able to use my package still would be to install the gsl library first so it can be accessed during the build. I am not sure if Robin has a set of instructions for platform specific installation of his package (which would likely include the pre-installation of the gsl library).

This is not necessary on Windows, where most users install binary builds
of packages, because Brian Ripley has done the work to put together a
binary build that includes the necessary GSL routines.  I would expect
that if you require users to install GSL and compile your package
themselves, you'll get almost no Windows users.  I don't know what is
involved in installing the package on other platforms.

Duncan Murdoch

> I may follow up with him in regards to this and to see if it makes sense to link to his library. I will also look into the possibility of adding a configure script (as per Jan's suggestion). I know that the use of the gsl library is not ideal, and may eventually try to replace the gsl dependent code, perhaps by making use of the R matrix package (though I don't know if it has all the features I am currently using).    
>
>
> Kjell Nygren
>  
>
>>I. Portability-
>>
>>Since I make extensive use of the gsl library in my C code, I have the gsl library installed (within the MinGw directory so it is included in the path) on my local machine. Within the package, I am then including a Makevars file with the following code in order to link to the gsl library:
>>
>>PKG_LIBS=-lgsl -lgslcblas
>>
>>I also know that there is an R package (gsl) making use of some gsl functions which contains a Makevars.win file with the following code:
>
>
> This package requires manual handling to build for Windows, and probably
> for some other platforms if they don't come with gsl by default.
>
> My recommendation would be to work with its author (Robin Hankin, see
> the DESCRIPTION file for contact information) to add whatever functions
> are not already there, and then just make your package depend on the R
> package, rather than on the GSL library directly.
>
> This will mean that all the manual work that has been done to get gsl to
> build will not need to be repeated by anyone who wants to install your
> package.
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Portability and Memory Issues for R-package

KNygren
In reply to this post by KNygren
Not getting users was one of my main concern.  So let me make sure I understand the suggestions correctly.

A. I should check if the GSL routines I make use of are part of Brian's binary build.  If not, I should look into having the required routines added to that build (going through Robin (or perhaps Brian?)).

B. If the required routines are included in the binary build for the GSL package, I can then link my package to the gsl packages and it should work fine on windows for any user who has done the installation of the gsl package. I take it the binary build also eliminates the need for each user to do the manual handling required to build on windows?  

Kjell Nygren      

-----Original Message-----
From: Duncan Murdoch [mailto:[hidden email]]
Sent: Tuesday, December 27, 2005 4:58 PM
To: Nygren, Kjell (Union Meeting)
Cc: [hidden email]
Subject: Re: [Rd] Portability and Memory Issues for R-package


On 12/27/2005 3:44 PM, [hidden email] wrote:
> My guess is that the key step for a user to be able to use my package still would be to install the gsl library first so it can be accessed during the build. I am not sure if Robin has a set of instructions for platform specific installation of his package (which would likely include the pre-installation of the gsl library).

This is not necessary on Windows, where most users install binary builds
of packages, because Brian Ripley has done the work to put together a
binary build that includes the necessary GSL routines.  I would expect
that if you require users to install GSL and compile your package
themselves, you'll get almost no Windows users.  I don't know what is
involved in installing the package on other platforms.

Duncan Murdoch

> I may follow up with him in regards to this and to see if it makes sense to link to his library. I will also look into the possibility of adding a configure script (as per Jan's suggestion). I know that the use of the gsl library is not ideal, and may eventually try to replace the gsl dependent code, perhaps by making use of the R matrix package (though I don't know if it has all the features I am currently using).    
>
>
> Kjell Nygren
>  
>
>>I. Portability-
>>
>>Since I make extensive use of the gsl library in my C code, I have the gsl library installed (within the MinGw directory so it is included in the path) on my local machine. Within the package, I am then including a Makevars file with the following code in order to link to the gsl library:
>>
>>PKG_LIBS=-lgsl -lgslcblas
>>
>>I also know that there is an R package (gsl) making use of some gsl functions which contains a Makevars.win file with the following code:
>
>
> This package requires manual handling to build for Windows, and probably
> for some other platforms if they don't come with gsl by default.
>
> My recommendation would be to work with its author (Robin Hankin, see
> the DESCRIPTION file for contact information) to add whatever functions
> are not already there, and then just make your package depend on the R
> package, rather than on the GSL library directly.
>
> This will mean that all the manual work that has been done to get gsl to
> build will not need to be repeated by anyone who wants to install your
> package.
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel