Cluster: Various GCC, how important is consistency?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster: Various GCC, how important is consistency?

PaulJohnson32gmail
On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1.
I have, from time to time, run into problems with various R packages
and some older versions of GCC. I wish we had newer Linux in the
cluster, but with 1000s of nodes running 1000s of jobs, well, they
don't want a restart.

Administrator suggested I try to build with the GCC that is provided
with the nodes, which is gcc-4.4.7.  To my surprise, R-3.3.1 compiled
with that.  After that, I got quite far, many 100s of packages
compiled, but then I hit a snag that RccArmadillo explicitly refuses
to build with anything older than gcc-4.6.  The OpenMx package and
emplik packages also refuse to compile with old gcc

The cluster uses a module system, it is easy enough to swap in various
gcc versions to see what compiles.

I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not
picky, it compiled with gcc-4.4.7.

I worry...

1)  will reliance on various GCC make the packages incompatible with
R, or each other?

I logged out, logged back in, with R 3.3.1 I can run

library(RcppArmadillo)
library(Rcpp)

with no errors so far. But I'm not stress testing it much.

I should rebuild everything?

I expect that if I were to use gcc-6 on one package, it would not be
compatible with binaries built with 4.4.7.  But is there a zone of
tolerance allowing 4.4.7 and 4.9 packages to coexist?

2) If I build with non-default GCC, are all of the R users going to
hit trouble if they don't have the same GCC I use?  Unless I make some
extraordinary effort, they are getting GCC 4.4.7. If they try to
install a package, they are getting that GCC, not the one I use to
build RcppArmadillo or the other trouble cases (or everything, if you
say I need to go back and rebuild).

>From an administrative point of view, should I tie R-3.3.1 to a
particular version of GCC? I think I could learn how to do that.

On the cluster, they use the module framework. There are about 50
versions of GCC.  It is easy enough ask for a newer one:

$ module load gcc/4.9.2

It puts the gcc 4.9.2 binaries and shared libraries at the front of the PATHs.

pj


--
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write me directly, address me at pauljohn at ku.edu.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Cluster: Various GCC, how important is consistency?

Simon Urbanek
There are many issues with different gcc versions, but they can at least be minimized by using static linking, i.e. you should at the very least use -static-libstdc++ -static-libgcc to make sure you don't mix runtime versions. We run into the same problem since C++11 compilers are rare on production machines, but as long as you can isolate the packages away from the dynamically loaded code it often works since R only works at symbol level as long as you have a self-contained binary. The only other thing to worry about are ABI changes, but unless you use Fortran they tend to be compatible enough.

Cheers,
Simon


> On Oct 17, 2016, at 7:44 PM, Paul Johnson <[hidden email]> wrote:
>
> On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1.
> I have, from time to time, run into problems with various R packages
> and some older versions of GCC. I wish we had newer Linux in the
> cluster, but with 1000s of nodes running 1000s of jobs, well, they
> don't want a restart.
>
> Administrator suggested I try to build with the GCC that is provided
> with the nodes, which is gcc-4.4.7.  To my surprise, R-3.3.1 compiled
> with that.  After that, I got quite far, many 100s of packages
> compiled, but then I hit a snag that RccArmadillo explicitly refuses
> to build with anything older than gcc-4.6.  The OpenMx package and
> emplik packages also refuse to compile with old gcc
>
> The cluster uses a module system, it is easy enough to swap in various
> gcc versions to see what compiles.
>
> I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not
> picky, it compiled with gcc-4.4.7.
>
> I worry...
>
> 1)  will reliance on various GCC make the packages incompatible with
> R, or each other?
>
> I logged out, logged back in, with R 3.3.1 I can run
>
> library(RcppArmadillo)
> library(Rcpp)
>
> with no errors so far. But I'm not stress testing it much.
>
> I should rebuild everything?
>
> I expect that if I were to use gcc-6 on one package, it would not be
> compatible with binaries built with 4.4.7.  But is there a zone of
> tolerance allowing 4.4.7 and 4.9 packages to coexist?
>
> 2) If I build with non-default GCC, are all of the R users going to
> hit trouble if they don't have the same GCC I use?  Unless I make some
> extraordinary effort, they are getting GCC 4.4.7. If they try to
> install a package, they are getting that GCC, not the one I use to
> build RcppArmadillo or the other trouble cases (or everything, if you
> say I need to go back and rebuild).
>
>> From an administrative point of view, should I tie R-3.3.1 to a
> particular version of GCC? I think I could learn how to do that.
>
> On the cluster, they use the module framework. There are about 50
> versions of GCC.  It is easy enough ask for a newer one:
>
> $ module load gcc/4.9.2
>
> It puts the gcc 4.9.2 binaries and shared libraries at the front of the PATHs.
>
> pj
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write me directly, address me at pauljohn at ku.edu.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Cluster: Various GCC, how important is consistency?

Gabriel Becker
This absolutely causes it's own problems (and they may be bad enough that
you shouldnt do it) but you can also install an older version of
rcpparmadillo. My switchr package makes this more convenient from within r
but grabbing tarballs from the crank Web archive also works  (in fact
that's what switchr will do in this case).

This, of course will never be more than a stop gap. Eventually, sadly,
you'll likely need a newer operating system. We have the same problems on
our cluster.

Best of luck,
~G

On Oct 17, 2016 6:16 PM, "Simon Urbanek" <[hidden email]>
wrote:

> There are many issues with different gcc versions, but they can at least
> be minimized by using static linking, i.e. you should at the very least use
> -static-libstdc++ -static-libgcc to make sure you don't mix runtime
> versions. We run into the same problem since C++11 compilers are rare on
> production machines, but as long as you can isolate the packages away from
> the dynamically loaded code it often works since R only works at symbol
> level as long as you have a self-contained binary. The only other thing to
> worry about are ABI changes, but unless you use Fortran they tend to be
> compatible enough.
>
> Cheers,
> Simon
>
>
> > On Oct 17, 2016, at 7:44 PM, Paul Johnson <[hidden email]> wrote:
> >
> > On a cluster that is based on RedHat 6.2, we are updating to R-3.3.1.
> > I have, from time to time, run into problems with various R packages
> > and some older versions of GCC. I wish we had newer Linux in the
> > cluster, but with 1000s of nodes running 1000s of jobs, well, they
> > don't want a restart.
> >
> > Administrator suggested I try to build with the GCC that is provided
> > with the nodes, which is gcc-4.4.7.  To my surprise, R-3.3.1 compiled
> > with that.  After that, I got quite far, many 100s of packages
> > compiled, but then I hit a snag that RccArmadillo explicitly refuses
> > to build with anything older than gcc-4.6.  The OpenMx package and
> > emplik packages also refuse to compile with old gcc
> >
> > The cluster uses a module system, it is easy enough to swap in various
> > gcc versions to see what compiles.
> >
> > I did succeed compiling RcppArmadillo with gcc 4.9.2. But Rcpp is not
> > picky, it compiled with gcc-4.4.7.
> >
> > I worry...
> >
> > 1)  will reliance on various GCC make the packages incompatible with
> > R, or each other?
> >
> > I logged out, logged back in, with R 3.3.1 I can run
> >
> > library(RcppArmadillo)
> > library(Rcpp)
> >
> > with no errors so far. But I'm not stress testing it much.
> >
> > I should rebuild everything?
> >
> > I expect that if I were to use gcc-6 on one package, it would not be
> > compatible with binaries built with 4.4.7.  But is there a zone of
> > tolerance allowing 4.4.7 and 4.9 packages to coexist?
> >
> > 2) If I build with non-default GCC, are all of the R users going to
> > hit trouble if they don't have the same GCC I use?  Unless I make some
> > extraordinary effort, they are getting GCC 4.4.7. If they try to
> > install a package, they are getting that GCC, not the one I use to
> > build RcppArmadillo or the other trouble cases (or everything, if you
> > say I need to go back and rebuild).
> >
> >> From an administrative point of view, should I tie R-3.3.1 to a
> > particular version of GCC? I think I could learn how to do that.
> >
> > On the cluster, they use the module framework. There are about 50
> > versions of GCC.  It is easy enough ask for a newer one:
> >
> > $ module load gcc/4.9.2
> >
> > It puts the gcc 4.9.2 binaries and shared libraries at the front of the
> PATHs.
> >
> > pj
> >
> >
> > --
> > Paul E. Johnson   http://pj.freefaculty.org
> > Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
> >
> > To write me directly, address me at pauljohn at ku.edu.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Cluster: Various GCC, how important is consistency?

Jeroen Ooms.
In reply to this post by PaulJohnson32gmail
On Tue, Oct 18, 2016 at 1:44 AM, Paul Johnson <[hidden email]> wrote:
>
> Administrator suggested I try to build with the GCC that is provided
> with the nodes, which is gcc-4.4.7.

Redhat provides an alternative compiler (gcc 5.3 based) in one of it's
opt-in repositories called "redhat developer toolkit" (RDT). In CentOS
you install it as follows:

  yum install -y centos-release-scl
  yum install -y devtoolset-4-gcc-c++

This compiler is specifically designed to be used alongside the EL6
stock gcc 4.4.7. It includes a simple 'enable' script which will put
RDT gcc and g++ in front of your PATH and LD_LIBRARY_PATH and so on.

So what I do on CentOS is install R from EPEL (built with stock gcc
4.4.7) and whenever I need to install an R package that uses e.g.
CXX11, simply start an R shell using the RDT compilers:

   source /opt/rh/devtoolset-4/enable
   R

>From what I have been able to test, this works pretty well (though I
am not a regular EL user). But I was able to build R packages that use
C++11 (such as feather) and once installed, these packages can be used
even in a regular R session (without RDT enabled).

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Cluster: Various GCC, how important is consistency?

PaulJohnson32gmail
Dear Jeroen

Did you  rebuild R-3.3.1 and all of the packages with GCC-5.3 in order
to make this work?

The part that worries me is that the shared libraries won't be
consistent, with various versions of GCC in play.

On Tue, Oct 18, 2016 at 5:55 AM, Jeroen Ooms <[hidden email]> wrote:

> On Tue, Oct 18, 2016 at 1:44 AM, Paul Johnson <[hidden email]> wrote:
>>
>> Administrator suggested I try to build with the GCC that is provided
>> with the nodes, which is gcc-4.4.7.
>
> Redhat provides an alternative compiler (gcc 5.3 based) in one of it's
> opt-in repositories called "redhat developer toolkit" (RDT). In CentOS
> you install it as follows:
>
>   yum install -y centos-release-scl
>   yum install -y devtoolset-4-gcc-c++
>
> This compiler is specifically designed to be used alongside the EL6
> stock gcc 4.4.7. It includes a simple 'enable' script which will put
> RDT gcc and g++ in front of your PATH and LD_LIBRARY_PATH and so on.
>
> So what I do on CentOS is install R from EPEL (built with stock gcc
> 4.4.7) and whenever I need to install an R package that uses e.g.
> CXX11, simply start an R shell using the RDT compilers:
>
>    source /opt/rh/devtoolset-4/enable
>    R
>
> From what I have been able to test, this works pretty well (though I
> am not a regular EL user). But I was able to build R packages that use
> C++11 (such as feather) and once installed, these packages can be used
> even in a regular R session (without RDT enabled).



--
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

I only use this account for email list memberships. To write directly,
address me at pauljohn at ku.edu.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel