ATLAS threaded 64 bit Opteron build for R: need -fPIC

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ATLAS threaded 64 bit Opteron build for R: need -fPIC

Amit Aronovitch
Hi,

 Sorry for sending such a late reply, and for being abit OT.

  I've been trying to compile 64 bit ATLAS for numpy
(http://numeric.scipy.org/ ),
and so far this thread is the most useful one I could google up - thanks!.
  I encountered similiar problems, and so far could not get a .a
linkable to numpy
(comparing to your post - it seems I might have forgotten to add the
-fPIC for the
F77FLAGS or MMFLAGS).

 Also, I'm having trouble with the ATLAS lapack. To get a usable lib,
one has to
merge it with a full lapack implementation (as described in the ATLAS
errata).
 However, I'm using RHEL4, and their installed liblapack.a seems to have
been compiled
without -fPIC, so the merged library is unlinkable to numpy's .so. Is
there a way to use Redhat's
installed liblapack.so?

 Few questions about your compiler flags:

1) Is there a reason to compile with -O rather than -O3?
 (did you try and encounter some problem, or found no major performance
difference)
2) I see you use -mfpmath=387 - does this work better than sse2 (which
seems to be
 the default)? How about the "sse,387" option - should I try that?
 
Martin Maechler wrote:

>>>>>/ "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk <https://www.stat.math.ethz.ch/mailman/listinfo/r-devel>>
/>>>>>/  >>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>

>>>>>>     on 26 Feb 2004 15:44:16 +0100 writes:
>
>    PD> Douglas Bates <bates at stat.wisc.edu> writes:
>    >> Have you tried configuring R with Goto's BLAS
>    >> http://www.cs.utexas.edu/users/kgoto/
>    >>
>    >> I haven't worked with Opteron or Athlon64 computers but I understand
>    >> that Goto's BLAS are very effective on those machines.  Furthermore
>    >> Goto's BLAS are (only) available as .so libraries so you don't need to
>    >> mess with creating the .so version.
>
>    PD> I tried it, yes. Somewhat to my surprise, it seemed to be not quite as
>    PD> fast as the threaded ATLAS, but I wasn't very systematic about the
>    PD> benchmarking.
>
>    PD> (and the Goto items have license issues, which get in the way for
>    PD> binary distributions.)
>
>Thanks a lot, Peter, Brian, Doug, for your feedbacks!
>In the mean time, I have three running versions of R(-devel) on
>the 64-Opteron
>- "plain"
>- linked against threaded GOTO
>- linked against threaded (static) ATLAS  (using -fPIC for compilation;
>   "large" Rlapack)
>and I find that GOTO is faster than ATLAS
>consistently (between ~ 5-20%) for several tests
>(square matrices; %*% and solve).
>ATLAS is still an order of magnitude faster than "plain" for
>3000x3000 matrices.
>
>Here are somewhat repeatable "ATLAS for R" build instructions:
>
> 1. get ATLAS source; unpack
> 2. make : use defaults and "express" installation
> 3. Before "make install ...", edit the  Make.<ARCHITECTURE> file:
>    add "-fPIC" to three places, namely  F77FLAGS, CCFLAG0, and MMFLAGS:
>    which in case of the "threaded Opteron" architecture, leads to
>    the three new lines
>       F77FLAGS = -fPIC -fomit-frame-pointer -O -m64
>
> CCFLAG0 = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
>
> MMFLAGS = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
>    in the file   Make.Linux_HAMMER64SSE2_2
>
> 4. make install arch=Linux_HAMMER64SSE2_2
>
> 5. Sym.link the ATLAS libraries into /usr/local/lib:
>
>    cd /usr/local/lib
>    ln -s <ATLAS_build_dir>/lib/Linux_HAMMER64SSE2_2/lib* .
>
> 6. (needed for runtime!):
>    Use environment variable LD_LIBRARY_PATH=/usr/local/lib
>
>
>Note that I haven't built *.so (shared) libraries yet.
 /

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ATLAS threaded 64 bit Opteron build for R: need -fPIC

Brian Ripley
On Fri, 10 Feb 2006, Amit Aronovitch wrote:

You set the reply address to Martin Maechler!  That's antisocial.

> Hi,
>
> Sorry for sending such a late reply, and for being abit OT.
>
>  I've been trying to compile 64 bit ATLAS for numpy
> (http://numeric.scipy.org/ ), and so far this thread is the most useful
> one I could google up - thanks!.
>  I encountered similiar problems, and so far could not get a .a linkable
> to numpy (comparing to your post - it seems I might have forgotten to
> add the -fPIC for the F77FLAGS or MMFLAGS).

Yes, that _is_ in the R-admin manual.  I guess you have not read that - it
describes how to install R.  You can get it in the R tarball from

ftp://ftp.stat.math.ethz.ch/Software/R/R-devel.tar.bz2


> Also, I'm having trouble with the ATLAS lapack. To get a usable lib, one
> has to merge it with a full lapack implementation (as described in the
> ATLAS errata). However, I'm using RHEL4, and their installed liblapack.a
> seems to have been compiled without -fPIC, so the merged library is
> unlinkable to numpy's .so. Is there a way to use Redhat's installed
> liblapack.so?

No, nor should you want to.  If RHEL4 is like FC3/4 watch out, as RH have
managed to get BLAS routines in liblapack and not liblas, and use
incorrect patches to LAPACK 3.0.  (Again, see the latest R-admin manual.)

> Few questions about your compiler flags:
>
> 1) Is there a reason to compile with -O rather than -O3?
> (did you try and encounter some problem, or found no major performance
> difference)

ATLAS chose that.  Since the real work is done by hand-tuned assembler
code it should not matter.

> 2) I see you use -mfpmath=387 - does this work better than sse2 (which
> seems to be
> the default)? How about the "sse,387" option - should I try that?

Depends on your ATLAS version.  Again, ATLAS chose those.

As it happens, I have been trying to build ATLAS on my new dual Opteron
box this morning.  The latest devel version (3.7.11) does not build, as at
some point it says it expects the GNU x86-32 assembler.  If it did it
would use SSE3 and so be faster.

Both 3.6.0 and 3.7.11 fail because my machine is too fast, and I had to
increase the number of replications (1000) in make/Make.{mv,r1}tune and in
tune/blas/level1/*.c.  Even then I do not entirely trust the results (and
the two versions report different L1 caches sizes ...).

I got pretty exasperated with this (it needed about ten builds to get one
that succeeded).  Both ACML and the Goto BLAS work well out of the box on
Opterons, but do have licence issues. (Again, see the R-admin manual for
details.)


> Martin Maechler wrote:
>
>>>>>> / "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk <https://www.stat.math.ethz.ch/mailman/listinfo/r-devel>>
> />>>>>/  >>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>>     on 26 Feb 2004 15:44:16 +0100 writes:
>>
>>    PD> Douglas Bates <bates at stat.wisc.edu> writes:
>>   >> Have you tried configuring R with Goto's BLAS
>>   >> http://www.cs.utexas.edu/users/kgoto/
>>   >>
>>   >> I haven't worked with Opteron or Athlon64 computers but I understand
>>   >> that Goto's BLAS are very effective on those machines.  Furthermore
>>   >> Goto's BLAS are (only) available as .so libraries so you don't need to
>>   >> mess with creating the .so version.
>>
>>    PD> I tried it, yes. Somewhat to my surprise, it seemed to be not quite as
>>    PD> fast as the threaded ATLAS, but I wasn't very systematic about the
>>    PD> benchmarking.
>>
>>    PD> (and the Goto items have license issues, which get in the way for
>>    PD> binary distributions.)
>>
>> Thanks a lot, Peter, Brian, Doug, for your feedbacks!
>> In the mean time, I have three running versions of R(-devel) on
>> the 64-Opteron
>> - "plain"
>> - linked against threaded GOTO
>> - linked against threaded (static) ATLAS  (using -fPIC for compilation;
>>   "large" Rlapack)
>> and I find that GOTO is faster than ATLAS
>> consistently (between ~ 5-20%) for several tests
>> (square matrices; %*% and solve).
>> ATLAS is still an order of magnitude faster than "plain" for
>> 3000x3000 matrices.
>>
>> Here are somewhat repeatable "ATLAS for R" build instructions:
>>
>> 1. get ATLAS source; unpack
>> 2. make : use defaults and "express" installation
>> 3. Before "make install ...", edit the  Make.<ARCHITECTURE> file:
>>    add "-fPIC" to three places, namely  F77FLAGS, CCFLAG0, and MMFLAGS:
>>    which in case of the "threaded Opteron" architecture, leads to
>>    the three new lines
>>       F77FLAGS = -fPIC -fomit-frame-pointer -O -m64
>>
>> CCFLAG0 = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
>>
>> MMFLAGS = -fPIC -fomit-frame-pointer -O -mfpmath=387 -m64
>>    in the file   Make.Linux_HAMMER64SSE2_2
>>
>> 4. make install arch=Linux_HAMMER64SSE2_2
>>
>> 5. Sym.link the ATLAS libraries into /usr/local/lib:
>>
>>    cd /usr/local/lib
>>    ln -s <ATLAS_build_dir>/lib/Linux_HAMMER64SSE2_2/lib* .
>>
>> 6. (needed for runtime!):
>>    Use environment variable LD_LIBRARY_PATH=/usr/local/lib
>>
>>
>> Note that I haven't built *.so (shared) libraries yet.
> /

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: ATLAS threaded 64 bit Opteron build for R: need -fPIC

Amit Aronovitch
Prof Brian Ripley wrote:

> On Fri, 10 Feb 2006, Amit Aronovitch wrote:
>
> You set the reply address to Martin Maechler!  That's antisocial.
>
Sincere apologies. I certainly didn't intend to!
(I probably misclicked while trying to put him on Cc: )

   Please ignore that header.

>> Hi,
>>
>> Sorry for sending such a late reply, and for being abit OT.
>>
>>  I've been trying to compile 64 bit ATLAS for numpy
>> (http://numeric.scipy.org/ ), and so far this thread is the most
>> useful one I could google up - thanks!.
>>  I encountered similiar problems, and so far could not get a .a
>> linkable to numpy (comparing to your post - it seems I might have
>> forgotten to add the -fPIC for the F77FLAGS or MMFLAGS).
>
>
> Yes, that _is_ in the R-admin manual.  I guess you have not read that
> - it describes how to install R.  You can get it in the R tarball from
>
> ftp://ftp.stat.math.ethz.ch/Software/R/R-devel.tar.bz2
>
>
>> Also, I'm having trouble with the ATLAS lapack. To get a usable lib,
>> one has to merge it with a full lapack implementation (as described
>> in the ATLAS errata). However, I'm using RHEL4, and their installed
>> liblapack.a seems to have been compiled without -fPIC, so the merged
>> library is unlinkable to numpy's .so. Is there a way to use Redhat's
>> installed liblapack.so?
>
>
> No, nor should you want to.  If RHEL4 is like FC3/4 watch out, as RH
> have managed to get BLAS routines in liblapack and not liblas, and use
> incorrect patches to LAPACK 3.0.  (Again, see the latest R-admin manual.)

Thanks for the tip - guess that means I'll have to compile my own lapack...

>
>> Few questions about your compiler flags:
>>
>> 1) Is there a reason to compile with -O rather than -O3?
>> (did you try and encounter some problem, or found no major performance
>> difference)
>
>
> ATLAS chose that.  Since the real work is done by hand-tuned assembler
> code it should not matter.
>
>> 2) I see you use -mfpmath=387 - does this work better than sse2 (which
>> seems to be
>> the default)? How about the "sse,387" option - should I try that?
>
>
> Depends on your ATLAS version.  Again, ATLAS chose those.
>
> As it happens, I have been trying to build ATLAS on my new dual
> Opteron box this morning.  The latest devel version (3.7.11) does not
> build, as at some point it says it expects the GNU x86-32 assembler.
> If it did it would use SSE3 and so be faster.
>
> Both 3.6.0 and 3.7.11 fail because my machine is too fast, and I had
> to increase the number of replications (1000) in make/Make.{mv,r1}tune
> and in tune/blas/level1/*.c.  Even then I do not entirely trust the
> results (and the two versions report different L1 caches sizes ...).
>
> I got pretty exasperated with this (it needed about ten builds to get
> one that succeeded).  Both ACML and the Goto BLAS work well out of the
> box on Opterons, but do have licence issues. (Again, see the R-admin
> manual for details.)
>
I'll certainly have to read the R-admin manual.
Once I manage to get a working lib I'll try posting some of that info to
ATLAS lists (should prbly be included in atlas errata or something).

  thanks alot,
      Amit A.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel