Running R on dual/quad Opteron machines

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Running R on dual/quad Opteron machines

Simone Giannerini
Dear all,

I am managing a departmental purchase of an Opteron based
workstation/server for scientific computing on which we will be
running R.
The environment will probably be either Unix/Linux or Solaris and the
amount of RAM will be 8-16Gb, depending on the number of processors.
My main concerns are the following:

1. How much does R  benefit from passing from one processor to
two/four processor machines? Consider that the typical intensive use
of the server
 will be represented by simulation studies with many repeated loops.
2. How does R cope with parallelization and/or parallelized compiled code ?

I would be very grateful if someone could give suggestions and/or
point me to information on the above mentioned issues.

Regards,

Simone Giannerini

--
______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098248  Fax: +39 051 232153
E-mail: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Sean Davis



On 3/6/06 11:50 AM, "Simone Giannerini" <[hidden email]> wrote:

> Dear all,
>
> I am managing a departmental purchase of an Opteron based
> workstation/server for scientific computing on which we will be
> running R.
> The environment will probably be either Unix/Linux or Solaris and the
> amount of RAM will be 8-16Gb, depending on the number of processors.
> My main concerns are the following:
>
> 1. How much does R  benefit from passing from one processor to
> two/four processor machines? Consider that the typical intensive use
> of the server
>  will be represented by simulation studies with many repeated loops.

You will have to implement some parallelization code yourself in order to
take full advantage of the multiple processors.  See below.

> 2. How does R cope with parallelization and/or parallelized compiled code ?

You might look at the Rmpi and snow packages for parallelization from within
R.  We use Rmpi and snow for analyses like simulation and have found these
applications quite easy to implement in parallel from within R.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Simone Giannerini
Dear Sean,

many thanks for the suggestion, I will have a look at the packages.

Regards,

Simone

On 3/6/06, Sean Davis <[hidden email]> wrote:

>
>
>
> On 3/6/06 11:50 AM, "Simone Giannerini" <[hidden email]> wrote:
>
> > Dear all,
> >
> > I am managing a departmental purchase of an Opteron based
> > workstation/server for scientific computing on which we will be
> > running R.
> > The environment will probably be either Unix/Linux or Solaris and the
> > amount of RAM will be 8-16Gb, depending on the number of processors.
> > My main concerns are the following:
> >
> > 1. How much does R  benefit from passing from one processor to
> > two/four processor machines? Consider that the typical intensive use
> > of the server
> >  will be represented by simulation studies with many repeated loops.
>
> You will have to implement some parallelization code yourself in order to
> take full advantage of the multiple processors.  See below.
>
> > 2. How does R cope with parallelization and/or parallelized compiled code ?
>
> You might look at the Rmpi and snow packages for parallelization from within
> R.  We use Rmpi and snow for analyses like simulation and have found these
> applications quite easy to implement in parallel from within R.
>
>


--
______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098248  Fax: +39 051 232153
E-mail: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Thomas Lumley
In reply to this post by Simone Giannerini
On Mon, 6 Mar 2006, Simone Giannerini wrote:
> The environment will probably be either Unix/Linux or Solaris and the
> amount of RAM will be 8-16Gb, depending on the number of processors.
> My main concerns are the following:
>
> 1. How much does R  benefit from passing from one processor to
> two/four processor machines? Consider that the typical intensive use
> of the server
> will be represented by simulation studies with many repeated loops.

The typical way that R is used on multiprocessor systems is running more
than one program, rather than parallel processing. If four people are
using the computer or if one person splits 10,000 iterations of a
simulation into 4 sets of 2,500 you will be using all four processors.

> 2. How does R cope with parallelization and/or parallelized compiled code ?
>

It doesn't really.  There are interfaces to MPI and PVM and there is the
possibility of using a parallel BLAS to speed up linear algebra.  These
won't help much unless the server is under fairly low load so that a
single program can use more than 100% of a single processor.  Our
multiprocessor Opteron servers are rarely that underutilized.

  -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Simone Giannerini
On 3/6/06, Thomas Lumley <[hidden email]> wrote:

> On Mon, 6 Mar 2006, Simone Giannerini wrote:
> > The environment will probably be either Unix/Linux or Solaris and the
> > amount of RAM will be 8-16Gb, depending on the number of processors.
> > My main concerns are the following:
> >
> > 1. How much does R  benefit from passing from one processor to
> > two/four processor machines? Consider that the typical intensive use
> > of the server
> > will be represented by simulation studies with many repeated loops.
>
> The typical way that R is used on multiprocessor systems is running more
> than one program, rather than parallel processing. If four people are
> using the computer or if one person splits 10,000 iterations of a
> simulation into 4 sets of 2,500 you will be using all four processors.
>

Many thanks, if I have understood correctly, in this case I would need
running four separate instances of R, since a single thread cannot
exploit more than one cpu, am I correct?

Regards,

Simone

--
______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098248  Fax: +39 051 232153
E-mail: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Thomas Lumley
On Mon, 6 Mar 2006, Simone Giannerini wrote:

> On 3/6/06, Thomas Lumley <[hidden email]> wrote:
>> On Mon, 6 Mar 2006, Simone Giannerini wrote:
>>> The environment will probably be either Unix/Linux or Solaris and the
>>> amount of RAM will be 8-16Gb, depending on the number of processors.
>>> My main concerns are the following:
>>>
>>> 1. How much does R  benefit from passing from one processor to
>>> two/four processor machines? Consider that the typical intensive use
>>> of the server
>>> will be represented by simulation studies with many repeated loops.
>>
>> The typical way that R is used on multiprocessor systems is running more
>> than one program, rather than parallel processing. If four people are
>> using the computer or if one person splits 10,000 iterations of a
>> simulation into 4 sets of 2,500 you will be using all four processors.
>>
> Many thanks, if I have understood correctly, in this case I would need
> running four separate instances of R, since a single thread cannot
> exploit more than one cpu, am I correct?
>

You *can* exploit more than one CPU using eg the "snow" package, but it's
often easier to just run multiple instances of R, and for a shared
computing system there are often multiple people each running one instance
of R.

  -thomas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Sean Davis



On 3/6/06 1:37 PM, "Thomas Lumley" <[hidden email]> wrote:

> On Mon, 6 Mar 2006, Simone Giannerini wrote:
>
>> On 3/6/06, Thomas Lumley <[hidden email]> wrote:
>>> On Mon, 6 Mar 2006, Simone Giannerini wrote:
>>>> The environment will probably be either Unix/Linux or Solaris and the
>>>> amount of RAM will be 8-16Gb, depending on the number of processors.
>>>> My main concerns are the following:
>>>>
>>>> 1. How much does R  benefit from passing from one processor to
>>>> two/four processor machines? Consider that the typical intensive use
>>>> of the server
>>>> will be represented by simulation studies with many repeated loops.
>>>
>>> The typical way that R is used on multiprocessor systems is running more
>>> than one program, rather than parallel processing. If four people are
>>> using the computer or if one person splits 10,000 iterations of a
>>> simulation into 4 sets of 2,500 you will be using all four processors.
>>>
>> Many thanks, if I have understood correctly, in this case I would need
>> running four separate instances of R, since a single thread cannot
>> exploit more than one cpu, am I correct?
>>
>
> You *can* exploit more than one CPU using eg the "snow" package, but it's
> often easier to just run multiple instances of R, and for a shared
> computing system there are often multiple people each running one instance
> of R.

And let me couch my earlier statements on snow/Rmpi by saying that we use
these tools on a relatively large beowulf cluster (~200 nodes), which is
somewhat different than a single box with 2-4 processors, so it is may not
be worth the trouble outside of a cluster environment.  For example, we have
not moved to using Rmpi/snow on our dual-processor G5s because the speed
gain just isn't worth the extra installation trouble, etc.

Sean

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Simone Giannerini
Ok thanks, I am wondering whether running multiple instances of R
would be possible without problems in presence of compiled code
(shared libraries).
Intuitively, while there can be multiple instances of R, all of them
would be using the same library, but I am just guessing, I might do a
check on this.

Ciao

Simone

>
> And let me couch my earlier statements on snow/Rmpi by saying that we use
> these tools on a relatively large beowulf cluster (~200 nodes), which is
> somewhat different than a single box with 2-4 processors, so it is may not
> be worth the trouble outside of a cluster environment.  For example, we have
> not moved to using Rmpi/snow on our dual-processor G5s because the speed
> gain just isn't worth the extra installation trouble, etc.
>
> Sean
>
>



--
______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098248  Fax: +39 051 232153
E-mail: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Brian Ripley
On Tue, 7 Mar 2006, Simone Giannerini wrote:

> Ok thanks, I am wondering whether running multiple instances of R
> would be possible without problems in presence of compiled code
> (shared libraries).
> Intuitively, while there can be multiple instances of R, all of them
> would be using the same library, but I am just guessing, I might do a
> check on this.

That's what the `shared library' means.  The common parts (e.g. code and
static data) are shared, but the data areas are not.

Different processes run in different address spaces, and modern OSes are
careful only to give a user process write access to its own address space.

Many of us have servers running multiple copies of R at almost all times.
I typically run R tests with four copies running on a dual-CPU Opteron,
that being about the minimum number needed to get 100% CPU usage since I/O
is also being done.

>
> Ciao
>
> Simone
>
>>
>> And let me couch my earlier statements on snow/Rmpi by saying that we use
>> these tools on a relatively large beowulf cluster (~200 nodes), which is
>> somewhat different than a single box with 2-4 processors, so it is may not
>> be worth the trouble outside of a cluster environment.  For example, we have
>> not moved to using Rmpi/snow on our dual-processor G5s because the speed
>> gain just isn't worth the extra installation trouble, etc.
>>
>> Sean

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Running R on dual/quad Opteron machines

Simone Giannerini
Dear prof. Ripley,

many thanks for the clarification, now I have good elements for
managing the purchase.

kind regards,

Simone Giannerini

On 3/7/06, Prof Brian Ripley <[hidden email]> wrote:

> On Tue, 7 Mar 2006, Simone Giannerini wrote:
>
> > Ok thanks, I am wondering whether running multiple instances of R
> > would be possible without problems in presence of compiled code
> > (shared libraries).
> > Intuitively, while there can be multiple instances of R, all of them
> > would be using the same library, but I am just guessing, I might do a
> > check on this.
>
> That's what the `shared library' means.  The common parts (e.g. code and
> static data) are shared, but the data areas are not.
>
> Different processes run in different address spaces, and modern OSes are
> careful only to give a user process write access to its own address space.
>
> Many of us have servers running multiple copies of R at almost all times.
> I typically run R tests with four copies running on a dual-CPU Opteron,
> that being about the minimum number needed to get 100% CPU usage since I/O
> is also being done.
>
> >
> > Ciao
> >
> > Simone
> >
> >>
> >> And let me couch my earlier statements on snow/Rmpi by saying that we use
> >> these tools on a relatively large beowulf cluster (~200 nodes), which is
> >> somewhat different than a single box with 2-4 processors, so it is may not
> >> be worth the trouble outside of a cluster environment.  For example, we have
> >> not moved to using Rmpi/snow on our dual-processor G5s because the speed
> >> gain just isn't worth the extra installation trouble, etc.
> >>
> >> Sean
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>


--
______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098248  Fax: +39 051 232153
E-mail: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel