Quantcast

R and Multi threading

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

R and Multi threading

pejpm
I will preface this message by saying that I am not an R developer and know very little about R...but here is my situation:

One of my users has developed a model for analysing commodity prices. At the moment when he runs this model on his daily data set it takes roughly 5 hours to complete. He is using a quad core PC with 2gb of RAM. The R process only uses 1 core..i.e. the overall CPU usage tops out at around 25%. This has been a managable situation for a while, but he would now like to run this model on 5 years of historical data. He has a colleague who ran the model on a 16 core Redhat Linux box, but it took even longer to run. He has asked me for assistance in speeding up this process. I have a couple of questions:

1) Is is possible to run the Windows version of R across all four processors?

2) I was under the impression that R for Linux supported multi-threading by default. Am I correct in this assumption? If not, is it possible for Linux R to multi thread, and how do I go about configuring this?

Apologies for the lack of detailed info in this post. I work in trade floor support and engineering and we dont really have much demand for this kind of heavy duty computational work so I am learning as I investigate this issue.

Regards

pejpm
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: R and Multi threading

Prof Brian Ripley
On Tue, 7 Oct 2008, pejpm wrote:

>
> I will preface this message by saying that I am not an R developer and no
> very little about R...but here is my situation:
>
> One of my users has developed a model for analysing commodity prices. At the
> moment when he runs this model on his daily data set it takes roughly 5
> hours to complete. He is using a quad core PC with 2gb of RAM. The R process
> only uses 1 core..i.e. the overall CPU usage tops out at around 25%. This
> has been a managable situation for a while, but he would now like to run
> this model on 5 years of historical data. He has a colleague who ran the
> model on a 16 core Redhat Linux box, but it took even longer to run. He has
> asked me for assistance in speeding up this process. I have a couple of
> questions:
>
> 1) Is is possible to run the Windows version of R across all four
> processors?

No.

> 2) I was under the impression that R for Linux supported multi-threading by
> default. Am I correct in this assumption? If not, is it possible for Linux R
> to multi thread, and how do I go about configuring this?

Your impression/assumption is wrong.

> Apologies for the lack of detailed info in this post. I work in trade floor
> support and engineering and we dont really have much demand for this kind of
> heavy duty computational work so I am learning as I investigate this issue.

R runs as a single task.  It is possible that some of the the support
functions (notably the BLAS) can be multithreaded, and this will often
(but not always) help if the task is intensive numerical linear algebra.
But even if a multithreaded BLAS is used (and it is not the default
build), the effect on a typical R task is very small.

If you want to exploit multiple processors/cores you need to split up your
R job amongst multiple processes.  There are ways to help you do that
(packages snow and Rmpi, amongst others), but they need recoding of the
job to make use of them.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: R and Multi threading

jgarcia-2
Dear prof, and list!

I'm wondering which are the steps to exploit multiple processors/cores if
most of the processing time is due to C code dynamically loaded into R. I
mean; e.g., a Monte Carlo analysis calls the C part a huge number of
times, and it is this C part which takes most of the time.

Will snow be anyway useful for this, or multithreading must be made
explicit (I don't know how) within the C code, or there is nothing we can
do?

Javier G.P
----


> On Tue, 7 Oct 2008, pejpm wrote:
>
>>
>> I will preface this message by saying that I am not an R developer and
>> no
>> very little about R...but here is my situation:
>>
>> One of my users has developed a model for analysing commodity prices. At
>> the
>> moment when he runs this model on his daily data set it takes roughly 5
>> hours to complete. He is using a quad core PC with 2gb of RAM. The R
>> process
>> only uses 1 core..i.e. the overall CPU usage tops out at around 25%.
>> This
>> has been a managable situation for a while, but he would now like to run
>> this model on 5 years of historical data. He has a colleague who ran the
>> model on a 16 core Redhat Linux box, but it took even longer to run. He
>> has
>> asked me for assistance in speeding up this process. I have a couple of
>> questions:
>>
>> 1) Is is possible to run the Windows version of R across all four
>> processors?
>
> No.
>
>> 2) I was under the impression that R for Linux supported multi-threading
>> by
>> default. Am I correct in this assumption? If not, is it possible for
>> Linux R
>> to multi thread, and how do I go about configuring this?
>
> Your impression/assumption is wrong.
>
>> Apologies for the lack of detailed info in this post. I work in trade
>> floor
>> support and engineering and we dont really have much demand for this
>> kind of
>> heavy duty computational work so I am learning as I investigate this
>> issue.
>
> R runs as a single task.  It is possible that some of the the support
> functions (notably the BLAS) can be multithreaded, and this will often
> (but not always) help if the task is intensive numerical linear algebra.
> But even if a multithreaded BLAS is used (and it is not the default
> build), the effect on a typical R task is very small.
>
> If you want to exploit multiple processors/cores you need to split up your
> R job amongst multiple processes.  There are ways to help you do that
> (packages snow and Rmpi, amongst others), but they need recoding of the
> job to make use of them.
>
> --
> Brian D. Ripley,                  [hidden email]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: R and Multi threading

Prof Brian Ripley
On Wed, 8 Oct 2008, [hidden email] wrote:

> Dear prof, and list!
>
> I'm wondering which are the steps to exploit multiple processors/cores if
> most of the processing time is due to C code dynamically loaded into R. I
> mean; e.g., a Monte Carlo analysis calls the C part a huge number of
> times, and it is this C part which takes most of the time.

But you may well be able to do those parts in parallel.  It depends on how
the MCMC algorithm is organized.

> Will snow be anyway useful for this, or multithreading must be made
> explicit (I don't know how) within the C code, or there is nothing we can
> do?

Please do your own homeork on what snow (etc) do, and how multithreaded
BLAS work (and the ones I am familiar with are C code and use pthreads --
OpenMP is another possibility).

Parallelization is (in general) hard and demands detailed understanding of
the algorithms used (and of alternative algorithms).  For example, the
early 1990s debate on single vs multiple runs for MCMC was all about a
single CPU, and the conclusions will be different if many CPUs are
available at no extra cost.

>
> Javier G.P
> ----
>
>
>> On Tue, 7 Oct 2008, pejpm wrote:
>>
>>>
>>> I will preface this message by saying that I am not an R developer and
>>> no
>>> very little about R...but here is my situation:
>>>
>>> One of my users has developed a model for analysing commodity prices. At
>>> the
>>> moment when he runs this model on his daily data set it takes roughly 5
>>> hours to complete. He is using a quad core PC with 2gb of RAM. The R
>>> process
>>> only uses 1 core..i.e. the overall CPU usage tops out at around 25%.
>>> This
>>> has been a managable situation for a while, but he would now like to run
>>> this model on 5 years of historical data. He has a colleague who ran the
>>> model on a 16 core Redhat Linux box, but it took even longer to run. He
>>> has
>>> asked me for assistance in speeding up this process. I have a couple of
>>> questions:
>>>
>>> 1) Is is possible to run the Windows version of R across all four
>>> processors?
>>
>> No.
>>
>>> 2) I was under the impression that R for Linux supported multi-threading
>>> by
>>> default. Am I correct in this assumption? If not, is it possible for
>>> Linux R
>>> to multi thread, and how do I go about configuring this?
>>
>> Your impression/assumption is wrong.
>>
>>> Apologies for the lack of detailed info in this post. I work in trade
>>> floor
>>> support and engineering and we dont really have much demand for this
>>> kind of
>>> heavy duty computational work so I am learning as I investigate this
>>> issue.
>>
>> R runs as a single task.  It is possible that some of the the support
>> functions (notably the BLAS) can be multithreaded, and this will often
>> (but not always) help if the task is intensive numerical linear algebra.
>> But even if a multithreaded BLAS is used (and it is not the default
>> build), the effect on a typical R task is very small.
>>
>> If you want to exploit multiple processors/cores you need to split up your
>> R job amongst multiple processes.  There are ways to help you do that
>> (packages snow and Rmpi, amongst others), but they need recoding of the
>> job to make use of them.
>>
>> --
>> Brian D. Ripley,                  [hidden email]
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...