parallel processing in r...

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

parallel processing in r...

akshay kulkarni
dear members,
                             I am using mclapply to parallelize my code. I am using Red Hat Linux in AWS.

When I use mclapply, I see no speed increase. I doubt that the Linux OS is allowing fewer than the maximum number of cores to mclapply ( by default, mclapply takes all the available cores to it).

How do you check if the number of workers is less than the output given by detectCores(), in Linux? Is there any R function for it?

I do acknowledge that help on an OS is not suitable for this mailing list, but even Internet could'nt help me. Therefore this mail......

very many thanks for your time  and effort...
yours sincerely,
AKSHAY M KULKARNI

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

Bert Gunter-2
The effectiveness of parallelizing code, be it with mclapply or otherwise,
depends in large part on the code, which you failed to show.

I cannot answer your other question.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Jun 30, 2018 at 10:07 AM, akshay kulkarni <[hidden email]>
wrote:

> dear members,
>                              I am using mclapply to parallelize my code. I
> am using Red Hat Linux in AWS.
>
> When I use mclapply, I see no speed increase. I doubt that the Linux OS is
> allowing fewer than the maximum number of cores to mclapply ( by default,
> mclapply takes all the available cores to it).
>
> How do you check if the number of workers is less than the output given by
> detectCores(), in Linux? Is there any R function for it?
>
> I do acknowledge that help on an OS is not suitable for this mailing list,
> but even Internet could'nt help me. Therefore this mail......
>
> very many thanks for your time  and effort...
> yours sincerely,
> AKSHAY M KULKARNI
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

Jeff Newmiller
In reply to this post by akshay kulkarni
Use "top" at the bash prompt.

Read about the "mc.cores" parameter to mclapply.

Make a simplified example version of your analysis and post your question in the context of that example [1][2][3]. You will learn about the issues you are dealing with in the process of trimming your problem, and will have code you can share that demonstrates the issue without exposing private information.

Running parallel does not necessarily improve performance because other factors like task switching overhead and Inter-process-communication (data sharing) can drag it down. Read about the real benefits and drawbacks of parallelism... there are many discussions out there out there... you might start with [4].


[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)

[4] https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html

On June 30, 2018 10:07:49 AM PDT, akshay kulkarni <[hidden email]> wrote:

>dear members,
>I am using mclapply to parallelize my code. I am using Red Hat Linux in
>AWS.
>
>When I use mclapply, I see no speed increase. I doubt that the Linux OS
>is allowing fewer than the maximum number of cores to mclapply ( by
>default, mclapply takes all the available cores to it).
>
>How do you check if the number of workers is less than the output given
>by detectCores(), in Linux? Is there any R function for it?
>
>I do acknowledge that help on an OS is not suitable for this mailing
>list, but even Internet could'nt help me. Therefore this mail......
>
>very many thanks for your time  and effort...
>yours sincerely,
>AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

Patrick Connolly-4
If you use gkrellm, you'll get a plot of each core's activity so it's
easy to see how many are being used.

yum install gkrellm.


HTH

On 07/01/2018 06:16 AM, Jeff Newmiller wrote:

> Use "top" at the bash prompt.
>
> Read about the "mc.cores" parameter to mclapply.
>
> Make a simplified example version of your analysis and post your question in the context of that example [1][2][3]. You will learn about the issues you are dealing with in the process of trimming your problem, and will have code you can share that demonstrates the issue without exposing private information.
>
> Running parallel does not necessarily improve performance because other factors like task switching overhead and Inter-process-communication (data sharing) can drag it down. Read about the real benefits and drawbacks of parallelism... there are many discussions out there out there... you might start with [4].
>
>
> [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>
> [2] http://adv-r.had.co.nz/Reproducibility.html
>
> [3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)
>
> [4] https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html
>
> On June 30, 2018 10:07:49 AM PDT, akshay kulkarni <[hidden email]> wrote:
>> dear members,
>> I am using mclapply to parallelize my code. I am using Red Hat Linux in
>> AWS.
>>
>> When I use mclapply, I see no speed increase. I doubt that the Linux OS
>> is allowing fewer than the maximum number of cores to mclapply ( by
>> default, mclapply takes all the available cores to it).
>>
>> How do you check if the number of workers is less than the output given
>> by detectCores(), in Linux? Is there any R function for it?
>>
>> I do acknowledge that help on an OS is not suitable for this mailing
>> list, but even Internet could'nt help me. Therefore this mail......
>>
>> very many thanks for your time  and effort...
>> yours sincerely,
>> AKSHAY M KULKARNI
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

akshay kulkarni
In reply to this post by Jeff Newmiller
dear Members,
                                      Thanks for the reply..I do have another issue; I will be highly obliged if you answer it:
I tried "top" at the bash prompt, but it provides a way to measure CPU performance of the existing processes. I want to check the CPU usage of the execution of an R function. So I start R by this

$ R

and at the R prompt I type the function to be executed. But if I type "top" at the R prompt, it says object "top" not found.

So, should I change to bash prompt after running the R function? If yes, how do I do it? If not, how to use "top" inside the R prompt?

Again, I think this is an OS isuue....but I could'nt find any answer in the Internet. I am an independent researcher and I don't have personal access to experts.......this mail list is the only vent I have.......

Very many thanks for your time and effort...
Yours sincerely,
AKSHAY M KULKARNI

________________________________
From: Jeff Newmiller <[hidden email]>
Sent: Saturday, June 30, 2018 11:46 PM
To: [hidden email]; akshay kulkarni; R help Mailing list
Subject: Re: [R] parallel processing in r...

Use "top" at the bash prompt.

Read about the "mc.cores" parameter to mclapply.

Make a simplified example version of your analysis and post your question in the context of that example [1][2][3]. You will learn about the issues you are dealing with in the process of trimming your problem, and will have code you can share that demonstrates the issue without exposing private information.

Running parallel does not necessarily improve performance because other factors like task switching overhead and Inter-process-communication (data sharing) can drag it down. Read about the real benefits and drawbacks of parallelism... there are many discussions out there out there... you might start with [4].


[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)

[4] https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html

On June 30, 2018 10:07:49 AM PDT, akshay kulkarni <[hidden email]> wrote:

>dear members,
>I am using mclapply to parallelize my code. I am using Red Hat Linux in
>AWS.
>
>When I use mclapply, I see no speed increase. I doubt that the Linux OS
>is allowing fewer than the maximum number of cores to mclapply ( by
>default, mclapply takes all the available cores to it).
>
>How do you check if the number of workers is less than the output given
>by detectCores(), in Linux? Is there any R function for it?
>
>I do acknowledge that help on an OS is not suitable for this mailing
>list, but even Internet could'nt help me. Therefore this mail......
>
>very many thanks for your time  and effort...
>yours sincerely,
>AKSHAY M KULKARNI
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

Benoit Vaillant
Hello,

On Sun, Jul 01, 2018 at 11:31:29AM +0000, akshay kulkarni wrote:
> I tried "top" at the bash prompt, but it provides a way to measure
> CPU performance of the existing processes. I want to check the CPU
> usage of the execution of an R function.

Try to open two bash prompts, in one use R and in the other use top to
monitor what is going on.

> and at the R prompt I type the function to be executed. But if I
> type "top" at the R prompt, it says object "top" not found.

top is a shell command, no issue with R not knowing about this.

> So, should I change to bash prompt after running the R function? If
> yes, how do I do it? If not, how to use "top" inside the R prompt?

Basically, you can't.

> Again, I think this is an OS isuue....but I could'nt find any answer
> in the Internet. I am an independent researcher and I don't have
> personal access to experts.......this mail list is the only vent I
> have.......

... (many more dots) Do you think we are experts on your system?

Please do your home work and get back to us once it's done. ;-)

Cheers,

--
Benoît

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (883 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

Bogaso
Hi,

On ' how to use "top" inside the R prompt? '
you can use system('top') command.

Thanks,

On Sun, Jul 1, 2018 at 9:53 PM Benoit Vaillant <[hidden email]>
wrote:

> Hello,
>
> On Sun, Jul 01, 2018 at 11:31:29AM +0000, akshay kulkarni wrote:
> > I tried "top" at the bash prompt, but it provides a way to measure
> > CPU performance of the existing processes. I want to check the CPU
> > usage of the execution of an R function.
>
> Try to open two bash prompts, in one use R and in the other use top to
> monitor what is going on.
>
> > and at the R prompt I type the function to be executed. But if I
> > type "top" at the R prompt, it says object "top" not found.
>
> top is a shell command, no issue with R not knowing about this.
>
> > So, should I change to bash prompt after running the R function? If
> > yes, how do I do it? If not, how to use "top" inside the R prompt?
>
> Basically, you can't.
>
> > Again, I think this is an OS isuue....but I could'nt find any answer
> > in the Internet. I am an independent researcher and I don't have
> > personal access to experts.......this mail list is the only vent I
> > have.......
>
> ... (many more dots) Do you think we are experts on your system?
>
> Please do your home work and get back to us once it's done. ;-)
>
> Cheers,
>
> --
> Benoît
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: parallel processing in r...

akshay kulkarni
In reply to this post by Benoit Vaillant
dear Benoit,
                       Amazing! I did my home work and found out that only two out of eight processors is being utilized by mclapply.

I am using general purpose t2 instances in AWS, with Linux AMI(Red Hat).

How do I make RHEL  utilize all the processors in my linux instance? Should I use the "configure" command in Linux? Is there any specific command in AWS linux instances? Or just setting the mc.cores argument of mclapply to 8(there are 8 cores in my linux instance) work?

If the answer is somewhat involved, please refer  some online resources....

Very many thanks for your time and effort....
Yours sincerely
AKSHAY M KULKARNI
________________________________
From: Benoit Vaillant <[hidden email]>
Sent: Sunday, July 1, 2018 5:52 PM
To: akshay kulkarni
Cc: R help Mailing list
Subject: Re: [R] parallel processing in r...

Hello,

On Sun, Jul 01, 2018 at 11:31:29AM +0000, akshay kulkarni wrote:
> I tried "top" at the bash prompt, but it provides a way to measure
> CPU performance of the existing processes. I want to check the CPU
> usage of the execution of an R function.

Try to open two bash prompts, in one use R and in the other use top to
monitor what is going on.

> and at the R prompt I type the function to be executed. But if I
> type "top" at the R prompt, it says object "top" not found.

top is a shell command, no issue with R not knowing about this.

> So, should I change to bash prompt after running the R function? If
> yes, how do I do it? If not, how to use "top" inside the R prompt?

Basically, you can't.

> Again, I think this is an OS isuue....but I could'nt find any answer
> in the Internet. I am an independent researcher and I don't have
> personal access to experts.......this mail list is the only vent I
> have.......

... (many more dots) Do you think we are experts on your system?

Please do your home work and get back to us once it's done. ;-)

Cheers,

--
Beno�t

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.