How to extract or sort values from one column

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

How to extract or sort values from one column

pooja sinha
Hi All,

I have a .csv file with four columns (Chrom, Start_pos, End_pos & Value).
The value column range from 0 to 1.0 having more than 2.8 million rows. I
need to write a code from which I can extract the values from 0.2-0.4 &
0.7-1.0. Could anyone help me in writing the code because I am new to R and
it takes lot of time manually to sort based on values.

The only part I know is I can read the .csv file and after that I don’t
know how to proceed further.


Thanks,

Puja

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

Ben Tupper-2
Welcome to R!

You could try using findInterval() which will quickly determine into
which interval your values belong.

# your break points define the intervals
brks <- c( 0.2, 0.4, 0.7)

# make an example data frame
n <- 100
x <- data.frame(
  x = seq_len(n),
  y = runif(n, min = 0, max = 1))

# compute the interval associations and add it to the
# data frame
x$group <- findInterval(x$y, brks)

# show the groupings
plot(x$x, x$y, pch = 1 + x$group)

Cheers,
Ben


On Fri, Jan 31, 2020 at 9:21 AM pooja sinha <[hidden email]> wrote:

>
> Hi All,
>
> I have a .csv file with four columns (Chrom, Start_pos, End_pos & Value).
> The value column range from 0 to 1.0 having more than 2.8 million rows. I
> need to write a code from which I can extract the values from 0.2-0.4 &
> 0.7-1.0. Could anyone help me in writing the code because I am new to R and
> it takes lot of time manually to sort based on values.
>
> The only part I know is I can read the .csv file and after that I don’t
> know how to proceed further.
>
>
> Thanks,
>
> Puja
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Ben Tupper
Bigelow Laboratory for Ocean Science
West Boothbay Harbor, Maine
http://www.bigelow.org/
https://eco.bigelow.org

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

K. Elo-2
In reply to this post by pooja sinha
Hi!

Let's assume your data is stored in a data frame called 'df'. So this
code should do the job:

df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]

Best,
Kimmo



pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:

> Hi All,
>
> I have a .csv file with four columns (Chrom, Start_pos, End_pos &
> Value).
> The value column range from 0 to 1.0 having more than 2.8 million
> rows. I
> need to write a code from which I can extract the values from 0.2-0.4
> &
> 0.7-1.0. Could anyone help me in writing the code because I am new to
> R and
> it takes lot of time manually to sort based on values.
>
> The only part I know is I can read the .csv file and after that I
> don’t
> know how to proceed further.
>
>
> Thanks,
>
> Puja
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

K. Elo-2
Hi!

Oh, sorry, one "s" too much in my code. Here the correct one:

df$Value[ (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ]

Best,
Kimmo

pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti:

> Hi!
>
> Let's assume your data is stored in a data frame called 'df'. So this
> code should do the job:
>
> df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]
>
> Best,
> Kimmo
>
>
>
> pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:
> > Hi All,
> >
> > I have a .csv file with four columns (Chrom, Start_pos, End_pos &
> > Value).
> > The value column range from 0 to 1.0 having more than 2.8 million
> > rows. I
> > need to write a code from which I can extract the values from 0.2-
> > 0.4
> > &
> > 0.7-1.0. Could anyone help me in writing the code because I am new
> > to
> > R and
> > it takes lot of time manually to sort based on values.
> >
> > The only part I know is I can read the .csv file and after that I
> > don’t
> > know how to proceed further.
> >
> >
> > Thanks,
> >
> > Puja
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

pooja sinha
Thanks for providing the code but I also needed the output sheet in
.csv format with all the four columns corresponding to the value (Chrom,
Start_pos, End_pos & Value ranging from what I specified earlier).

Puja

On Fri, Jan 31, 2020 at 10:23 AM K. Elo <[hidden email]> wrote:

> Hi!
>
> Oh, sorry, one "s" too much in my code. Here the correct one:
>
> df$Value[ (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ]
>
> Best,
> Kimmo
>
> pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti:
> > Hi!
> >
> > Let's assume your data is stored in a data frame called 'df'. So this
> > code should do the job:
> >
> > df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]
> >
> > Best,
> > Kimmo
> >
> >
> >
> > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:
> > > Hi All,
> > >
> > > I have a .csv file with four columns (Chrom, Start_pos, End_pos &
> > > Value).
> > > The value column range from 0 to 1.0 having more than 2.8 million
> > > rows. I
> > > need to write a code from which I can extract the values from 0.2-
> > > 0.4
> > > &
> > > 0.7-1.0. Could anyone help me in writing the code because I am new
> > > to
> > > R and
> > > it takes lot of time manually to sort based on values.
> > >
> > > The only part I know is I can read the .csv file and after that I
> > > don’t
> > > know how to proceed further.
> > >
> > >
> > > Thanks,
> > >
> > > Puja
> > >
> > >     [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

Bert Gunter-2
Time to study some tutorials and do your own work, don't you think? There
are many good tutorials on the web.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jan 31, 2020 at 7:50 AM pooja sinha <[hidden email]> wrote:

> Thanks for providing the code but I also needed the output sheet in
> .csv format with all the four columns corresponding to the value (Chrom,
> Start_pos, End_pos & Value ranging from what I specified earlier).
>
> Puja
>
> On Fri, Jan 31, 2020 at 10:23 AM K. Elo <[hidden email]> wrote:
>
> > Hi!
> >
> > Oh, sorry, one "s" too much in my code. Here the correct one:
> >
> > df$Value[ (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ]
> >
> > Best,
> > Kimmo
> >
> > pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti:
> > > Hi!
> > >
> > > Let's assume your data is stored in a data frame called 'df'. So this
> > > code should do the job:
> > >
> > > df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]
> > >
> > > Best,
> > > Kimmo
> > >
> > >
> > >
> > > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:
> > > > Hi All,
> > > >
> > > > I have a .csv file with four columns (Chrom, Start_pos, End_pos &
> > > > Value).
> > > > The value column range from 0 to 1.0 having more than 2.8 million
> > > > rows. I
> > > > need to write a code from which I can extract the values from 0.2-
> > > > 0.4
> > > > &
> > > > 0.7-1.0. Could anyone help me in writing the code because I am new
> > > > to
> > > > R and
> > > > it takes lot of time manually to sort based on values.
> > > >
> > > > The only part I know is I can read the .csv file and after that I
> > > > don’t
> > > > know how to proceed further.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Puja
> > > >
> > > >     [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

K. Elo-2
In reply to this post by pooja sinha
Hi!

To extract full rows, use:

df[ ( (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ), ]


But it is also a good idea to start reading some introductory
tutorials. These are basic things you can find in all tutorials :-)

Best,
Kimmo

pe, 2020-01-31 kello 10:50 -0500, pooja sinha kirjoitti:

> Thanks for providing the code but I also needed the output sheet in
> .csv format with all the four columns corresponding to the value
> (Chrom,
> Start_pos, End_pos & Value ranging from what I specified earlier).
>
> Puja
>
> On Fri, Jan 31, 2020 at 10:23 AM K. Elo <[hidden email]> wrote:
>
> > Hi!
> >
> > Oh, sorry, one "s" too much in my code. Here the correct one:
> >
> > df$Value[ (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ]
> >
> > Best,
> > Kimmo
> >
> > pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti:
> > > Hi!
> > >
> > > Let's assume your data is stored in a data frame called 'df'. So
> > > this
> > > code should do the job:
> > >
> > > df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]
> > >
> > > Best,
> > > Kimmo
> > >
> > >
> > >
> > > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:
> > > > Hi All,
> > > >
> > > > I have a .csv file with four columns (Chrom, Start_pos, End_pos
> > > > &
> > > > Value).
> > > > The value column range from 0 to 1.0 having more than 2.8
> > > > million
> > > > rows. I
> > > > need to write a code from which I can extract the values from
> > > > 0.2-
> > > > 0.4
> > > > &
> > > > 0.7-1.0. Could anyone help me in writing the code because I am
> > > > new
> > > > to
> > > > R and
> > > > it takes lot of time manually to sort based on values.
> > > >
> > > > The only part I know is I can read the .csv file and after that
> > > > I
> > > > don’t
> > > > know how to proceed further.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Puja
> > > >
> > > >     [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> > > > see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> > > > code.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
> > > code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

pooja sinha
Thanks but it gives error "incorrect number of dimensions".


Best,
Puja

On Fri, Jan 31, 2020 at 11:37 AM K. Elo <[hidden email]> wrote:

> Hi!
>
> To extract full rows, use:
>
> df[ ( (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ), ]
>
>
> But it is also a good idea to start reading some introductory
> tutorials. These are basic things you can find in all tutorials :-)
>
> Best,
> Kimmo
>
> pe, 2020-01-31 kello 10:50 -0500, pooja sinha kirjoitti:
> > Thanks for providing the code but I also needed the output sheet in
> > .csv format with all the four columns corresponding to the value
> > (Chrom,
> > Start_pos, End_pos & Value ranging from what I specified earlier).
> >
> > Puja
> >
> > On Fri, Jan 31, 2020 at 10:23 AM K. Elo <[hidden email]> wrote:
> >
> > > Hi!
> > >
> > > Oh, sorry, one "s" too much in my code. Here the correct one:
> > >
> > > df$Value[ (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ]
> > >
> > > Best,
> > > Kimmo
> > >
> > > pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti:
> > > > Hi!
> > > >
> > > > Let's assume your data is stored in a data frame called 'df'. So
> > > > this
> > > > code should do the job:
> > > >
> > > > df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]
> > > >
> > > > Best,
> > > > Kimmo
> > > >
> > > >
> > > >
> > > > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:
> > > > > Hi All,
> > > > >
> > > > > I have a .csv file with four columns (Chrom, Start_pos, End_pos
> > > > > &
> > > > > Value).
> > > > > The value column range from 0 to 1.0 having more than 2.8
> > > > > million
> > > > > rows. I
> > > > > need to write a code from which I can extract the values from
> > > > > 0.2-
> > > > > 0.4
> > > > > &
> > > > > 0.7-1.0. Could anyone help me in writing the code because I am
> > > > > new
> > > > > to
> > > > > R and
> > > > > it takes lot of time manually to sort based on values.
> > > > >
> > > > > The only part I know is I can read the .csv file and after that
> > > > > I
> > > > > don’t
> > > > > know how to proceed further.
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Puja
> > > > >
> > > > >     [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> > > > > see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible
> > > > > code.
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> > > > code.
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to extract or sort values from one column

pooja sinha
It worked, initially I made some mistake.


Thanks a lot. Trying to read basics of R.

Puja

On Fri, Jan 31, 2020 at 1:06 PM pooja sinha <[hidden email]> wrote:

> Thanks but it gives error "incorrect number of dimensions".
>
>
> Best,
> Puja
>
> On Fri, Jan 31, 2020 at 11:37 AM K. Elo <[hidden email]> wrote:
>
>> Hi!
>>
>> To extract full rows, use:
>>
>> df[ ( (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ), ]
>>
>>
>> But it is also a good idea to start reading some introductory
>> tutorials. These are basic things you can find in all tutorials :-)
>>
>> Best,
>> Kimmo
>>
>> pe, 2020-01-31 kello 10:50 -0500, pooja sinha kirjoitti:
>> > Thanks for providing the code but I also needed the output sheet in
>> > .csv format with all the four columns corresponding to the value
>> > (Chrom,
>> > Start_pos, End_pos & Value ranging from what I specified earlier).
>> >
>> > Puja
>> >
>> > On Fri, Jan 31, 2020 at 10:23 AM K. Elo <[hidden email]> wrote:
>> >
>> > > Hi!
>> > >
>> > > Oh, sorry, one "s" too much in my code. Here the correct one:
>> > >
>> > > df$Value[ (df$Value>=0.2 & df$Value<=0.4) | df$Value>=0.7 ]
>> > >
>> > > Best,
>> > > Kimmo
>> > >
>> > > pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti:
>> > > > Hi!
>> > > >
>> > > > Let's assume your data is stored in a data frame called 'df'. So
>> > > > this
>> > > > code should do the job:
>> > > >
>> > > > df$Value[ (df$Value>=0.2 & df$Values<=0.4) | df$Value>=0.7 ]
>> > > >
>> > > > Best,
>> > > > Kimmo
>> > > >
>> > > >
>> > > >
>> > > > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti:
>> > > > > Hi All,
>> > > > >
>> > > > > I have a .csv file with four columns (Chrom, Start_pos, End_pos
>> > > > > &
>> > > > > Value).
>> > > > > The value column range from 0 to 1.0 having more than 2.8
>> > > > > million
>> > > > > rows. I
>> > > > > need to write a code from which I can extract the values from
>> > > > > 0.2-
>> > > > > 0.4
>> > > > > &
>> > > > > 0.7-1.0. Could anyone help me in writing the code because I am
>> > > > > new
>> > > > > to
>> > > > > R and
>> > > > > it takes lot of time manually to sort based on values.
>> > > > >
>> > > > > The only part I know is I can read the .csv file and after that
>> > > > > I
>> > > > > don’t
>> > > > > know how to proceed further.
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Puja
>> > > > >
>> > > > >     [[alternative HTML version deleted]]
>> > > > >
>> > > > > ______________________________________________
>> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>> > > > > see
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > > PLEASE do read the posting guide
>> > > > > http://www.R-project.org/posting-guide.html
>> > > > > and provide commented, minimal, self-contained, reproducible
>> > > > > code.
>> > > >
>> > > > ______________________________________________
>> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > PLEASE do read the posting guide
>> > > > http://www.R-project.org/posting-guide.html
>> > > > and provide commented, minimal, self-contained, reproducible
>> > > > code.
>> > >
>> > > ______________________________________________
>> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> > >
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

How to parallelize a process called by a socket connection

James Spottiswoode
In reply to this post by pooja sinha
Hi R Experts,

I’m using R version 3.4.3 running under Linux on an AWS EC2 instance.  I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine.  Here’s the function that listens for a socket connection:

# define server function
server <- function() {
  while(TRUE){
  con <- socketConnection(host="localhost", port = server_port, blocking=TRUE,
                            server=TRUE, open="r+", timeout = 100000000)    
    data <- readLines(con, 1L, skipNul = T, ok = T)
    response <- check(data)    
    if (!is.null(response)) writeLines(response, con)
  }
}

The server function expects to receive a character string which is then passed to the function check().  check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine.  

This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously.  I’m familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things.  

Currently I have a kludge which is a round-robin approach to solving the problem.  I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1… etc. This mitigates, but doesn’t solve, the problem.

Any advice would be greatly appreciated!  Thanks.

James

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to parallelize a process called by a socket connection

Pages, Herve
Seems like you've replied to an existing thread to ask a new question
(your post gets buried deep inside the "How to extract or sort values
from one column" thread in my Thunderbird). Unfortunately this means
that a lot of people who might be able to help you will miss it.

H.


On 2/1/20 11:24, James Spottiswoode wrote:

> Hi R Experts,
>
> I’m using R version 3.4.3 running under Linux on an AWS EC2 instance.  I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine.  Here’s the function that listens for a socket connection:
>
> # define server function
> server <- function() {
>    while(TRUE){
>   con <- socketConnection(host="localhost", port = server_port, blocking=TRUE,
>                              server=TRUE, open="r+", timeout = 100000000)
>       data <- readLines(con, 1L, skipNul = T, ok = T)
>       response <- check(data)
>       if (!is.null(response)) writeLines(response, con)
>    }
> }
>
> The server function expects to receive a character string which is then passed to the function check().  check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine.
>
> This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously.  I’m familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things.
>
> Currently I have a kludge which is a round-robin approach to solving the problem.  I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1… etc. This mitigates, but doesn’t solve, the problem.
>
> Any advice would be greatly appreciated!  Thanks.
>
> James
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=2N70kU171QMzQHhg6A9N3op5jqv8uCm9-njqZfPW3Ok&s=h4ZzqcZ-uTxQeMUcI1l7nHEQHY-Vn-EQsKH83fU7B3s&e=
> PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=2N70kU171QMzQHhg6A9N3op5jqv8uCm9-njqZfPW3Ok&s=GgmKzz9H7MAj3iy7Pu4U0q5v02Fumnl3hjxug2SY1zk&e=
> and provide commented, minimal, self-contained, reproducible code.
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to parallelize a process called by a socket connection

Ivan Krylov
In reply to this post by James Spottiswoode
On Sat, 1 Feb 2020 11:24:51 -0800
James Spottiswoode <[hidden email]> wrote:

>   while(TRUE){
>   con <- socketConnection(host="localhost", port =
> server_port, blocking=TRUE, server=TRUE, open="r+", timeout =
> 100000000)
>       data <- readLines(con, 1L, skipNul = T, ok = T)
>     response <- check(data)    
>     if (!is.null(response)) writeLines(response, con)
>   }

> This all works perfectly except that while check() spends ~50ms doing
> its stuff no more requests can be received and processed.

This poses an interesting challenge.

Normally, a single-threaded server would call listen(socket, backlog)
[1] with backlog > 1, so that other clients attempting to connect() can
wait in the queue while the server calls accept() in a loop and handles
them one by one. Unfortunately, R socketConnection()s are single-client
only, since R closes the listen()ing helper socket immediately after
accept()ing the first client [2].

A quick and dirty hack would be to modify utils::make.socket (and use
read.socket()/write.socket() instead of readLines()/writeLines()),
omitting the check for "localhost" and .Call(C_sockclose, tmp) at the
end of if(server) branch. (Instead, the socket called "tmp" should be
kept and the code should loop on .Call(C_socklisten, tmp) to process all
clients.) I cannot in recommend this in good faith as a long-term
solution, since all involved APIs are private and should not be
depended upon.

Some code from the svSocket package [3] could be repurposed to rely on
the Tcl/Tk event loop to handle multiple clients on a singe server
socket, but implementing that would require knowledge of Tcl. If you
can afford to change the clients to use the HTTP protocol, you can use
the httpuv package [4] to handle the connection management and HTTP
request parsing for you.

Besides that, I don't see a way to handle multiple clients with a single
server socket in R. I must be missing something.

--
Best regards,
Ivan

[1] https://beej.us/guide/bgnet/html/#listen

[2]
https://github.com/wch/r-source/blob/07c17042d9e198319a425df726cc80545ae69812/src/modules/internet/sockconn.c#L77

[3] https://cran.r-project.org/package=svSocket

[4] https://cran.r-project.org/package=httpuv

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.