How to extract or sort values from one column

 Hi All, I have a .csv file with four columns (Chrom, Start_pos, End_pos & Value). The value column range from 0 to 1.0 having more than 2.8 million rows. I need to write a code from which I can extract the values from 0.2-0.4 & 0.7-1.0. Could anyone help me in writing the code because I am new to R and it takes lot of time manually to sort based on values. The only part I know is I can read the .csv file and after that I don't know how to proceed further. Thanks, Puja
Re: How to extract or sort values from one column

 Welcome to R! You could try using findInterval() which will quickly determine into which interval your values belong. # your break points define the intervals brks <- c( 0.2, 0.4, 0.7) # make an example data frame n <- 100 x <- data.frame(   x = seq_len(n),   y = runif(n, min = 0, max = 1)) # compute the interval associations and add it to the # data frame x\$group <- findInterval(x\$y, brks) # show the groupings plot(x\$x, x\$y, pch = 1 + x\$group) Cheers, Ben On Fri, Jan 31, 2020 at 9:21 AM pooja sinha wrote: > > Hi All, > > I have a .csv file with four columns (Chrom, Start_pos, End_pos & Value). > The value column range from 0 to 1.0 having more than 2.8 million rows. I > need to write a code from which I can extract the values from 0.2-0.4 & > 0.7-1.0. Could anyone help me in writing the code because I am new to R and > it takes lot of time manually to sort based on values. > > The only part I know is I can read the .csv file and after that I don't > know how to proceed further. > > > Thanks, > > Puja -- Ben Tupper Bigelow Laboratory for Ocean Science West Boothbay Harbor, Maine http://www.bigelow.org/https://eco.bigelow.org
Re: How to extract or sort values from one column

 In reply to this post by pooja sinha Hi! Let's assume your data is stored in a data frame called 'df'. So this code should do the job: df\$Value[ (df\$Value>=0.2 & df\$Values<=0.4) | df\$Value>=0.7 ] Best, Kimmo pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti: > Hi All, > > I have a .csv file with four columns (Chrom, Start_pos, End_pos & > Value). > The value column range from 0 to 1.0 having more than 2.8 million > rows. I > need to write a code from which I can extract the values from 0.2-0.4 > & > 0.7-1.0. Could anyone help me in writing the code because I am new to > R and > it takes lot of time manually to sort based on values. > > The only part I know is I can read the .csv file and after that I > don't > know how to proceed further. > > > Thanks, > > Puja
Re: How to extract or sort values from one column

 Hi! Oh, sorry, one "s" too much in my code. Here the correct one: df\$Value[ (df\$Value>=0.2 & df\$Value<=0.4) | df\$Value>=0.7 ] Best, Kimmo pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti: > Hi! > > Let's assume your data is stored in a data frame called 'df'. So this > code should do the job: > > df\$Value[ (df\$Value>=0.2 & df\$Values<=0.4) | df\$Value>=0.7 ] > > Best, > Kimmo > > > > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti: > > Hi All, > > > > I have a .csv file with four columns (Chrom, Start_pos, End_pos & > > Value). > > The value column range from 0 to 1.0 having more than 2.8 million > > rows. I > > need to write a code from which I can extract the values from 0.2- > > 0.4 > > & > > 0.7-1.0. Could anyone help me in writing the code because I am new > > to > > R and > > it takes lot of time manually to sort based on values. > > > > The only part I know is I can read the .csv file and after that I > > don't > > know how to proceed further. > > > > > > Thanks, > > > > Puja
Re: How to extract or sort values from one column

 Thanks for providing the code but I also needed the output sheet in .csv format with all the four columns corresponding to the value (Chrom, Start_pos, End_pos & Value ranging from what I specified earlier). Puja On Fri, Jan 31, 2020 at 10:23 AM K. Elo wrote: > Hi! > > Oh, sorry, one "s" too much in my code. Here the correct one: > > df\$Value[ (df\$Value>=0.2 & df\$Value<=0.4) | df\$Value>=0.7 ] > > Best, > Kimmo > > pe, 2020-01-31 kello 17:12 +0200, K. Elo kirjoitti: > > Hi! > > > > Let's assume your data is stored in a data frame called 'df'. So this > > code should do the job: > > > > df\$Value[ (df\$Value>=0.2 & df\$Values<=0.4) | df\$Value>=0.7 ] > > > > Best, > > Kimmo > > > > > > > > pe, 2020-01-31 kello 09:21 -0500, pooja sinha kirjoitti: > > > Hi All, > > > > > > I have a .csv file with four columns (Chrom, Start_pos, End_pos & > > > Value). > > > The value column range from 0 to 1.0 having more than 2.8 million > > > rows. I > > > need to write a code from which I can extract the values from 0.2- > > > 0.4 > > > & > > > 0.7-1.0. Could anyone help me in writing the code because I am new > > > to > > > R and > > > it takes lot of time manually to sort based on values. > > > > > > The only part I know is I can read the .csv file and after that I > > > don't > > > know how to proceed further. > > > > > > > > > Thanks, > > > > > > Puja
How to parallelize a process called by a socket connection

 In reply to this post by pooja sinha Hi R Experts, I'm using R version 3.4.3 running under Linux on an AWS EC2 instance.  I have an R code listening on a port for a socket connection which passes incoming data to a function the results of which are then passed back to the calling machine.  Here's the function that listens for a socket connection: # define server function server <- function() {   while(TRUE){   con <- socketConnection(host="localhost", port = server_port, blocking=TRUE,                             server=TRUE, open="r+", timeout = 100000000)         data <- readLines(con, 1L, skipNul = T, ok = T)     response <- check(data)         if (!is.null(response)) writeLines(response, con)   } } The server function expects to receive a character string which is then passed to the function check().  check() is a large, complex routine which does text analysis and many other things and returns a JSON string to be passed back to the calling machine.   This all works perfectly except that while check() spends ~50ms doing its stuff no more requests can be received and processed. Therefore if a new request comes in sooner than ~50ms after the last one, it is not processed. I would therefore like to parallelize this so that the box can be running more than one check() process simulatanously.  I'm familar with several of the paralyzing R packages but I cannot see how to integrate them with the socket connection side of things.   Currently I have a kludge which is a round-robin approach to solving the problem.  I have 4 versions of the whole R code listening on 4 different ports, say P1, P2, P3, P4, and the calling machine issues calls in sequence to ports P1,P2,P3,P4,P1… etc. This mitigates, but doesn't solve, the problem. Any advice would be greatly appreciated!  Thanks. James