Conditionally remove rows with logic

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Conditionally remove rows with logic

jennifer.sheng2002
Dear all,

I need to remove any rows AFTER the label becomes 1.  For example, for ID
1, the two rows with TIME of 15 & 18 should be removed; for ID 2, any rows
after time 6, i.e., rows of time 9-18, should be removed.  Any
suggestions?  Thank you very much!

The current dataset looks like the following:
ID     TIME     LABEL
1        0            0
1        3            0
1        6            0
1        9            0
1        12          1
1        15          0
1        18           0
2        0            0
2        3            0
2        6            1
2        9            0
2        12          0
2        15          0
2        18          0

Thanks a lot!
Jennifer

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Conditionally remove rows with logic

Jim Lemon-4
Hi Jennifer,
A very pedestrian method, but I think it does what you want.

remove_rows_after_1<-function(x) {
 nrows<-dim(x)[1]
 rtr<-NA
 rtrcount<-1
 got1<-FALSE
 thisID<-x$ID[1]
 for(i in 1:nrows) {
  if(x$ID[i] == thisID && got1) {
   rtr[rtrcount]<-i
   rtrcount<-rtrcount+1
  }
  if(x$ID[i] != thisID) {
   thisID<-x$ID[i]
   got1<-FALSE
  }
  if(x$ID[i] == thisID && x$LABEL[i]) got1<-TRUE
 }
 return(rtr)
}

The function returns the indices of rows to be removed.

Jim


On Mon, Aug 8, 2016 at 8:21 AM, Jennifer Sheng
<[hidden email]> wrote:

> Dear all,
>
> I need to remove any rows AFTER the label becomes 1.  For example, for ID
> 1, the two rows with TIME of 15 & 18 should be removed; for ID 2, any rows
> after time 6, i.e., rows of time 9-18, should be removed.  Any
> suggestions?  Thank you very much!
>
> The current dataset looks like the following:
> ID     TIME     LABEL
> 1        0            0
> 1        3            0
> 1        6            0
> 1        9            0
> 1        12          1
> 1        15          0
> 1        18           0
> 2        0            0
> 2        3            0
> 2        6            1
> 2        9            0
> 2        12          0
> 2        15          0
> 2        18          0
>
> Thanks a lot!
> Jennifer
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Conditionally remove rows with logic

MacQueen, Don
In reply to this post by jennifer.sheng2002
Assuming that within each ID the data is sorted by increasing TIME, and
that LABEL==1 occours only once within each ID. Then I would try something
like this.

Suppose that your data is in a data frame named "df".


df.keep <- logical()

for (id in unique(df$ID)) {
  df.tmp <- subset(df, df$ID==id)
  tmp.keep <- rep(TRUE, nrow(df.tmp))
  tmp.keep[df.tmp$TIME > df.tmp$TIME[df.tmp$LABEL==1]] <- FALSE
  df.keep <- c(df.keep, tmp.keep)
}

newdf <- df[df.keep , ]

I have not tested this.

I'm sure it could be made more efficient, and probably with a bit of
cleverness one could avoid creating temporary subsets of the input. But I
tend to find such subsets handy for testing and debugging.

Unless your input data is huge, it should be fast enough that you won't
notice the inefficiencies.

-Don

--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 8/7/16, 3:21 PM, "R-help on behalf of Jennifer Sheng"
<[hidden email] on behalf of [hidden email]>
wrote:

>Dear all,
>
>I need to remove any rows AFTER the label becomes 1.  For example, for ID
>1, the two rows with TIME of 15 & 18 should be removed; for ID 2, any rows
>after time 6, i.e., rows of time 9-18, should be removed.  Any
>suggestions?  Thank you very much!
>
>The current dataset looks like the following:
>ID     TIME     LABEL
>1        0            0
>1        3            0
>1        6            0
>1        9            0
>1        12          1
>1        15          0
>1        18           0
>2        0            0
>2        3            0
>2        6            1
>2        9            0
>2        12          0
>2        15          0
>2        18          0
>
>Thanks a lot!
>Jennifer
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Conditionally remove rows with logic

jholtman
In reply to this post by jennifer.sheng2002
try this:

> input <- read.table(text = "ID     TIME     LABEL
+  1        0            0
+  1        3            0
+  1        6            0
+  1        9            0
+  1        12          1
+  1        15          0
+  1        18           0
+  2        0            0
+  2        3            0
+  2        6            1
+  2        9            0
+  2        12          0
+  2        15          0
+  2        18          0", header = TRUE)
>
>  result <- do.call(rbind,
+     lapply(split(input, input$ID), function(.id){
+         indx <- which(.id$LABEL == 1)
+         if (length(indx) == 1) .id <- .id[1:indx, ]  # keep upto the '1'
+         .id
+     })
+ )
>
>
> result
     ID TIME LABEL
1.1   1    0     0
1.2   1    3     0
1.3   1    6     0
1.4   1    9     0
1.5   1   12     1
2.8   2    0     0
2.9   2    3     0
2.10  2    6     1
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Aug 7, 2016 at 6:21 PM, Jennifer Sheng <[hidden email]
> wrote:

> Dear all,
>
> I need to remove any rows AFTER the label becomes 1.  For example, for ID
> 1, the two rows with TIME of 15 & 18 should be removed; for ID 2, any rows
> after time 6, i.e., rows of time 9-18, should be removed.  Any
> suggestions?  Thank you very much!
>
> The current dataset looks like the following:
> ID     TIME     LABEL
> 1        0            0
> 1        3            0
> 1        6            0
> 1        9            0
> 1        12          1
> 1        15          0
> 1        18           0
> 2        0            0
> 2        3            0
> 2        6            1
> 2        9            0
> 2        12          0
> 2        15          0
> 2        18          0
>
> Thanks a lot!
> Jennifer
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.