 Classic List Threaded 7 messages Open this post in threaded view
|

 I am trying to add a constant to the previous value of a variable based on certain conditions. Maybe there is a simple way to do this that I am missing completely. I have given an example below: df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) > df   x  y 1 1 10 2 2 20 3 3 30 4 4 NA 5 5 NA I want to add 2 to the previous value of y, if x exceeds 3 (also will have to handle NAs in the process). The resulting output would look like:   x  y 1 1 10 2 2 20 3 3 30 4 4 32 5 5 34 Can someone please explain how to do it? Thank you. Ravi
Open this post in threaded view
|

## Re: Conditionally adding a constant

 Hello, I believe this works. f1 <- function(x){         for(i in 2:length(x)) x[i] <- ifelse(x[i-1] > 3, x[i-1] + 2, x[i])         x } f2 <- function(x){         for(i in 2:length(x)) x[i] <- ifelse(is.na(x[i]) & (x[i-1] > 3), x[i-1] + 2, x[i])         x } df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) apply(df, 2, f1)      # df\$x > 3, df\$x also changes apply(df, 2, f2)      # only df\$y has NA's Maybe there's a better way, avoiding the loop. Rui Barradas
Open this post in threaded view
|

## Re: Conditionally adding a constant

 Here is another approach.  Probably with some thought and fingerwork, rle() could be used to avoid the while loop, but that should only slow things down if there are long runs of NAs --- there can be a lot of NAs as long as they are spaced apart and it should still be quite efficient. f <- function(x, y) {   i <- which(x > 3)   cond <- TRUE   while (cond) {     y[i] <- y[i - 1] + 2L     cond <- any(is.na(y))   }   return(y) } df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) df\$y <- f(df\$x, df\$y) Cheers, Josh On Mon, Jan 2, 2012 at 4:47 AM, Rui Barradas <[hidden email]> wrote: > Hello, > > I believe this works. > > f1 <- function(x){ >        for(i in 2:length(x)) x[i] <- ifelse(x[i-1] > 3, x[i-1] + 2, x[i]) >        x > } > > f2 <- function(x){ >        for(i in 2:length(x)) x[i] <- ifelse(is.na(x[i]) & (x[i-1] > 3), x[i-1] + > 2, x[i]) >        x > } > > df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) > > apply(df, 2, f1)      # df\$x > 3, df\$x also changes > apply(df, 2, f2)      # only df\$y has NA's > > Maybe there's a better way, avoiding the loop. > > Rui Barradas > > > -- > View this message in context: http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4253125.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Conditionally adding a constant

 Here is a way of doing it without loops: > df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) > > require(zoo)  # need na.locf to fix the NAs > > # replace NA with preceeding values > df\$y <- na.locf(df\$y) > df   x  y 1 1 10 2 2 20 3 3 30 4 4 30 5 5 30 > > # assuming that you want to increment the counts when x > 3 > inc <- cumsum(df\$x > 3) * 2 > inc  0 0 0 2 4 > > df\$y <- df\$y + inc > df   x  y 1 1 10 2 2 20 3 3 30 4 4 32 5 5 34 > > > > On Mon, Jan 2, 2012 at 1:59 PM, Joshua Wiley <[hidden email]> wrote: > Here is another approach.  Probably with some thought and fingerwork, > rle() could be used to avoid the while loop, but that should only slow > things down if there are long runs of NAs --- there can be a lot of > NAs as long as they are spaced apart and it should still be quite > efficient. > > f <- function(x, y) { >  i <- which(x > 3) >  cond <- TRUE >  while (cond) { >    y[i] <- y[i - 1] + 2L >    cond <- any(is.na(y)) >  } >  return(y) > } > > df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) > > df\$y <- f(df\$x, df\$y) > > Cheers, > > Josh > > On Mon, Jan 2, 2012 at 4:47 AM, Rui Barradas <[hidden email]> wrote: >> Hello, >> >> I believe this works. >> >> f1 <- function(x){ >>        for(i in 2:length(x)) x[i] <- ifelse(x[i-1] > 3, x[i-1] + 2, x[i]) >>        x >> } >> >> f2 <- function(x){ >>        for(i in 2:length(x)) x[i] <- ifelse(is.na(x[i]) & (x[i-1] > 3), x[i-1] + >> 2, x[i]) >>        x >> } >> >> df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) >> >> apply(df, 2, f1)      # df\$x > 3, df\$x also changes >> apply(df, 2, f2)      # only df\$y has NA's >> >> Maybe there's a better way, avoiding the loop. >> >> Rui Barradas >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4253125.html>> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/> > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Conditionally adding a constant

 In reply to this post by vioravis Hello again, I believe we are all missing something. Isn't it possible to have NAs as the first values of 'y'? And isn't it also possible to have x > 3? Here is my point (I have changed function 'f2' to predict for such cases, 'f1' is rubbish) # Rui f3 <- function(x, y){         inx <- which(x > 3)         ynx <- which(is.na(y))         for(i in which(inx %in% ynx)) y[ynx[i]] <- y[ynx[i]-1] + 2L         y } # Jim's, as a function, 'na.rm' option added or else 'df3' would produce an error require(zoo) f4 <- function(x, y){         y <- na.locf(y, na.rm=FALSE)         inc <- cumsum(x > 3) * 2         y + inc } df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) df df2 <- data.frame(x = c(1,2,3,4,5), y = c(10,20,NA,40,NA)) df2 df3 <- data.frame(x = c(1,2,3,4,5), y = rev(c(10,20,30,NA,NA))) df3 # Joshua f(df\$x, df\$y)      # works f(df2\$x, df2\$y)    # infinite loop f(df3\$x, df3\$y)    # infinite loop # Rui f3(df\$x, df\$y)     # works f3(df2\$x, df2\$y)   # works as expected? f3(df3\$x, df3\$y)   # works as expected? # Jim f4(df\$x, df\$y)     # works f4(df2\$x, df2\$y)   # works as expected? f4(df3\$x, df3\$y)   # works as expected? If this makes sense, the performance tests are very much in favour of Jim's solution. # If this is what is asked for, test the performance # with large enough N N <- 1.e5 dftest <- data.frame(x=1:N, y=c(sample(c(rep(NA, 5), 10*1:5), N, replace=TRUE))) sum(is.na(dftest))/N    # proportion of NAs in 'dftest' t2 <- system.time(invisible(apply(dftest, 2, f2)))[c(1, 3)] t3 <- system.time(invisible(f3(dftest\$x, dftest\$y)))[c(1, 3)] t4 <- system.time(invisible(f4(dftest\$x, dftest\$y)))[c(1, 3)] rbind(t2=t2, t3=t3, t4=t4, t2.t3=t2/t3, t2.t4=t2/t4, t3.t4=t3/t4) Sample output       user.self   elapsed t2      2.93000   2.95000 t3      0.22000   0.22000 t4      0.01000   0.01000 t2.t3  13.31818  13.40909 t2.t4 293.00000 295.00000 t3.t4  22.00000  22.00000 A factor of 300 over the initial solution or 20+ over the other loop based one. Downside, it needs an extra package loaded, but 'zoo' is rather common place. Rui Barradas