# vectorization condition counting

3 messages
Open this post in threaded view
|
Report Content as Inappropriate

## vectorization condition counting

 Hi all, I am working on a really big dataset and I would like to vectorize a condition in a if loop to improve speed. the original loop with the condition is currently writen as follow: if(sum(as.integer(tags\$tag_id==tags\$tag_id[i]))==1&tags\$lgth[i]<300){          tags\$stage[i]<-"J"        } Do you have some ideas ? I was unable to do it correctly Thanking you in advance for your help Guillaume
Open this post in threaded view
|
Report Content as Inappropriate

## Re: vectorization condition counting

 Your sum(tag_id==tag_id[i])==1, meaning tag_id[i] is the only entry with its value, may be vectorized by the sneaky idiom    !(duplicated(tag_id,fromLast=FALSE) | duplicated(tag_id,fromLast=TRUE) Hence f0() (with your code in a loop) and f1() are equivalent: f0 <- function (tags) {     for (i in seq_len(nrow(tags))) {         if (sum(tags\$tag_id == tags\$tag_id[i]) == 1 & tags\$lgth[i] < 300) {             tags\$stage[i] <- "J"         }     }     tags } f1 <-function (tags) {     needsChanging <- with(tags, !(duplicated(tag_id, fromLast = FALSE) |         duplicated(tag_id, fromLast = TRUE)) & lgth < 300)     tags\$stage[needsChanging] <- "J"     tags } E.g., > someTags <- data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), stage=factor(rep(".",8), levels=c(".","J"))) > all.equal(f0(someTags), f1(someTags)) [1] TRUE > f1(someTags)   tag_id lgth stage 1      1   50     J 2      2  100     . 3      2  150     . 4      3  200     J 5      4  250     J 6      5  300     . 7      6  350     . 8      6  400     . Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf > Of Guillaume2883 > Sent: Friday, August 10, 2012 3:47 PM > To: [hidden email] > Subject: [R] vectorization condition counting > > Hi all, > > I am working on a really big dataset and I would like to vectorize a > condition in a if loop to improve speed. > > the original loop with the condition is currently writen as follow: > > if(sum(as.integer(tags\$tag_id==tags\$tag_id[i]))==1&tags\$lgth[i]<300){ > >      tags\$stage[i]<-"J" > >    } > > Do you have some ideas ? I was unable to do it correctly > Thanking you in advance for your help > > Guillaume > > > > -- > View this message in context: http://r.789695.n4.nabble.com/vectorization-condition-> counting-tp4639992.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.