Quantcast

vectorization condition counting

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

vectorization condition counting

Guillaume2883
Hi all,

I am working on a really big dataset and I would like to vectorize a condition in a if loop to improve speed.

the original loop with the condition is currently writen as follow:

if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1&tags$lgth[i]<300){
   
     tags$stage[i]<-"J"
   
   }

Do you have some ideas ? I was unable to do it correctly
Thanking you in advance for your help

Guillaume
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: vectorization condition counting

William Dunlap
Your sum(tag_id==tag_id[i])==1, meaning tag_id[i] is the only entry with its
value, may be vectorized by the sneaky idiom
   !(duplicated(tag_id,fromLast=FALSE) | duplicated(tag_id,fromLast=TRUE)

Hence f0() (with your code in a loop) and f1() are equivalent:
f0 <- function (tags) {
    for (i in seq_len(nrow(tags))) {
        if (sum(tags$tag_id == tags$tag_id[i]) == 1 & tags$lgth[i] < 300) {
            tags$stage[i] <- "J"
        }
    }
    tags
}
f1 <-function (tags) {
    needsChanging <- with(tags, !(duplicated(tag_id, fromLast = FALSE) |
        duplicated(tag_id, fromLast = TRUE)) & lgth < 300)
    tags$stage[needsChanging] <- "J"
    tags
}

E.g.,
> someTags <- data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), stage=factor(rep(".",8), levels=c(".","J")))
> all.equal(f0(someTags), f1(someTags))
[1] TRUE
> f1(someTags)
  tag_id lgth stage
1      1   50     J
2      2  100     .
3      2  150     .
4      3  200     J
5      4  250     J
6      5  300     .
7      6  350     .
8      6  400     .

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Guillaume2883
> Sent: Friday, August 10, 2012 3:47 PM
> To: [hidden email]
> Subject: [R] vectorization condition counting
>
> Hi all,
>
> I am working on a really big dataset and I would like to vectorize a
> condition in a if loop to improve speed.
>
> the original loop with the condition is currently writen as follow:
>
> if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1&tags$lgth[i]<300){
>
>      tags$stage[i]<-"J"
>
>    }
>
> Do you have some ideas ? I was unable to do it correctly
> Thanking you in advance for your help
>
> Guillaume
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/vectorization-condition-
> counting-tp4639992.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: vectorization condition counting

arun kirshna


HI,

This may also help:
someTags <- data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), stage=factor(rep(".",8), levels=c(".","J")))
f2<-function(x){
  needsChanging<-with(someTags,is.na(match(tag_id,tag_id[duplicated(tag_id)]))&lgth<300)
 x$stage[needsChanging]<-"J"
 x
 }
 f2(someTags)
#  tag_id lgth stage
#1      1   50     J
#2      2  100     .
#3      2  150     .
#4      3  200     J
#5      4  250     J
#6      5  300     .
#7      6  350     .
#8      6  400     .
A.K.


----- Original Message -----
From: William Dunlap <[hidden email]>
To: Guillaume2883 <[hidden email]>; "[hidden email]" <[hidden email]>
Cc:
Sent: Friday, August 10, 2012 8:02 PM
Subject: Re: [R] vectorization condition counting

Your sum(tag_id==tag_id[i])==1, meaning tag_id[i] is the only entry with its
value, may be vectorized by the sneaky idiom
   !(duplicated(tag_id,fromLast=FALSE) | duplicated(tag_id,fromLast=TRUE)

Hence f0() (with your code in a loop) and f1() are equivalent:
f0 <- function (tags) {
    for (i in seq_len(nrow(tags))) {
        if (sum(tags$tag_id == tags$tag_id[i]) == 1 & tags$lgth[i] < 300) {
            tags$stage[i] <- "J"
        }
    }
    tags
}
f1 <-function (tags) {
    needsChanging <- with(tags, !(duplicated(tag_id, fromLast = FALSE) |
        duplicated(tag_id, fromLast = TRUE)) & lgth < 300)
    tags$stage[needsChanging] <- "J"
    tags
}

E.g.,
> someTags <- data.frame(tag_id = c(1, 2, 2, 3, 4, 5, 6, 6), lgth = 50*(1:8), stage=factor(rep(".",8), levels=c(".","J")))
> all.equal(f0(someTags), f1(someTags))
[1] TRUE
> f1(someTags)
  tag_id lgth stage
1      1   50     J
2      2  100     .
3      2  150     .
4      3  200     J
5      4  250     J
6      5  300     .
7      6  350     .
8      6  400     .

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Guillaume2883
> Sent: Friday, August 10, 2012 3:47 PM
> To: [hidden email]
> Subject: [R] vectorization condition counting
>
> Hi all,
>
> I am working on a really big dataset and I would like to vectorize a
> condition in a if loop to improve speed.
>
> the original loop with the condition is currently writen as follow:
>
> if(sum(as.integer(tags$tag_id==tags$tag_id[i]))==1&tags$lgth[i]<300){
>
>      tags$stage[i]<-"J"
>
>    }
>
> Do you have some ideas ? I was unable to do it correctly
> Thanking you in advance for your help
>
> Guillaume
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/vectorization-condition-
> counting-tp4639992.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...