Speeding up a loop

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Speeding up a loop

wwreith
General problem: I have 20 projects that can be invested in and I need to decide which combinations meet a certain set of standards. The total possible combinations comes out to 2^20. However I know for a fact that the number of projects must be greater than 5 and less than 13. So far the the code below is the best I can come up with for iteratively creating a set to check against my set of standards.

Code
x<-matrix(0,nrow=1,ncol=20)
for(i in 1:2^20)
{
x[1]<-x[1]+1
  for(j in 1:20)
  {
    if(x[j]>1)
    {
      x[j]=0
      if(j<20)
      {
        x[j+1]=x[j+1]+1
      }
    }
  }
if(sum(x)>5 && sum(x)<13)
{
# insert criteria here.
}
}

my code forces me to create all 2^20 x's and then use an if statement to decide if x is within my range of projects. Is there a faster way to increment x. Any ideas on how to kill the for loop so that it won't attempt to process an x where the sum is greater than 12 or less than 6?
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Adams, Jean
I've had to do something similar, so I wrote a small function to help.
This runs in about 1/4 the time of your code on my machine.
Others may have a more efficient approach.

all.combs <- function(num, from=0, to=num) {
        # create a matrix of all possible combinations of num items
        # restrict each row to have between "from" and "to" items
        res <- vector("list", to-from+1)
        for(i in seq(from:to)) {
                j <- (from:to)[i]
                if(j==0) res[[i]] <- rep(FALSE, num)
                comb <- combn(num, j)
                res[[i]] <- t(apply(comb, 2, function(x)
!is.na(match(1:num, x))))
                }
        do.call(rbind, res)
        }

all.combs(20, 5, 13)

Jean


wwreith <[hidden email]> wrote on 07/20/2012 07:45:30 AM:

> General problem: I have 20 projects that can be invested in and I need
to
> decide which combinations meet a certain set of standards. The total
> possible combinations comes out to 2^20. However I know for a fact that
the
> number of projects must be greater than 5 and less than 13. So far the
the
> code below is the best I can come up with for iteratively creating a set
to

> check against my set of standards.
>
> Code
> x<-matrix(0,nrow=1,ncol=20)
> for(i in 1:2^20)
> {
> x[1]<-x[1]+1
>   for(j in 1:20)
>   {
>     if(x[j]>1)
>     {
>       x[j]=0
>       if(j<20)
>       {
>         x[j+1]=x[j+1]+1
>       }
>     }
>   }
> if(sum(x)>5 && sum(x)<13)
> {
> # insert criteria here.
> }
> }
>
> my code forces me to create all 2^20 x's and then use an if statement to
> decide if x is within my range of projects. Is there a faster way to
> increment x. Any ideas on how to kill the for loop so that it won't
attempt
> to process an x where the sum is greater than 12 or less than 6?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Petr Savicky
In reply to this post by wwreith
On Fri, Jul 20, 2012 at 05:45:30AM -0700, wwreith wrote:

> General problem: I have 20 projects that can be invested in and I need to
> decide which combinations meet a certain set of standards. The total
> possible combinations comes out to 2^20. However I know for a fact that the
> number of projects must be greater than 5 and less than 13. So far the the
> code below is the best I can come up with for iteratively creating a set to
> check against my set of standards.
>
> Code
> x<-matrix(0,nrow=1,ncol=20)
> for(i in 1:2^20)
> {
> x[1]<-x[1]+1
>   for(j in 1:20)
>   {
>     if(x[j]>1)
>     {
>       x[j]=0
>       if(j<20)
>       {
>         x[j+1]=x[j+1]+1
>       }
>     }
>   }
> if(sum(x)>5 && sum(x)<13)
> {
> # insert criteria here.
> }
> }
>
> my code forces me to create all 2^20 x's and then use an if statement to
> decide if x is within my range of projects. Is there a faster way to
> increment x. Any ideas on how to kill the for loop so that it won't attempt
> to process an x where the sum is greater than 12 or less than 6?

Hi.

The restriction on the sum of the rows between 6 and 12 eliminates the
tails of the distribution, not the main part. So, the final number of
rows is not much smaller than 2^20. More exactly, it is

  sum(choose(20, 6:12))

which is about 0.8477173 * 2^20. On the other hand, all combinations
may be created using expand.grid() faster than using a for loop.

Try the following

  g <- as.matrix(expand.grid(rep(list(0:1), times=20)))
  s <- rowSums(g)
  x <- g[s > 5 & s < 13, ]
  nrow(x)

  [1] 888896

Hope this helps.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Adams, Jean
Petr,

This is great.
MUCH faster than the code I provided.
And much more elegant code.
Thanks for posting!

Jean


Petr Savicky <[hidden email]> wrote on 07/20/2012 09:26:34 AM:

> On Fri, Jul 20, 2012 at 05:45:30AM -0700, wwreith wrote:
> > General problem: I have 20 projects that can be invested in and I need
to
> > decide which combinations meet a certain set of standards. The total
> > possible combinations comes out to 2^20. However I know for a fact
that the
> > number of projects must be greater than 5 and less than 13. So far the
the
> > code below is the best I can come up with for iteratively creating a
set to

> > check against my set of standards.
> >
> > Code
> > x<-matrix(0,nrow=1,ncol=20)
> > for(i in 1:2^20)
> > {
> > x[1]<-x[1]+1
> >   for(j in 1:20)
> >   {
> >     if(x[j]>1)
> >     {
> >       x[j]=0
> >       if(j<20)
> >       {
> >         x[j+1]=x[j+1]+1
> >       }
> >     }
> >   }
> > if(sum(x)>5 && sum(x)<13)
> > {
> > # insert criteria here.
> > }
> > }
> >
> > my code forces me to create all 2^20 x's and then use an if statement
to
> > decide if x is within my range of projects. Is there a faster way to
> > increment x. Any ideas on how to kill the for loop so that it won't
attempt

> > to process an x where the sum is greater than 12 or less than 6?
>
> Hi.
>
> The restriction on the sum of the rows between 6 and 12 eliminates the
> tails of the distribution, not the main part. So, the final number of
> rows is not much smaller than 2^20. More exactly, it is
>
>   sum(choose(20, 6:12))
>
> which is about 0.8477173 * 2^20. On the other hand, all combinations
> may be created using expand.grid() faster than using a for loop.
>
> Try the following
>
>   g <- as.matrix(expand.grid(rep(list(0:1), times=20)))
>   s <- rowSums(g)
>   x <- g[s > 5 & s < 13, ]
>   nrow(x)
>
>   [1] 888896
>
> Hope this helps.
>
> Petr Savicky.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Petr Savicky
In reply to this post by Petr Savicky
On Fri, Jul 20, 2012 at 04:26:34PM +0200, Petr Savicky wrote:

> On Fri, Jul 20, 2012 at 05:45:30AM -0700, wwreith wrote:
> > General problem: I have 20 projects that can be invested in and I need to
> > decide which combinations meet a certain set of standards. The total
> > possible combinations comes out to 2^20. However I know for a fact that the
> > number of projects must be greater than 5 and less than 13. So far the the
> > code below is the best I can come up with for iteratively creating a set to
> > check against my set of standards.
> >
> > Code
> > x<-matrix(0,nrow=1,ncol=20)
> > for(i in 1:2^20)
> > {
> > x[1]<-x[1]+1
> >   for(j in 1:20)
> >   {
> >     if(x[j]>1)
> >     {
> >       x[j]=0
> >       if(j<20)
> >       {
> >         x[j+1]=x[j+1]+1
> >       }
> >     }
> >   }
> > if(sum(x)>5 && sum(x)<13)
> > {
> > # insert criteria here.
> > }
> > }
> >
> > my code forces me to create all 2^20 x's and then use an if statement to
> > decide if x is within my range of projects. Is there a faster way to
> > increment x. Any ideas on how to kill the for loop so that it won't attempt
> > to process an x where the sum is greater than 12 or less than 6?
>
> Hi.
>
> The restriction on the sum of the rows between 6 and 12 eliminates the
> tails of the distribution, not the main part. So, the final number of
> rows is not much smaller than 2^20. More exactly, it is
>
>   sum(choose(20, 6:12))
>
> which is about 0.8477173 * 2^20. On the other hand, all combinations
> may be created using expand.grid() faster than using a for loop.
>
> Try the following
>
>   g <- as.matrix(expand.grid(rep(list(0:1), times=20)))
>   s <- rowSums(g)
>   x <- g[s > 5 & s < 13, ]

Hi.

The above code creates a matrix, whose rows are vectors of 0,1, which
contain between 6 and 12 ones. Using this matrix, it is possible to
go through all these combinations using a for loop as follows.

  for (i in seq.int(length=nrow(x))) {
      here, x[i, ] is a row of the matrix
  }

Another option is to use ifelse() function, which allows to evaluate
a condition on the whole columns of the matrix. If this is possible,
then it is more efficient than a for loop.

Instead of using expand.grid() to create all 2^20 combinations, it is
possible to create only rows with a specified number of ones. The
rows of length n with exactly k ones can be created as follows.

  n <- 5
  k <- 2
  ind <- combn(n, k)
  m <- ncol(ind)
  x <- matrix(0, nrow=m, ncol=n)
  x[cbind(rep(1:m, each=k), c(ind))] <- 1
  x

   [1,]    1    1    0    0    0
   [2,]    1    0    1    0    0
   [3,]    1    0    0    1    0
   [4,]    1    0    0    0    1
   [5,]    0    1    1    0    0
   [6,]    0    1    0    1    0
   [7,]    0    1    0    0    1
   [8,]    0    0    1    1    0
   [9,]    0    0    1    0    1
  [10,]    0    0    0    1    1

Hope this helps.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Adams, Jean
In reply to this post by wwreith
"Reith, William [USA]" <[hidden email]> wrote on 07/20/2012
09:52:02 AM:

> Would this matrix eat up memory making the rest of my program
> slower? Each x needs to be multiplied by a matrix and the results
> checked against a set of thresholds. Doing them one at a time takes
> at least 24 hours right now.
>
> Optimizing a program is not my thing.
>
> Sent from my Verizon Wireless 4GLTE smartphone


It's not mine either.
Better to ask the group.
I'm ccing R-help on this message.

Jean


> "Jean V Adams" <[hidden email]> wrote on 07/20/2012 10:05 AM:
>
> I've had to do something similar, so I wrote a small function to help.
> This runs in about 1/4 the time of your code on my machine.
> Others may have a more efficient approach.
>
> all.combs <- function(num, from=0, to=num) {
>         # create a matrix of all possible combinations of num items
>         # restrict each row to have between "from" and "to" items
>         res <- vector("list", to-from+1)
>         for(i in seq(from:to)) {
>                 j <- (from:to)[i]
>                 if(j==0) res[[i]] <- rep(FALSE, num)
>                 comb <- combn(num, j)
>                 res[[i]] <- t(apply(comb, 2, function(x) !is.na
> (match(1:num, x))))
>                 }
>         do.call(rbind, res)
>         }
>
> all.combs(20, 5, 13)
>
> Jean
>
>
> wwreith <[hidden email]> wrote on 07/20/2012 07:45:30 AM:
>
> > General problem: I have 20 projects that can be invested in and I need
to
> > decide which combinations meet a certain set of standards. The total
> > possible combinations comes out to 2^20. However I know for a fact
that the
> > number of projects must be greater than 5 and less than 13. So far the
the
> > code below is the best I can come up with for iteratively creating a
set to

> > check against my set of standards.
> >
> > Code
> > x<-matrix(0,nrow=1,ncol=20)
> > for(i in 1:2^20)
> > {
> > x[1]<-x[1]+1
> >   for(j in 1:20)
> >   {
> >     if(x[j]>1)
> >     {
> >       x[j]=0
> >       if(j<20)
> >       {
> >         x[j+1]=x[j+1]+1
> >       }
> >     }
> >   }
> > if(sum(x)>5 && sum(x)<13)
> > {
> > # insert criteria here.
> > }
> > }
> >
> > my code forces me to create all 2^20 x's and then use an if statement
to
> > decide if x is within my range of projects. Is there a faster way to
> > increment x. Any ideas on how to kill the for loop so that it won't
attempt
> > to process an x where the sum is greater than 12 or less than 6?
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

wwreith
That is faster than what I was doing and reducing 15% of my iterations it still very helpful.

Next question.

I need to multiply each row x[i,] of the matrix x by another matrix A. Specifically

for(i in 1:n)
{
If (x[i,]%*%A[,1]<.5 || x[i,]%*%A[,2]<42 || x[i,]%*%A[,3]>150)
{
x<-x[-i,]
n<-n-1
}. #In other words remove row i from x if it does not meet criteria (>=.5, >=42, <=150). When multiplied to A
}
Is there a better way than using a for loop for this or x<-x[-i,] for that matter? I assume building a new matrix would be worse.

Ideally I want to also exclude some x[,i] as well example if x[1,] is better than x[2,] in all three categories i.e. bigger, bigger, and smaller than x[2,] when multiplied to A then I want to exclude x[2,] as well. Any suggestions on whether it is better to do this all at once or in stages?

Thanks for helping!
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Richard M. Heiberger
This works to multiply the ith row of a by the ith value of b.
It might be what you can use

a <- matrix(1:30, 6, 5)
b <- 1:6

a
a*b



To simplify your code, I think you can do this---one multiplication

xA <- x %*% A

Now you can do the tests on xA and not have any matrix multiplications
inside the loop.
The next line is not tested but I hope gives you the idea

remove.set <- (xA[,1] < .5) || (xA[,2] < 52) || (xA[,3] > 150)

new.x <- x[remove.set,]

You now have new.x with no explicit loops at all, because this takes
advantage of vector operations.

Rich

On Fri, Jul 20, 2012 at 2:34 PM, wwreith <[hidden email]> wrote:

> That is faster than what I was doing and reducing 15% of my iterations it
> still very helpful.
>
> Next question.
>
> I need to multiply each row x[i,] of the matrix x by another matrix A.
> Specifically
>
> for(i in 1:n)
> {
> If (x[i,]%*%A[,1]<.5 || x[i,]%*%A[,2]<42 || x[i,]%*%A[,3]>150)
> {
> x<-x[-i,]
> n<-n-1
> }. #In other words remove row i from x if it does not meet criteria (>=.5,
> >=42, <=150). When multiplied to A
> }
> Is there a better way than using a for loop for this or x<-x[-i,] for that
> matter? I assume building a new matrix would be worse.
>
> Ideally I want to also exclude some x[,i] as well example if x[1,] is
> better
> than x[2,] in all three categories i.e. bigger, bigger, and smaller than
> x[2,] when multiplied to A then I want to exclude x[2,] as well. Any
> suggestions on whether it is better to do this all at once or in stages?
>
> Thanks for helping!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637255.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Richard M. Heiberger
whoops, backwards

new.x <- x[!remove.set,]


On Fri, Jul 20, 2012 at 2:51 PM, Richard M. Heiberger <[hidden email]>wrote:

> This works to multiply the ith row of a by the ith value of b.
> It might be what you can use
>
> a <- matrix(1:30, 6, 5)
> b <- 1:6
>
> a
> a*b
>
>
>
> To simplify your code, I think you can do this---one multiplication
>
> xA <- x %*% A
>
> Now you can do the tests on xA and not have any matrix multiplications
> inside the loop.
> The next line is not tested but I hope gives you the idea
>
> remove.set <- (xA[,1] < .5) || (xA[,2] < 52) || (xA[,3] > 150)
>
> new.x <- x[remove.set,]
>
> You now have new.x with no explicit loops at all, because this takes
> advantage of vector operations.
>
> Rich
>
> On Fri, Jul 20, 2012 at 2:34 PM, wwreith <[hidden email]> wrote:
>
>> That is faster than what I was doing and reducing 15% of my iterations it
>> still very helpful.
>>
>> Next question.
>>
>> I need to multiply each row x[i,] of the matrix x by another matrix A.
>> Specifically
>>
>> for(i in 1:n)
>> {
>> If (x[i,]%*%A[,1]<.5 || x[i,]%*%A[,2]<42 || x[i,]%*%A[,3]>150)
>> {
>> x<-x[-i,]
>> n<-n-1
>> }. #In other words remove row i from x if it does not meet criteria (>=.5,
>> >=42, <=150). When multiplied to A
>> }
>> Is there a better way than using a for loop for this or x<-x[-i,] for that
>> matter? I assume building a new matrix would be worse.
>>
>> Ideally I want to also exclude some x[,i] as well example if x[1,] is
>> better
>> than x[2,] in all three categories i.e. bigger, bigger, and smaller than
>> x[2,] when multiplied to A then I want to exclude x[2,] as well. Any
>> suggestions on whether it is better to do this all at once or in stages?
>>
>> Thanks for helping!
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637255.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

wwreith
next for loop question.

I need a loop that removes a row from a matrix if it is worse in positions 1,2,3,4 than another row in the matrix. right now my matrix is 503028x26.

Rule to define worse position1 is smaller, position2 is smaller, position3 is higher, and position4 is smaller

Example:

row1: 1, 10, 3, 3
row2: 3, 7, 5, 2


row2 is not worse than row1 since it is "better" in position 1, eventhough it is worse in all other positions.

row3: 2,5,7,1
row3 however is worse than row2 and should be removed from the matrix.

Any ideas? Should I break this into pieces or do it all at once? Is there something faster than a loop? My current loops takes well over 24 hours to run.


m<-matrix(0,1,24)
for(i in 1:n)
{
 a<-matrix(x[i,1:4],1,4)
j=1
      nn<-nrow(m)
      counter<-0
      while(j<=nn)
      {
        if(a[1]>m[j,1] && a[2]>m[j,2] && a[3]>m[j,4] && a[4]<m[j,4])
        {
          m<-m[-j,]
          nn<-length(m[,1])
          counter<-1
        } else j<-j+1
      }
      if(counter==1)
      {
        b<-cbind(a,x)
         m<-rbind(m,b)
      }
      if(counter==0)
      {
        if(a[1]>min(m[,1]) || a[3]>min(m[,3]) || a[4]>min(m[,4]) || a[5]<max(m[,5]))
        {
          b<-cbind(a,x)
           m<-rbind(m,b)
        }
     }
}
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Rui Barradas
Hello,

Maybe it would have been better to start a new thread, if the question
is different. To show that it's a continuation, the subject line could
be extended with "part 2" or something like that.

This solution runs in 3.6 hours.


to.keep <- function(x){
     keep <- function(i, env){
         env$ires <- env$ires + 1
         if(env$ires > env$curr.rows){
             env$result <- rbind(env$result, matrix(nrow=increment,
ncol=nc))
             env$curr.rows <- env$curr.rows + increment
         }
         env$result[env$ires, ] <- x[i, ]
     }

     a1 <- x[, 1]
     a2 <- x[, 2]
     a3 <- x[, 3]
     a4 <- x[, 4]
     nc <- ncol(x)
     increment <- 1000

     e <- new.env()
     e$curr.rows <- increment
     e$result <- matrix(nrow=e$curr.rows, ncol=nc)
     e$ires <- 0

     for(i in seq_len(nrow(x))){
         yes <- x[i, 1] >= a1 | x[i, 2] >= a2 | x[i, 3] <= a3 | x[i, 4]
 >= a4
         if(all(yes)) keep(i, e)
     }
     e$result[seq_len(e$ires), 1:nc]
}

# Now the timing.

set.seed(3971)
nc <- 26
Enes <- seq(from=1e3, to=1e4, by=1e3)
tm <- numeric(length(Enes))
i <- 0
for(n in Enes){
     i <- i + 1
     N <- nc*n
     m <- matrix(sample(0:9, N, TRUE), ncol=nc)
     tm[i] <- system.time(kp <- to.keep(m))[3]
}

plot(Enes, tm) # quadratic behavior
fit <- lm(tm ~ Enes + I(Enes^2))
(secs <- predict(fit, newdata=data.frame(Enes=503028)))
secs/60/60 # 3.6 hours


Hope this helps,

Rui Barradas

Em 21-07-2012 13:26, wwreith escreveu:

> next for loop question.
>
> I need a loop that removes a row from a matrix if it is worse in positions
> 1,2,3,4 than another row in the matrix. right now my matrix is 503028x26.
>
> Rule to define worse position1 is smaller, position2 is smaller, position3
> is higher, and position4 is smaller
>
> Example:
>
> row1: 1, 10, 3, 3
> row2: 3, 7, 5, 2
>
>
> row2 is not worse than row1 since it is "better" in position 1, eventhough
> it is worse in all other positions.
>
> row3: 2,5,7,1
> row3 however is worse than row2 and should be removed from the matrix.
>
> Any ideas? Should I break this into pieces or do it all at once? Is there
> something faster than a loop? My current loops takes well over 24 hours to
> run.
>
>
> m<-matrix(0,1,24)
> for(i in 1:n)
> {
>   a<-matrix(x[i,1:4],1,4)
> j=1
>        nn<-nrow(m)
>        counter<-0
>        while(j<=nn)
>        {
>          if(a[1]>m[j,1] && a[2]>m[j,2] && a[3]>m[j,4] && a[4]<m[j,4])
>          {
>            m&lt;-m[-j,]
>            nn&lt;-length(m[,1])
>            counter&lt;-1
>          } else j&lt;-j+1
>        }
>        if(counter==1)
>        {
>          b&lt;-cbind(a,x)
>           m&lt;-rbind(m,b)
>        }
>        if(counter==0)
>        {
>          if(a[1]>min(m[,1]) || a[3]>min(m[,3]) || a[4]>min(m[,4]) ||
> a[5]<max(m[,5]))
>          {
>            b<-cbind(a,x)
>             m<-rbind(m,b)
>          }
>       }
> }
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637305.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

wwreith
Any chance I could ask for an idiots guide for function to.keep(x). I understand how to use it but not what some of the lines are doing. Comments would be extremely helpful.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Rui Barradas

Ok, sorry, I should have included some comments.

The function is divided in three parts, 1. intro, 2. decision, 3. keep rows.
Part 3 is the function keep(), internal to to.keep(). Let's start with 1.

1. Setup some variables first.
1.a) The variables 'a'.
If the input object 'x' is a matrix this doesn't give a great speed-up
but if 'x' is a data.frame, extraction is time consuming.
So, do this once only, at the beginning.
1.b) The new environment.
This is because my first version would need to change values declared
outside the internal function.
This can be done with the global assignment operator, <<-, but this
pratice should be avoided, it's easy to mess things up.
Note that all the variables changed inside the internal function are in
this new environment, 'e'.
In particular note that 'result' is initialized with 1000 rows.
2. The loop.
This is where we decide if we want to keep that row. I have negated the
condition from an original 'no'.
The 'no' condition:
     a1[i] < a1 & a2[i] < a2 & a3[i] > a3 & a4[i] < a4
Then the test would be:
     if(any(no)) dont_keep else keep.  # pseudo-code
Not in pseudo-code:
     if( all( !no ) ) keep(i, e)
The down side of this is that the original is more readable.

3. The internal function, keep().
Considering the small number of rows I have used for tests, e$result was
initialized to 1e3.
With 5e5 lines I would increase this number to 1e5.
First, the funcion updates the [row number] pointer into 'result' and
checks if we are at a 'result' limit.
If yes, make it bigger by e$increment [ == 1e3 ] rows.
Then just assign row i from matrix/df 'x' to the appropriate row of
e$result.
The reason why we need the environment is because on function return,
all but the returned value is lost.
We could return a list with saved values of ires, curr.rows, result, and
return the list.
But this would complicate and slow things down. Assign, update and
reassign. Messy.
Environments can help keep it "simple", in the sense of to keep together
what is meant to be used together.

And now I hope there is not an overdose of comments :)

Rui Barradas

Em 21-07-2012 18:37, wwreith escreveu:

> Any chance I could ask for an idiots guide for function to.keep(x). I
> understand how to use it but not what some of the lines are doing. Comments
> would be extremely helpful.
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

wwreith
1.15           60 0.553555415         0.574892872
1.15   60 0.563183983         0.564029359

Shouldn't the function row out the second one, since it it higher in position 3 and lower in position 4 i.e. it should not all be yes?

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up a loop

Rui Barradas
Hello,

I think this is a boundary issue. In your op you've said "less" not
"less than or equal to".
Try using "<=" and ">=" to see what happens, I bet it solves it.

Rui Barradas

Em 23-07-2012 14:43, wwreith escreveu:

> 1.15           60 0.553555415         0.574892872
> 1.15   60 0.563183983         0.564029359
>
> Shouldn't the function row out the second one, since it it higher in
> position 3 and lower in position 4 i.e. it should not all be yes?
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: Speeding up a loop

Rui Barradas
Hello,

Are you sure? With a matirx composed of those two rows only I had a
problem, the function to.keep() returned NULL. See the changes made to
avoid it.

# beginning of loop
     for(i in seq_len(nrow(x))){
         #yes <- x[i, 1] > a1 | x[i, 2] > a2 | x[i, 3] < a3 | x[i, 4] > a4
         #if(all(yes)) keep(i, e)
         # Original post, do NOT remove if equal
         #no <- x[i, 1] < a1 | x[i, 2] < a2 | x[i, 3] > a3 | x[i, 4] < a4
         # Changed to remove if equal
         no <- x[i, 1] <= a1 | x[i, 2] <= a2 | x[i, 3] >= a3 | x[i, 4] <= a4
         if(all(!no)) keep(i, e)
     }
     if(e$ires == 0 && nrow(x) > 0)
         x[1, ]
     else
         e$result[seq_len(e$ires), 1:nc]
# end of function


Em 23-07-2012 18:18, Reith, William [USA] escreveu:

> It looks like both ways produce the same result.
>
> -----Original Message-----
> From: Rui Barradas [mailto:[hidden email]]
> Sent: Monday, July 23, 2012 1:05 PM
> To: Reith, William [USA]
> Subject: Re: [External] Re: [R] Speeding up a loop
>
> Hello,
>
> But that's the negation of '<', so try to negate '<=', meaning, remove the equal signs. Sorry if I wasn't very clear.
>
> Rui Barradas
>
> Em 23-07-2012 17:44, Reith, William [USA] escreveu:
>> This is what I have for the yes for loop
>>
>> for(i in seq_len(nrow(x))){
>>       yes <- x[i, 1] >= a1 | x[i, 2] >= a2 | x[i, 3] <= a3 | x[i, 4]>= a4
>>       if(all(yes)) keep(i, e)
>>     }
>>
>> -----Original Message-----
>> From: Rui Barradas [mailto:[hidden email]]
>> Sent: Monday, July 23, 2012 12:14 PM
>> To: Reith, William [USA]
>> Cc: r-help
>> Subject: [External] Re: [R] Speeding up a loop
>>
>> Hello,
>>
>> I think this is a boundary issue. In your op you've said "less" not "less than or equal to".
>> Try using "<=" and ">=" to see what happens, I bet it solves it.
>>
>> Rui Barradas
>>
>> Em 23-07-2012 14:43, wwreith escreveu:
>>> 1.15           60 0.553555415         0.574892872
>>> 1.15   60 0.563183983         0.564029359
>>>
>>> Shouldn't the function row out the second one, since it it higher in
>>> position 3 and lower in position 4 i.e. it should not all be yes?
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.ht
>>> m l Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: Speeding up a loop

Rui Barradas
Hello,

Sorry for the mess, the logical operation should also be changed to
conjunction:

         no <- x[i, 1] <= a1 & x[i, 2] <= a2 & x[i, 3] >= a3 & x[i, 4] <= a4

Rui Barradas

Em 24-07-2012 13:09, Rui Barradas escreveu:

> Hello,
>
> Are you sure? With a matirx composed of those two rows only I had a
> problem, the function to.keep() returned NULL. See the changes made to
> avoid it.
>
> # beginning of loop
>     for(i in seq_len(nrow(x))){
>         #yes <- x[i, 1] > a1 | x[i, 2] > a2 | x[i, 3] < a3 | x[i, 4] > a4
>         #if(all(yes)) keep(i, e)
>         # Original post, do NOT remove if equal
>         #no <- x[i, 1] < a1 | x[i, 2] < a2 | x[i, 3] > a3 | x[i, 4] < a4
>         # Changed to remove if equal
>         no <- x[i, 1] <= a1 | x[i, 2] <= a2 | x[i, 3] >= a3 | x[i, 4]
> <= a4
>         if(all(!no)) keep(i, e)
>     }
>     if(e$ires == 0 && nrow(x) > 0)
>         x[1, ]
>     else
>         e$result[seq_len(e$ires), 1:nc]
> # end of function
>
>
> Em 23-07-2012 18:18, Reith, William [USA] escreveu:
>> It looks like both ways produce the same result.
>>
>> -----Original Message-----
>> From: Rui Barradas [mailto:[hidden email]]
>> Sent: Monday, July 23, 2012 1:05 PM
>> To: Reith, William [USA]
>> Subject: Re: [External] Re: [R] Speeding up a loop
>>
>> Hello,
>>
>> But that's the negation of '<', so try to negate '<=', meaning,
>> remove the equal signs. Sorry if I wasn't very clear.
>>
>> Rui Barradas
>>
>> Em 23-07-2012 17:44, Reith, William [USA] escreveu:
>>> This is what I have for the yes for loop
>>>
>>> for(i in seq_len(nrow(x))){
>>>       yes <- x[i, 1] >= a1 | x[i, 2] >= a2 | x[i, 3] <= a3 | x[i,
>>> 4]>= a4
>>>       if(all(yes)) keep(i, e)
>>>     }
>>>
>>> -----Original Message-----
>>> From: Rui Barradas [mailto:[hidden email]]
>>> Sent: Monday, July 23, 2012 12:14 PM
>>> To: Reith, William [USA]
>>> Cc: r-help
>>> Subject: [External] Re: [R] Speeding up a loop
>>>
>>> Hello,
>>>
>>> I think this is a boundary issue. In your op you've said "less" not
>>> "less than or equal to".
>>> Try using "<=" and ">=" to see what happens, I bet it solves it.
>>>
>>> Rui Barradas
>>>
>>> Em 23-07-2012 14:43, wwreith escreveu:
>>>> 1.15           60 0.553555415         0.574892872
>>>> 1.15       60    0.563183983         0.564029359
>>>>
>>>> Shouldn't the function row out the second one, since it it higher in
>>>> position 3 and lower in position 4 i.e. it should not all be yes?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.ht
>>>> m l Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: Speeding up a loop

Rui Barradas
In reply to this post by Rui Barradas
Hello,

Anyway, I've redid part of the function in order to accomodate 1. larger
increments and 2. keep if equal or not. So, here's the complete version
and forget my two previous mails.

to.keep <- function(x, increment = 1e4, keep.if.equal = FALSE){
     keep <- function(i, env){
         env$ires <- env$ires + 1
         if(env$ires > env$curr.rows){
             env$result <- rbind(env$result, matrix(nrow=increment,
ncol=nc))
             env$curr.rows <- env$curr.rows + increment
         }
         env$result[env$ires, ] <- x[i, ]
     }

     x  <- as.matrix(x)
     a1 <- x[, 1]
     a2 <- x[, 2]
     a3 <- x[, 3]
     a4 <- x[, 4]
     nc <- ncol(x)

     e <- new.env()
     e$curr.rows <- increment
     e$result <- matrix(nrow=e$curr.rows, ncol=nc)
     e$ires <- 0
     if(keep.if.equal){
         for(i in seq_len(nrow(x))){
             yes <- a1[i] >= a1 | a2[i] >= a2 | a3[i] <= a3 | a4[i] >= a4
             if(all(yes[-i])) keep(i, e)
         }
     }else{
         for(i in seq_len(nrow(x))){
             no <- a1[i] <= a1 & a2[i] <= a2 & a3[i] >= a3 & a4[i] <= a4
             if(!any(no[-i])) keep(i, e)
         }
     }
     e$result[seq_len(e$ires), 1:nc]
}


I hope this finally settles it.

Rui Barradas

Em 23-07-2012 18:18, Reith, William [USA] escreveu:

> It looks like both ways produce the same result.
>
> -----Original Message-----
> From: Rui Barradas [mailto:[hidden email]]
> Sent: Monday, July 23, 2012 1:05 PM
> To: Reith, William [USA]
> Subject: Re: [External] Re: [R] Speeding up a loop
>
> Hello,
>
> But that's the negation of '<', so try to negate '<=', meaning, remove the equal signs. Sorry if I wasn't very clear.
>
> Rui Barradas
>
> Em 23-07-2012 17:44, Reith, William [USA] escreveu:
>> This is what I have for the yes for loop
>>
>> for(i in seq_len(nrow(x))){
>>       yes <- x[i, 1] >= a1 | x[i, 2] >= a2 | x[i, 3] <= a3 | x[i, 4]>= a4
>>       if(all(yes)) keep(i, e)
>>     }
>>
>> -----Original Message-----
>> From: Rui Barradas [mailto:[hidden email]]
>> Sent: Monday, July 23, 2012 12:14 PM
>> To: Reith, William [USA]
>> Cc: r-help
>> Subject: [External] Re: [R] Speeding up a loop
>>
>> Hello,
>>
>> I think this is a boundary issue. In your op you've said "less" not "less than or equal to".
>> Try using "<=" and ">=" to see what happens, I bet it solves it.
>>
>> Rui Barradas
>>
>> Em 23-07-2012 14:43, wwreith escreveu:
>>> 1.15           60 0.553555415         0.574892872
>>> 1.15   60 0.563183983         0.564029359
>>>
>>> Shouldn't the function row out the second one, since it it higher in
>>> position 3 and lower in position 4 i.e. it should not all be yes?
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637438.ht
>>> m l Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.