# Seeking a more efficient way to find partition maxima Classic List Threaded 10 messages Open this post in threaded view
|

## Seeking a more efficient way to find partition maxima

 Hi.   Suppose I have a vector that I partition into disjoint, contiguous subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v, v3 = v[5:6].  I want to find the maximum element of each subvector.  In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:   partiCmax <- function( values, seriesIdx ) {   # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)   parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )   return( cummax( values )[ parti[ , 2 ] ] ) }     The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work.  The following function works (at least it did on the examples I tried):   partiMax <- function( values, seriesIdx ) {   # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)   parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )   return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) ) }     but I figured someone out there could come up with something cleverer.  Thanks!   --  TMK  --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Seeking a more efficient way to find partition maxima

 Talbot Katz wrote: > Hi. > > Suppose I have a vector that I partition into disjoint, contiguous > subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into > three subvectors, v1 = v[1:3], v2 = v, v3 = v[5:6].  I want to > find the maximum element of each subvector.  In this example, max(v1) > is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive > subvector maxima would never decrease, as in the example, I could do > the following: > > partiCmax <- function( values, seriesIdx ) { # assume seriesIdx is > increasing integer sequence beginning with 1, ending at less than or > equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 > ] - 1 ), length( values ) ) ) return( cummax( values )[ parti[ , 2 ] > ] ) } > > > The use of cummax makes that pretty efficient, but if the subvector > maxima are not non-decreasing, it doesn't work.  The following > function works (at least it did on the examples I tried): > > partiMax <- function( values, seriesIdx ) { # assume seriesIdx is > increasing integer sequence beginning with 1, ending at less than or > equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 > ] - 1 ), length( values ) ) ) return( sapply( ( 1:length(seriesIdx) > ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] > ) ) } ) ) } > > > but I figured someone out there could come up with something > cleverer.  Thanks! It is not clear how you are creating the partitions, but if you are (hopefully) ending up with a list, such as:  > Vec.List \$v1  1 4 2 \$v2  6 \$v3  7 5 Then you can use:  > sapply(Vec.List, max, na.rm = TRUE) v1 v2 v3   4  6  7 See ?sapply for more information. HTH, Marc Schwartz ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Seeking a more efficient way to find partition maxima

 Hi Marc. Thank you for the swift response!  I should have explained more about the partitions, I hoped it would be clear from the code.  I am supplying an index vector to create the partitions.  In my original example of v = c(1,4,2,6,7,5) with v1 = v[1:3], v2 = v, v3 = v[5:6], I would specify this partition with the index vector c(1,4,5), giving the first indices of each subvector; the implicit assumption is that the last index of partition i is the first index of partition (i + 1) minus 1, except for the last partition, which ends at the end of the original vector being partitioned.  Here are the results of this example from the two functions I defined: > partiCmax(c(1,4,2,6,7,5),c(1,4,5))  4 6 7 > partiMax(c(1,4,2,6,7,5),c(1,4,5))  4 6 7 > However, if I use an example in which the maxima are not non-decreasing: > partiCmax(6:1,c(1,4,5))  6 6 6 > partiMax(6:1,c(1,4,5))  6 3 2 > partiMax gives the sought result, but partiCmax doesn't.  Your function is like a cleaner version of partiMax, using the fact that you have the partition in a list.  This begs the question of what is the most efficient way to create the partition list from the original vector and the index vector that specifies the partition.  But I am also wondering whether we can find something even more efficient than the use of the apply family of functions.  Thanks! --  TMK  -- 212-460-5430 home 917-656-5351 cell t o p k a t z @ m s n . c o m > Date: Mon, 7 Jan 2008 10:44:20 -0600 > From: [hidden email] > To: [hidden email] > CC: [hidden email] > Subject: Re: [R] Seeking a more efficient way to find partition maxima > > Talbot Katz wrote: >> Hi. >> >> Suppose I have a vector that I partition into disjoint, contiguous >> subvectors. For example, let v = c(1,4,2,6,7,5), partition it into >> three subvectors, v1 = v[1:3], v2 = v, v3 = v[5:6]. I want to >> find the maximum element of each subvector. In this example, max(v1) >> is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive >> subvector maxima would never decrease, as in the example, I could do >> the following: >> >> partiCmax <- function( values, seriesIdx ) { # assume seriesIdx is >> increasing integer sequence beginning with 1, ending at less than or >> equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 >> ] - 1 ), length( values ) ) ) return( cummax( values )[ parti[ , 2 ] >> ] ) } >> >> >> The use of cummax makes that pretty efficient, but if the subvector >> maxima are not non-decreasing, it doesn't work. The following >> function works (at least it did on the examples I tried): >> >> partiMax <- function( values, seriesIdx ) { # assume seriesIdx is >> increasing integer sequence beginning with 1, ending at less than or >> equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 >> ] - 1 ), length( values ) ) ) return( sapply( ( 1:length(seriesIdx) >> ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] >> ) ) } ) ) } >> >> >> but I figured someone out there could come up with something >> cleverer. Thanks! > > It is not clear how you are creating the partitions, but if you are > (hopefully) ending up with a list, such as: > >> Vec.List > \$v1 >  1 4 2 > > \$v2 >  6 > > \$v3 >  7 5 > > > Then you can use: > >> sapply(Vec.List, max, na.rm = TRUE) > v1 v2 v3 > 4 6 7 > > > See ?sapply for more information. > > HTH, > > Marc Schwartz > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Efficient way to substract all entries in two vectors from each other

 In reply to this post by Marc Schwartz Hi Joh. To echo Bert, more detail would be helpful.  But here is a stab at something.  Given two numeric vectors, a and b, for each entry of a this finds the index in b of the closest match.  It's not efficient at all. > bestmati <- function( a, b ) { return( sapply( a, function( x ) { which.min( ( b - x )^2 ) } ) ) } > bestmati( c( 3, 9 ), c( 1, 4, 2, 6, 7, 5 ) )  2 5 > --  TMK  -- 212-460-5430 home 917-656-5351 cell t o p k a t z @ m s n . c o m ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Seeking a more efficient way to find partition maxima

 In reply to this post by Talbot Katz Talbot, Try this: PartMax <- function(x, breaks) {    breaks <- c(breaks, length(x) + 1)    sapply(seq(length(breaks) - 1),           function(i) max(x[breaks[i]:(breaks[i + 1] - 1)],                           na.rm = TRUE)) }  > PartMax(c(1,4,2,6,7,5),c(1,4,5))  4 6 7  > PartMax(6:1,c(1,4,5))  6 3 2 I would not go to extremes in trying to avoid *apply functions. In many cases, it will provide sufficient performance and enable the code to be quite readable.  Short of coding a new R function in a lower level language like C, you are not likely to get more performance and then you have to balance the time it takes to improve the code, versus simply getting the job done reliably and moving on. If you have very large initial vectors, with a large number of sub-partitions and will be doing this often, then it might make sense to spend the time to profile the code and fine tune it. But I would first get a sense of the actual running time on larger vectors before making that decision. HTH, Marc Talbot Katz wrote: > Hi Marc. > > Thank you for the swift response! I should have explained more about the partitions, I hoped it would be clear from the code. I am supplying an index vector to create the partitions. In my original example of v = c(1,4,2,6,7,5) with v1 = v[1:3], v2 = v, v3 = v[5:6], I would specify this partition with the index vector c(1,4,5), giving the first indices of each subvector; the implicit assumption is that the last index of partition i is the first index of partition (i + 1) minus 1, except for the last partition, which ends at the end of the original vector being partitioned. Here are the results of this example from the two functions I defined: > >> partiCmax(c(1,4,2,6,7,5),c(1,4,5)) >  4 6 7 >> partiMax(c(1,4,2,6,7,5),c(1,4,5)) >  4 6 7 > > > However, if I use an example in which the maxima are not > non-decreasing: > >> partiCmax(6:1,c(1,4,5)) >  6 6 6 >> partiMax(6:1,c(1,4,5)) >  6 3 2 > > partiMax gives the sought result, but partiCmax doesn't. Your > function >is like a cleaner version of partiMax, using the fact that you have the >partition in a list. This begs the question of what is the most >efficient way to create the partition list from the original vector and >the index vector that specifies the partition. But I am also wondering >whether we can find something even more efficient than the use of the >apply family of functions. Thanks! ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Seeking a more efficient way to find partition maxima

 In reply to this post by Talbot Katz Try testing the performance of transforming your series to one in which the values of each partition are larger than all prior partitions and the untransforming back: # test data myseq <- c(1, 4, 2, 6, 7, 5) part <- c(1, 4, 5) M <- max(myseq) # transform myseq2 <- myseq + M * cumsum(replace(0 * myseq, part, 1)) # calcuate on transformed version tmp <- partiCmax(myseq2, part) # untransform tmp - M * seq_along(tmp) # c(4, 6, 7) Also you might check how it compares to the simpler    tapply(myseq, cumsum(replace(0 * myseq, part, 1)), max) On Jan 7, 2008 11:18 AM, Talbot Katz <[hidden email]> wrote: > > Hi. > > Suppose I have a vector that I partition into disjoint, contiguous subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v, v3 = v[5:6].  I want to find the maximum element of each subvector.  In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following: > > partiCmax <- function( values, seriesIdx ) { >  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values) >  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) ) >  return( cummax( values )[ parti[ , 2 ] ] ) > } > > > The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work.  The following function works (at least it did on the examples I tried): > > partiMax <- function( values, seriesIdx ) { >  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values) >  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) ) >  return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) ) > } > > > but I figured someone out there could come up with something cleverer.  Thanks! > > --  TMK  --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m >        [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Seeking a more efficient way to find partition maxima

 Thanks Gabor!  This is the sort of thing I was trying to devise, but I ran up against my extensively documented brain power limitations ;-)  Per your remarks, it remains to be seen how it performs compared to a good implementation with one of the apply functions. --  TMK  -- 212-460-5430 home 917-656-5351 cell t o p k a t z @ m s n . c o m > Date: Mon, 7 Jan 2008 13:49:13 -0500 > From: [hidden email] > To: [hidden email] > Subject: Re: [R] Seeking a more efficient way to find partition maxima > CC: [hidden email] > > Try testing the performance of transforming your series to > one in which the values of each partition are larger than all > prior partitions and the untransforming back: > > # test data > myseq <- c(1, 4, 2, 6, 7, 5) > part <- c(1, 4, 5) > > M <- max(myseq) > > # transform > myseq2 <- myseq + M * cumsum(replace(0 * myseq, part, 1)) > > # calcuate on transformed version > tmp <- partiCmax(myseq2, part) > > # untransform > tmp - M * seq_along(tmp) # c(4, 6, 7) > > Also you might check how it compares to the simpler > > tapply(myseq, cumsum(replace(0 * myseq, part, 1)), max) > > On Jan 7, 2008 11:18 AM, Talbot Katz  wrote: >> >> Hi. >> >> Suppose I have a vector that I partition into disjoint, contiguous subvectors. For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v, v3 = v[5:6]. I want to find the maximum element of each subvector. In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following: >> >> partiCmax <- function( values, seriesIdx ) { >> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values) >> parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) ) >> return( cummax( values )[ parti[ , 2 ] ] ) >> } >> >> >> The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work. The following function works (at least it did on the examples I tried): >> >> partiMax <- function( values, seriesIdx ) { >> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values) >> parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) ) >> return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) ) >> } >> >> >> but I figured someone out there could come up with something cleverer. Thanks! >> >> -- TMK --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Efficient way to substract all entries in two vectors from each other

 In reply to this post by Talbot Katz Johannes Graumann wrote: > Hi all, > > I'm to inexperienced to come up with the matrix solution elusively > appearing on the horizon for the following problem and would appreciate if > you could give me a nudge ... > I have two vectors a, and b and need to find the closest match for each > value of a in b. > How to do that efficiently? I am not sure if I understood your problem correctly, but maybe this is of help. However its not memory efficient. > a <- c(3,10,50,100) > b <- c(7,8,300) > b[apply(abs(outer(a,b,"-")),1,which.min)]  7 8 8 8 In Biobase there is the matchpt function doing this efficiently. best wishes, ido ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.