Seeking a more efficient way to find partition maxima

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Seeking a more efficient way to find partition maxima

Talbot Katz

Hi.
 
Suppose I have a vector that I partition into disjoint, contiguous subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6].  I want to find the maximum element of each subvector.  In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
 
partiCmax <- function( values, seriesIdx ) {
  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
  return( cummax( values )[ parti[ , 2 ] ] )
}
 
 
The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work.  The following function works (at least it did on the examples I tried):
 
partiMax <- function( values, seriesIdx ) {
  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
  return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
}
 
 
but I figured someone out there could come up with something cleverer.  Thanks!
 
--  TMK  --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Seeking a more efficient way to find partition maxima

Marc Schwartz
Talbot Katz wrote:

> Hi.
>
> Suppose I have a vector that I partition into disjoint, contiguous
> subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into
> three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6].  I want to
> find the maximum element of each subvector.  In this example, max(v1)
> is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive
> subvector maxima would never decrease, as in the example, I could do
> the following:
>
> partiCmax <- function( values, seriesIdx ) { # assume seriesIdx is
> increasing integer sequence beginning with 1, ending at less than or
> equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1
> ] - 1 ), length( values ) ) ) return( cummax( values )[ parti[ , 2 ]
> ] ) }
>
>
> The use of cummax makes that pretty efficient, but if the subvector
> maxima are not non-decreasing, it doesn't work.  The following
> function works (at least it did on the examples I tried):
>
> partiMax <- function( values, seriesIdx ) { # assume seriesIdx is
> increasing integer sequence beginning with 1, ending at less than or
> equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1
> ] - 1 ), length( values ) ) ) return( sapply( ( 1:length(seriesIdx)
> ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ]
> ) ) } ) ) }
>
>
> but I figured someone out there could come up with something
> cleverer.  Thanks!

It is not clear how you are creating the partitions, but if you are
(hopefully) ending up with a list, such as:

 > Vec.List
$v1
[1] 1 4 2

$v2
[1] 6

$v3
[1] 7 5


Then you can use:

 > sapply(Vec.List, max, na.rm = TRUE)
v1 v2 v3
  4  6  7


See ?sapply for more information.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Seeking a more efficient way to find partition maxima

Talbot Katz

Hi Marc.

Thank you for the swift response!  I should have explained more about the partitions, I hoped it would be clear from the code.  I am supplying an index vector to create the partitions.  In my original example of v = c(1,4,2,6,7,5) with v1 = v[1:3], v2 = v[4], v3 = v[5:6], I would specify this partition with the index vector c(1,4,5), giving the first indices of each subvector; the implicit assumption is that the last index of partition i is the first index of partition (i + 1) minus 1, except for the last partition, which ends at the end of the original vector being partitioned.  Here are the results of this example from the two functions I defined:

> partiCmax(c(1,4,2,6,7,5),c(1,4,5))
[1] 4 6 7
> partiMax(c(1,4,2,6,7,5),c(1,4,5))
[1] 4 6 7
>


However, if I use an example in which the maxima are not non-decreasing:

> partiCmax(6:1,c(1,4,5))
[1] 6 6 6
> partiMax(6:1,c(1,4,5))
[1] 6 3 2
>

partiMax gives the sought result, but partiCmax doesn't.  Your function is like a cleaner version of partiMax, using the fact that you have the partition in a list.  This begs the question of what is the most efficient way to create the partition list from the original vector and the index vector that specifies the partition.  But I am also wondering whether we can find something even more efficient than the use of the apply family of functions.  Thanks!


--  TMK  --
212-460-5430 home
917-656-5351 cell
t o p k a t z @ m s n . c o m



> Date: Mon, 7 Jan 2008 10:44:20 -0600
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]
> Subject: Re: [R] Seeking a more efficient way to find partition maxima
>
> Talbot Katz wrote:
>> Hi.
>>
>> Suppose I have a vector that I partition into disjoint, contiguous
>> subvectors. For example, let v = c(1,4,2,6,7,5), partition it into
>> three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to
>> find the maximum element of each subvector. In this example, max(v1)
>> is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive
>> subvector maxima would never decrease, as in the example, I could do
>> the following:
>>
>> partiCmax <- function( values, seriesIdx ) { # assume seriesIdx is
>> increasing integer sequence beginning with 1, ending at less than or
>> equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1
>> ] - 1 ), length( values ) ) ) return( cummax( values )[ parti[ , 2 ]
>> ] ) }
>>
>>
>> The use of cummax makes that pretty efficient, but if the subvector
>> maxima are not non-decreasing, it doesn't work. The following
>> function works (at least it did on the examples I tried):
>>
>> partiMax <- function( values, seriesIdx ) { # assume seriesIdx is
>> increasing integer sequence beginning with 1, ending at less than or
>> equal to length(values) parti <- cbind( seriesIdx, c( ( seriesIdx[ -1
>> ] - 1 ), length( values ) ) ) return( sapply( ( 1:length(seriesIdx)
>> ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ]
>> ) ) } ) ) }
>>
>>
>> but I figured someone out there could come up with something
>> cleverer. Thanks!
>
> It is not clear how you are creating the partitions, but if you are
> (hopefully) ending up with a list, such as:
>
>> Vec.List
> $v1
> [1] 1 4 2
>
> $v2
> [1] 6
>
> $v3
> [1] 7 5
>
>
> Then you can use:
>
>> sapply(Vec.List, max, na.rm = TRUE)
> v1 v2 v3
> 4 6 7
>
>
> See ?sapply for more information.
>
> HTH,
>
> Marc Schwartz
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Efficient way to substract all entries in two vectors from each other

Talbot Katz
In reply to this post by Marc Schwartz

Hi Joh.

To echo Bert, more detail would be helpful.  But here is a stab at something.  Given two numeric vectors, a and b, for each entry of a this finds the index in b of the closest match.  It's not efficient at all.

> bestmati <- function( a, b ) { return( sapply( a, function( x ) { which.min( ( b - x )^2 ) } ) ) }
> bestmati( c( 3, 9 ), c( 1, 4, 2, 6, 7, 5 ) )
[1] 2 5
>

--  TMK  --
212-460-5430 home
917-656-5351 cell
t o p k a t z @ m s n . c o m

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Seeking a more efficient way to find partition maxima

Marc Schwartz
In reply to this post by Talbot Katz
Talbot,

Try this:

PartMax <- function(x, breaks)
{
   breaks <- c(breaks, length(x) + 1)
   sapply(seq(length(breaks) - 1),
          function(i) max(x[breaks[i]:(breaks[i + 1] - 1)],
                          na.rm = TRUE))
}


 > PartMax(c(1,4,2,6,7,5),c(1,4,5))
[1] 4 6 7

 > PartMax(6:1,c(1,4,5))
[1] 6 3 2


I would not go to extremes in trying to avoid *apply functions. In many
cases, it will provide sufficient performance and enable the code to be
quite readable.  Short of coding a new R function in a lower level
language like C, you are not likely to get more performance and then you
have to balance the time it takes to improve the code, versus simply
getting the job done reliably and moving on.

If you have very large initial vectors, with a large number of
sub-partitions and will be doing this often, then it might make sense to
spend the time to profile the code and fine tune it. But I would first
get a sense of the actual running time on larger vectors before making
that decision.

HTH,

Marc


Talbot Katz wrote:
> Hi Marc.
>
> Thank you for the swift response! I should have explained more about
the partitions, I hoped it would be clear from the code. I am supplying
an index vector to create the partitions. In my original example of v =
c(1,4,2,6,7,5) with v1 = v[1:3], v2 = v[4], v3 = v[5:6], I would specify
this partition with the index vector c(1,4,5), giving the first indices
of each subvector; the implicit assumption is that the last index of
partition i is the first index of partition (i + 1) minus 1, except for
the last partition, which ends at the end of the original vector being
partitioned. Here are the results of this example from the two functions
I defined:

>
>> partiCmax(c(1,4,2,6,7,5),c(1,4,5))
> [1] 4 6 7
>> partiMax(c(1,4,2,6,7,5),c(1,4,5))
> [1] 4 6 7
>
>
> However, if I use an example in which the maxima are not
> non-decreasing:
>
>> partiCmax(6:1,c(1,4,5))
> [1] 6 6 6
>> partiMax(6:1,c(1,4,5))
> [1] 6 3 2
>
> partiMax gives the sought result, but partiCmax doesn't. Your
> function
>is like a cleaner version of partiMax, using the fact that you have the
>partition in a list. This begs the question of what is the most
>efficient way to create the partition list from the original vector and
>the index vector that specifies the partition. But I am also wondering
>whether we can find something even more efficient than the use of the
>apply family of functions. Thanks!

<snip>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Seeking a more efficient way to find partition maxima

Gabor Grothendieck
In reply to this post by Talbot Katz
Try testing the performance of transforming your series to
one in which the values of each partition are larger than all
prior partitions and the untransforming back:

# test data
myseq <- c(1, 4, 2, 6, 7, 5)
part <- c(1, 4, 5)

M <- max(myseq)

# transform
myseq2 <- myseq + M * cumsum(replace(0 * myseq, part, 1))

# calcuate on transformed version
tmp <- partiCmax(myseq2, part)

# untransform
tmp - M * seq_along(tmp) # c(4, 6, 7)

Also you might check how it compares to the simpler

   tapply(myseq, cumsum(replace(0 * myseq, part, 1)), max)

On Jan 7, 2008 11:18 AM, Talbot Katz <[hidden email]> wrote:

>
> Hi.
>
> Suppose I have a vector that I partition into disjoint, contiguous subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6].  I want to find the maximum element of each subvector.  In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
>
> partiCmax <- function( values, seriesIdx ) {
>  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
>  return( cummax( values )[ parti[ , 2 ] ] )
> }
>
>
> The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work.  The following function works (at least it did on the examples I tried):
>
> partiMax <- function( values, seriesIdx ) {
>  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
>  return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
> }
>
>
> but I figured someone out there could come up with something cleverer.  Thanks!
>
> --  TMK  --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Seeking a more efficient way to find partition maxima

Talbot Katz

Thanks Gabor!  This is the sort of thing I was trying to devise, but I ran up against my extensively documented brain power limitations ;-)  Per your remarks, it remains to be seen how it performs compared to a good implementation with one of the apply functions.

--  TMK  --
212-460-5430 home
917-656-5351 cell
t o p k a t z @ m s n . c o m



> Date: Mon, 7 Jan 2008 13:49:13 -0500
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Seeking a more efficient way to find partition maxima
> CC: [hidden email]
>
> Try testing the performance of transforming your series to
> one in which the values of each partition are larger than all
> prior partitions and the untransforming back:
>
> # test data
> myseq <- c(1, 4, 2, 6, 7, 5)
> part <- c(1, 4, 5)
>
> M <- max(myseq)
>
> # transform
> myseq2 <- myseq + M * cumsum(replace(0 * myseq, part, 1))
>
> # calcuate on transformed version
> tmp <- partiCmax(myseq2, part)
>
> # untransform
> tmp - M * seq_along(tmp) # c(4, 6, 7)
>
> Also you might check how it compares to the simpler
>
> tapply(myseq, cumsum(replace(0 * myseq, part, 1)), max)
>
> On Jan 7, 2008 11:18 AM, Talbot Katz  wrote:
>>
>> Hi.
>>
>> Suppose I have a vector that I partition into disjoint, contiguous subvectors. For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to find the maximum element of each subvector. In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
>>
>> partiCmax <- function( values, seriesIdx ) {
>> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>> parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
>> return( cummax( values )[ parti[ , 2 ] ] )
>> }
>>
>>
>> The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work. The following function works (at least it did on the examples I tried):
>>
>> partiMax <- function( values, seriesIdx ) {
>> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>> parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
>> return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
>> }
>>
>>
>> but I figured someone out there could come up with something cleverer. Thanks!
>>
>> -- TMK --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Efficient way to substract all entries in two vectors from each other

Ido Tamir
In reply to this post by Talbot Katz
Johannes Graumann wrote:

> Hi all,
>
> I'm to inexperienced to come up with the matrix solution elusively
> appearing on the horizon for the following problem and would appreciate if
> you could give me a nudge ...
> I have two vectors a, and b and need to find the closest match for each
> value of a in b.
> How to do that efficiently?

I am not sure if I understood your problem correctly,
but maybe this is of help.
However its not memory efficient.

> a <- c(3,10,50,100)
> b <- c(7,8,300)
> b[apply(abs(outer(a,b,"-")),1,which.min)]
[1] 7 8 8 8

In Biobase there is the matchpt function doing this
efficiently.

best wishes,
ido

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Efficient way to substract all entries in two vectors from each other

Johannes
Ido M. Tamir wrote:

> matchpt
Thanks for this hint. It is exactly what I'm looking for.

Cheers, Joh

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Seeking a more efficient way to find partition maxima

jholtman
In reply to this post by Talbot Katz
Does this do what you want?

> x <- c(1,4,2,6,7,5)
> x.i <- c(1,4,5)
> partiMax <- function(vec, index){
+     # create a vector to identify the subvectors
+     x.b <- diff(c(index, length(vec) + 1))
+     # split up the vector
+     x.s <- split(vec, rep(seq_along(index), times=x.b))
+     unlist(lapply(x.s, max))
+ }
>
> partiMax(x, x.i)
1 2 3
4 6 7
> partiMax(6:1, x.i)
1 2 3
6 3 2
>
>


On Jan 7, 2008 11:18 AM, Talbot Katz <[hidden email]> wrote:

>
> Hi.
>
> Suppose I have a vector that I partition into disjoint, contiguous subvectors.  For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6].  I want to find the maximum element of each subvector.  In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7.  If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
>
> partiCmax <- function( values, seriesIdx ) {
>  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
>  return( cummax( values )[ parti[ , 2 ] ] )
> }
>
>
> The use of cummax makes that pretty efficient, but if the subvector maxima are not non-decreasing, it doesn't work.  The following function works (at least it did on the examples I tried):
>
> partiMax <- function( values, seriesIdx ) {
>  # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>  parti <- cbind( seriesIdx, c( ( seriesIdx[ -1 ] - 1 ), length( values ) ) )
>  return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
> }
>
>
> but I figured someone out there could come up with something cleverer.  Thanks!
>
> --  TMK  --212-460-5430 home917-656-5351 cellt o p k a t z @ m s n . c o m
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.