

Hi.
Suppose I have a vector that I partition into disjoint, contiguous subvectors. For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to find the maximum element of each subvector. In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
partiCmax < function( values, seriesIdx ) {
# assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
return( cummax( values )[ parti[ , 2 ] ] )
}
The use of cummax makes that pretty efficient, but if the subvector maxima are not nondecreasing, it doesn't work. The following function works (at least it did on the examples I tried):
partiMax < function( values, seriesIdx ) {
# assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
}
but I figured someone out there could come up with something cleverer. Thanks!
 TMK 2124605430 home9176565351 cellt o p k a t z @ m s n . c o m
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Talbot Katz wrote:
> Hi.
>
> Suppose I have a vector that I partition into disjoint, contiguous
> subvectors. For example, let v = c(1,4,2,6,7,5), partition it into
> three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to
> find the maximum element of each subvector. In this example, max(v1)
> is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive
> subvector maxima would never decrease, as in the example, I could do
> the following:
>
> partiCmax < function( values, seriesIdx ) { # assume seriesIdx is
> increasing integer sequence beginning with 1, ending at less than or
> equal to length(values) parti < cbind( seriesIdx, c( ( seriesIdx[ 1
> ]  1 ), length( values ) ) ) return( cummax( values )[ parti[ , 2 ]
> ] ) }
>
>
> The use of cummax makes that pretty efficient, but if the subvector
> maxima are not nondecreasing, it doesn't work. The following
> function works (at least it did on the examples I tried):
>
> partiMax < function( values, seriesIdx ) { # assume seriesIdx is
> increasing integer sequence beginning with 1, ending at less than or
> equal to length(values) parti < cbind( seriesIdx, c( ( seriesIdx[ 1
> ]  1 ), length( values ) ) ) return( sapply( ( 1:length(seriesIdx)
> ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ]
> ) ) } ) ) }
>
>
> but I figured someone out there could come up with something
> cleverer. Thanks!
It is not clear how you are creating the partitions, but if you are
(hopefully) ending up with a list, such as:
> Vec.List
$v1
[1] 1 4 2
$v2
[1] 6
$v3
[1] 7 5
Then you can use:
> sapply(Vec.List, max, na.rm = TRUE)
v1 v2 v3
4 6 7
See ?sapply for more information.
HTH,
Marc Schwartz
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Marc.
Thank you for the swift response! I should have explained more about the partitions, I hoped it would be clear from the code. I am supplying an index vector to create the partitions. In my original example of v = c(1,4,2,6,7,5) with v1 = v[1:3], v2 = v[4], v3 = v[5:6], I would specify this partition with the index vector c(1,4,5), giving the first indices of each subvector; the implicit assumption is that the last index of partition i is the first index of partition (i + 1) minus 1, except for the last partition, which ends at the end of the original vector being partitioned. Here are the results of this example from the two functions I defined:
> partiCmax(c(1,4,2,6,7,5),c(1,4,5))
[1] 4 6 7
> partiMax(c(1,4,2,6,7,5),c(1,4,5))
[1] 4 6 7
>
However, if I use an example in which the maxima are not nondecreasing:
> partiCmax(6:1,c(1,4,5))
[1] 6 6 6
> partiMax(6:1,c(1,4,5))
[1] 6 3 2
>
partiMax gives the sought result, but partiCmax doesn't. Your function is like a cleaner version of partiMax, using the fact that you have the partition in a list. This begs the question of what is the most efficient way to create the partition list from the original vector and the index vector that specifies the partition. But I am also wondering whether we can find something even more efficient than the use of the apply family of functions. Thanks!
 TMK 
2124605430 home
9176565351 cell
t o p k a t z @ m s n . c o m
> Date: Mon, 7 Jan 2008 10:44:20 0600
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]
> Subject: Re: [R] Seeking a more efficient way to find partition maxima
>
> Talbot Katz wrote:
>> Hi.
>>
>> Suppose I have a vector that I partition into disjoint, contiguous
>> subvectors. For example, let v = c(1,4,2,6,7,5), partition it into
>> three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to
>> find the maximum element of each subvector. In this example, max(v1)
>> is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive
>> subvector maxima would never decrease, as in the example, I could do
>> the following:
>>
>> partiCmax < function( values, seriesIdx ) { # assume seriesIdx is
>> increasing integer sequence beginning with 1, ending at less than or
>> equal to length(values) parti < cbind( seriesIdx, c( ( seriesIdx[ 1
>> ]  1 ), length( values ) ) ) return( cummax( values )[ parti[ , 2 ]
>> ] ) }
>>
>>
>> The use of cummax makes that pretty efficient, but if the subvector
>> maxima are not nondecreasing, it doesn't work. The following
>> function works (at least it did on the examples I tried):
>>
>> partiMax < function( values, seriesIdx ) { # assume seriesIdx is
>> increasing integer sequence beginning with 1, ending at less than or
>> equal to length(values) parti < cbind( seriesIdx, c( ( seriesIdx[ 1
>> ]  1 ), length( values ) ) ) return( sapply( ( 1:length(seriesIdx)
>> ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ]
>> ) ) } ) ) }
>>
>>
>> but I figured someone out there could come up with something
>> cleverer. Thanks!
>
> It is not clear how you are creating the partitions, but if you are
> (hopefully) ending up with a list, such as:
>
>> Vec.List
> $v1
> [1] 1 4 2
>
> $v2
> [1] 6
>
> $v3
> [1] 7 5
>
>
> Then you can use:
>
>> sapply(Vec.List, max, na.rm = TRUE)
> v1 v2 v3
> 4 6 7
>
>
> See ?sapply for more information.
>
> HTH,
>
> Marc Schwartz
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Joh.
To echo Bert, more detail would be helpful. But here is a stab at something. Given two numeric vectors, a and b, for each entry of a this finds the index in b of the closest match. It's not efficient at all.
> bestmati < function( a, b ) { return( sapply( a, function( x ) { which.min( ( b  x )^2 ) } ) ) }
> bestmati( c( 3, 9 ), c( 1, 4, 2, 6, 7, 5 ) )
[1] 2 5
>
 TMK 
2124605430 home
9176565351 cell
t o p k a t z @ m s n . c o m
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Talbot,
Try this:
PartMax < function(x, breaks)
{
breaks < c(breaks, length(x) + 1)
sapply(seq(length(breaks)  1),
function(i) max(x[breaks[i]:(breaks[i + 1]  1)],
na.rm = TRUE))
}
> PartMax(c(1,4,2,6,7,5),c(1,4,5))
[1] 4 6 7
> PartMax(6:1,c(1,4,5))
[1] 6 3 2
I would not go to extremes in trying to avoid *apply functions. In many
cases, it will provide sufficient performance and enable the code to be
quite readable. Short of coding a new R function in a lower level
language like C, you are not likely to get more performance and then you
have to balance the time it takes to improve the code, versus simply
getting the job done reliably and moving on.
If you have very large initial vectors, with a large number of
subpartitions and will be doing this often, then it might make sense to
spend the time to profile the code and fine tune it. But I would first
get a sense of the actual running time on larger vectors before making
that decision.
HTH,
Marc
Talbot Katz wrote:
> Hi Marc.
>
> Thank you for the swift response! I should have explained more about
the partitions, I hoped it would be clear from the code. I am supplying
an index vector to create the partitions. In my original example of v =
c(1,4,2,6,7,5) with v1 = v[1:3], v2 = v[4], v3 = v[5:6], I would specify
this partition with the index vector c(1,4,5), giving the first indices
of each subvector; the implicit assumption is that the last index of
partition i is the first index of partition (i + 1) minus 1, except for
the last partition, which ends at the end of the original vector being
partitioned. Here are the results of this example from the two functions
I defined:
>
>> partiCmax(c(1,4,2,6,7,5),c(1,4,5))
> [1] 4 6 7
>> partiMax(c(1,4,2,6,7,5),c(1,4,5))
> [1] 4 6 7
>
>
> However, if I use an example in which the maxima are not
> nondecreasing:
>
>> partiCmax(6:1,c(1,4,5))
> [1] 6 6 6
>> partiMax(6:1,c(1,4,5))
> [1] 6 3 2
>
> partiMax gives the sought result, but partiCmax doesn't. Your
> function
>is like a cleaner version of partiMax, using the fact that you have the
>partition in a list. This begs the question of what is the most
>efficient way to create the partition list from the original vector and
>the index vector that specifies the partition. But I am also wondering
>whether we can find something even more efficient than the use of the
>apply family of functions. Thanks!
<snip>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Try testing the performance of transforming your series to
one in which the values of each partition are larger than all
prior partitions and the untransforming back:
# test data
myseq < c(1, 4, 2, 6, 7, 5)
part < c(1, 4, 5)
M < max(myseq)
# transform
myseq2 < myseq + M * cumsum(replace(0 * myseq, part, 1))
# calcuate on transformed version
tmp < partiCmax(myseq2, part)
# untransform
tmp  M * seq_along(tmp) # c(4, 6, 7)
Also you might check how it compares to the simpler
tapply(myseq, cumsum(replace(0 * myseq, part, 1)), max)
On Jan 7, 2008 11:18 AM, Talbot Katz < [hidden email]> wrote:
>
> Hi.
>
> Suppose I have a vector that I partition into disjoint, contiguous subvectors. For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to find the maximum element of each subvector. In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
>
> partiCmax < function( values, seriesIdx ) {
> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
> parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
> return( cummax( values )[ parti[ , 2 ] ] )
> }
>
>
> The use of cummax makes that pretty efficient, but if the subvector maxima are not nondecreasing, it doesn't work. The following function works (at least it did on the examples I tried):
>
> partiMax < function( values, seriesIdx ) {
> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
> parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
> return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
> }
>
>
> but I figured someone out there could come up with something cleverer. Thanks!
>
>  TMK 2124605430 home9176565351 cellt o p k a t z @ m s n . c o m
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks Gabor! This is the sort of thing I was trying to devise, but I ran up against my extensively documented brain power limitations ;) Per your remarks, it remains to be seen how it performs compared to a good implementation with one of the apply functions.
 TMK 
2124605430 home
9176565351 cell
t o p k a t z @ m s n . c o m
> Date: Mon, 7 Jan 2008 13:49:13 0500
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Seeking a more efficient way to find partition maxima
> CC: [hidden email]
>
> Try testing the performance of transforming your series to
> one in which the values of each partition are larger than all
> prior partitions and the untransforming back:
>
> # test data
> myseq < c(1, 4, 2, 6, 7, 5)
> part < c(1, 4, 5)
>
> M < max(myseq)
>
> # transform
> myseq2 < myseq + M * cumsum(replace(0 * myseq, part, 1))
>
> # calcuate on transformed version
> tmp < partiCmax(myseq2, part)
>
> # untransform
> tmp  M * seq_along(tmp) # c(4, 6, 7)
>
> Also you might check how it compares to the simpler
>
> tapply(myseq, cumsum(replace(0 * myseq, part, 1)), max)
>
> On Jan 7, 2008 11:18 AM, Talbot Katz wrote:
>>
>> Hi.
>>
>> Suppose I have a vector that I partition into disjoint, contiguous subvectors. For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to find the maximum element of each subvector. In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
>>
>> partiCmax < function( values, seriesIdx ) {
>> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>> parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
>> return( cummax( values )[ parti[ , 2 ] ] )
>> }
>>
>>
>> The use of cummax makes that pretty efficient, but if the subvector maxima are not nondecreasing, it doesn't work. The following function works (at least it did on the examples I tried):
>>
>> partiMax < function( values, seriesIdx ) {
>> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
>> parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
>> return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
>> }
>>
>>
>> but I figured someone out there could come up with something cleverer. Thanks!
>>
>>  TMK 2124605430 home9176565351 cellt o p k a t z @ m s n . c o m
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Johannes Graumann wrote:
> Hi all,
>
> I'm to inexperienced to come up with the matrix solution elusively
> appearing on the horizon for the following problem and would appreciate if
> you could give me a nudge ...
> I have two vectors a, and b and need to find the closest match for each
> value of a in b.
> How to do that efficiently?
I am not sure if I understood your problem correctly,
but maybe this is of help.
However its not memory efficient.
> a < c(3,10,50,100)
> b < c(7,8,300)
> b[apply(abs(outer(a,b,"")),1,which.min)]
[1] 7 8 8 8
In Biobase there is the matchpt function doing this
efficiently.
best wishes,
ido
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Does this do what you want?
> x < c(1,4,2,6,7,5)
> x.i < c(1,4,5)
> partiMax < function(vec, index){
+ # create a vector to identify the subvectors
+ x.b < diff(c(index, length(vec) + 1))
+ # split up the vector
+ x.s < split(vec, rep(seq_along(index), times=x.b))
+ unlist(lapply(x.s, max))
+ }
>
> partiMax(x, x.i)
1 2 3
4 6 7
> partiMax(6:1, x.i)
1 2 3
6 3 2
>
>
On Jan 7, 2008 11:18 AM, Talbot Katz < [hidden email]> wrote:
>
> Hi.
>
> Suppose I have a vector that I partition into disjoint, contiguous subvectors. For example, let v = c(1,4,2,6,7,5), partition it into three subvectors, v1 = v[1:3], v2 = v[4], v3 = v[5:6]. I want to find the maximum element of each subvector. In this example, max(v1) is 4, max(v2) is 6, max(v3) is 7. If I knew that the successive subvector maxima would never decrease, as in the example, I could do the following:
>
> partiCmax < function( values, seriesIdx ) {
> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
> parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
> return( cummax( values )[ parti[ , 2 ] ] )
> }
>
>
> The use of cummax makes that pretty efficient, but if the subvector maxima are not nondecreasing, it doesn't work. The following function works (at least it did on the examples I tried):
>
> partiMax < function( values, seriesIdx ) {
> # assume seriesIdx is increasing integer sequence beginning with 1, ending at less than or equal to length(values)
> parti < cbind( seriesIdx, c( ( seriesIdx[ 1 ]  1 ), length( values ) ) )
> return( sapply( ( 1:length(seriesIdx) ), function ( i ) {return( max( values[ parti[ i, 1 ]:parti[ i, 2 ] ] ) ) } ) )
> }
>
>
> but I figured someone out there could come up with something cleverer. Thanks!
>
>  TMK 2124605430 home9176565351 cellt o p k a t z @ m s n . c o m
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

