rank(, ties.method="last")

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

rank(, ties.method="last")

Marius Hofert-4
Hi,

I ran into a problem where I actually need rank(, ties.method="last"). It would
be great to have this feature in base and it's also simple to get (see below).

Thanks & cheers,
Marius


rank2 <- function (x, na.last = TRUE, ties.method = c("average",
"first", "last", # new "last"
    "random", "max", "min"))
{
    nas <- is.na(x)
    nm <- names(x)
    ties.method <- match.arg(ties.method)
    if (is.factor(x))
        x <- as.integer(x)
    x <- x[!nas]
    y <- switch(ties.method, average = , min = , max = .Internal(rank(x,
        length(x), ties.method)), first = sort.list(sort.list(x)),
        last = sort.list(sort.list(x, decreasing=TRUE),
decreasing=TRUE), # change
        random = sort.list(order(x, stats::runif(sum(!nas)))))
    if (!is.na(na.last) && any(nas)) {
        yy <- NA
        NAkeep <- (na.last == "keep")
        if (NAkeep || na.last) {
            yy[!nas] <- y
            if (!NAkeep)
                yy[nas] <- (length(y) + 1L):length(yy)
        }
        else {
            len <- sum(nas)
            yy[!nas] <- y + len
            yy[nas] <- seq_len(len)
        }
        y <- yy
        names(y) <- nm
    }
    else names(y) <- nm[!nas]
    y
}

## MWE
x <- c(10, 11, 11, 12, 12, 13)
rank(x, ties.method="first")
rank2(x, ties.method="last")

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: rank(, ties.method="last")

Martin Maechler

> I ran into a problem where I actually need rank(, ties.method="last"). It would
> be great to have this feature in base and it's also simple to get (see below).

> Thanks & cheers,
> Marius


> rank2 <- function (x, na.last = TRUE, ties.method = c("average",
> "first", "last", # new "last"
>     "random", "max", "min"))
> {
>     nas <- is.na(x)
>     nm <- names(x)
>     ties.method <- match.arg(ties.method)
>     if (is.factor(x))
>         x <- as.integer(x)
>     x <- x[!nas]
>     y <- switch(ties.method, average = , min = , max = .Internal(rank(x,
>         length(x), ties.method)), first = sort.list(sort.list(x)),
>         last = sort.list(sort.list(x, decreasing=TRUE), decreasing=TRUE), # change
>         random = sort.list(order(x, stats::runif(sum(!nas)))))
>     if (!is.na(na.last) && any(nas)) {
>         yy <- NA
>         NAkeep <- (na.last == "keep")
>         if (NAkeep || na.last) {
>             yy[!nas] <- y
>             if (!NAkeep)
>                 yy[nas] <- (length(y) + 1L):length(yy)
>         }
>         else {
>             len <- sum(nas)
>             yy[!nas] <- y + len
>             yy[nas] <- seq_len(len)
>         }
>         y <- yy
>         names(y) <- nm
>     }
>     else names(y) <- nm[!nas]
>     y
> }

> ## MWE
> x <- c(10, 11, 11, 12, 12, 13)
> rank(x, ties.method="first")
> rank2(x, ties.method="last")

Indeed, this makes sense to me, and is easy enough to document
and maintain, and preferable to asking useRs to use  rev(.) and
similar "easy" (but somewhat costly for large data!)
transformations to get the same....

Or have (Marius Hofert and I) overlooked something obvious ?

Martin Maechler,
ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: rank(, ties.method="last")

Henric Winell
Den 2015-10-09 kl. 12:14, skrev Martin Maechler:

>
>> I ran into a problem where I actually need rank(, ties.method="last"). It would
>> be great to have this feature in base and it's also simple to get (see below).
>
>> Thanks & cheers,
>> Marius
>
>
>> rank2 <- function (x, na.last = TRUE, ties.method = c("average",
>> "first", "last", # new "last"
>>      "random", "max", "min"))
>> {
>>      nas <- is.na(x)
>>      nm <- names(x)
>>      ties.method <- match.arg(ties.method)
>>      if (is.factor(x))
>>          x <- as.integer(x)
>>      x <- x[!nas]
>>      y <- switch(ties.method, average = , min = , max = .Internal(rank(x,
>>          length(x), ties.method)), first = sort.list(sort.list(x)),
>>          last = sort.list(sort.list(x, decreasing=TRUE), decreasing=TRUE), # change
>>          random = sort.list(order(x, stats::runif(sum(!nas)))))
>>      if (!is.na(na.last) && any(nas)) {
>>          yy <- NA
>>          NAkeep <- (na.last == "keep")
>>          if (NAkeep || na.last) {
>>              yy[!nas] <- y
>>              if (!NAkeep)
>>                  yy[nas] <- (length(y) + 1L):length(yy)
>>          }
>>          else {
>>              len <- sum(nas)
>>              yy[!nas] <- y + len
>>              yy[nas] <- seq_len(len)
>>          }
>>          y <- yy
>>          names(y) <- nm
>>      }
>>      else names(y) <- nm[!nas]
>>      y
>> }
>
>> ## MWE
>> x <- c(10, 11, 11, 12, 12, 13)
>> rank(x, ties.method="first")
>> rank2(x, ties.method="last")
>
> Indeed, this makes sense to me, and is easy enough to document
> and maintain, and preferable to asking useRs to use  rev(.) and
> similar "easy" (but somewhat costly for large data!)
> transformations to get the same....
>
> Or have (Marius Hofert and I) overlooked something obvious ?

I think so: the code above doesn't seem to do the right thing.  Consider
the following example:

 > x <- c(1, 1, 2, 3)
 > rank2(x, ties.method = "last")
[1] 1 2 4 3

That doesn't look right to me -- I had expected

 > rev(sort.list(x, decreasing = TRUE))
[1] 2 1 3 4


Henric Winell



>
> Martin Maechler,
> ETH Zurich
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: rank(, ties.method="last")

Marius Hofert-4
In reply to this post by Martin Maechler
On Tue, Oct 20, 2015 at 10:26 AM, Henric Winell
<[hidden email]> wrote:

> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
> I think so: the code above doesn't seem to do the right thing.  Consider
> the following example:
>
>  > x <- c(1, 1, 2, 3)
>  > rank2(x, ties.method = "last")
> [1] 1 2 4 3
>
> That doesn't look right to me -- I had expected
>
>  > rev(sort.list(x, decreasing = TRUE))
> [1] 2 1 3 4
>

Indeed, well spotted, that seems to be correct.

>
> Henric Winell
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: rank(, ties.method="last")

R devel mailing list
In reply to this post by Marius Hofert-4
Marius Hofert-4------------------------------

> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
> I think so: the code above doesn't seem to do the right thing.  Consider
> the following example:
>
>  > x <- c(1, 1, 2, 3)
>  > rank2(x, ties.method = "last")
> [1] 1 2 4 3
>
> That doesn't look right to me -- I had expected
>
>  > rev(sort.list(x, decreasing = TRUE))
> [1] 2 1 3 4
>

Indeed, well spotted, that seems to be correct.

>
> Henric Winell
>
------------------------------

In the particular example (of length 4), what is really wanted is the following.
ind <- integer(4)
ind[sort.list(x, decreasing=TRUE)] <- 4:1
ind

The following gives the desired result:
sort.list(rev(sort.list(x, decreasing=TRUE)))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: rank(, ties.method="last")

Henric Winell
Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via R-devel:

> Marius Hofert-4------------------------------
>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
>> I think so: the code above doesn't seem to do the right thing.  Consider
>> the following example:
>>
>>   > x <- c(1, 1, 2, 3)
>>   > rank2(x, ties.method = "last")
>> [1] 1 2 4 3
>>
>> That doesn't look right to me -- I had expected
>>
>>   > rev(sort.list(x, decreasing = TRUE))
>> [1] 2 1 3 4
>>
>
> Indeed, well spotted, that seems to be correct.
>
>>
>> Henric Winell
>>
> ------------------------------
>
> In the particular example (of length 4), what is really wanted is the following.
> ind <- integer(4)
> ind[sort.list(x, decreasing=TRUE)] <- 4:1
> ind

You don't provide the output here, but 'ind' is, of course,

 > ind
[1] 2 1 3 4

> The following gives the desired result:
> sort.list(rev(sort.list(x, decreasing=TRUE)))

And, again, no output, but

 > sort.list(rev(sort.list(x, decreasing=TRUE)))
[1] 2 1 3 4

Why is it necessary to use 'sort.list' on the result from
'rev(sort.list(...'?


Henric Winell



>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: rank(, ties.method="last")

Martin Maechler
>>>>> Henric Winell <[hidden email]>
>>>>>     on Wed, 21 Oct 2015 13:43:02 +0200 writes:

    > Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via R-devel:
    >> Marius Hofert-4------------------------------
    >>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
    >>> I think so: the code above doesn't seem to do the right thing.  Consider
    >>> the following example:
    >>>
    >>> > x <- c(1, 1, 2, 3)
    >>> > rank2(x, ties.method = "last")
    >>> [1] 1 2 4 3
    >>>
    >>> That doesn't look right to me -- I had expected
    >>>
    >>> > rev(sort.list(x, decreasing = TRUE))
    >>> [1] 2 1 3 4
    >>>
    >>
    >> Indeed, well spotted, that seems to be correct.
    >>
    >>>
    >>> Henric Winell
    >>>
    >> ------------------------------
    >>
    >> In the particular example (of length 4), what is really wanted is the following.
    >> ind <- integer(4)
    >> ind[sort.list(x, decreasing=TRUE)] <- 4:1
    >> ind

    > You don't provide the output here, but 'ind' is, of course,

    >> ind
    > [1] 2 1 3 4

    >> The following gives the desired result:
    >> sort.list(rev(sort.list(x, decreasing=TRUE)))

    > And, again, no output, but

    >> sort.list(rev(sort.list(x, decreasing=TRUE)))
    > [1] 2 1 3 4

    > Why is it necessary to use 'sort.list' on the result from
    > 'rev(sort.list(...'?

You can try all kind of code on this *too* simple example and do
experiments.  But let's approach this a bit more scientifically
and hence systematically:

Look at  rank  {the R function definition} to see that
for the case of no NA's,

 rank(x, ties.method = "first')   ===    sort.list(sort.list(x))

If you assume that to be correct and want to define "last" to be
correct as well (in the sense of being  "first"-consistent),
it is clear that

  rank(x, ties.method = "last)   ===  rev(sort.list(sort.list(rev(x))))

must also be correct.  I don't think that *any* of the proposals
so far had a correct version [but the too simplistic examples
did not show the problems].

In  R-devel (the R development) version of today, i.e., svn
revision >= 69549, the implementation of  ties.method = "last'
uses
        ## == rev(sort.list(sort.list(rev(x)))) :
        if(length(x) == 0) integer(0)
        else { i <- length(x):1L
               sort.list(sort.list(x[i]))[i] },

which is equivalent to using rev() but a bit more efficient.

Martin Maechler, ETH Zurich

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel