Inconsistency in median()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistency in median()

Gustavo Zapata Wainberg
Hi!

I'm wrinting this post because there is an inconsistency when median() is
calculated for even or odd vectors. For odd vectors, attributes (such as
labels added with Hmisc) are kept after running median(), but this is not
the case if the vector is even, in this last case attributes are lost.

I know that this is due to median() using mean() to obtain the result when
the vector is even, and mean() always takes attributes off vectors.

Don't you think that attributes should be kept in both cases? And, going
further, shouldn't mean() keep attributes as well? I have looked in R's
Bugzilla and I didn't find an entry related to this issue.

Please, let me know if you consider that this issue should be posted in R's
bugzilla.

Here is an example with code.

rndvar <- rnorm(n = 100)

Hmisc::label(rndvar) <- "A label for RNDVAR"

str(median(rndvar[-c(1,2)]))

Returns: "num 0.0368"

str(median(rndvar[-1]))

Returns:
 'labelled' num 0.0322
 - attr(*, "label")= chr "A label for RNDVAR"

Thanks in advance!

Gustavo Zapata-Wainberg

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in median()

Martin Maechler
>>>>> Gustavo Zapata Wainberg
>>>>>     on Mon, 3 May 2021 20:48:49 +0200 writes:

    > Hi!

    > I'm wrinting this post because there is an inconsistency
    > when median() is calculated for even or odd vectors. For
    > odd vectors, attributes (such as labels added with Hmisc)
    > are kept after running median(), but this is not the case
    > if the vector is even, in this last case attributes are
    > lost.

    > I know that this is due to median() using mean() to obtain
    > the result when the vector is even, and mean() always
    > takes attributes off vectors.

Yes, and this has been the design of  median()  for ever :

If n := length(x)  is odd,  the median is "the middle" observation,
                   and should  equal to x[j] for j = (n+1)/2
                   and hence e.g., is well defined for an ordered factor.

When  n  is even
     however, median() must be the mean of "the two middle" observations,
       which is e.g., not even *defined* for an ordered factor.

We *could* talk of the so called lo-median  or hi-median
(terms probably coined by John W. Tukey) because (IIRC), these
are equal to each other and to the median for odd n, but
are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
still "of the same kind" as x[]  itself.

Interestingly, for the mad() { = the median absolute deviation from the median}
we *do* allow to specify logical 'low' and 'high',
but that for the "outer" median in MAD's definition, not the
inner one.

## From <Rsrc>/src/library/stats/R/mad.R :

mad <- function(x, center = median(x), constant = 1.4826,
                na.rm = FALSE, low = FALSE, high = FALSE)
{
    if(na.rm)
        x <- x[!is.na(x)]
    n <- length(x)
    constant *
        if((low || high) && n%%2 == 0) {
            if(low && high) stop("'low' and 'high' cannot be both TRUE")
            n2 <- n %/% 2 + as.integer(high)
            sort(abs(x - center), partial = n2)[n2]
        }
        else median(abs(x - center))
}




    > Don't you think that attributes should be kept in both
    > cases?

well, not all attributes can be kept.
Note that for *named* vectors x,  x[j] can (and does) keep the name,
but there's definitely no sensible name to give to (x[j] + x[j+1])/2

I'm willing to collaborate with some, considering
to extend  median.default()  making  hi-median and lo-median
available to the user.
Both of these will always return x[j] for some j and hence keep
all (sensible!) attributes (well, if the `[`-method for the
corresponding class has been defined correctly; I've encountered
quite a few cases where people created vector-like classes but
did not provide a "correct"  subsetting method (typically you
should make sure both a `[[` and `[` method works!).

Best regards,
Martin

Martin Maechler
ETH Zurich  and  R Core team

    > And, going further, shouldn't mean() keep
    > attributes as well? I have looked in R's Bugzilla and I
    > didn't find an entry related to this issue.

    > Please, let me know if you consider that this issue should
    > be posted in R's bugzilla.

    > Here is an example with code.

    > rndvar <- rnorm(n = 100)

    > Hmisc::label(rndvar) <- "A label for RNDVAR"

    > str(median(rndvar[-c(1,2)]))

    > Returns: "num 0.0368"

    > str(median(rndvar[-1]))

    > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
    > label for RNDVAR"

    > Thanks in advance!

    > Gustavo Zapata-Wainberg

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in median()

Gustavo Zapata Wainberg
Hi, thanks Dr. Mächler for your prompt response!

I agree with your explanations about this issue. But I was thinking of
something like adding an argument to median() and mean() that could keep
the attributes of the variables if set to TRUE.

Thanks again.

Best regards

El mar, 4 may 2021 a las 17:57, Martin Maechler (<[hidden email]>)
escribió:

> >>>>> Gustavo Zapata Wainberg
> >>>>>     on Mon, 3 May 2021 20:48:49 +0200 writes:
>
>     > Hi!
>
>     > I'm wrinting this post because there is an inconsistency
>     > when median() is calculated for even or odd vectors. For
>     > odd vectors, attributes (such as labels added with Hmisc)
>     > are kept after running median(), but this is not the case
>     > if the vector is even, in this last case attributes are
>     > lost.
>
>     > I know that this is due to median() using mean() to obtain
>     > the result when the vector is even, and mean() always
>     > takes attributes off vectors.
>
> Yes, and this has been the design of  median()  for ever :
>
> If n := length(x)  is odd,  the median is "the middle" observation,
>                    and should  equal to x[j] for j = (n+1)/2
>                    and hence e.g., is well defined for an ordered factor.
>
> When  n  is even
>      however, median() must be the mean of "the two middle" observations,
>        which is e.g., not even *defined* for an ordered factor.
>
> We *could* talk of the so called lo-median  or hi-median
> (terms probably coined by John W. Tukey) because (IIRC), these
> are equal to each other and to the median for odd n, but
> are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
> still "of the same kind" as x[]  itself.
>
> Interestingly, for the mad() { = the median absolute deviation from the
> median}
> we *do* allow to specify logical 'low' and 'high',
> but that for the "outer" median in MAD's definition, not the
> inner one.
>
> ## From <Rsrc>/src/library/stats/R/mad.R :
>
> mad <- function(x, center = median(x), constant = 1.4826,
>                 na.rm = FALSE, low = FALSE, high = FALSE)
> {
>     if(na.rm)
>         x <- x[!is.na(x)]
>     n <- length(x)
>     constant *
>         if((low || high) && n%%2 == 0) {
>             if(low && high) stop("'low' and 'high' cannot be both TRUE")
>             n2 <- n %/% 2 + as.integer(high)
>             sort(abs(x - center), partial = n2)[n2]
>         }
>         else median(abs(x - center))
> }
>
>
>
>
>     > Don't you think that attributes should be kept in both
>     > cases?
>
> well, not all attributes can be kept.
> Note that for *named* vectors x,  x[j] can (and does) keep the name,
> but there's definitely no sensible name to give to (x[j] + x[j+1])/2
>
> I'm willing to collaborate with some, considering
> to extend  median.default()  making  hi-median and lo-median
> available to the user.
> Both of these will always return x[j] for some j and hence keep
> all (sensible!) attributes (well, if the `[`-method for the
> corresponding class has been defined correctly; I've encountered
> quite a few cases where people created vector-like classes but
> did not provide a "correct"  subsetting method (typically you
> should make sure both a `[[` and `[` method works!).
>
> Best regards,
> Martin
>
> Martin Maechler
> ETH Zurich  and  R Core team
>
>     > And, going further, shouldn't mean() keep
>     > attributes as well? I have looked in R's Bugzilla and I
>     > didn't find an entry related to this issue.
>
>     > Please, let me know if you consider that this issue should
>     > be posted in R's bugzilla.
>
>     > Here is an example with code.
>
>     > rndvar <- rnorm(n = 100)
>
>     > Hmisc::label(rndvar) <- "A label for RNDVAR"
>
>     > str(median(rndvar[-c(1,2)]))
>
>     > Returns: "num 0.0368"
>
>     > str(median(rndvar[-1]))
>
>     > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
>     > label for RNDVAR"
>
>     > Thanks in advance!
>
>     > Gustavo Zapata-Wainberg
>
>     >   [[alternative HTML version deleted]]
>
>     > ______________________________________________
>     > [hidden email] mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in median()

David Winsemius
It would almost trivial to make a wrapper tha first captures attributes, runs median, and then returns the Re-attribute-ed value.

David.

Sent from my iPhone

> On May 5, 2021, at 8:29 AM, Gustavo Zapata Wainberg <[hidden email]> wrote:
>
> Hi, thanks Dr. Mächler for your prompt response!
>
> I agree with your explanations about this issue. But I was thinking of
> something like adding an argument to median() and mean() that could keep
> the attributes of the variables if set to TRUE.
>
> Thanks again.
>
> Best regards
>
> El mar, 4 may 2021 a las 17:57, Martin Maechler (<[hidden email]>)
> escribió:
>
>>>>>>> Gustavo Zapata Wainberg
>>>>>>>    on Mon, 3 May 2021 20:48:49 +0200 writes:
>>
>>> Hi!
>>
>>> I'm wrinting this post because there is an inconsistency
>>> when median() is calculated for even or odd vectors. For
>>> odd vectors, attributes (such as labels added with Hmisc)
>>> are kept after running median(), but this is not the case
>>> if the vector is even, in this last case attributes are
>>> lost.
>>
>>> I know that this is due to median() using mean() to obtain
>>> the result when the vector is even, and mean() always
>>> takes attributes off vectors.
>>
>> Yes, and this has been the design of  median()  for ever :
>>
>> If n := length(x)  is odd,  the median is "the middle" observation,
>>                   and should  equal to x[j] for j = (n+1)/2
>>                   and hence e.g., is well defined for an ordered factor.
>>
>> When  n  is even
>>     however, median() must be the mean of "the two middle" observations,
>>       which is e.g., not even *defined* for an ordered factor.
>>
>> We *could* talk of the so called lo-median  or hi-median
>> (terms probably coined by John W. Tukey) because (IIRC), these
>> are equal to each other and to the median for odd n, but
>> are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
>> still "of the same kind" as x[]  itself.
>>
>> Interestingly, for the mad() { = the median absolute deviation from the
>> median}
>> we *do* allow to specify logical 'low' and 'high',
>> but that for the "outer" median in MAD's definition, not the
>> inner one.
>>
>> ## From <Rsrc>/src/library/stats/R/mad.R :
>>
>> mad <- function(x, center = median(x), constant = 1.4826,
>>                na.rm = FALSE, low = FALSE, high = FALSE)
>> {
>>    if(na.rm)
>>        x <- x[!is.na(x)]
>>    n <- length(x)
>>    constant *
>>        if((low || high) && n%%2 == 0) {
>>            if(low && high) stop("'low' and 'high' cannot be both TRUE")
>>            n2 <- n %/% 2 + as.integer(high)
>>            sort(abs(x - center), partial = n2)[n2]
>>        }
>>        else median(abs(x - center))
>> }
>>
>>
>>
>>
>>> Don't you think that attributes should be kept in both
>>> cases?
>>
>> well, not all attributes can be kept.
>> Note that for *named* vectors x,  x[j] can (and does) keep the name,
>> but there's definitely no sensible name to give to (x[j] + x[j+1])/2
>>
>> I'm willing to collaborate with some, considering
>> to extend  median.default()  making  hi-median and lo-median
>> available to the user.
>> Both of these will always return x[j] for some j and hence keep
>> all (sensible!) attributes (well, if the `[`-method for the
>> corresponding class has been defined correctly; I've encountered
>> quite a few cases where people created vector-like classes but
>> did not provide a "correct"  subsetting method (typically you
>> should make sure both a `[[` and `[` method works!).
>>
>> Best regards,
>> Martin
>>
>> Martin Maechler
>> ETH Zurich  and  R Core team
>>
>>> And, going further, shouldn't mean() keep
>>> attributes as well? I have looked in R's Bugzilla and I
>>> didn't find an entry related to this issue.
>>
>>> Please, let me know if you consider that this issue should
>>> be posted in R's bugzilla.
>>
>>> Here is an example with code.
>>
>>> rndvar <- rnorm(n = 100)
>>
>>> Hmisc::label(rndvar) <- "A label for RNDVAR"
>>
>>> str(median(rndvar[-c(1,2)]))
>>
>>> Returns: "num 0.0368"
>>
>>> str(median(rndvar[-1]))
>>
>>> Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
>>> label for RNDVAR"
>>
>>> Thanks in advance!
>>
>>> Gustavo Zapata-Wainberg
>>
>>>  [[alternative HTML version deleted]]
>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel