Inconsistency in median()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistency in median()

Gustavo Zapata Wainberg
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in median()

Martin Maechler
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in median()

Gustavo Zapata Wainberg
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in median()

David Winsemius
It would almost trivial to make a wrapper tha first captures attributes, runs median, and then returns the Re-attribute-ed value.

David.

Sent from my iPhone

> On May 5, 2021, at 8:29 AM, Gustavo Zapata Wainberg <[hidden email]> wrote:
>
> Hi, thanks Dr. Mächler for your prompt response!
>
> I agree with your explanations about this issue. But I was thinking of
> something like adding an argument to median() and mean() that could keep
> the attributes of the variables if set to TRUE.
>
> Thanks again.
>
> Best regards
>
> El mar, 4 may 2021 a las 17:57, Martin Maechler (<[hidden email]>)
> escribió:
>
>>>>>>> Gustavo Zapata Wainberg
>>>>>>>    on Mon, 3 May 2021 20:48:49 +0200 writes:
>>
>>> Hi!
>>
>>> I'm wrinting this post because there is an inconsistency
>>> when median() is calculated for even or odd vectors. For
>>> odd vectors, attributes (such as labels added with Hmisc)
>>> are kept after running median(), but this is not the case
>>> if the vector is even, in this last case attributes are
>>> lost.
>>
>>> I know that this is due to median() using mean() to obtain
>>> the result when the vector is even, and mean() always
>>> takes attributes off vectors.
>>
>> Yes, and this has been the design of  median()  for ever :
>>
>> If n := length(x)  is odd,  the median is "the middle" observation,
>>                   and should  equal to x[j] for j = (n+1)/2
>>                   and hence e.g., is well defined for an ordered factor.
>>
>> When  n  is even
>>     however, median() must be the mean of "the two middle" observations,
>>       which is e.g., not even *defined* for an ordered factor.
>>
>> We *could* talk of the so called lo-median  or hi-median
>> (terms probably coined by John W. Tukey) because (IIRC), these
>> are equal to each other and to the median for odd n, but
>> are   equal to  x[j]  and  x[j+1]   j=n/2  for even n *and* are
>> still "of the same kind" as x[]  itself.
>>
>> Interestingly, for the mad() { = the median absolute deviation from the
>> median}
>> we *do* allow to specify logical 'low' and 'high',
>> but that for the "outer" median in MAD's definition, not the
>> inner one.
>>
>> ## From <Rsrc>/src/library/stats/R/mad.R :
>>
>> mad <- function(x, center = median(x), constant = 1.4826,
>>                na.rm = FALSE, low = FALSE, high = FALSE)
>> {
>>    if(na.rm)
>>        x <- x[!is.na(x)]
>>    n <- length(x)
>>    constant *
>>        if((low || high) && n%%2 == 0) {
>>            if(low && high) stop("'low' and 'high' cannot be both TRUE")
>>            n2 <- n %/% 2 + as.integer(high)
>>            sort(abs(x - center), partial = n2)[n2]
>>        }
>>        else median(abs(x - center))
>> }
>>
>>
>>
>>
>>> Don't you think that attributes should be kept in both
>>> cases?
>>
>> well, not all attributes can be kept.
>> Note that for *named* vectors x,  x[j] can (and does) keep the name,
>> but there's definitely no sensible name to give to (x[j] + x[j+1])/2
>>
>> I'm willing to collaborate with some, considering
>> to extend  median.default()  making  hi-median and lo-median
>> available to the user.
>> Both of these will always return x[j] for some j and hence keep
>> all (sensible!) attributes (well, if the `[`-method for the
>> corresponding class has been defined correctly; I've encountered
>> quite a few cases where people created vector-like classes but
>> did not provide a "correct"  subsetting method (typically you
>> should make sure both a `[[` and `[` method works!).
>>
>> Best regards,
>> Martin
>>
>> Martin Maechler
>> ETH Zurich  and  R Core team
>>
>>> And, going further, shouldn't mean() keep
>>> attributes as well? I have looked in R's Bugzilla and I
>>> didn't find an entry related to this issue.
>>
>>> Please, let me know if you consider that this issue should
>>> be posted in R's bugzilla.
>>
>>> Here is an example with code.
>>
>>> rndvar <- rnorm(n = 100)
>>
>>> Hmisc::label(rndvar) <- "A label for RNDVAR"
>>
>>> str(median(rndvar[-c(1,2)]))
>>
>>> Returns: "num 0.0368"
>>
>>> str(median(rndvar[-1]))
>>
>>> Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A
>>> label for RNDVAR"
>>
>>> Thanks in advance!
>>
>>> Gustavo Zapata-Wainberg
>>
>>>  [[alternative HTML version deleted]]
>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel