I'm wrinting this post because there is an inconsistency when median() is calculated for even or odd vectors. For odd vectors, attributes (such as labels added with Hmisc) are kept after running median(), but this is not the case if the vector is even, in this last case attributes are lost. I know that this is due to median() using mean() to obtain the result when the vector is even, and mean() always takes attributes off vectors. Don't you think that attributes should be kept in both cases? And, going further, shouldn't mean() keep attributes as well? I have looked in R's Bugzilla and I didn't find an entry related to this issue. Please, let me know if you consider that this issue should be posted in R's bugzilla. Here is an example with code. rndvar <- rnorm(n = 100) Hmisc::label(rndvar) <- "A label for RNDVAR" str(median(rndvar[-c(1,2)])) Returns: "num 0.0368" str(median(rndvar[-1])) Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A label for RNDVAR" Thanks in advance! Gustavo Zapata-Wainberg
>>>>> Gustavo Zapata Wainberg
>>>>> on Mon, 3 May 2021 20:48:49 +0200 writes: > Hi! > I'm wrinting this post because there is an inconsistency > when median() is calculated for even or odd vectors. For > odd vectors, attributes (such as labels added with Hmisc) > are kept after running median(), but this is not the case > if the vector is even, in this last case attributes are > lost. > I know that this is due to median() using mean() to obtain > the result when the vector is even, and mean() always > takes attributes off vectors. Yes, and this has been the design of median() for ever : If n := length(x) is odd, the median is "the middle" observation, and should equal to x[j] for j = (n+1)/2 and hence e.g., is well defined for an ordered factor. When n is even however, median() must be the mean of "the two middle" observations, which is e.g., not even *defined* for an ordered factor. We *could* talk of the so called lo-median or hi-median (terms probably coined by John W. Tukey) because (IIRC), these are equal to each other and to the median for odd n, but are equal to x[j] and x[j+1] j=n/2 for even n *and* are still "of the same kind" as x[] itself. Interestingly, for the mad() { = the median absolute deviation from the median} we *do* allow to specify logical 'low' and 'high', but that for the "outer" median in MAD's definition, not the inner one. ## From <Rsrc>/src/library/stats/R/mad.R : mad <- function(x, center = median(x), constant = 1.4826, na.rm = FALSE, low = FALSE, high = FALSE) { if(na.rm) x <- x[!is.na(x)] n <- length(x) constant * if((low || high) && n%%2 == 0) { if(low && high) stop("'low' and 'high' cannot be both TRUE") n2 <- n %/% 2 + as.integer(high) sort(abs(x - center), partial = n2)[n2] } else median(abs(x - center)) } > Don't you think that attributes should be kept in both > cases? well, not all attributes can be kept. Note that for *named* vectors x, x[j] can (and does) keep the name, but there's definitely no sensible name to give to (x[j] + x[j+1])/2 I'm willing to collaborate with some, considering to extend median.default() making hi-median and lo-median available to the user. Both of these will always return x[j] for some j and hence keep all (sensible!) attributes (well, if the `[`-method for the corresponding class has been defined correctly; I've encountered quite a few cases where people created vector-like classes but did not provide a "correct" subsetting method (typically you should make sure both a `[[` and `[` method works!). Best regards, Martin Martin Maechler ETH Zurich and R Core team > And, going further, shouldn't mean() keep > attributes as well? I have looked in R's Bugzilla and I > didn't find an entry related to this issue. > Please, let me know if you consider that this issue should > be posted in R's bugzilla. > Here is an example with code. > rndvar <- rnorm(n = 100) > Hmisc::label(rndvar) <- "A label for RNDVAR" > str(median(rndvar[-c(1,2)])) > Returns: "num 0.0368" > str(median(rndvar[-1])) > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A > label for RNDVAR" > Thanks in advance! > Gustavo Zapata-Wainberg
Hi, thanks Dr. Mächler for your prompt response!
I agree with your explanations about this issue. But I was thinking of something like adding an argument to median() and mean() that could keep the attributes of the variables if set to TRUE. Thanks again. Best regards El mar, 4 may 2021 a las 17:57, Martin Maechler (<[hidden email]>) escribió: > >>>>> Gustavo Zapata Wainberg > >>>>> on Mon, 3 May 2021 20:48:49 +0200 writes: > > > Hi! > > > I'm wrinting this post because there is an inconsistency > > when median() is calculated for even or odd vectors. For > > odd vectors, attributes (such as labels added with Hmisc) > > are kept after running median(), but this is not the case > > if the vector is even, in this last case attributes are > > lost. > > > I know that this is due to median() using mean() to obtain > > the result when the vector is even, and mean() always > > takes attributes off vectors. > > Yes, and this has been the design of median() for ever : > > If n := length(x) is odd, the median is "the middle" observation, > and should equal to x[j] for j = (n+1)/2 > and hence e.g., is well defined for an ordered factor. > > When n is even > however, median() must be the mean of "the two middle" observations, > which is e.g., not even *defined* for an ordered factor. > > We *could* talk of the so called lo-median or hi-median > (terms probably coined by John W. Tukey) because (IIRC), these > are equal to each other and to the median for odd n, but > are equal to x[j] and x[j+1] j=n/2 for even n *and* are > still "of the same kind" as x[] itself. > > Interestingly, for the mad() { = the median absolute deviation from the > median} > we *do* allow to specify logical 'low' and 'high', > but that for the "outer" median in MAD's definition, not the > inner one. > > ## From <Rsrc>/src/library/stats/R/mad.R : > > mad <- function(x, center = median(x), constant = 1.4826, > na.rm = FALSE, low = FALSE, high = FALSE) > { > if(na.rm) > x <- x[!is.na(x)] > n <- length(x) > constant * > if((low || high) && n%%2 == 0) { > if(low && high) stop("'low' and 'high' cannot be both TRUE") > n2 <- n %/% 2 + as.integer(high) > sort(abs(x - center), partial = n2)[n2] > } > else median(abs(x - center)) > } > > > > > > Don't you think that attributes should be kept in both > > cases? > > well, not all attributes can be kept. > Note that for *named* vectors x, x[j] can (and does) keep the name, > but there's definitely no sensible name to give to (x[j] + x[j+1])/2 > > I'm willing to collaborate with some, considering > to extend median.default() making hi-median and lo-median > available to the user. > Both of these will always return x[j] for some j and hence keep > all (sensible!) attributes (well, if the `[`-method for the > corresponding class has been defined correctly; I've encountered > quite a few cases where people created vector-like classes but > did not provide a "correct" subsetting method (typically you > should make sure both a `[[` and `[` method works!). > > Best regards, > Martin > > Martin Maechler > ETH Zurich and R Core team > > > And, going further, shouldn't mean() keep > > attributes as well? I have looked in R's Bugzilla and I > > didn't find an entry related to this issue. > > > Please, let me know if you consider that this issue should > > be posted in R's bugzilla. > > > Here is an example with code. > > > rndvar <- rnorm(n = 100) > > > Hmisc::label(rndvar) <- "A label for RNDVAR" > > > str(median(rndvar[-c(1,2)])) > > > Returns: "num 0.0368" > > > str(median(rndvar[-1])) > > > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A > > label for RNDVAR" > > > Thanks in advance! > > > Gustavo Zapata-Wainberg
It would almost trivial to make a wrapper tha first captures attributes, runs median, and then returns the Re-attribute-ed value.
David. Sent from my iPhone > On May 5, 2021, at 8:29 AM, Gustavo Zapata Wainberg <[hidden email]> wrote: > > Hi, thanks Dr. Mächler for your prompt response! > > I agree with your explanations about this issue. But I was thinking of > something like adding an argument to median() and mean() that could keep > the attributes of the variables if set to TRUE. > > Thanks again. > > Best regards > > El mar, 4 may 2021 a las 17:57, Martin Maechler (<[hidden email]>) > escribió: > >>>>>>> Gustavo Zapata Wainberg >>>>>>> on Mon, 3 May 2021 20:48:49 +0200 writes: >> >>> Hi! >> >>> I'm wrinting this post because there is an inconsistency >>> when median() is calculated for even or odd vectors. For >>> odd vectors, attributes (such as labels added with Hmisc) >>> are kept after running median(), but this is not the case >>> if the vector is even, in this last case attributes are >>> lost. >> >>> I know that this is due to median() using mean() to obtain >>> the result when the vector is even, and mean() always >>> takes attributes off vectors. >> >> Yes, and this has been the design of median() for ever : >> >> If n := length(x) is odd, the median is "the middle" observation, >> and should equal to x[j] for j = (n+1)/2 >> and hence e.g., is well defined for an ordered factor. >> >> When n is even >> however, median() must be the mean of "the two middle" observations, >> which is e.g., not even *defined* for an ordered factor. >> >> We *could* talk of the so called lo-median or hi-median >> (terms probably coined by John W. Tukey) because (IIRC), these >> are equal to each other and to the median for odd n, but >> are equal to x[j] and x[j+1] j=n/2 for even n *and* are >> still "of the same kind" as x[] itself. >> >> Interestingly, for the mad() { = the median absolute deviation from the >> median} >> we *do* allow to specify logical 'low' and 'high', >> but that for the "outer" median in MAD's definition, not the >> inner one. >> >> ## From <Rsrc>/src/library/stats/R/mad.R : >> >> mad <- function(x, center = median(x), constant = 1.4826, >> na.rm = FALSE, low = FALSE, high = FALSE) >> { >> if(na.rm) >> x <- x[!is.na(x)] >> n <- length(x) >> constant * >> if((low || high) && n%%2 == 0) { >> if(low && high) stop("'low' and 'high' cannot be both TRUE") >> n2 <- n %/% 2 + as.integer(high) >> sort(abs(x - center), partial = n2)[n2] >> } >> else median(abs(x - center)) >> } >> >> >> >> >>> Don't you think that attributes should be kept in both >>> cases? >> >> well, not all attributes can be kept. >> Note that for *named* vectors x, x[j] can (and does) keep the name, >> but there's definitely no sensible name to give to (x[j] + x[j+1])/2 >> >> I'm willing to collaborate with some, considering >> to extend median.default() making hi-median and lo-median >> available to the user. >> Both of these will always return x[j] for some j and hence keep >> all (sensible!) attributes (well, if the `[`-method for the >> corresponding class has been defined correctly; I've encountered >> quite a few cases where people created vector-like classes but >> did not provide a "correct" subsetting method (typically you >> should make sure both a `[[` and `[` method works!). >> >> Best regards, >> Martin >> >> Martin Maechler >> ETH Zurich and R Core team >> >>> And, going further, shouldn't mean() keep >>> attributes as well? I have looked in R's Bugzilla and I >>> didn't find an entry related to this issue. >> >>> Please, let me know if you consider that this issue should >>> be posted in R's bugzilla. >> >>> Here is an example with code. >> >>> rndvar <- rnorm(n = 100) >> >>> Hmisc::label(rndvar) <- "A label for RNDVAR" >> >>> str(median(rndvar[-c(1,2)])) >> >>> Returns: "num 0.0368" >> >>> str(median(rndvar[-1])) >> >>> Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A >>> label for RNDVAR" >> >>> Thanks in advance! >> >>> Gustavo Zapata-Wainberg
