Silent failure with NA results in fligner.test()

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Silent failure with NA results in fligner.test()

karoliskoncevicius
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Silent failure with NA results in fligner.test()

Kurt Hornik-5
>>>>> Karolis K writes:

Any preferences?

Best
-k

> Hello,
> In certain cases fligner.test() returns NaN statistic and NA p-value.
> The issue happens when, after centering with the median, all absolute values become constant, which ten leads to identical ranks.

> Below are a few examples:

> # 2 groups, 2 values each
> # issue is caused by residual values after centering (-0.5, 0.5, -0.5, 0.5)
> # then, after taking the absolute value, all the ranks become identical.
>> fligner.test(c(2,3,4,5), gl(2,2))

>         Fligner-Killeen test of homogeneity of variances

> data:  c(2, 3, 4, 5) and gl(2, 2)
> Fligner-Killeen:med chi-squared = NaN, df = 1, p-value = NA


> # similar situation with more observations and 3 groups
>> fligner.test(c(2,3,2,3,4,4,5,5,8,9,9,8), gl(3,4))

>         Fligner-Killeen test of homogeneity of variances

> data:  c(2, 3, 2, 3, 4, 4, 5, 5, 8, 9, 9, 8) and gl(3, 4)
> Fligner-Killeen:med chi-squared = NaN, df = 2, p-value = NA


> Two simple patches are proposed below. One returns an error, and another returns a p-value of 1.
> Not sure which one is more appropriate, so submitting both.

> Warm regards,
> Karolis Koncevičius

> ---

> Index: fligner.test.R
> ===================================================================
> --- fligner.test.R (revision 79650)
> +++ fligner.test.R (working copy)
> @@ -59,8 +59,13 @@
>          stop("data are essentially constant")
 

>      a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> -    STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> -    STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> +    if (var(a) > 0) {
> +        STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> +        STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> +    }
> +    else {
> +        STATISTIC <- 0
> +    }
>      PARAMETER <- k - 1
>      PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
>      names(STATISTIC) <- "Fligner-Killeen:med chi-squared”

> ---

> Index: fligner.test.R
> ===================================================================
> --- fligner.test.R (revision 79650)
> +++ fligner.test.R (working copy)
> @@ -57,6 +57,8 @@
>      x <- x - tapply(x,g,median)[g]
>      if (all(x == 0))
>          stop("data are essentially constant")
> +    if (var(abs(x)) == 0)
> +        stop("absolute residuals from the median are essentially constant")
 
>      a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
>      STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))

> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Silent failure with NA results in fligner.test()

karoliskoncevicius
In reply to this post by karoliskoncevicius
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Silent failure with NA results in fligner.test()

Martin Maechler
In reply to this post by Kurt Hornik-5
Not sure....
If all of the variances are zero,  they are homogenous in that sense,
and I would give a  p-value of 1  ..
if only *some* of the variances are zero... it's less easy.

I still would try to *not* give an error in such cases  and even
prefer  NA  statistic and p-value..  because yes, these are "not
available" for such data.
But it is not strictly an error to try such a test on data of the
correct format...   Consequently, personally I would even try to not
give the current error ... but rather return NA values here:
>  if (all(x == 0))
>          stop("data are essentially constant")

On Mon, Dec 21, 2020 at 12:22 PM Kurt Hornik <[hidden email]> wrote:

>
> >>>>> Karolis K writes:
>
> Any preferences?
>
> Best
> -k
>
> > Hello,
> > In certain cases fligner.test() returns NaN statistic and NA p-value.
> > The issue happens when, after centering with the median, all absolute values become constant, which ten leads to identical ranks.
>
> > Below are a few examples:
>
> > # 2 groups, 2 values each
> > # issue is caused by residual values after centering (-0.5, 0.5, -0.5, 0.5)
> > # then, after taking the absolute value, all the ranks become identical.
> >> fligner.test(c(2,3,4,5), gl(2,2))
>
> >         Fligner-Killeen test of homogeneity of variances
>
> > data:  c(2, 3, 4, 5) and gl(2, 2)
> > Fligner-Killeen:med chi-squared = NaN, df = 1, p-value = NA
>
>
> > # similar situation with more observations and 3 groups
> >> fligner.test(c(2,3,2,3,4,4,5,5,8,9,9,8), gl(3,4))
>
> >         Fligner-Killeen test of homogeneity of variances
>
> > data:  c(2, 3, 2, 3, 4, 4, 5, 5, 8, 9, 9, 8) and gl(3, 4)
> > Fligner-Killeen:med chi-squared = NaN, df = 2, p-value = NA
>
>
> > Two simple patches are proposed below. One returns an error, and another returns a p-value of 1.
> > Not sure which one is more appropriate, so submitting both.
>
> > Warm regards,
> > Karolis Koncevičius
>
> > ---
>
> > Index: fligner.test.R
> > ===================================================================
> > --- fligner.test.R    (revision 79650)
> > +++ fligner.test.R    (working copy)
> > @@ -59,8 +59,13 @@
> >          stop("data are essentially constant")
>
> >      a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> > -    STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> > -    STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> > +    if (var(a) > 0) {
> > +        STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
> > +        STATISTIC <- (STATISTIC - n * mean(a)^2) / var(a)
> > +    }
> > +    else {
> > +        STATISTIC <- 0
> > +    }
> >      PARAMETER <- k - 1
> >      PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
> >      names(STATISTIC) <- "Fligner-Killeen:med chi-squared”
>
> > ---
>
> > Index: fligner.test.R
> > ===================================================================
> > --- fligner.test.R    (revision 79650)
> > +++ fligner.test.R    (working copy)
> > @@ -57,6 +57,8 @@
> >      x <- x - tapply(x,g,median)[g]
> >      if (all(x == 0))
> >          stop("data are essentially constant")
> > +    if (var(abs(x)) == 0)
> > +        stop("absolute residuals from the median are essentially constant")
>
> >      a <- qnorm((1 + rank(abs(x)) / (n + 1)) / 2)
> >      STATISTIC <- sum(tapply(a, g, "sum")^2 / tapply(a, g, "length"))
>
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Martin <[hidden email]>   http://stat.ethz.ch/~maechler
Seminar für Statistik, ETH Zürich     HG G 16       Rämistrasse 101
CH-8092 Zurich, SWITZERLAND           ☎ +41 44 632 3408        <><

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Silent failure with NA results in fligner.test()

Kurt Hornik-5
In reply to this post by karoliskoncevicius
>>>>> Karolis K writes:

> To me it seems like returning chi-sq = 0 and p-value = 1 would make sense.
> It would also be consistent with other scenarios of equal variance in all
> groups. One example:

> fligner.test(1:8, gl(2,4))
> #        Fligner-Killeen test of homogeneity of variances
> #
> # data:  1:8 and gl(2, 4)
> # Fligner-Killeen:med chi-squared = 0, df = 1, p-value = 1

> But I am aware that other tests implemented in stats:: sometimes throw
> errors in similar situations.

> Maybe someone more familiar with the behaviour and philosophy behind
> stats:: preferences can add more weight here?

Thanks for spotting this.  After some internal discussions, we've come
to the conclusion that there is no "obvious" way to handle situations
where the Fligner-Killeen:med chi-squared test statistic is undefined
(i.e., when the denominator is zero).  [Owing to the discreteness of the
ranks, trying to take limits will not work.]  For now, these
sitatuations consistently give NaN/NA instead of errors (and the numeric
computations were improved so that it should no longer possible to get a
zero denominator and a non-zero numerator).

Best
-k

> Warm regards,
> Karolis K.

> [[alternative HTML version deleted]]

> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel