

Hello,
Writing to share some things I've found about wilcox.test() that seem a
a bit inconsistent.
1. Inf values are not removed if paired=TRUE
# returns different results (Inf is removed):
wilcox.test(c(1,2,3,4), c(0,9,8,7))
wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
# returns the same result (Inf is left as value with highest rank):
wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
2. tolerance issues with paired=TRUE.
wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
# ...
# Warning: cannot compute exact pvalue with ties
wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
# ...
# no warning
3. Always 'x observations are missing' when paired=TRUE
wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
# ...
# Error: not enough (finite) 'x' observations
4. No indication if normal approximation was used:
# different numbers, but same "method" name
wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
From all of these I am pretty sure the 1st one is likely unintended,
so attaching a small patch to adjust it. Can also try patching others if
consensus is reached that the behavioiur has to be modified.
Kind regards,
Karolis Koncevičius.

Index: wilcox.test.R
===================================================================
 wilcox.test.R (revision 77540)
+++ wilcox.test.R (working copy)
@@ 42,7 +42,7 @@
if(paired) {
if(length(x) != length(y))
stop("'x' and 'y' must have the same length")
 OK < complete.cases(x, y)
+ OK < is.finite(x) & is.finite(y)
x < x[OK]  y[OK]
y < NULL
}
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Your second issue seems like a more or less unavoidable floatingpoint
computation issue. The paired test operates by computing differences
between corresponding values of x and y.
It's not impossible to try to detect "almostties" (by testing for
differences less than, say, sqrt(.Machine$double.eps)), but it's
delicate and somewhat subjective/problemdependent.
Example:
options(digits=20)
> unique(c(4,3,2)c(3,2,1))
[1] 1
> unique(c(0.4,0.3,0.2)c(0.3,0.2,0.1))
[1] 0.100000000000000033307 0.099999999999999977796 0.100000000000000005551
On 20191207 1:55 p.m., Karolis Koncevičius wrote:
> Hello,
>
> Writing to share some things I've found about wilcox.test() that seem a
> a bit inconsistent.
>
> 1. Inf values are not removed if paired=TRUE
>
> # returns different results (Inf is removed):
> wilcox.test(c(1,2,3,4), c(0,9,8,7))
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>
> # returns the same result (Inf is left as value with highest rank):
> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>
> 2. tolerance issues with paired=TRUE.
>
> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
> # ...
> # Warning: cannot compute exact pvalue with ties
>
> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
> # ...
> # no warning
>
> 3. Always 'x observations are missing' when paired=TRUE
>
> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
> # ...
> # Error: not enough (finite) 'x' observations
>
> 4. No indication if normal approximation was used:
>
> # different numbers, but same "method" name
> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>
>
> From all of these I am pretty sure the 1st one is likely unintended,
> so attaching a small patch to adjust it. Can also try patching others if
> consensus is reached that the behavioiur has to be modified.
>
> Kind regards,
> Karolis Koncevičius.
>
> 
>
> Index: wilcox.test.R
> ===================================================================
>  wilcox.test.R (revision 77540)
> +++ wilcox.test.R (working copy)
> @@ 42,7 +42,7 @@
> if(paired) {
> if(length(x) != length(y))
> stop("'x' and 'y' must have the same length")
>  OK < complete.cases(x, y)
> + OK < is.finite(x) & is.finite(y)
> x < x[OK]  y[OK]
> y < NULL
> }
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Thank you for responding, and so quickly at that.
Yes, I do understand that this is a floating point issue.
However, since wilcox.test() works on ranks this might be a bit
dangerous in my opinion. Maybe more so than for magnitude based tests.
Any small precision error will be ranked and it becomes a matter of
errors being systematically >0 or <0 in one group.
Here is one example that I do not like:
x < seq(0.9, 0.2, 0.1)
y < seq(0.8, 0.1, 0.1)
wilcox.test(x, y, paired=TRUE, mu=0.1)
Wilcoxon signed rank test with continuity correction
data: x and y
V = 0, pvalue = 0.01471
alternative hypothesis: true location shift is not equal to 0.1
# ... Warning, due to some precision deviations being duplicated ...
xy
[1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
sign(xy  0.1)
[1] 1 1 1 1 1 1 1 0
t.test() uses .Machine$double.eps with stderr and avoids this issue:
t.test(x, y, paired=TRUE, mu=0.1)
Error in t.test.default(x, y, paired = TRUE, mu = 0.1) :
data are essentially constant
On 20191207 14:41, Ben Bolker wrote:
>
> Your second issue seems like a more or less unavoidable floatingpoint
>computation issue. The paired test operates by computing differences
>between corresponding values of x and y.
>
> It's not impossible to try to detect "almostties" (by testing for
>differences less than, say, sqrt(.Machine$double.eps)), but it's
>delicate and somewhat subjective/problemdependent.
>
> Example:
>
>options(digits=20)
>> unique(c(4,3,2)c(3,2,1))
>[1] 1
>> unique(c(0.4,0.3,0.2)c(0.3,0.2,0.1))
>[1] 0.100000000000000033307 0.099999999999999977796 0.100000000000000005551
>
>On 20191207 1:55 p.m., Karolis Koncevičius wrote:
>> Hello,
>>
>> Writing to share some things I've found about wilcox.test() that seem a
>> a bit inconsistent.
>>
>> 1. Inf values are not removed if paired=TRUE
>>
>> # returns different results (Inf is removed):
>> wilcox.test(c(1,2,3,4), c(0,9,8,7))
>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>>
>> # returns the same result (Inf is left as value with highest rank):
>> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>>
>> 2. tolerance issues with paired=TRUE.
>>
>> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
>> # ...
>> # Warning: cannot compute exact pvalue with ties
>>
>> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
>> # ...
>> # no warning
>>
>> 3. Always 'x observations are missing' when paired=TRUE
>>
>> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
>> # ...
>> # Error: not enough (finite) 'x' observations
>>
>> 4. No indication if normal approximation was used:
>>
>> # different numbers, but same "method" name
>> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
>> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>>
>>
>> From all of these I am pretty sure the 1st one is likely unintended,
>> so attaching a small patch to adjust it. Can also try patching others if
>> consensus is reached that the behavioiur has to be modified.
>>
>> Kind regards,
>> Karolis Koncevičius.
>>
>> 
>>
>> Index: wilcox.test.R
>> ===================================================================
>>  wilcox.test.R (revision 77540)
>> +++ wilcox.test.R (working copy)
>> @@ 42,7 +42,7 @@
>> if(paired) {
>> if(length(x) != length(y))
>> stop("'x' and 'y' must have the same length")
>>  OK < complete.cases(x, y)
>> + OK < is.finite(x) & is.finite(y)
>> x < x[OK]  y[OK]
>> y < NULL
>> }
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rdevel>
>______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Karolis Koncevičius
>>>>> on Sat, 7 Dec 2019 20:55:36 +0200 writes:
> Hello,
> Writing to share some things I've found about wilcox.test() that seem a
> a bit inconsistent.
> 1. Inf values are not removed if paired=TRUE
> # returns different results (Inf is removed):
> wilcox.test(c(1,2,3,4), c(0,9,8,7))
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
> # returns the same result (Inf is left as value with highest rank):
> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
Now which of the two cases do you consider correct ?
IHMO, the 2nd one is correct: it is exactly one property of the
*robustness* of the wilcoxon test and very desirable that any
(positive) outlier is treated the same as just "the largest
value" and the test statistic (and hence the pvalue) should not
change.
So I think the first case is wrong, notably if modified, (such
that the last y is the overall maximal value (slightly larger sample):
> wilcox.test(1:7, 1/8+ c(9:4, 12))
Wilcoxon rank sum test
data: 1:7 and 1/8 + c(9:4, 12)
W = 6, pvalue = 0.01748
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(1:7, 1/8+ c(9:4, 10000))
Wilcoxon rank sum test
data: 1:7 and 1/8 + c(9:4, 10000)
W = 6, pvalue = 0.01748
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(1:7, 1/8+ c(9:4, Inf))
Wilcoxon rank sum test
data: 1:7 and 1/8 + c(9:4, Inf)
W = 6, pvalue = 0.03497
alternative hypothesis: true location shift is not equal to 0
The Inf case should definitely give the same as the 10'000 case.
That's exactly one property of a robust statistic.
Thank you, Karolis, this is pretty embarrassing to only be detected now after 25+ years of R in use ...
The correct fix starts with replacing the is.finite() by !is.na() and keep the 'Inf' in the rank computations...
(but then probably also deal with the case of more than one Inf, notably the Inf  Inf "exception" which is not triggered by your example...)

Ben addressed the "rounding" / numerical issues unavoidable for
the other problems.
> 2. tolerance issues with paired=TRUE.
> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
> # ...
> # Warning: cannot compute exact pvalue with ties
> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
> # ...
> # no warning
> 3. Always 'x observations are missing' when paired=TRUE
> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
> # ...
> # Error: not enough (finite) 'x' observations
> 4. No indication if normal approximation was used:
> # different numbers, but same "method" name
> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
> From all of these I am pretty sure the 1st one is likely unintended,
> so attaching a small patch to adjust it. Can also try patching others if
> consensus is reached that the behavioiur has to be modified.
> Kind regards,
> Karolis Koncevičius.
> 
> Index: wilcox.test.R
> ===================================================================
>  wilcox.test.R (revision 77540)
> +++ wilcox.test.R (working copy)
> @@ 42,7 +42,7 @@
> if(paired) {
> if(length(x) != length(y))
> stop("'x' and 'y' must have the same length")
>  OK < complete.cases(x, y)
> + OK < is.finite(x) & is.finite(y)
> x < x[OK]  y[OK]
> y < NULL
> }
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Thank you for a fast response. Nice to see this mailing list being so
alive.
Regarding Inf issue: I agree with your assessment that Inf should not be
removed. The code gave me an impression that Inf values were
intentionally removed (since is.finite() was used everywhere, except for
paired case). I will try to adjust my patch according to your feedback.
One more thing: it seems like you assumed that issues 2:4 are all
related to machine precision, which is not the case  only 2nd issue is.
Just wanted to draw this to your attention, in case you might have some
feedback and guidelines about issues 3 and 4 as well.
On 20191207 21:59, Martin Maechler wrote:
>>>>>> Karolis Koncevičius
>>>>>> on Sat, 7 Dec 2019 20:55:36 +0200 writes:
>
> > Hello,
> > Writing to share some things I've found about wilcox.test() that seem a
> > a bit inconsistent.
>
> > 1. Inf values are not removed if paired=TRUE
>
> > # returns different results (Inf is removed):
> > wilcox.test(c(1,2,3,4), c(0,9,8,7))
> > wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>
> > # returns the same result (Inf is left as value with highest rank):
> > wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
> > wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>
>Now which of the two cases do you consider correct ?
>
>IHMO, the 2nd one is correct: it is exactly one property of the
>*robustness* of the wilcoxon test and very desirable that any
>(positive) outlier is treated the same as just "the largest
>value" and the test statistic (and hence the pvalue) should not
>change.
>
>So I think the first case is wrong, notably if modified, (such
>that the last y is the overall maximal value (slightly larger sample):
>
>> wilcox.test(1:7, 1/8+ c(9:4, 12))
>
> Wilcoxon rank sum test
>
>data: 1:7 and 1/8 + c(9:4, 12)
>W = 6, pvalue = 0.01748
>alternative hypothesis: true location shift is not equal to 0
>
>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
>
> Wilcoxon rank sum test
>
>data: 1:7 and 1/8 + c(9:4, 10000)
>W = 6, pvalue = 0.01748
>alternative hypothesis: true location shift is not equal to 0
>
>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
>
> Wilcoxon rank sum test
>
>data: 1:7 and 1/8 + c(9:4, Inf)
>W = 6, pvalue = 0.03497
>alternative hypothesis: true location shift is not equal to 0
>
>The Inf case should definitely give the same as the 10'000 case.
>That's exactly one property of a robust statistic.
>
>Thank you, Karolis, this is pretty embarrassing to only be detected now after 25+ years of R in use ...
>
>The correct fix starts with replacing the is.finite() by !is.na() and keep the 'Inf' in the rank computations...
>(but then probably also deal with the case of more than one Inf, notably the Inf  Inf "exception" which is not triggered by your example...)
>
>
>
>
>Ben addressed the "rounding" / numerical issues unavoidable for
>the other problems.
>
> > 2. tolerance issues with paired=TRUE.
>
> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
> > # ...
> > # Warning: cannot compute exact pvalue with ties
>
> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
> > # ...
> > # no warning
>
> > 3. Always 'x observations are missing' when paired=TRUE
>
> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
> > # ...
> > # Error: not enough (finite) 'x' observations
>
> > 4. No indication if normal approximation was used:
>
> > # different numbers, but same "method" name
> > wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
> > wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>
>
> > From all of these I am pretty sure the 1st one is likely unintended,
> > so attaching a small patch to adjust it. Can also try patching others if
> > consensus is reached that the behavioiur has to be modified.
>
> > Kind regards,
> > Karolis Koncevičius.
>
> > 
>
> > Index: wilcox.test.R
> > ===================================================================
> >  wilcox.test.R (revision 77540)
> > +++ wilcox.test.R (working copy)
> > @@ 42,7 +42,7 @@
> > if(paired) {
> > if(length(x) != length(y))
> > stop("'x' and 'y' must have the same length")
> >  OK < complete.cases(x, y)
> > + OK < is.finite(x) & is.finite(y)
> > x < x[OK]  y[OK]
> > y < NULL
> > }
>
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


I'd like to ask the developers to include some exact computation for
ties into wilcox.test(). Just try
wilcox.test(c(1,1,5),c(10,11))
wilcox.test(c(1,2,5),c(10,11))
The pvalues differ significantly.
But if I try
library(exactRankTests)
wilcox.exact(c(1,1,5),c(10,11))
wilcox.exact(c(1,2,5),c(10,11))
then the pvalues coincide as expected. (R is used by many
nonmathematicians/nonengineers who pay no attention at warnings.)
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


So I tried adding Infinity support for all cases.
And it is (as could be expected) more complicated than I thought.
It is easy to add Inf support for the test. The problems start with conf.int=TRUE.
Currently confidence intervals are computed via `uniroot()` and, in the
case of infinities, we are computationally looking for roots over
infinite interval which results in an error. I suspect this is the
reason Inf values were removed in the first place.
Just a note, I found a few more errors/inconsistencies when requesting
confidence intervals with paired=TRUE (due to Infinities being left in).
Current error in InfInf scenario:
wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE, conf.int=TRUE)
Error in if (ZEROES) x < x[x != 0] :
missing value where TRUE/FALSE needed
NaN confidence intervals:
wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE, conf.int=TRUE)
Wilcoxon signed rank test with continuity correction
data: c(1:9, Inf) and c(21:28, Inf, 30)
V = 9.5, pvalue = 0.0586
alternative hypothesis: true location shift is not equal to 0
0 percent confidence interval:
NaN NaN
sample estimates:
midrange
NaN
The easiest "fix" for consistency would be to simply remove Infinity
support from the paired=TRUE case.
But going with the more desirable approach of adding Infinity support
for nonpaired cases  it is currently not clear to me what confidence
intervals and pseudomedian should be. Specially when Infinities are on
both sides.
Regards,
Karolis Koncevičius.
On 20191207 23:18, Karolis Koncevičius wrote:
>Thank you for a fast response. Nice to see this mailing list being so
>alive.
>
>Regarding Inf issue: I agree with your assessment that Inf should not
>be removed. The code gave me an impression that Inf values were
>intentionally removed (since is.finite() was used everywhere, except
>for paired case). I will try to adjust my patch according to your
>feedback.
>
>One more thing: it seems like you assumed that issues 2:4 are all
>related to machine precision, which is not the case  only 2nd issue
>is.
>Just wanted to draw this to your attention, in case you might have
>some feedback and guidelines about issues 3 and 4 as well.
>
>
>
>On 20191207 21:59, Martin Maechler wrote:
>>>>>>>Karolis Koncevičius
>>>>>>> on Sat, 7 Dec 2019 20:55:36 +0200 writes:
>>
>> > Hello,
>> > Writing to share some things I've found about wilcox.test() that seem a
>> > a bit inconsistent.
>>
>> > 1. Inf values are not removed if paired=TRUE
>>
>> > # returns different results (Inf is removed):
>> > wilcox.test(c(1,2,3,4), c(0,9,8,7))
>> > wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>>
>> > # returns the same result (Inf is left as value with highest rank):
>> > wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
>> > wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>>
>>Now which of the two cases do you consider correct ?
>>
>>IHMO, the 2nd one is correct: it is exactly one property of the
>>*robustness* of the wilcoxon test and very desirable that any
>>(positive) outlier is treated the same as just "the largest
>>value" and the test statistic (and hence the pvalue) should not
>>change.
>>
>>So I think the first case is wrong, notably if modified, (such
>>that the last y is the overall maximal value (slightly larger sample):
>>
>>>wilcox.test(1:7, 1/8+ c(9:4, 12))
>>
>> Wilcoxon rank sum test
>>
>>data: 1:7 and 1/8 + c(9:4, 12)
>>W = 6, pvalue = 0.01748
>>alternative hypothesis: true location shift is not equal to 0
>>
>>>wilcox.test(1:7, 1/8+ c(9:4, 10000))
>>
>> Wilcoxon rank sum test
>>
>>data: 1:7 and 1/8 + c(9:4, 10000)
>>W = 6, pvalue = 0.01748
>>alternative hypothesis: true location shift is not equal to 0
>>
>>>wilcox.test(1:7, 1/8+ c(9:4, Inf))
>>
>> Wilcoxon rank sum test
>>
>>data: 1:7 and 1/8 + c(9:4, Inf)
>>W = 6, pvalue = 0.03497
>>alternative hypothesis: true location shift is not equal to 0
>>
>>The Inf case should definitely give the same as the 10'000 case.
>>That's exactly one property of a robust statistic.
>>
>>Thank you, Karolis, this is pretty embarrassing to only be detected now after 25+ years of R in use ...
>>
>>The correct fix starts with replacing the is.finite() by !is.na() and keep the 'Inf' in the rank computations...
>>(but then probably also deal with the case of more than one Inf, notably the Inf  Inf "exception" which is not triggered by your example...)
>>
>>
>>
>>
>>Ben addressed the "rounding" / numerical issues unavoidable for
>>the other problems.
>>
>> > 2. tolerance issues with paired=TRUE.
>>
>> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
>> > # ...
>> > # Warning: cannot compute exact pvalue with ties
>>
>> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
>> > # ...
>> > # no warning
>>
>> > 3. Always 'x observations are missing' when paired=TRUE
>>
>> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
>> > # ...
>> > # Error: not enough (finite) 'x' observations
>>
>> > 4. No indication if normal approximation was used:
>>
>> > # different numbers, but same "method" name
>> > wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
>> > wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>>
>>
>> > From all of these I am pretty sure the 1st one is likely unintended,
>> > so attaching a small patch to adjust it. Can also try patching others if
>> > consensus is reached that the behavioiur has to be modified.
>>
>> > Kind regards,
>> > Karolis Koncevičius.
>>
>> > 
>>
>> > Index: wilcox.test.R
>> > ===================================================================
>> >  wilcox.test.R (revision 77540)
>> > +++ wilcox.test.R (working copy)
>> > @@ 42,7 +42,7 @@
>> > if(paired) {
>> > if(length(x) != length(y))
>> > stop("'x' and 'y' must have the same length")
>> >  OK < complete.cases(x, y)
>> > + OK < is.finite(x) & is.finite(y)
>> > x < x[OK]  y[OK]
>> > y < NULL
>> > }
>>
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Karolis Koncevičius
>>>>> on Mon, 9 Dec 2019 23:43:36 +0200 writes:
> So I tried adding Infinity support for all cases.
> And it is (as could be expected) more complicated than I thought.
"Of course !" Thank you, Karolis, in any case!
> It is easy to add Inf support for the test. The problems start with conf.int=TRUE.
> Currently confidence intervals are computed via `uniroot()` and, in the
> case of infinities, we are computationally looking for roots over
> infinite interval which results in an error. I suspect this is the
> reason Inf values were removed in the first place.
Maybe. It's still wrong to be done "up front".
I'm sure 98% (or so ;) of all calls to wilcox.test() do *not*
use conf.int = TRUE
> Just a note, I found a few more errors/inconsistencies when requesting
> confidence intervals with paired=TRUE (due to Infinities being left in).
> Current error in InfInf scenario:
> wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE, conf.int=TRUE)
> Error in if (ZEROES) x < x[x != 0] :
> missing value where TRUE/FALSE needed
Good catch .. notably as it also happens when conf.int=FALSE as
by default.
My version of wilcox.test() now does give the same as when the
to 'Inf' are left away.
> NaN confidence intervals:
> wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE, conf.int=TRUE)
> Wilcoxon signed rank test with continuity correction
> data: c(1:9, Inf) and c(21:28, Inf, 30)
> V = 9.5, pvalue = 0.0586
> alternative hypothesis: true location shift is not equal to 0
> 0 percent confidence interval:
> NaN NaN
> sample estimates:
> midrange
> NaN
I don't see a big problem here. The NaN's in some sense show the
best that can be computed with this data. Statistic and
Pvalue, but no conf.int.
> The easiest "fix" for consistency would be to simply remove Infinity
> support from the paired=TRUE case.
I strongly disagree. We are not pruning good functionality just
for some definition of consistency.
> But going with the more desirable approach of adding Infinity support
> for nonpaired cases  it is currently not clear to me what confidence
> intervals and pseudomedian should be. Specially when Infinities are on
> both sides.
I deem that not to be a big deal. They are not defined given
the default formulas and that is reflected by NA / NaN in those
parts of the result.
> Regards,
> Karolis Koncevičius.
But I have also spent a few hours now on wilcox.test.default() behavior
notably also looking at the "rounding" / "machine precision"
situation, and also on your remark that the 'method: ...' does
not indicate well enough what was computed.
In my (not yet committed) but hereby proposed enhancement of wilcox.test(),
I have a new argument, 'digits.rank = Inf' (the default 'Inf'
corresponding to the current behavior)
with help page documentation:
digits.rank: a number; if finite, ‘rank(signif(r, digits.rank))’ will
be used to compute ranks for the test statistic instead of
(the default) ‘rank(r)’.
and then in 'Details :'
For stability reasons, it may be advisable to use rounded data or
to set ‘digits.rank = 7’, say, such that determination of ties
does not depend on very small numeric differences (see the
example).
and then in 'Examples: '
## accuracy in ties determination via 'digits.rank':
wilcox.test( 4:2, 3:1, paired=TRUE) # Warning: cannot compute exact pvalue with ties
wilcox.test((4:2)/10, (3:1)/10, paired=TRUE) # no ties => *no* warning
wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank = 9) # same ties as (4:2, 3:1)

Lastly, I propose to replace "test" by "exact test" in the
'method' component (and print out) in case exact computations
were used. This information should be part of the returned
"htest" object, and not only visible from the arguments and warnings that are
printed during the computations.
This last change is in some sense the "most backincompatible"
change of these, because many wilcox.test() printouts would
slightly change, e.g.,
> w0 < wilcox.test( 1:5, 4*(0:4), paired=TRUE)
Wilcoxon signed rank exact test
data: 1:5 and 4 * (0:4)
V = 1, pvalue = 0.125
alternative hypothesis: true location shift is not equal to 0
where before (in R <= 3.6.x) it is just
Wilcoxon signed rank test
data: .........
...............
...............
but I think R 4.0.0 is a good occasion for such a change.
Constructive feedback on all this is very welcome!
Martin
> On 20191207 23:18, Karolis Koncevičius wrote:
>> Thank you for a fast response. Nice to see this mailing list being so
>> alive.
>>
>> Regarding Inf issue: I agree with your assessment that Inf should not
>> be removed. The code gave me an impression that Inf values were
>> intentionally removed (since is.finite() was used everywhere, except
>> for paired case). I will try to adjust my patch according to your
>> feedback.
>>
>> One more thing: it seems like you assumed that issues 2:4 are all
>> related to machine precision, which is not the case  only 2nd issue
>> is.
>> Just wanted to draw this to your attention, in case you might have
>> some feedback and guidelines about issues 3 and 4 as well.
>>
>>
>>
>> On 20191207 21:59, Martin Maechler wrote:
>>>>>>>> Karolis Koncevičius
>>>>>>>> on Sat, 7 Dec 2019 20:55:36 +0200 writes:
>>>
>>> > Hello,
>>> > Writing to share some things I've found about wilcox.test() that seem a
>>> > a bit inconsistent.
>>>
>>> > 1. Inf values are not removed if paired=TRUE
>>>
>>> > # returns different results (Inf is removed):
>>> > wilcox.test(c(1,2,3,4), c(0,9,8,7))
>>> > wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>>>
>>> > # returns the same result (Inf is left as value with highest rank):
>>> > wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
>>> > wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>>>
>>> Now which of the two cases do you consider correct ?
>>>
>>> IHMO, the 2nd one is correct: it is exactly one property of the
>>> *robustness* of the wilcoxon test and very desirable that any
>>> (positive) outlier is treated the same as just "the largest
>>> value" and the test statistic (and hence the pvalue) should not
>>> change.
>>>
>>> So I think the first case is wrong, notably if modified, (such
>>> that the last y is the overall maximal value (slightly larger sample):
>>>
>>>> wilcox.test(1:7, 1/8+ c(9:4, 12))
>>>
>>> Wilcoxon rank sum test
>>>
>>> data: 1:7 and 1/8 + c(9:4, 12)
>>> W = 6, pvalue = 0.01748
>>> alternative hypothesis: true location shift is not equal to 0
>>>
>>>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
>>>
>>> Wilcoxon rank sum test
>>>
>>> data: 1:7 and 1/8 + c(9:4, 10000)
>>> W = 6, pvalue = 0.01748
>>> alternative hypothesis: true location shift is not equal to 0
>>>
>>>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
>>>
>>> Wilcoxon rank sum test
>>>
>>> data: 1:7 and 1/8 + c(9:4, Inf)
>>> W = 6, pvalue = 0.03497
>>> alternative hypothesis: true location shift is not equal to 0
>>>
>>> The Inf case should definitely give the same as the 10'000 case.
>>> That's exactly one property of a robust statistic.
>>>
>>> Thank you, Karolis, this is pretty embarrassing to only be detected now after 25+ years of R in use ...
>>>
>>> The correct fix starts with replacing the is.finite() by !is.na() and keep the 'Inf' in the rank computations...
>>> (but then probably also deal with the case of more than one Inf, notably the Inf  Inf "exception" which is not triggered by your example...)
>>>
>>>
>>> 
>>>
>>> Ben addressed the "rounding" / numerical issues unavoidable for
>>> the other problems.
>>>
>>> > 2. tolerance issues with paired=TRUE.
>>>
>>> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
>>> > # ...
>>> > # Warning: cannot compute exact pvalue with ties
>>>
>>> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
>>> > # ...
>>> > # no warning
>>>
>>> > 3. Always 'x observations are missing' when paired=TRUE
>>>
>>> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
>>> > # ...
>>> > # Error: not enough (finite) 'x' observations
>>>
>>> > 4. No indication if normal approximation was used:
>>>
>>> > # different numbers, but same "method" name
>>> > wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
>>> > wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>>>
>>>
>>> > From all of these I am pretty sure the 1st one is likely unintended,
>>> > so attaching a small patch to adjust it. Can also try patching others if
>>> > consensus is reached that the behavioiur has to be modified.
>>>
>>> > Kind regards,
>>> > Karolis Koncevičius.
>>>
>>> > 
>>>
>>> > Index: wilcox.test.R
>>> > ===================================================================
>>> >  wilcox.test.R (revision 77540)
>>> > +++ wilcox.test.R (working copy)
>>> > @@ 42,7 +42,7 @@
>>> > if(paired) {
>>> > if(length(x) != length(y))
>>> > stop("'x' and 'y' must have the same length")
>>> >  OK < complete.cases(x, y)
>>> > + OK < is.finite(x) & is.finite(y)
>>> > x < x[OK]  y[OK]
>>> > y < NULL
>>> > }
>>>
>>> > ______________________________________________
>>> > [hidden email] mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


>>>>> Martin Maechler
>>>>> on Thu, 12 Dec 2019 17:20:47 +0100 writes:
>>>>> Karolis Koncevičius
>>>>> on Mon, 9 Dec 2019 23:43:36 +0200 writes:
>> So I tried adding Infinity support for all cases. And it
>> is (as could be expected) more complicated than I
>> thought.
> "Of course !" Thank you, Karolis, in any case!
>> It is easy to add Inf support for the test. The problems
>> start with conf.int=TRUE.
>> Currently confidence intervals are computed via
>> `uniroot()` and, in the case of infinities, we are
>> computationally looking for roots over infinite interval
>> which results in an error. I suspect this is the reason
>> Inf values were removed in the first place.
> Maybe. It's still wrong to be done "up front". I'm sure
> 98% (or so ;) of all calls to wilcox.test() do *not* use
> conf.int = TRUE
>> Just a note, I found a few more errors/inconsistencies
>> when requesting confidence intervals with paired=TRUE
>> (due to Infinities being left in).
>> Current error in InfInf scenario:
>> wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE,
>> conf.int=TRUE) Error in if (ZEROES) x < x[x != 0] :
>> missing value where TRUE/FALSE needed
> Good catch .. notably as it also happens when
> conf.int=FALSE as by default. My version of wilcox.test()
> now does give the same as when the to 'Inf' are left away.
>> NaN confidence intervals:
>> wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE,
>> conf.int=TRUE)
>> Wilcoxon signed rank test with continuity correction
>> data: c(1:9, Inf) and c(21:28, Inf, 30) V = 9.5, pvalue
>> = 0.0586 alternative hypothesis: true location shift is
>> not equal to 0 0 percent confidence interval: NaN NaN
>> sample estimates: midrange NaN
> I don't see a big problem here. The NaN's in some sense
> show the best that can be computed with this data.
> Statistic and Pvalue, but no conf.int.
>> The easiest "fix" for consistency would be to simply
>> remove Infinity support from the paired=TRUE case.
> I strongly disagree. We are not pruning good
> functionality just for some definition of consistency.
>> But going with the more desirable approach of adding
>> Infinity support for nonpaired cases  it is currently
>> not clear to me what confidence intervals and
>> pseudomedian should be. Specially when Infinities are on
>> both sides.
> I deem that not to be a big deal. They are not defined
> given the default formulas and that is reflected by NA /
> NaN in those parts of the result.
>> Regards, Karolis Koncevičius.
> But I have also spent a few hours now on
> wilcox.test.default() behavior notably also looking at the
> "rounding" / "machine precision" situation, and also on
> your remark that the 'method: ...' does not indicate well
> enough what was computed.
> In my (not yet committed) but hereby proposed enhancement
> of wilcox.test(), I have a new argument, 'digits.rank =
> Inf' (the default 'Inf' corresponding to the current
> behavior) with help page documentation:
> digits.rank: a number; if finite, ‘rank(signif(r,
> digits.rank))’ will be used to compute ranks for the test
> statistic instead of (the default) ‘rank(r)’.
> and then in 'Details :'
> For stability reasons, it may be advisable to use
> rounded data or to set ‘digits.rank = 7’, say, such that
> determination of ties does not depend on very small
> numeric differences (see the example).
> and then in 'Examples: '
> ## accuracy in ties determination via 'digits.rank':
> wilcox.test( 4:2, 3:1, paired=TRUE) # Warning: cannot
> compute exact pvalue with ties wilcox.test((4:2)/10,
> (3:1)/10, paired=TRUE) # no ties => *no* warning
> wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank =
> 9) # same ties as (4:2, 3:1)
> 
> Lastly, I propose to replace "test" by "exact test" in the
> 'method' component (and print out) in case exact
> computations were used. This information should be part
> of the returned "htest" object, and not only visible from
> the arguments and warnings that are printed during the
> computations. This last change is in some sense the "most
> backincompatible" change of these, because many
> wilcox.test() printouts would slightly change, e.g.,
>> w0 < wilcox.test( 1:5, 4*(0:4), paired=TRUE)
> Wilcoxon signed rank exact test
> data: 1:5 and 4 * (0:4) V = 1, pvalue = 0.125
> alternative hypothesis: true location shift is not equal
> to 0
> where before (in R <= 3.6.x) it is just
> Wilcoxon signed rank test
> data: ......... ............... ...............
> but I think R 4.0.0 is a good occasion for such a change.
> Constructive feedback on all this is very welcome! Martin
... none ... I "assume" this means everybody likes the idea ;)
Anyway, now comitted to Rdevel (for R 4.0.0), svn rev 77569
(in 'NEW FEATURES').
Martin
>> On 20191207 23:18, Karolis Koncevičius wrote:
>>> Thank you for a fast response. Nice to see this mailing
>>> list being so alive.
>>>
>>> Regarding Inf issue: I agree with your assessment that
>>> Inf should not be removed. The code gave me an
>>> impression that Inf values were intentionally removed
>>> (since is.finite() was used everywhere, except for
>>> paired case). I will try to adjust my patch according to
>>> your feedback.
>>>
>>> One more thing: it seems like you assumed that issues
>>> 2:4 are all related to machine precision, which is not
>>> the case  only 2nd issue is. Just wanted to draw this
>>> to your attention, in case you might have some feedback
>>> and guidelines about issues 3 and 4 as well.
>>>
>>>
>>>
>>> On 20191207 21:59, Martin Maechler wrote:
>>>>>>>>> Karolis Koncevičius on Sat, 7 Dec 2019 20:55:36
>>>>>>>>> +0200 writes:
>>>>
>>>> > Hello, > Writing to share some things I've found
>>>> about wilcox.test() that seem a > a bit inconsistent.
>>>>
>>>> > 1. Inf values are not removed if paired=TRUE
>>>>
>>>> > # returns different results (Inf is removed): >
>>>> wilcox.test(c(1,2,3,4), c(0,9,8,7)) >
>>>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>>>>
>>>> > # returns the same result (Inf is left as value with
>>>> highest rank): > wilcox.test(c(1,2,3,4), c(0,9,8,7),
>>>> paired=TRUE) > wilcox.test(c(1,2,3,4), c(0,9,8,Inf),
>>>> paired=TRUE)
>>>>
>>>> Now which of the two cases do you consider correct ?
>>>>
>>>> IHMO, the 2nd one is correct: it is exactly one
>>>> property of the *robustness* of the wilcoxon test and
>>>> very desirable that any (positive) outlier is treated
>>>> the same as just "the largest value" and the test
>>>> statistic (and hence the pvalue) should not change.
>>>>
>>>> So I think the first case is wrong, notably if
>>>> modified, (such that the last y is the overall maximal
>>>> value (slightly larger sample):
>>>>
>>>>> wilcox.test(1:7, 1/8+ c(9:4, 12))
>>>>
>>>> Wilcoxon rank sum test
>>>>
>>>> data: 1:7 and 1/8 + c(9:4, 12) W = 6, pvalue = 0.01748
>>>> alternative hypothesis: true location shift is not
>>>> equal to 0
>>>>
>>>>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
>>>>
>>>> Wilcoxon rank sum test
>>>>
>>>> data: 1:7 and 1/8 + c(9:4, 10000) W = 6, pvalue =
>>>> 0.01748 alternative hypothesis: true location shift is
>>>> not equal to 0
>>>>
>>>>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
>>>>
>>>> Wilcoxon rank sum test
>>>>
>>>> data: 1:7 and 1/8 + c(9:4, Inf) W = 6, pvalue =
>>>> 0.03497 alternative hypothesis: true location shift is
>>>> not equal to 0
>>>>
>>>> The Inf case should definitely give the same as the
>>>> 10'000 case. That's exactly one property of a robust
>>>> statistic.
>>>>
>>>> Thank you, Karolis, this is pretty embarrassing to only
>>>> be detected now after 25+ years of R in use ...
>>>>
>>>> The correct fix starts with replacing the is.finite()
>>>> by !is.na() and keep the 'Inf' in the rank
>>>> computations... (but then probably also deal with the
>>>> case of more than one Inf, notably the Inf  Inf
>>>> "exception" which is not triggered by your example...)
>>>>
>>>>
>>>> 
>>>>
>>>> Ben addressed the "rounding" / numerical issues
>>>> unavoidable for the other problems.
>>>>
>>>> > 2. tolerance issues with paired=TRUE.
>>>>
>>>> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE) > #
>>>> ... > # Warning: cannot compute exact pvalue with
>>>> ties
>>>>
>>>> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1),
>>>> paired=TRUE) > # ... > # no warning
>>>>
>>>> > 3. Always 'x observations are missing' when
>>>> paired=TRUE
>>>>
>>>> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_),
>>>> paired=TRUE) > # ... > # Error: not enough (finite)
>>>> 'x' observations
>>>>
>>>> > 4. No indication if normal approximation was used:
>>>>
>>>> > # different numbers, but same "method" name >
>>>> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE) >
>>>> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>>>>
>>>>
>>>> > From all of these I am pretty sure the 1st one is
>>>> likely unintended, > so attaching a small patch to
>>>> adjust it. Can also try patching others if > consensus
>>>> is reached that the behavioiur has to be modified.
>>>>
>>>> > Kind regards, > Karolis Koncevičius.
>>>>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


I see I am too late to comment :)
But commenting after the fact, just wish to say that I like the changes.
Specially the mentioning of "exact" in the test name.
Floating point prevision is very nicely implemented too.
My only worry is that it will not serve new/lay users that may be in the
biggest need for protections like these.
Do you think it would make sense to do it a bit differently? i.e.
setting digits.rank=7 by default, and including a message in the warning
i.e. "ties present, if you are working with small digits consider
adjusting digits.rank".
But, on the other hand, I understand that this would be a breaking
change. A non breaking change might be to leave digits.rank as NA or
NULL by default, which would act as infinity but also would do a test
within wilcox.test() that checks for ties with digits.rank=7. Then a
warning will say "possibly missed ties due to machine precision, if you
are sure these are not ties  set digits.rank to Inf to get rid of this
warning". This would be a nonbreaking change, except for a warning.
Would be interesting to hear your thoughts about this.
I will pull your changes and try to play with the code a bit later
today. Thanks a lot for, Martin!
Also I have an unrelated question  I mainly find these discrepancies in
"stats" because I am working on my little package related to hypothesis
tests. And I have found a few more of them in other tests. One that I
reported long time ago, regarding flinger.test(), which is also related
to machine precision.
In terms of the etiquette of this list  should I mention them in this
same thread or is it better to create a new one?
Kind regards,
Karolis K.
On 20191214 22:50, Martin Maechler wrote:
>>>>>> Martin Maechler
>>>>>> on Thu, 12 Dec 2019 17:20:47 +0100 writes:
>
>>>>>> Karolis Koncevičius
>>>>>> on Mon, 9 Dec 2019 23:43:36 +0200 writes:
>
> >> So I tried adding Infinity support for all cases. And it
> >> is (as could be expected) more complicated than I
> >> thought.
>
> > "Of course !" Thank you, Karolis, in any case!
>
> >> It is easy to add Inf support for the test. The problems
> >> start with conf.int=TRUE.
>
> >> Currently confidence intervals are computed via
> >> `uniroot()` and, in the case of infinities, we are
> >> computationally looking for roots over infinite interval
> >> which results in an error. I suspect this is the reason
> >> Inf values were removed in the first place.
>
> > Maybe. It's still wrong to be done "up front". I'm sure
> > 98% (or so ;) of all calls to wilcox.test() do *not* use
> > conf.int = TRUE
>
>
> >> Just a note, I found a few more errors/inconsistencies
> >> when requesting confidence intervals with paired=TRUE
> >> (due to Infinities being left in).
>
> >> Current error in InfInf scenario:
>
> >> wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE,
> >> conf.int=TRUE) Error in if (ZEROES) x < x[x != 0] :
> >> missing value where TRUE/FALSE needed
>
> > Good catch .. notably as it also happens when
> > conf.int=FALSE as by default. My version of wilcox.test()
> > now does give the same as when the to 'Inf' are left away.
>
> >> NaN confidence intervals:
>
> >> wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE,
> >> conf.int=TRUE)
>
> >> Wilcoxon signed rank test with continuity correction
>
> >> data: c(1:9, Inf) and c(21:28, Inf, 30) V = 9.5, pvalue
> >> = 0.0586 alternative hypothesis: true location shift is
> >> not equal to 0 0 percent confidence interval: NaN NaN
> >> sample estimates: midrange NaN
>
> > I don't see a big problem here. The NaN's in some sense
> > show the best that can be computed with this data.
> > Statistic and Pvalue, but no conf.int.
>
>
> >> The easiest "fix" for consistency would be to simply
> >> remove Infinity support from the paired=TRUE case.
>
> > I strongly disagree. We are not pruning good
> > functionality just for some definition of consistency.
>
> >> But going with the more desirable approach of adding
> >> Infinity support for nonpaired cases  it is currently
> >> not clear to me what confidence intervals and
> >> pseudomedian should be. Specially when Infinities are on
> >> both sides.
>
> > I deem that not to be a big deal. They are not defined
> > given the default formulas and that is reflected by NA /
> > NaN in those parts of the result.
>
> >> Regards, Karolis Koncevičius.
>
> > But I have also spent a few hours now on
> > wilcox.test.default() behavior notably also looking at the
> > "rounding" / "machine precision" situation, and also on
> > your remark that the 'method: ...' does not indicate well
> > enough what was computed.
>
> > In my (not yet committed) but hereby proposed enhancement
> > of wilcox.test(), I have a new argument, 'digits.rank =
> > Inf' (the default 'Inf' corresponding to the current
> > behavior) with help page documentation:
>
> > digits.rank: a number; if finite, ‘rank(signif(r,
> > digits.rank))’ will be used to compute ranks for the test
> > statistic instead of (the default) ‘rank(r)’.
>
> > and then in 'Details :'
>
> > For stability reasons, it may be advisable to use
> > rounded data or to set ‘digits.rank = 7’, say, such that
> > determination of ties does not depend on very small
> > numeric differences (see the example).
>
> > and then in 'Examples: '
>
> > ## accuracy in ties determination via 'digits.rank':
> > wilcox.test( 4:2, 3:1, paired=TRUE) # Warning: cannot
> > compute exact pvalue with ties wilcox.test((4:2)/10,
> > (3:1)/10, paired=TRUE) # no ties => *no* warning
> > wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank =
> > 9) # same ties as (4:2, 3:1)
>
> > 
>
> > Lastly, I propose to replace "test" by "exact test" in the
> > 'method' component (and print out) in case exact
> > computations were used. This information should be part
> > of the returned "htest" object, and not only visible from
> > the arguments and warnings that are printed during the
> > computations. This last change is in some sense the "most
> > backincompatible" change of these, because many
> > wilcox.test() printouts would slightly change, e.g.,
>
> >> w0 < wilcox.test( 1:5, 4*(0:4), paired=TRUE)
>
> > Wilcoxon signed rank exact test
>
> > data: 1:5 and 4 * (0:4) V = 1, pvalue = 0.125
> > alternative hypothesis: true location shift is not equal
> > to 0
>
> > where before (in R <= 3.6.x) it is just
>
> > Wilcoxon signed rank test
>
> > data: ......... ............... ...............
>
> > but I think R 4.0.0 is a good occasion for such a change.
>
> > Constructive feedback on all this is very welcome! Martin
>
>... none ... I "assume" this means everybody likes the idea ;)
>
>Anyway, now comitted to Rdevel (for R 4.0.0), svn rev 77569
>(in 'NEW FEATURES').
>
>Martin
>
>
>
> >> On 20191207 23:18, Karolis Koncevičius wrote:
> >>> Thank you for a fast response. Nice to see this mailing
> >>> list being so alive.
> >>>
> >>> Regarding Inf issue: I agree with your assessment that
> >>> Inf should not be removed. The code gave me an
> >>> impression that Inf values were intentionally removed
> >>> (since is.finite() was used everywhere, except for
> >>> paired case). I will try to adjust my patch according to
> >>> your feedback.
> >>>
> >>> One more thing: it seems like you assumed that issues
> >>> 2:4 are all related to machine precision, which is not
> >>> the case  only 2nd issue is. Just wanted to draw this
> >>> to your attention, in case you might have some feedback
> >>> and guidelines about issues 3 and 4 as well.
> >>>
> >>>
> >>>
> >>> On 20191207 21:59, Martin Maechler wrote:
> >>>>>>>>> Karolis Koncevičius on Sat, 7 Dec 2019 20:55:36
> >>>>>>>>> +0200 writes:
> >>>>
> >>>> > Hello, > Writing to share some things I've found
> >>>> about wilcox.test() that seem a > a bit inconsistent.
> >>>>
> >>>> > 1. Inf values are not removed if paired=TRUE
> >>>>
> >>>> > # returns different results (Inf is removed): >
> >>>> wilcox.test(c(1,2,3,4), c(0,9,8,7)) >
> >>>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
> >>>>
> >>>> > # returns the same result (Inf is left as value with
> >>>> highest rank): > wilcox.test(c(1,2,3,4), c(0,9,8,7),
> >>>> paired=TRUE) > wilcox.test(c(1,2,3,4), c(0,9,8,Inf),
> >>>> paired=TRUE)
> >>>>
> >>>> Now which of the two cases do you consider correct ?
> >>>>
> >>>> IHMO, the 2nd one is correct: it is exactly one
> >>>> property of the *robustness* of the wilcoxon test and
> >>>> very desirable that any (positive) outlier is treated
> >>>> the same as just "the largest value" and the test
> >>>> statistic (and hence the pvalue) should not change.
> >>>>
> >>>> So I think the first case is wrong, notably if
> >>>> modified, (such that the last y is the overall maximal
> >>>> value (slightly larger sample):
> >>>>
> >>>>> wilcox.test(1:7, 1/8+ c(9:4, 12))
> >>>>
> >>>> Wilcoxon rank sum test
> >>>>
> >>>> data: 1:7 and 1/8 + c(9:4, 12) W = 6, pvalue = 0.01748
> >>>> alternative hypothesis: true location shift is not
> >>>> equal to 0
> >>>>
> >>>>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
> >>>>
> >>>> Wilcoxon rank sum test
> >>>>
> >>>> data: 1:7 and 1/8 + c(9:4, 10000) W = 6, pvalue =
> >>>> 0.01748 alternative hypothesis: true location shift is
> >>>> not equal to 0
> >>>>
> >>>>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
> >>>>
> >>>> Wilcoxon rank sum test
> >>>>
> >>>> data: 1:7 and 1/8 + c(9:4, Inf) W = 6, pvalue =
> >>>> 0.03497 alternative hypothesis: true location shift is
> >>>> not equal to 0
> >>>>
> >>>> The Inf case should definitely give the same as the
> >>>> 10'000 case. That's exactly one property of a robust
> >>>> statistic.
> >>>>
> >>>> Thank you, Karolis, this is pretty embarrassing to only
> >>>> be detected now after 25+ years of R in use ...
> >>>>
> >>>> The correct fix starts with replacing the is.finite()
> >>>> by !is.na() and keep the 'Inf' in the rank
> >>>> computations... (but then probably also deal with the
> >>>> case of more than one Inf, notably the Inf  Inf
> >>>> "exception" which is not triggered by your example...)
> >>>>
> >>>>
> >>>> 
> >>>>
> >>>> Ben addressed the "rounding" / numerical issues
> >>>> unavoidable for the other problems.
> >>>>
> >>>> > 2. tolerance issues with paired=TRUE.
> >>>>
> >>>> > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE) > #
> >>>> ... > # Warning: cannot compute exact pvalue with
> >>>> ties
> >>>>
> >>>> > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1),
> >>>> paired=TRUE) > # ... > # no warning
> >>>>
> >>>> > 3. Always 'x observations are missing' when
> >>>> paired=TRUE
> >>>>
> >>>> > wilcox.test(c(1,2), c(NA_integer_,NA_integer_),
> >>>> paired=TRUE) > # ... > # Error: not enough (finite)
> >>>> 'x' observations
> >>>>
> >>>> > 4. No indication if normal approximation was used:
> >>>>
> >>>> > # different numbers, but same "method" name >
> >>>> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE) >
> >>>> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
> >>>>
> >>>>
> >>>> > From all of these I am pretty sure the 1st one is
> >>>> likely unintended, > so attaching a small patch to
> >>>> adjust it. Can also try patching others if > consensus
> >>>> is reached that the behavioiur has to be modified.
> >>>>
> >>>> > Kind regards, > Karolis Koncevičius.
> >>>>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel

