Weird differing results when using the Wilcoxon-test

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Weird differing results when using the Wilcoxon-test

Cedric Laczny
Hi,

I became a little bit confused when working with the Wilcoxon test in R.
As far as I understood, there are mainly two versions:
1) wilcox.test{stats}, which is the default and an approximation, especially,
when ties are involved
2) wilcox_test{coin}, which does calculate the distribution _exactly_ even,
with ties.

I have the following scenario:

#---BeginCode---
# big example
size = 60
big1 = rnorm(size, 0, 1)
big2 = rnorm(size, 0.5, 1

g1f = rep(1, size)
g2f = rep(2, size)
big = c(big1, big2)
data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))

wilcox_approx = wilcox.test(big1, big2)
wilcox_exact = wilcox_test(big ~ gr, data=data_frame, distribution="exact")
#---EndCode---

I found here http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf 
that wilcox.test (at least for the signed rank test) relies on exact
(p-)values until N = n1 + n2 = 50.
I can reproduce this, when using e.g. size = 15. The p-values then are the
same, as I would expect it, having read the info from the link.

#---BeginCode---
print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
#---EndCode---

That said, if I set e.g. size = 60, then the p-values of wilcox.test and
wilcox_test differ, as expected.

What's puzzling me particularly is the differing results when wanting to
calculate the p-value manually, for bigger sample sizes.

So, if we get the W-score from wilcox.test:

#---BeginCode---
W_big = wilcox.test(big1, big2))$statistic
#---EndCode---

and "convert" it to a Z-score, like this:

#---BeginCode---
mu_big = (size^2)/2
sd_big = sqrt(size*size*(size + size + 1)/12)
N = size + size
sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )

Z_big = (((W_big - mu_big)/sd_big)
#---EndCode---

The Z-Score (Z_big) is equal to the statistic of wilcox_test.
So far so good. And now comes the main problem.
When I follow the documentation correctly, the p-value for a given W-score/-
statistic ist calculated using the normal-approximation with the Z-score.
However, when I do that, I get a different result than what I would expect.
Because I would expect the p-value of wilcox.test to be equal to
2*pnorm(Z_big), which is in fact _not_ equal. Please see:

#---BeginCode---
p_value_manual = 2 * pnorm(Z_big)

print("--- Resulting pvalues --- ", quote=F)
print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
print(paste("P-value manual:", p_value_manual), quote=F)
#---EndCode---

So how is the calculation of the p-value performed in wilcox.test, when the
sample sizes are big? Because this might explain why the value differs from
that being calculated manually.

Best regards,

Cedric

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weird differing results when using the Wilcoxon-test

Thomas Lumley


After fixing the parentheses in your code so it does run, it seems that the difference is that wilcox.test defaults to using a continuity correction and your manual calculation does not.   Use wilcox.test(big1, big2, correct=FALSE).

    -thomas

On Tue, 17 Aug 2010, Cedric Laczny wrote:

> Hi,
>
> I became a little bit confused when working with the Wilcoxon test in R.
> As far as I understood, there are mainly two versions:
> 1) wilcox.test{stats}, which is the default and an approximation, especially,
> when ties are involved
> 2) wilcox_test{coin}, which does calculate the distribution _exactly_ even,
> with ties.
>
> I have the following scenario:
>
> #---BeginCode---
> # big example
> size = 60
> big1 = rnorm(size, 0, 1)
> big2 = rnorm(size, 0.5, 1
>
> g1f = rep(1, size)
> g2f = rep(2, size)
> big = c(big1, big2)
> data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))
>
> wilcox_approx = wilcox.test(big1, big2)
> wilcox_exact = wilcox_test(big ~ gr, data=data_frame, distribution="exact")
> #---EndCode---
>
> I found here http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf
> that wilcox.test (at least for the signed rank test) relies on exact
> (p-)values until N = n1 + n2 = 50.
> I can reproduce this, when using e.g. size = 15. The p-values then are the
> same, as I would expect it, having read the info from the link.
>
> #---BeginCode---
> print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> #---EndCode---
>
> That said, if I set e.g. size = 60, then the p-values of wilcox.test and
> wilcox_test differ, as expected.
>
> What's puzzling me particularly is the differing results when wanting to
> calculate the p-value manually, for bigger sample sizes.
>
> So, if we get the W-score from wilcox.test:
>
> #---BeginCode---
> W_big = wilcox.test(big1, big2))$statistic
> #---EndCode---
>
> and "convert" it to a Z-score, like this:
>
> #---BeginCode---
> mu_big = (size^2)/2
> sd_big = sqrt(size*size*(size + size + 1)/12)
> N = size + size
> sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )
>
> Z_big = (((W_big - mu_big)/sd_big)
> #---EndCode---
>
> The Z-Score (Z_big) is equal to the statistic of wilcox_test.
> So far so good. And now comes the main problem.
> When I follow the documentation correctly, the p-value for a given W-score/-
> statistic ist calculated using the normal-approximation with the Z-score.
> However, when I do that, I get a different result than what I would expect.
> Because I would expect the p-value of wilcox.test to be equal to
> 2*pnorm(Z_big), which is in fact _not_ equal. Please see:
>
> #---BeginCode---
> p_value_manual = 2 * pnorm(Z_big)
>
> print("--- Resulting pvalues --- ", quote=F)
> print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> print(paste("P-value manual:", p_value_manual), quote=F)
> #---EndCode---
>
> So how is the calculation of the p-value performed in wilcox.test, when the
> sample sizes are big? Because this might explain why the value differs from
> that being calculated manually.
>
> Best regards,
>
> Cedric
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weird differing results when using the Wilcoxon-test

Cedric Laczny
Thanks for the hint.
I tested this generic example and had the same behavior also in a special
example, that can be found below. This example does not involve continuity
correction but exibits the same "unexpected" behavior:

GDS_example = function()
{
        print("---> GDS EXAMPLE <---", quote=F)
        #For GDS1331!!!
        exp_all_200601_at = c(281.600, 209.300, 202.600, 263.700, 242.500,
281.300, 329.100, 268.200, 311.700, 230.600, 280.600, 233.900, 270.200,
274.000, 169.000, 154.500, 160.100, 196.900, 169.500, 102.800, 148.600,
111.700, 117.000, 173.000, 79.100, 175.700, 149.500, 151.600, 78.700, 113.900,
196.200)
        c1 = c(1:14)
        c2 = c(20:31)
        group1 = exp_all_200601_at[c1]
        group2 = exp_all_200601_at[c2]

        wilcox_approx = wilcox.test(group1, group2, correct=F)
        # Deviation estimate -> N.B. the TRUE deviation value can NOT be known as
we would have to sample _ALL_ possible "tissue" etc.
        #wilcox.test(group1, group2, conf.int=T)$estimate

        g1f = rep(1, length(c1))
        g2f = rep(2, length(c2))
        gf = as.factor(c(g1f, g2f))

        exp_groups_200601_at = exp_all_200601_at[c(c1, c2)]
        exp_groups_data_frame = data.frame( exp_groups_200601_at, gf )
        wilcox_exact = wilcox_test( exp_groups_200601_at ~gf, data =
exp_groups_data_frame, conf.int=T, distribution="exact" )

        print(wilcox_approx, quote=F)
        print(wilcox_exact)
        print(paste("P-value manually:", 2*(1 - pnorm(statistic(wilcox_exact)),
quote=F) ))

}

Best,

Cedric


On Tuesday, 17. August 2010 17:50:48 Thomas Lumley wrote:

> After fixing the parentheses in your code so it does run, it seems that the
> difference is that wilcox.test defaults to using a continuity correction
> and your manual calculation does not.   Use wilcox.test(big1, big2,
> correct=FALSE).
>
>     -thomas
>
> On Tue, 17 Aug 2010, Cedric Laczny wrote:
> > Hi,
> >
> > I became a little bit confused when working with the Wilcoxon test in R.
> > As far as I understood, there are mainly two versions:
> > 1) wilcox.test{stats}, which is the default and an approximation,
> > especially, when ties are involved
> > 2) wilcox_test{coin}, which does calculate the distribution _exactly_
> > even, with ties.
> >
> > I have the following scenario:
> >
> > #---BeginCode---
> > # big example
> > size = 60
> > big1 = rnorm(size, 0, 1)
> > big2 = rnorm(size, 0.5, 1
> >
> > g1f = rep(1, size)
> > g2f = rep(2, size)
> > big = c(big1, big2)
> > data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))
> >
> > wilcox_approx = wilcox.test(big1, big2)
> > wilcox_exact = wilcox_test(big ~ gr, data=data_frame,
> > distribution="exact") #---EndCode---
> >
> > I found here
> > http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf that
> > wilcox.test (at least for the signed rank test) relies on exact
> > (p-)values until N = n1 + n2 = 50.
> > I can reproduce this, when using e.g. size = 15. The p-values then are
> > the same, as I would expect it, having read the info from the link.
> >
> > #---BeginCode---
> > print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> > print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> > #---EndCode---
> >
> > That said, if I set e.g. size = 60, then the p-values of wilcox.test and
> > wilcox_test differ, as expected.
> >
> > What's puzzling me particularly is the differing results when wanting to
> > calculate the p-value manually, for bigger sample sizes.
> >
> > So, if we get the W-score from wilcox.test:
> >
> > #---BeginCode---
> > W_big = wilcox.test(big1, big2))$statistic
> > #---EndCode---
> >
> > and "convert" it to a Z-score, like this:
> >
> > #---BeginCode---
> > mu_big = (size^2)/2
> > sd_big = sqrt(size*size*(size + size + 1)/12)
> > N = size + size
> > sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )
> >
> > Z_big = (((W_big - mu_big)/sd_big)
> > #---EndCode---
> >
> > The Z-Score (Z_big) is equal to the statistic of wilcox_test.
> > So far so good. And now comes the main problem.
> > When I follow the documentation correctly, the p-value for a given
> > W-score/- statistic ist calculated using the normal-approximation with
> > the Z-score. However, when I do that, I get a different result than what
> > I would expect. Because I would expect the p-value of wilcox.test to be
> > equal to 2*pnorm(Z_big), which is in fact _not_ equal. Please see:
> >
> > #---BeginCode---
> > p_value_manual = 2 * pnorm(Z_big)
> >
> > print("--- Resulting pvalues --- ", quote=F)
> > print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> > print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> > print(paste("P-value manual:", p_value_manual), quote=F)
> > #---EndCode---
> >
> > So how is the calculation of the p-value performed in wilcox.test, when
> > the sample sizes are big? Because this might explain why the value
> > differs from that being calculated manually.
> >
> > Best regards,
> >
> > Cedric
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
>
> Thomas Lumley
> Professor of Biostatistics
> University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weird differing results when using the Wilcoxon-test

Cedric Laczny
In reply to this post by Cedric Laczny
I was able to trace down the unexpected behavior to the following line
SIGMA <- sqrt((n.x * n.y/12) * ((n.x + n.y + 1) -
                sum(NTIES^3 - NTIES)/((n.x + n.y) * (n.x + n.y -
                  1))))
My calculations of the Z-score for the normal approximation where based on
using the standard deviation for ranks _without_ ties. The above formula seems
to account for ties and thus, yields a slightly different z-score. However, the
data seems to include at most 1 tie (based on rnorm), so it would be the same
result as if it contained no tie (1^3 - 1 has the same result as 0^3 - 0,
obviously ;) ) and thus I would expect the result to be the same as when using
the formula for the standard deviation without ties.
Interestigly, this z-score is also different from that reported by wilcox_test
(exact calculation of p-value).
I have not been able to find this formula in any textbook nearby or at any
website. Therefore I am wondering where it does actually com from?

Best,

Cedric

On Tuesday, 17. August 2010 13:48:10 Cedric Laczny wrote:

> Hi,
>
> I became a little bit confused when working with the Wilcoxon test in R.
> As far as I understood, there are mainly two versions:
> 1) wilcox.test{stats}, which is the default and an approximation,
> especially, when ties are involved
> 2) wilcox_test{coin}, which does calculate the distribution _exactly_ even,
> with ties.
>
> I have the following scenario:
>
> #---BeginCode---
> # big example
> size = 60
> big1 = rnorm(size, 0, 1)
> big2 = rnorm(size, 0.5, 1
>
> g1f = rep(1, size)
> g2f = rep(2, size)
> big = c(big1, big2)
> data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))
>
> wilcox_approx = wilcox.test(big1, big2)
> wilcox_exact = wilcox_test(big ~ gr, data=data_frame, distribution="exact")
> #---EndCode---
>
> I found here http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf
> that wilcox.test (at least for the signed rank test) relies on exact
> (p-)values until N = n1 + n2 = 50.
> I can reproduce this, when using e.g. size = 15. The p-values then are the
> same, as I would expect it, having read the info from the link.
>
> #---BeginCode---
> print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> #---EndCode---
>
> That said, if I set e.g. size = 60, then the p-values of wilcox.test and
> wilcox_test differ, as expected.
>
> What's puzzling me particularly is the differing results when wanting to
> calculate the p-value manually, for bigger sample sizes.
>
> So, if we get the W-score from wilcox.test:
>
> #---BeginCode---
> W_big = wilcox.test(big1, big2))$statistic
> #---EndCode---
>
> and "convert" it to a Z-score, like this:
>
> #---BeginCode---
> mu_big = (size^2)/2
> sd_big = sqrt(size*size*(size + size + 1)/12)
> N = size + size
> sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )
>
> Z_big = (((W_big - mu_big)/sd_big)
> #---EndCode---
>
> The Z-Score (Z_big) is equal to the statistic of wilcox_test.
> So far so good. And now comes the main problem.
> When I follow the documentation correctly, the p-value for a given
> W-score/- statistic ist calculated using the normal-approximation with the
> Z-score. However, when I do that, I get a different result than what I
> would expect. Because I would expect the p-value of wilcox.test to be
> equal to
> 2*pnorm(Z_big), which is in fact _not_ equal. Please see:
>
> #---BeginCode---
> p_value_manual = 2 * pnorm(Z_big)
>
> print("--- Resulting pvalues --- ", quote=F)
> print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> print(paste("P-value manual:", p_value_manual), quote=F)
> #---EndCode---
>
> So how is the calculation of the p-value performed in wilcox.test, when the
> sample sizes are big? Because this might explain why the value differs from
> that being calculated manually.
>
> Best regards,
>
> Cedric
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [solved] Weird differing results when using the Wilcoxon-test

Cedric Laczny
By using the help page for wilcox.test and looking in the there-mentioned book
(Hollander and Wolfe), I could find the formula also in the book.
People that might run into the same problem: The formula can also be found
here, at least if you have access to it via your university or insitution, or
so: www.springerlink.com/index/a757704707951858.pdf

For those of you who might have stumbled upon
http://reiter1.com/Glossar/Mann-Whitney%20Test.htm ,
if I haven't done any mistake in the calculations, the formula is equivalent
to the one in R (Hollander and Wolfe).
Especially, if there are no ties, then the formula is equivalent to the
formula for the variance (sigma-squared) for ranks _without_ties.

Also please notice, as mentioned in an earlier response to this thread,
results will probably differ if continuity correction in enabled.

When doing your own calculations or trying to compare some results to the ones
you obtain in R, take care about the distribution that was used in order to
calculate the p-value: in R for n1 or n2 (respective group sizes) >= 50 a
normal approximation is used via pnorm(z), whereas for smaller group sizes
pwilcox(w) is used, which is _exact_.

Thank you all for you help!

Best,

Cedric

On Wednesday, 18. August 2010 11:55:52 Cedric Laczny wrote:

> I was able to trace down the unexpected behavior to the following line
> SIGMA <- sqrt((n.x * n.y/12) * ((n.x + n.y + 1) -
>                 sum(NTIES^3 - NTIES)/((n.x + n.y) * (n.x + n.y -
>                   1))))
> My calculations of the Z-score for the normal approximation where based on
> using the standard deviation for ranks _without_ ties. The above formula
> seems to account for ties and thus, yields a slightly different z-score.
> However, the data seems to include at most 1 tie (based on rnorm), so it
> would be the same result as if it contained no tie (1^3 - 1 has the same
> result as 0^3 - 0, obviously ;) ) and thus I would expect the result to be
> the same as when using the formula for the standard deviation without
> ties.
> Interestigly, this z-score is also different from that reported by
> wilcox_test (exact calculation of p-value).
> I have not been able to find this formula in any textbook nearby or at any
> website. Therefore I am wondering where it does actually com from?
>
> Best,
>
> Cedric
>
> On Tuesday, 17. August 2010 13:48:10 Cedric Laczny wrote:
> > Hi,
> >
> > I became a little bit confused when working with the Wilcoxon test in R.
> > As far as I understood, there are mainly two versions:
> > 1) wilcox.test{stats}, which is the default and an approximation,
> > especially, when ties are involved
> > 2) wilcox_test{coin}, which does calculate the distribution _exactly_
> > even, with ties.
> >
> > I have the following scenario:
> >
> > #---BeginCode---
> > # big example
> > size = 60
> > big1 = rnorm(size, 0, 1)
> > big2 = rnorm(size, 0.5, 1
> >
> > g1f = rep(1, size)
> > g2f = rep(2, size)
> > big = c(big1, big2)
> > data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))
> >
> > wilcox_approx = wilcox.test(big1, big2)
> > wilcox_exact = wilcox_test(big ~ gr, data=data_frame,
> > distribution="exact") #---EndCode---
> >
> > I found here
> > http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf that
> > wilcox.test (at least for the signed rank test) relies on exact
> > (p-)values until N = n1 + n2 = 50.
> > I can reproduce this, when using e.g. size = 15. The p-values then are
> > the same, as I would expect it, having read the info from the link.
> >
> > #---BeginCode---
> > print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> > print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> > #---EndCode---
> >
> > That said, if I set e.g. size = 60, then the p-values of wilcox.test and
> > wilcox_test differ, as expected.
> >
> > What's puzzling me particularly is the differing results when wanting to
> > calculate the p-value manually, for bigger sample sizes.
> >
> > So, if we get the W-score from wilcox.test:
> >
> > #---BeginCode---
> > W_big = wilcox.test(big1, big2))$statistic
> > #---EndCode---
> >
> > and "convert" it to a Z-score, like this:
> >
> > #---BeginCode---
> > mu_big = (size^2)/2
> > sd_big = sqrt(size*size*(size + size + 1)/12)
> > N = size + size
> > sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )
> >
> > Z_big = (((W_big - mu_big)/sd_big)
> > #---EndCode---
> >
> > The Z-Score (Z_big) is equal to the statistic of wilcox_test.
> > So far so good. And now comes the main problem.
> > When I follow the documentation correctly, the p-value for a given
> > W-score/- statistic ist calculated using the normal-approximation with
> > the Z-score. However, when I do that, I get a different result than what
> > I would expect. Because I would expect the p-value of wilcox.test to be
> > equal to
> > 2*pnorm(Z_big), which is in fact _not_ equal. Please see:
> >
> > #---BeginCode---
> > p_value_manual = 2 * pnorm(Z_big)
> >
> > print("--- Resulting pvalues --- ", quote=F)
> > print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> > print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> > print(paste("P-value manual:", p_value_manual), quote=F)
> > #---EndCode---
> >
> > So how is the calculation of the p-value performed in wilcox.test, when
> > the sample sizes are big? Because this might explain why the value
> > differs from that being calculated manually.
> >
> > Best regards,
> >
> > Cedric
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Weird differing results when using the Wilcoxon-test

Peter Dalgaard-2
In reply to this post by Cedric Laczny

On Aug 18, 2010, at 11:55 AM, Cedric Laczny wrote:

> I was able to trace down the unexpected behavior to the following line
> SIGMA <- sqrt((n.x * n.y/12) * ((n.x + n.y + 1) -
>                sum(NTIES^3 - NTIES)/((n.x + n.y) * (n.x + n.y -
>                  1))))
> My calculations of the Z-score for the normal approximation where based on
> using the standard deviation for ranks _without_ ties. The above formula seems
> to account for ties and thus, yields a slightly different z-score. However, the
> data seems to include at most 1 tie (based on rnorm), so it would be the same
> result as if it contained no tie (1^3 - 1 has the same result as 0^3 - 0,
> obviously ;) ) and thus I would expect the result to be the same as when using
> the formula for the standard deviation without ties.

Note the definition of NTIES <- table(r), counting the number of observations tied for a particular rank, so it is all ones if and only if there are NO ties in data.

(If you are in paper-and-pencil mode, these formulas are fairly easily worked out once you realize that you only need the mean and variance of the rank of a single observation -- the covariances are C(R1,R2) = -1/(N-1) V(R1) because of symmetry and the fact that the sum of all N ranks is fixed.)

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.