Quantcast

Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

maxbre
Given this example

#start code

a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
     760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)

b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
     3220,490,20790,290,740,5350,940,3910,0,640,850,260)

wilcox.test(a, b, paired=FALSE)

#sum of rank for first sample
sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a
W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2
W1

U1 <- length(a)*length(b)/2-W1
U1

#sum of ranks for second sample
sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b
W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2
W2

U2 <- length(a)*length(b)/2-W2
U2

#end code

And given the fact that:

- in the note of R Wilcox.test is clearly stated: “ The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not. R subtracts [….], giving a value which is larger by m(m+1)/2 for a first sample of size m”

- as result of the same test performed with different stat packages (i.e. STATISTICA and PAST) I’ve got an U value of 200.5 as in W2 (see my script) with the same p-value

What can I conclude regarding STATISTICA and PAST packages?... are they giving W2 (see my script) instead of U?

A crucial point is that the variant of the algorithm used for computation by the packages is very rarely indicated in the output or documented in the help facility and the manuals.
See also this link (I’ve found after a long meandering on the web) about the comparison of “wilcoxon mann whitney” u test outcomes from different stat packages:
http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 

Any of you have faced the same type of issues? Or am I completely wrong?

maxbre
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

Peter Dalgaard-2

On May 29, 2012, at 17:55 , maxbre wrote:

> Given this example
>
> #start code
>
> a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
>
> 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
>
> b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
>     3220,490,20790,290,740,5350,940,3910,0,640,850,260)
>
> wilcox.test(a, b, paired=FALSE)
>
> #sum of rank for first sample
> sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a
> W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2
> W1
>
> U1 <- length(a)*length(b)/2-W1
> U1
>
> #sum of ranks for second sample
> sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b
> W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2
> W2
>
> U2 <- length(a)*length(b)/2-W2
> U2
>
> #end code
>
> And given the fact that:
>
> - in the note of R Wilcox.test is clearly stated: “ The literature is not
> unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney
> tests. The two most common definitions correspond to the sum of the ranks of
> the first sample with the minimum value subtracted or not. R subtracts [….],
> giving a value which is larger by m(m+1)/2 for a first sample of size m”

NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.

>
> - as result of the same test performed with different stat packages (i.e.
> STATISTICA and PAST) I’ve got an U value of 200.5 as in W2 (see my script)
> with the same p-value
>
> What can I conclude regarding STATISTICA and PAST packages?... are they
> giving W2 (see my script) instead of U?

Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.


>
> A crucial point is that the variant of the algorithm used for computation by
> the packages is very rarely indicated in the output or documented in the
> help facility and the manuals.
> See also this link (I’ve found after a long meandering on the web) about the
> comparison of “wilcoxon mann whitney” u test outcomes from different stat
> packages:
> http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 
>
> Any of you have faced the same type of issues? Or am I completely wrong?
>
> maxbre
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

Peter Dalgaard-2
In reply to this post by maxbre

On May 29, 2012, at 17:55 , maxbre wrote:

> Given this example
>
> #start code
>
> a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
>
> 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
>
> b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
>     3220,490,20790,290,740,5350,940,3910,0,640,850,260)
>
> wilcox.test(a, b, paired=FALSE)
>
> #sum of rank for first sample
> sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a
> W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2
> W1
>
> U1 <- length(a)*length(b)/2-W1
> U1
>
> #sum of ranks for second sample
> sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b
> W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2
> W2
>
> U2 <- length(a)*length(b)/2-W2
> U2
>
> #end code
>
> And given the fact that:
>
> - in the note of R Wilcox.test is clearly stated: “ The literature is not
> unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney
> tests. The two most common definitions correspond to the sum of the ranks of
> the first sample with the minimum value subtracted or not. R subtracts [….],
> giving a value which is larger by m(m+1)/2 for a first sample of size m”

NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.

>
> - as result of the same test performed with different stat packages (i.e.
> STATISTICA and PAST) I’ve got an U value of 200.5 as in W2 (see my script)
> with the same p-value
>
> What can I conclude regarding STATISTICA and PAST packages?... are they
> giving W2 (see my script) instead of U?

Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.


>
> A crucial point is that the variant of the algorithm used for computation by
> the packages is very rarely indicated in the output or documented in the
> help facility and the manuals.
> See also this link (I’ve found after a long meandering on the web) about the
> comparison of “wilcoxon mann whitney” u test outcomes from different stat
> packages:
> http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 
>
> Any of you have faced the same type of issues? Or am I completely wrong?
>
> maxbre
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

ted.harding-3
In reply to this post by Peter Dalgaard-2
On 30-May-2012 07:33:12 peter dalgaard wrote:

>
> On May 29, 2012, at 17:55 , maxbre wrote:
>> Given this example
>> [snip]
>> And given the fact that:
>>
>> - in the note of R Wilcox.test is clearly stated: "The literature
>> is not unanimous about the definitions of the Wilcoxon rank sum
>> and Mann-Whitney tests. The two most common definitions correspond
>> to the sum of the ranks of the first sample with the minimum value
> subtracted or not. R subtracts [...], giving a value which is larger
>> by m(m+1)/2 for a first sample of size m"
>>
>
> NB: You are quoting like the Devil reads the Bible: The bit in [...]
> is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.
> [snip]

Since Peter would seem to be unique in the fortune of being able
to observe the Devil reading the Bible, I propose that this be
added to the Fortunes of us all.

Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <[hidden email]>
Date: 01-Jun-2012  Time: 00:05:40
This message was sent by XFMail

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...