Quantcast

Mann-Whitney U

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Mann-Whitney U

Natalie O'Toole
Hi,

Could someone please tell me how to perform a Mann-Whitney U test on a
dataset with 2 groups where one group has more data values than another?

I have split up my 2 groups into 2 columns in my .txt file i'm using with
R. Here is the code i have so far...

group1 <- c(LeafArea2)
group2 <- c(LeafArea1)
wilcox.test(group1, group2)

This code works for datasets with the same number of data values in each
column, but not when there is a different number of data values in one
column than another column of data.

Is the solution that i have to have a null value in the data column with
the fewer data values?

I'm testing for significant diferences between the 2 groups, and the
result i'm getting in R with the uneven values is different from what i'm
getting in SPSS.

Help please!

Nat


------------------------------------------------------------------------------------------------------------------------

This communication is intended for the use of the recipient to which it is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,
and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.
------------------------------------------------------------------------------------------------------------------------

This communication is intended for the use of the recipient to which it is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,
and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Marc Schwartz
On Tue, 2007-08-14 at 14:45 -0600, Natalie O'Toole wrote:

> Hi,
>
> Could someone please tell me how to perform a Mann-Whitney U test on a
> dataset with 2 groups where one group has more data values than another?
>
> I have split up my 2 groups into 2 columns in my .txt file i'm using with
> R. Here is the code i have so far...
>
> group1 <- c(LeafArea2)
> group2 <- c(LeafArea1)
> wilcox.test(group1, group2)
>
> This code works for datasets with the same number of data values in each
> column, but not when there is a different number of data values in one
> column than another column of data.
>
> Is the solution that i have to have a null value in the data column with
> the fewer data values?
>
> I'm testing for significant diferences between the 2 groups, and the
> result i'm getting in R with the uneven values is different from what i'm
> getting in SPSS.
>
> Help please!
>
> Nat

You will need to provide any error messages that you are getting. There
is a two sample example in ?wilcox.test that shows that the function can
handle two vectors with differing sizes.

Having the output of str(group1) and str(group2) may also prove useful.

You may also wish to pay attention to the "Note" in ?wilcox.test which,
if you are getting differing results between SPSS and R, may provide
some insight into why, presuming that you can gain the same information
about SPSS.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Prof Brian Ripley
In reply to this post by Natalie O'Toole
On Tue, 14 Aug 2007, Natalie O'Toole wrote:

> Hi,
>
> Could someone please tell me how to perform a Mann-Whitney U test on a
> dataset with 2 groups where one group has more data values than another?
>
> I have split up my 2 groups into 2 columns in my .txt file i'm using with
> R. Here is the code i have so far...
>
> group1 <- c(LeafArea2)
> group2 <- c(LeafArea1)
> wilcox.test(group1, group2)
>
> This code works for datasets with the same number of data values in each
> column, but not when there is a different number of data values in one
> column than another column of data.

There is an example of that scenario on the help page for wilcox.test, so
it does 'work'.  What exactly went wrong for you?

> Is the solution that i have to have a null value in the data column with
> the fewer data values?
>
> I'm testing for significant diferences between the 2 groups, and the
> result i'm getting in R with the uneven values is different from what i'm
> getting in SPSS.

We need a worked example.  As the help page says, definitions do differ.
If you can provide a reproducible example in R and the output from SPSS we
may be able to tell you how to relate that to what you see in R.

[...]

> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

As it says, we really need such code (and the output you get) to be able
to help you.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Peter Dalgaard
Prof Brian Ripley wrote:

> On Tue, 14 Aug 2007, Natalie O'Toole wrote:
>
>  
>> Hi,
>>
>> Could someone please tell me how to perform a Mann-Whitney U test on a
>> dataset with 2 groups where one group has more data values than another?
>>
>> I have split up my 2 groups into 2 columns in my .txt file i'm using with
>> R. Here is the code i have so far...
>>
>> group1 <- c(LeafArea2)
>> group2 <- c(LeafArea1)
>> wilcox.test(group1, group2)
>>
>> This code works for datasets with the same number of data values in each
>> column, but not when there is a different number of data values in one
>> column than another column of data.
>>    
>
> There is an example of that scenario on the help page for wilcox.test, so
> it does 'work'.  What exactly went wrong for you?
>
>  
>> Is the solution that i have to have a null value in the data column with
>> the fewer data values?
>>
>> I'm testing for significant diferences between the 2 groups, and the
>> result i'm getting in R with the uneven values is different from what i'm
>> getting in SPSS.
>>    
>
> We need a worked example.  As the help page says, definitions do differ.
> If you can provide a reproducible example in R and the output from SPSS we
> may be able to tell you how to relate that to what you see in R.
>
> [...]
>
>  
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>    
>
> As it says, we really need such code (and the output you get) to be able
> to help you.
>
>  
Also, "two variables of different length in two columns" is not a good
idea. If you read in things in parallel columns, it would usually imply
paired data. If one column is shorter, you may be reading different data
than you think. Check e.g. the "sleep" data for a better format.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

JohnDee
In reply to this post by Natalie O'Toole
Natalie,

It's best to provide at least a sample of your data.  Your field names suggest
that your data might be collected in units of mm^2 or some similar
measurement of area.  Why do you want to use Mann-Whitney, which will rank
your data and then use those ranks rather than your actual data?  Unless your
sample is quite small, why not use a two sample t-test?  Also,are your
samples paired?  If they aren't, did you use the "paired = FALSE" option?

JWDougherty

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Natalie O'Toole
In reply to this post by Natalie O'Toole
Hi,

I do want to use the Mann-Whitney test which ranks my data and then uses
those ranks rather than the actual data.

Here is the R code i am using:

 group1<-
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<-
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95,
na.action)

paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
the Mann-Whitney test is used (my samples are NOT paired).
conf.level = 0.95 to specify the confidence level
na.action is used because i have a NA value (i suspect i am not using
na.action in the correct manner)

When i use this code i get the following error message:

Error in arg == choices : comparison (1) is possible only for atomic and
list types

When i use this code:

 group1<-
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<-
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95)

I get the following result:

  Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 405.5, p-value = 0.6494
alternative hypothesis: true location shift is not equal to 0

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(group1,
group2, paired = FALSE, conf.level = 0.95)

The W value here is 405.5 with a p-value of 0.6494


in SPSS, i am ranking my data and then performing a Mann-Whitney U by
selecting analyze - non-parametric tests - 2 independent samples  and then
checking off the Mann-Whitney U test.

For the Mann-Whitney test in SPSS i am gettting the following results:

Mann-Whitney U = 350.5
 2- tailed p value = 0.643

I think maybe the descrepancy has to do with the specification of the NA
values in R, but i'm not sure.


If anyone has any suggestions, please let me know!

I hope i have provided enough information to convey my problem.

Thank-you,

Nat
__________________


Natalie,

It's best to provide at least a sample of your data.  Your field names
suggest
that your data might be collected in units of mm^2 or some similar
measurement of area.  Why do you want to use Mann-Whitney, which will rank

your data and then use those ranks rather than your actual data?  Unless
your
sample is quite small, why not use a two sample t-test?  Also,are your
samples paired?  If they aren't, did you use the "paired = FALSE" option?

JWDougherty

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



------------------------------------------------------------------------------------------------------------------------

This communication is intended for the use of the recipient to which it is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,
and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.


------------------------------------------------------------------------------------------------------------------------

This communication is intended for the use of the recipient to which it is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,
and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.
------------------------------------------------------------------------------------------------------------------------

This communication is intended for the use of the recipient to which it is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,
and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Lucke, Joseph F
R and SPSS are using different but equivalent statistics.  R is using
the rank sum of group1 adjusted for the mean rank. SPSS is using the
rank sum of group2 adjusted for the mean rank.

Example.
> G1=group1
> G2=group2[-length(group2)] #get rid of the NA
> n1=length(G1) #n1=28
> n2=length(G2) #n2=27
# convert to ranks
> W=rank(c(G1,G2))
> R1=W[1:n1] #put the ranks back into the groups
> R2=W[n1+1:n2]
#Get the sum of the ranks for each group
> W1=sum(R1)
> W2=sum(R2)
#Adjust for mean rank for group 1
> W1-n1*(n1+1)/2
[1] 405.5
#Adjust for mean rank for group 2
> W2-n2*(n2+1)/2
[1] 350.5

W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result.

Ties throw a wrench in the works.  R uses a continuity correction by
default, SPSS does not.
Taking out the continuity correction,
> wilcox.test(G1,G2,correct=FALSE)

        Wilcoxon rank sum test

data:  G1 and G2
W = 405.5, p-value = 0.6433
alternative hypothesis: true location shift is not equal to 0

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(G1, G2,
correct = FALSE)

This p-value is the same as SPSS's.


Consult a serious non-parametrics text.  I used
Lehmann, E. L., Nonparametrics: Statistical methods based on ranks.
1975. Holden-Day. San Francisco, CA.


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Natalie O'Toole
Sent: Wednesday, August 15, 2007 1:07 PM
To: [hidden email]
Subject: Re: [R] Mann-Whitney U

Hi,

I do want to use the Mann-Whitney test which ranks my data and then uses
those ranks rather than the actual data.

Here is the R code i am using:

 group1<-
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<-
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
> 0.95,
na.action)

paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
the Mann-Whitney test is used (my samples are NOT paired).
conf.level = 0.95 to specify the confidence level na.action is used
because i have a NA value (i suspect i am not using na.action in the
correct manner)

When i use this code i get the following error message:

Error in arg == choices : comparison (1) is possible only for atomic and
list types

When i use this code:

 group1<-
c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> group2<-
c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
> 0.95)

I get the following result:

  Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 405.5, p-value = 0.6494
alternative hypothesis: true location shift is not equal to 0

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(group1,
group2, paired = FALSE, conf.level = 0.95)

The W value here is 405.5 with a p-value of 0.6494


in SPSS, i am ranking my data and then performing a Mann-Whitney U by
selecting analyze - non-parametric tests - 2 independent samples  and
then checking off the Mann-Whitney U test.

For the Mann-Whitney test in SPSS i am gettting the following results:

Mann-Whitney U = 350.5
 2- tailed p value = 0.643

I think maybe the descrepancy has to do with the specification of the NA
values in R, but i'm not sure.


If anyone has any suggestions, please let me know!

I hope i have provided enough information to convey my problem.

Thank-you,

Nat
__________________


Natalie,

It's best to provide at least a sample of your data.  Your field names
suggest
that your data might be collected in units of mm^2 or some similar
measurement of area.  Why do you want to use Mann-Whitney, which will
rank

your data and then use those ranks rather than your actual data?  Unless

your
sample is quite small, why not use a two sample t-test?  Also,are your
samples paired?  If they aren't, did you use the "paired = FALSE"
option?

JWDougherty

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



------------------------------------------------------------------------
------------------------------------------------

This communication is intended for the use of the recipient to which it
is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,

and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.


------------------------------------------------------------------------
------------------------------------------------

This communication is intended for the use of the recipient to which it
is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,

and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.
------------------------------------------------------------------------
------------------------------------------------

This communication is intended for the use of the recipient to which it
is
addressed, and may
contain confidential, personal, and or privileged information. Please
contact the sender
immediately if you are not the intended recipient of this communication,

and do not copy,
distribute, or take action relying on it. Any communication received in
error, or subsequent
reply, should be deleted or destroyed.
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Peter Dalgaard
Lucke, Joseph F wrote:
> R and SPSS are using different but equivalent statistics.  R is using
> the rank sum of group1 adjusted for the mean rank. SPSS is using the
> rank sum of group2 adjusted for the mean rank.
>
>  
Close: It is the _minimum_ possible rank sum that is getting subtracted.
If everyone in group1 is less than everyone in group2, R's W statistic  
will be zero. Other way around in SPSS.

> Example.
>  
>> G1=group1
>> G2=group2[-length(group2)] #get rid of the NA
>> n1=length(G1) #n1=28
>> n2=length(G2) #n2=27
>>    
> # convert to ranks
>  
>> W=rank(c(G1,G2))
>> R1=W[1:n1] #put the ranks back into the groups
>> R2=W[n1+1:n2]
>>    
> #Get the sum of the ranks for each group
>  
>> W1=sum(R1)
>> W2=sum(R2)
>>    
> #Adjust for mean rank for group 1
>  
>> W1-n1*(n1+1)/2
>>    
> [1] 405.5
> #Adjust for mean rank for group 2
>  
>> W2-n2*(n2+1)/2
>>    
> [1] 350.5
>
> W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result.
>
> Ties throw a wrench in the works.  R uses a continuity correction by
> default, SPSS does not.
> Taking out the continuity correction,
>  
>> wilcox.test(G1,G2,correct=FALSE)
>>    
>
>         Wilcoxon rank sum test
>
> data:  G1 and G2
> W = 405.5, p-value = 0.6433
> alternative hypothesis: true location shift is not equal to 0
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(G1, G2,
> correct = FALSE)
>
> This p-value is the same as SPSS's.
>
>
> Consult a serious non-parametrics text.  I used
> Lehmann, E. L., Nonparametrics: Statistical methods based on ranks.
> 1975. Holden-Day. San Francisco, CA.
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Natalie O'Toole
> Sent: Wednesday, August 15, 2007 1:07 PM
> To: [hidden email]
> Subject: Re: [R] Mann-Whitney U
>
> Hi,
>
> I do want to use the Mann-Whitney test which ranks my data and then uses
> those ranks rather than the actual data.
>
> Here is the R code i am using:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>  
>> group2<-
>>    
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>  
>> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
>> 0.95,
>>    
> na.action)
>
> paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
> the Mann-Whitney test is used (my samples are NOT paired).
> conf.level = 0.95 to specify the confidence level na.action is used
> because i have a NA value (i suspect i am not using na.action in the
> correct manner)
>
> When i use this code i get the following error message:
>
> Error in arg == choices : comparison (1) is possible only for atomic and
> list types
>
> When i use this code:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>  
>> group2<-
>>    
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>  
>> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
>> 0.95)
>>    
>
> I get the following result:
>
>   Wilcoxon rank sum test with continuity correction
>
> data:  group1 and group2
> W = 405.5, p-value = 0.6494
> alternative hypothesis: true location shift is not equal to 0
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(group1,
> group2, paired = FALSE, conf.level = 0.95)
>
> The W value here is 405.5 with a p-value of 0.6494
>
>
> in SPSS, i am ranking my data and then performing a Mann-Whitney U by
> selecting analyze - non-parametric tests - 2 independent samples  and
> then checking off the Mann-Whitney U test.
>
> For the Mann-Whitney test in SPSS i am gettting the following results:
>
> Mann-Whitney U = 350.5
>  2- tailed p value = 0.643
>
> I think maybe the descrepancy has to do with the specification of the NA
> values in R, but i'm not sure.
>
>
> If anyone has any suggestions, please let me know!
>
> I hope i have provided enough information to convey my problem.
>
> Thank-you,
>
> Nat
> __________________
>
>
> Natalie,
>
> It's best to provide at least a sample of your data.  Your field names
> suggest
> that your data might be collected in units of mm^2 or some similar
> measurement of area.  Why do you want to use Mann-Whitney, which will
> rank
>
> your data and then use those ranks rather than your actual data?  Unless
>
> your
> sample is quite small, why not use a two sample t-test?  Also,are your
> samples paired?  If they aren't, did you use the "paired = FALSE"
> option?
>
> JWDougherty
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------
>
> This communication is intended for the use of the recipient to which it
> is
> addressed, and may
> contain confidential, personal, and or privileged information. Please
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in
> error, or subsequent
> reply, should be deleted or destroyed.
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------
>
> This communication is intended for the use of the recipient to which it
> is
> addressed, and may
> contain confidential, personal, and or privileged information. Please
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in
> error, or subsequent
> reply, should be deleted or destroyed.
> ------------------------------------------------------------------------
> ------------------------------------------------
>
> This communication is intended for the use of the recipient to which it
> is
> addressed, and may
> contain confidential, personal, and or privileged information. Please
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in
> error, or subsequent
> reply, should be deleted or destroyed.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Mann-Whitney U

Marc Schwartz
In reply to this post by Natalie O'Toole
On Wed, 2007-08-15 at 12:06 -0600, Natalie O'Toole wrote:

> Hi,
>
> I do want to use the Mann-Whitney test which ranks my data and then
> uses
> those ranks rather than the actual data.
>
> Here is the R code i am using:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> > group2<-
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> > result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
> 0.95,
> na.action)

You did not specify a value for the na.action argument, hence the error
message you are getting.

It defaults to 'na.omit', unless you have modified R's options.
See ?na.action for more information.

In this case, it will remove any NA values from the two vectors prior to
calculating the statistic.

The additional arguments are really superfluous here. You can simply
use:

  wilcox.test(group1, group2)

> paired = FALSE so that the Wilcoxon rank sum test which is equivalent
> to
> the Mann-Whitney test is used (my samples are NOT paired).
> conf.level = 0.95 to specify the confidence level
> na.action is used because i have a NA value (i suspect i am not using
> na.action in the correct manner)
>
> When i use this code i get the following error message:
>
> Error in arg == choices : comparison (1) is possible only for atomic
> and
> list types
>
> When i use this code:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
> > group2<-
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
> > result <-  wilcox.test(group1, group2, paired=FALSE, conf.level =
> 0.95)
>
> I get the following result:
>
>   Wilcoxon rank sum test with continuity correction
>
> data:  group1 and group2
> W = 405.5, p-value = 0.6494
> alternative hypothesis: true location shift is not equal to 0
>
> Warning message:
> cannot compute exact p-value with ties in:
> wilcox.test.default(group1,
> group2, paired = FALSE, conf.level = 0.95)
>
> The W value here is 405.5 with a p-value of 0.6494
>
>
> in SPSS, i am ranking my data and then performing a Mann-Whitney U by
> selecting analyze - non-parametric tests - 2 independent samples  and
> then
> checking off the Mann-Whitney U test.
>
> For the Mann-Whitney test in SPSS i am gettting the following results:
>
> Mann-Whitney U = 350.5
>  2- tailed p value = 0.643
>
> I think maybe the descrepancy has to do with the specification of the
> NA
> values in R, but i'm not sure.
>
>
> If anyone has any suggestions, please let me know!
>
> I hope i have provided enough information to convey my problem.
>
> Thank-you,
>
> Nat

It would appear that SPSS is reversing the two groups in it's
calculation and NOT using a correction by default.

If you review the internal code for wilcox.test(), by using:

  stats:::wilcox.test.default

you can see that the relevant code in this case is:

  r <- rank(c(x - mu, y))
  n.x <- as.double(length(x))
  n.y <- as.double(length(y))

  STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2


Thus, if we use 'x' and 'y' for your two groups, respectively, we get:

x <- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,
       1.9,1.96, 2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,
       3.07,3.3)

y <- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,
       1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,
       2.77,2.9,2.94,NA)

mu <- 0


# Now remove the NA values
x <- na.omit(x)
y <- na.omit(y)


r <- rank(c(x - mu, y))
n.x <- as.double(length(x))
n.y <- as.double(length(y))

> r
 [1]  5.0  8.0  9.0 10.5 13.0 14.0 16.0 16.0 16.0 19.0 22.0 24.0 26.5
[14] 28.0 30.5 30.5 35.0 36.5 36.5 38.5 38.5 40.0 42.5 46.0 50.5 50.5
[27] 54.0 55.0  1.0  2.0  3.0  4.0  6.0  7.0 10.5 12.0 18.0 20.0 21.0
[40] 23.0 25.0 26.5 29.0 32.0 33.0 34.0 41.0 42.5 44.0 45.0 47.0 48.0
[53] 49.0 52.0 53.0

> n.x
[1] 28

> n.y
[1] 27


STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2

> STATISTIC
[1] 405.5

This is the value you get with R as you have used it.


Now, to replicate the statistic in SPSS, use the following code, with x
and y interchanged:

r <- rank(c(y - mu, x))
n.x <- as.double(length(x))
n.y <- as.double(length(y))

STATISTIC <- sum(r[seq_along(y)]) - n.y * (n.y + 1)/2

So we get:

> r
 [1]  1.0  2.0  3.0  4.0  6.0  7.0 10.5 12.0 18.0 20.0 21.0 23.0 25.0
[14] 26.5 29.0 32.0 33.0 34.0 41.0 42.5 44.0 45.0 47.0 48.0 49.0 52.0
[27] 53.0  5.0  8.0  9.0 10.5 13.0 14.0 16.0 16.0 16.0 19.0 22.0 24.0
[40] 26.5 28.0 30.5 30.5 35.0 36.5 36.5 38.5 38.5 40.0 42.5 46.0 50.5
[53] 50.5 54.0 55.0

> n.x
[1] 28

> n.y
[1] 27

> STATISTIC
[1] 350.5


So we now match SPSS' calculation of the statistic.


Now, to complete the process and replicate the SPSS results fully, you
could do the following, by reversing the order of your arguments and
setting 'correct = FALSE'. I am using 'x' and 'y' here, but use 'group1'
and 'group2' on your system:

> wilcox.test(y, x, correct = FALSE)

        Wilcoxon rank sum test

data:  y and x
W = 350.5, p-value = 0.6433
alternative hypothesis: true location shift is not equal to 0

Warning message:
cannot compute exact p-value with ties in: wilcox.test.default(y, x,
correct = FALSE)


BTW, I located a ftp site with SPSS' algorithm documentation online at:

  ftp://ftp.spss.com/pub/spss/statistics/spss/algorithms/

For the MW test, the relevant document is npart.pdf.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...