Hi,
Could someone please tell me how to perform a Mann-Whitney U test on a dataset with 2 groups where one group has more data values than another? I have split up my 2 groups into 2 columns in my .txt file i'm using with R. Here is the code i have so far... group1 <- c(LeafArea2) group2 <- c(LeafArea1) wilcox.test(group1, group2) This code works for datasets with the same number of data values in each column, but not when there is a different number of data values in one column than another column of data. Is the solution that i have to have a null value in the data column with the fewer data values? I'm testing for significant diferences between the 2 groups, and the result i'm getting in R with the uneven values is different from what i'm getting in SPSS. Help please! Nat ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Tue, 2007-08-14 at 14:45 -0600, Natalie O'Toole wrote:
> Hi, > > Could someone please tell me how to perform a Mann-Whitney U test on a > dataset with 2 groups where one group has more data values than another? > > I have split up my 2 groups into 2 columns in my .txt file i'm using with > R. Here is the code i have so far... > > group1 <- c(LeafArea2) > group2 <- c(LeafArea1) > wilcox.test(group1, group2) > > This code works for datasets with the same number of data values in each > column, but not when there is a different number of data values in one > column than another column of data. > > Is the solution that i have to have a null value in the data column with > the fewer data values? > > I'm testing for significant diferences between the 2 groups, and the > result i'm getting in R with the uneven values is different from what i'm > getting in SPSS. > > Help please! > > Nat You will need to provide any error messages that you are getting. There is a two sample example in ?wilcox.test that shows that the function can handle two vectors with differing sizes. Having the output of str(group1) and str(group2) may also prove useful. You may also wish to pay attention to the "Note" in ?wilcox.test which, if you are getting differing results between SPSS and R, may provide some insight into why, presuming that you can gain the same information about SPSS. HTH, Marc Schwartz ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Natalie O'Toole
On Tue, 14 Aug 2007, Natalie O'Toole wrote:
> Hi, > > Could someone please tell me how to perform a Mann-Whitney U test on a > dataset with 2 groups where one group has more data values than another? > > I have split up my 2 groups into 2 columns in my .txt file i'm using with > R. Here is the code i have so far... > > group1 <- c(LeafArea2) > group2 <- c(LeafArea1) > wilcox.test(group1, group2) > > This code works for datasets with the same number of data values in each > column, but not when there is a different number of data values in one > column than another column of data. There is an example of that scenario on the help page for wilcox.test, so it does 'work'. What exactly went wrong for you? > Is the solution that i have to have a null value in the data column with > the fewer data values? > > I'm testing for significant diferences between the 2 groups, and the > result i'm getting in R with the uneven values is different from what i'm > getting in SPSS. We need a worked example. As the help page says, definitions do differ. If you can provide a reproducible example in R and the output from SPSS we may be able to tell you how to relate that to what you see in R. [...] > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. As it says, we really need such code (and the output you get) to be able to help you. -- Brian D. Ripley, [hidden email] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Prof Brian Ripley wrote:
> On Tue, 14 Aug 2007, Natalie O'Toole wrote: > > >> Hi, >> >> Could someone please tell me how to perform a Mann-Whitney U test on a >> dataset with 2 groups where one group has more data values than another? >> >> I have split up my 2 groups into 2 columns in my .txt file i'm using with >> R. Here is the code i have so far... >> >> group1 <- c(LeafArea2) >> group2 <- c(LeafArea1) >> wilcox.test(group1, group2) >> >> This code works for datasets with the same number of data values in each >> column, but not when there is a different number of data values in one >> column than another column of data. >> > > There is an example of that scenario on the help page for wilcox.test, so > it does 'work'. What exactly went wrong for you? > > >> Is the solution that i have to have a null value in the data column with >> the fewer data values? >> >> I'm testing for significant diferences between the 2 groups, and the >> result i'm getting in R with the uneven values is different from what i'm >> getting in SPSS. >> > > We need a worked example. As the help page says, definitions do differ. > If you can provide a reproducible example in R and the output from SPSS we > may be able to tell you how to relate that to what you see in R. > > [...] > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > As it says, we really need such code (and the output you get) to be able > to help you. > > idea. If you read in things in parallel columns, it would usually imply paired data. If one column is shorter, you may be reading different data than you think. Check e.g. the "sleep" data for a better format. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Natalie O'Toole
Natalie,
It's best to provide at least a sample of your data. Your field names suggest that your data might be collected in units of mm^2 or some similar measurement of area. Why do you want to use Mann-Whitney, which will rank your data and then use those ranks rather than your actual data? Unless your sample is quite small, why not use a two sample t-test? Also,are your samples paired? If they aren't, did you use the "paired = FALSE" option? JWDougherty ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Natalie O'Toole
Hi,
I do want to use the Mann-Whitney test which ranks my data and then uses those ranks rather than the actual data. Here is the R code i am using: group1<- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > group2<- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > result <- wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95, na.action) paired = FALSE so that the Wilcoxon rank sum test which is equivalent to the Mann-Whitney test is used (my samples are NOT paired). conf.level = 0.95 to specify the confidence level na.action is used because i have a NA value (i suspect i am not using na.action in the correct manner) When i use this code i get the following error message: Error in arg == choices : comparison (1) is possible only for atomic and list types When i use this code: group1<- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > group2<- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > result <- wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95) I get the following result: Wilcoxon rank sum test with continuity correction data: group1 and group2 W = 405.5, p-value = 0.6494 alternative hypothesis: true location shift is not equal to 0 Warning message: cannot compute exact p-value with ties in: wilcox.test.default(group1, group2, paired = FALSE, conf.level = 0.95) The W value here is 405.5 with a p-value of 0.6494 in SPSS, i am ranking my data and then performing a Mann-Whitney U by selecting analyze - non-parametric tests - 2 independent samples and then checking off the Mann-Whitney U test. For the Mann-Whitney test in SPSS i am gettting the following results: Mann-Whitney U = 350.5 2- tailed p value = 0.643 I think maybe the descrepancy has to do with the specification of the NA values in R, but i'm not sure. If anyone has any suggestions, please let me know! I hope i have provided enough information to convey my problem. Thank-you, Nat __________________ Natalie, It's best to provide at least a sample of your data. Your field names suggest that your data might be collected in units of mm^2 or some similar measurement of area. Why do you want to use Mann-Whitney, which will rank your data and then use those ranks rather than your actual data? Unless your sample is quite small, why not use a two sample t-test? Also,are your samples paired? If they aren't, did you use the "paired = FALSE" option? JWDougherty ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
R and SPSS are using different but equivalent statistics. R is using
the rank sum of group1 adjusted for the mean rank. SPSS is using the rank sum of group2 adjusted for the mean rank. Example. > G1=group1 > G2=group2[-length(group2)] #get rid of the NA > n1=length(G1) #n1=28 > n2=length(G2) #n2=27 # convert to ranks > W=rank(c(G1,G2)) > R1=W[1:n1] #put the ranks back into the groups > R2=W[n1+1:n2] #Get the sum of the ranks for each group > W1=sum(R1) > W2=sum(R2) #Adjust for mean rank for group 1 > W1-n1*(n1+1)/2 [1] 405.5 #Adjust for mean rank for group 2 > W2-n2*(n2+1)/2 [1] 350.5 W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result. Ties throw a wrench in the works. R uses a continuity correction by default, SPSS does not. Taking out the continuity correction, > wilcox.test(G1,G2,correct=FALSE) Wilcoxon rank sum test data: G1 and G2 W = 405.5, p-value = 0.6433 alternative hypothesis: true location shift is not equal to 0 Warning message: cannot compute exact p-value with ties in: wilcox.test.default(G1, G2, correct = FALSE) This p-value is the same as SPSS's. Consult a serious non-parametrics text. I used Lehmann, E. L., Nonparametrics: Statistical methods based on ranks. 1975. Holden-Day. San Francisco, CA. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Natalie O'Toole Sent: Wednesday, August 15, 2007 1:07 PM To: [hidden email] Subject: Re: [R] Mann-Whitney U Hi, I do want to use the Mann-Whitney test which ranks my data and then uses those ranks rather than the actual data. Here is the R code i am using: group1<- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2, 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > group2<- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > result <- wilcox.test(group1, group2, paired=FALSE, conf.level = > 0.95, na.action) paired = FALSE so that the Wilcoxon rank sum test which is equivalent to the Mann-Whitney test is used (my samples are NOT paired). conf.level = 0.95 to specify the confidence level na.action is used because i have a NA value (i suspect i am not using na.action in the correct manner) When i use this code i get the following error message: Error in arg == choices : comparison (1) is possible only for atomic and list types When i use this code: group1<- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2, 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > group2<- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > result <- wilcox.test(group1, group2, paired=FALSE, conf.level = > 0.95) I get the following result: Wilcoxon rank sum test with continuity correction data: group1 and group2 W = 405.5, p-value = 0.6494 alternative hypothesis: true location shift is not equal to 0 Warning message: cannot compute exact p-value with ties in: wilcox.test.default(group1, group2, paired = FALSE, conf.level = 0.95) The W value here is 405.5 with a p-value of 0.6494 in SPSS, i am ranking my data and then performing a Mann-Whitney U by selecting analyze - non-parametric tests - 2 independent samples and then checking off the Mann-Whitney U test. For the Mann-Whitney test in SPSS i am gettting the following results: Mann-Whitney U = 350.5 2- tailed p value = 0.643 I think maybe the descrepancy has to do with the specification of the NA values in R, but i'm not sure. If anyone has any suggestions, please let me know! I hope i have provided enough information to convey my problem. Thank-you, Nat __________________ Natalie, It's best to provide at least a sample of your data. Your field names suggest that your data might be collected in units of mm^2 or some similar measurement of area. Why do you want to use Mann-Whitney, which will rank your data and then use those ranks rather than your actual data? Unless your sample is quite small, why not use a two sample t-test? Also,are your samples paired? If they aren't, did you use the "paired = FALSE" option? JWDougherty ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------------------------------------------------ ------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------ ------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------ ------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Lucke, Joseph F wrote:
> R and SPSS are using different but equivalent statistics. R is using > the rank sum of group1 adjusted for the mean rank. SPSS is using the > rank sum of group2 adjusted for the mean rank. > > Close: It is the _minimum_ possible rank sum that is getting subtracted. If everyone in group1 is less than everyone in group2, R's W statistic will be zero. Other way around in SPSS. > Example. > >> G1=group1 >> G2=group2[-length(group2)] #get rid of the NA >> n1=length(G1) #n1=28 >> n2=length(G2) #n2=27 >> > # convert to ranks > >> W=rank(c(G1,G2)) >> R1=W[1:n1] #put the ranks back into the groups >> R2=W[n1+1:n2] >> > #Get the sum of the ranks for each group > >> W1=sum(R1) >> W2=sum(R2) >> > #Adjust for mean rank for group 1 > >> W1-n1*(n1+1)/2 >> > [1] 405.5 > #Adjust for mean rank for group 2 > >> W2-n2*(n2+1)/2 >> > [1] 350.5 > > W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result. > > Ties throw a wrench in the works. R uses a continuity correction by > default, SPSS does not. > Taking out the continuity correction, > >> wilcox.test(G1,G2,correct=FALSE) >> > > Wilcoxon rank sum test > > data: G1 and G2 > W = 405.5, p-value = 0.6433 > alternative hypothesis: true location shift is not equal to 0 > > Warning message: > cannot compute exact p-value with ties in: wilcox.test.default(G1, G2, > correct = FALSE) > > This p-value is the same as SPSS's. > > > Consult a serious non-parametrics text. I used > Lehmann, E. L., Nonparametrics: Statistical methods based on ranks. > 1975. Holden-Day. San Francisco, CA. > > > -----Original Message----- > From: [hidden email] > [mailto:[hidden email]] On Behalf Of Natalie O'Toole > Sent: Wednesday, August 15, 2007 1:07 PM > To: [hidden email] > Subject: Re: [R] Mann-Whitney U > > Hi, > > I do want to use the Mann-Whitney test which ranks my data and then uses > those ranks rather than the actual data. > > Here is the R code i am using: > > group1<- > c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2, > 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > >> group2<- >> > c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9 > 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > >> result <- wilcox.test(group1, group2, paired=FALSE, conf.level = >> 0.95, >> > na.action) > > paired = FALSE so that the Wilcoxon rank sum test which is equivalent to > the Mann-Whitney test is used (my samples are NOT paired). > conf.level = 0.95 to specify the confidence level na.action is used > because i have a NA value (i suspect i am not using na.action in the > correct manner) > > When i use this code i get the following error message: > > Error in arg == choices : comparison (1) is possible only for atomic and > list types > > When i use this code: > > group1<- > c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2, > 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > >> group2<- >> > c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9 > 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > >> result <- wilcox.test(group1, group2, paired=FALSE, conf.level = >> 0.95) >> > > I get the following result: > > Wilcoxon rank sum test with continuity correction > > data: group1 and group2 > W = 405.5, p-value = 0.6494 > alternative hypothesis: true location shift is not equal to 0 > > Warning message: > cannot compute exact p-value with ties in: wilcox.test.default(group1, > group2, paired = FALSE, conf.level = 0.95) > > The W value here is 405.5 with a p-value of 0.6494 > > > in SPSS, i am ranking my data and then performing a Mann-Whitney U by > selecting analyze - non-parametric tests - 2 independent samples and > then checking off the Mann-Whitney U test. > > For the Mann-Whitney test in SPSS i am gettting the following results: > > Mann-Whitney U = 350.5 > 2- tailed p value = 0.643 > > I think maybe the descrepancy has to do with the specification of the NA > values in R, but i'm not sure. > > > If anyone has any suggestions, please let me know! > > I hope i have provided enough information to convey my problem. > > Thank-you, > > Nat > __________________ > > > Natalie, > > It's best to provide at least a sample of your data. Your field names > suggest > that your data might be collected in units of mm^2 or some similar > measurement of area. Why do you want to use Mann-Whitney, which will > rank > > your data and then use those ranks rather than your actual data? Unless > > your > sample is quite small, why not use a two sample t-test? Also,are your > samples paired? If they aren't, did you use the "paired = FALSE" > option? > > JWDougherty > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > ------------------------------------------------------------------------ > ------------------------------------------------ > > This communication is intended for the use of the recipient to which it > is > addressed, and may > contain confidential, personal, and or privileged information. Please > contact the sender > immediately if you are not the intended recipient of this communication, > > and do not copy, > distribute, or take action relying on it. Any communication received in > error, or subsequent > reply, should be deleted or destroyed. > > > ------------------------------------------------------------------------ > ------------------------------------------------ > > This communication is intended for the use of the recipient to which it > is > addressed, and may > contain confidential, personal, and or privileged information. Please > contact the sender > immediately if you are not the intended recipient of this communication, > > and do not copy, > distribute, or take action relying on it. Any communication received in > error, or subsequent > reply, should be deleted or destroyed. > ------------------------------------------------------------------------ > ------------------------------------------------ > > This communication is intended for the use of the recipient to which it > is > addressed, and may > contain confidential, personal, and or privileged information. Please > contact the sender > immediately if you are not the intended recipient of this communication, > > and do not copy, > distribute, or take action relying on it. Any communication received in > error, or subsequent > reply, should be deleted or destroyed. > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Natalie O'Toole
On Wed, 2007-08-15 at 12:06 -0600, Natalie O'Toole wrote:
> Hi, > > I do want to use the Mann-Whitney test which ranks my data and then > uses > those ranks rather than the actual data. > > Here is the R code i am using: > > group1<- > c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > > group2<- > c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > > result <- wilcox.test(group1, group2, paired=FALSE, conf.level = > 0.95, > na.action) You did not specify a value for the na.action argument, hence the error message you are getting. It defaults to 'na.omit', unless you have modified R's options. See ?na.action for more information. In this case, it will remove any NA values from the two vectors prior to calculating the statistic. The additional arguments are really superfluous here. You can simply use: wilcox.test(group1, group2) > paired = FALSE so that the Wilcoxon rank sum test which is equivalent > to > the Mann-Whitney test is used (my samples are NOT paired). > conf.level = 0.95 to specify the confidence level > na.action is used because i have a NA value (i suspect i am not using > na.action in the correct manner) > > When i use this code i get the following error message: > > Error in arg == choices : comparison (1) is possible only for atomic > and > list types > > When i use this code: > > group1<- > c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3) > > group2<- > c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA) > > result <- wilcox.test(group1, group2, paired=FALSE, conf.level = > 0.95) > > I get the following result: > > Wilcoxon rank sum test with continuity correction > > data: group1 and group2 > W = 405.5, p-value = 0.6494 > alternative hypothesis: true location shift is not equal to 0 > > Warning message: > cannot compute exact p-value with ties in: > wilcox.test.default(group1, > group2, paired = FALSE, conf.level = 0.95) > > The W value here is 405.5 with a p-value of 0.6494 > > > in SPSS, i am ranking my data and then performing a Mann-Whitney U by > selecting analyze - non-parametric tests - 2 independent samples and > then > checking off the Mann-Whitney U test. > > For the Mann-Whitney test in SPSS i am gettting the following results: > > Mann-Whitney U = 350.5 > 2- tailed p value = 0.643 > > I think maybe the descrepancy has to do with the specification of the > NA > values in R, but i'm not sure. > > > If anyone has any suggestions, please let me know! > > I hope i have provided enough information to convey my problem. > > Thank-you, > > Nat It would appear that SPSS is reversing the two groups in it's calculation and NOT using a correction by default. If you review the internal code for wilcox.test(), by using: stats:::wilcox.test.default you can see that the relevant code in this case is: r <- rank(c(x - mu, y)) n.x <- as.double(length(x)) n.y <- as.double(length(y)) STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2 Thus, if we use 'x' and 'y' for your two groups, respectively, we get: x <- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84, 1.9,1.96, 2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8, 3.07,3.3) y <- c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82, 1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73, 2.77,2.9,2.94,NA) mu <- 0 # Now remove the NA values x <- na.omit(x) y <- na.omit(y) r <- rank(c(x - mu, y)) n.x <- as.double(length(x)) n.y <- as.double(length(y)) > r [1] 5.0 8.0 9.0 10.5 13.0 14.0 16.0 16.0 16.0 19.0 22.0 24.0 26.5 [14] 28.0 30.5 30.5 35.0 36.5 36.5 38.5 38.5 40.0 42.5 46.0 50.5 50.5 [27] 54.0 55.0 1.0 2.0 3.0 4.0 6.0 7.0 10.5 12.0 18.0 20.0 21.0 [40] 23.0 25.0 26.5 29.0 32.0 33.0 34.0 41.0 42.5 44.0 45.0 47.0 48.0 [53] 49.0 52.0 53.0 > n.x [1] 28 > n.y [1] 27 STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2 > STATISTIC [1] 405.5 This is the value you get with R as you have used it. Now, to replicate the statistic in SPSS, use the following code, with x and y interchanged: r <- rank(c(y - mu, x)) n.x <- as.double(length(x)) n.y <- as.double(length(y)) STATISTIC <- sum(r[seq_along(y)]) - n.y * (n.y + 1)/2 So we get: > r [1] 1.0 2.0 3.0 4.0 6.0 7.0 10.5 12.0 18.0 20.0 21.0 23.0 25.0 [14] 26.5 29.0 32.0 33.0 34.0 41.0 42.5 44.0 45.0 47.0 48.0 49.0 52.0 [27] 53.0 5.0 8.0 9.0 10.5 13.0 14.0 16.0 16.0 16.0 19.0 22.0 24.0 [40] 26.5 28.0 30.5 30.5 35.0 36.5 36.5 38.5 38.5 40.0 42.5 46.0 50.5 [53] 50.5 54.0 55.0 > n.x [1] 28 > n.y [1] 27 > STATISTIC [1] 350.5 So we now match SPSS' calculation of the statistic. Now, to complete the process and replicate the SPSS results fully, you could do the following, by reversing the order of your arguments and setting 'correct = FALSE'. I am using 'x' and 'y' here, but use 'group1' and 'group2' on your system: > wilcox.test(y, x, correct = FALSE) Wilcoxon rank sum test data: y and x W = 350.5, p-value = 0.6433 alternative hypothesis: true location shift is not equal to 0 Warning message: cannot compute exact p-value with ties in: wilcox.test.default(y, x, correct = FALSE) BTW, I located a ftp site with SPSS' algorithm documentation online at: ftp://ftp.spss.com/pub/spss/statistics/spss/algorithms/ For the MW test, the relevant document is npart.pdf. HTH, Marc Schwartz ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |