A questionb about the Wilcoxon signed rank test

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

A questionb about the Wilcoxon signed rank test

hix li
Hi guys,
 
I have two data sets of prices: endprice0, endprice1
 
I use the Wilcox test:
 
wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",  conf.int = T, conf.level = 0.9)
 
The result is with V = 1819, p-value = 0.8812.
 
Then I calculated the z-value of the test: z-value = -2.661263. The corresponding p-value is: p-value = 0.003892, which is different from the p-value computed in the Wilcox test, I am using the following steps to compute the z-value:
 
diff = c(endprice0 - endprice1)
diffNew = diff[diff !=0]
diffNew.rank = rank(abs(diffNew))
diffNew.rank.sign <-  diffNew.rank  *  sign(diffNew)
ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819
ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751
 
v = ranks.neg
n = 100
z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263


Which p-value should I take for the Wilcox test then?
 
Hix
 
the data sets used in my test are:

endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53)
 
endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83)


 


      __________________________________________________________________
Make your browsing faster, safer, and easier with the new Internet Explorer[[elided Yahoo spam]]
com/ca/internetexplorer/
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: A questionb about the Wilcoxon signed rank test

David Winsemius

On Apr 5, 2010, at 8:06 AM, hix li wrote:

> Hi guys,
>
> I have two data sets of prices: endprice0, endprice1
>
> I use the Wilcox test:
>
> wilcox.test(endprice0, endprice1, paired = TRUE, alternative =  
> "two.sided",  conf.int = T, conf.level = 0.9)
>
> The result is with V = 1819, p-value = 0.8812.
>
> Then I calculated the z-value of the test: z-value = -2.661263. The  
> corresponding p-value is: p-value = 0.003892, which is different  
> from the p-value computed in the Wilcox test, I am using the  
> following steps to compute the z-value:

If you are trying to invent a new test then you should provide a  
theoretic justification. If you are doing this a a homework exercise,  
then consult with your instructor. If you are looking for alternative  
methods of looking at the data then either do a paired t.test or try :

  plot(density(endprice0))
lines(density(endprice1), col="red")

--
David.

>
> diff = c(endprice0 - endprice1)
> diffNew = diff[diff !=0]
> diffNew.rank = rank(abs(diffNew))
> diffNew.rank.sign <-  diffNew.rank  *  sign(diffNew)
> ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819
> ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751
>
> v = ranks.neg
> n = 100
> z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263
>
>
> Which p-value should I take for the Wilcox test then?
>
> Hix
>
> the data sets used in my test are:
>
> endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600,  
> 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500,  
> 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000,  
> 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400,  
> 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900,  
> 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500,  
> 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400,  
> 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000,  
> 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23,  
> 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0,  
> 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19,  
> 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23,  
> 13.46, 326.64, 198.19, 114.22, 79.53)
>
> endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200,  
> 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000,  
> 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600,  
> 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900,  
> 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000,  
> 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700,  
> 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000,  
> 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400,  
> 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52,  
> 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92,  
> 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11,  
> 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17,  
> 191.54, 126.44, 84.83)
>
>
>
>
>
>      
> __________________________________________________________________
> Make your browsing faster, safer, and easier with the new Internet  
> Explorer[[elided Yahoo spam]]
> com/ca/internetexplorer/
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: A questionb about the Wilcoxon signed rank test

Thomas Lumley
In reply to this post by hix li


The problem is that your data contains ties, which mess up the nice theory and result in different people using different approximations.

I don't know where your z-statistic formula comes from, but you can find the one R uses by looking at the source code in stats:::wilcox.test.default.

To see that R's z-statistic approximation is better than yours, try breaking the ties randomly and using exact=TRUE.
     wilcox.test(endprice0+rnorm(length(endprice0),s=1e-10),endprice1,paired=TRUE,exact=TRUE)
You will find that the p-values agree fairly well with R's 0.88.

         -thomas


On Mon, 5 Apr 2010, hix li wrote:

> Hi guys,
> ?
> I have two data sets of prices: endprice0, endprice1
> ?
> I use the Wilcox test:
> ?
> wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",? conf.int = T, conf.level = 0.9)
> ?
> The result is with V = 1819, p-value = 0.8812.
> ?
> Then I calculated the z-value of the test: z-value = -2.661263. The corresponding p-value is: p-value = 0.003892, which is different from the p-value computed in the Wilcox test, I am using the following steps to compute the z-value:
> ?
> diff = c(endprice0 - endprice1)
> diffNew = diff[diff !=0]
> diffNew.rank = rank(abs(diffNew))
> diffNew.rank.sign <-? diffNew.rank? *? sign(diffNew)
> ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819
> ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751
> ?
> v = ranks.neg
> n = 100
> z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263
>
>
> Which p-value should I take for the Wilcox test then?
> ?
> Hix
> ?
> the data sets used in my?test?are:
>
> endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53)
> ?
> endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83)
>
>
> ?
>
>
>      __________________________________________________________________
> Make your browsing faster, safer, and easier with the new Internet Explorer[[elided Yahoo spam]]
> com/ca/internetexplorer/
> [[alternative HTML version deleted]]
>
>

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: A questionb about the Wilcoxon signed rank test

Peter Ehlers
In reply to this post by hix li
Since this may be homework, I'll confine myself to a hint (which
may or may not be the problem; I haven't checked):

The formula you use for z is strongly dependent on the value of 'n'.

  -Peter Ehlers

On 2010-04-05 6:06, hix li wrote:

> Hi guys,
>
> I have two data sets of prices: endprice0, endprice1
>
> I use the Wilcox test:
>
> wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",  conf.int = T, conf.level = 0.9)
>
> The result is with V = 1819, p-value = 0.8812.
>
> Then I calculated the z-value of the test: z-value = -2.661263. The corresponding p-value is: p-value = 0.003892, which is different from the p-value computed in the Wilcox test, I am using the following steps to compute the z-value:
>
> diff = c(endprice0 - endprice1)
> diffNew = diff[diff !=0]
> diffNew.rank = rank(abs(diffNew))
> diffNew.rank.sign<-  diffNew.rank  *  sign(diffNew)
> ranks.pos<- sum(diffNew.rank.sign[diffNew.rank.sign>0]) = 1819
> ranks.neg<- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751
>
> v = ranks.neg
> n = 100
> z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263
>
>
> Which p-value should I take for the Wilcox test then?
>
> Hix
>
> the data sets used in my test are:
>
> endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53)
>
> endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83)
>
>
>
>
>
>        __________________________________________________________________
> Make your browsing faster, safer, and easier with the new Internet Explorer[[elided Yahoo spam]]
> com/ca/internetexplorer/
> [[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Ehlers
University of Calgary

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.