# Wilcoxon signed rank test and its requirements

30 messages
12
Open this post in threaded view
|

## Wilcoxon signed rank test and its requirements

 Hi all, I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in "Wilcoxon signed rank test with continuity correction": > wilcox.test(Sample,mu=mean(All), alt="two.sided")         Wilcoxon signed rank test with continuity correction data:  AlphaNoteOnsetDists V = 63855, p-value = 0.0002093 alternative hypothesis: true location is not equal to 0.4115136 > wilcox.test(Sample,mu=mean(All), alt = "greater")         Wilcoxon signed rank test with continuity correction data:  AlphaNoteOnsetDists V = 63855, p-value = 0.0001047 alternative hypothesis: true location is greater than 0.4115136 What assumptions are needed for the population? What can we say according these results? p-value for the "less" is 0.999. Thanks in advance, Atte Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote: > Hi all, > > I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in "Wilcoxon signed rank test with continuity correction": > >> wilcox.test(Sample,mu=mean(All), alt="two.sided") > >        Wilcoxon signed rank test with continuity correction > > data:  AlphaNoteOnsetDists > V = 63855, p-value = 0.0002093 > alternative hypothesis: true location is not equal to 0.4115136 > >> wilcox.test(Sample,mu=mean(All), alt = "greater") > >        Wilcoxon signed rank test with continuity correction > > data:  AlphaNoteOnsetDists > V = 63855, p-value = 0.0001047 > alternative hypothesis: true location is greater than 0.4115136 > > What assumptions are needed for the population? wikipedia says: "The Wilcoxon signed-rank test is a _non-parametric_ statistical hypothesis test for... " it also talks about the assumptions. > What can we say according these results? > p-value for the "less" is 0.999. That the p-value for less and greater seem to sum up to one, and that the p-value of greater is half of that for two-sided. You shouldn't ask what we can say. You should ask yourself "What was the question and is this test giving me an answer on that question?" Cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 [hidden email] ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 Thanks. What I have had to ask is that   how do you test that the data is symmetric enough? If it is not, is it ok to use some data transformation? when it is said: "The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value." > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote: > > Hi all, > > > > I have a distribution, and take a sample of it. Then I compare that > sample with the mean of the population like here in "Wilcoxon signed > rank test with continuity correction": > > > >> wilcox.test(Sample,mu=mean(All), alt="two.sided") > > > >        Wilcoxon signed rank test with continuity correction > > > > data:  AlphaNoteOnsetDists > > V = 63855, p-value = 0.0002093 > > alternative hypothesis: true location is not equal to 0.4115136 > > > >> wilcox.test(Sample,mu=mean(All), alt = "greater") > > > >        Wilcoxon signed rank test with continuity correction > > > > data:  AlphaNoteOnsetDists > > V = 63855, p-value = 0.0001047 > > alternative hypothesis: true location is greater than 0.4115136 > > > > What assumptions are needed for the population? > > wikipedia says: > "The Wilcoxon signed-rank test is a _non-parametric_ statistical > hypothesis test for... " > it also talks about the assumptions. > > > What can we say according these results? > > p-value for the "less" is 0.999. > > That the p-value for less and greater seem to sum up to one, and that > the p-value of greater is half of that for two-sided. You shouldn't > ask what we can say. You should ask yourself "What was the question > and is this test giving me an answer on that question?" > > Cheers > Joris > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > [hidden email] > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 PS. Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package? > Thanks. What I have had to ask is that >   > how do you test that the data is symmetric enough? > If it is not, is it ok to use some data transformation? > > when it is said: > > "The Wilcoxon signed rank test does not assume that the data are > sampled from a Gaussian distribution. However it does assume that the > data are distributed symmetrically around the median. If the > distribution is asymmetrical, the P value will not tell you much about > whether the median is different than the hypothetical value." > > > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote: > > > Hi all, > > > > > > I have a distribution, and take a sample of it. Then I compare > that > > sample with the mean of the population like here in "Wilcoxon signed > > > rank test with continuity correction": > > > > > >> wilcox.test(Sample,mu=mean(All), alt="two.sided") > > > > > >        Wilcoxon signed rank test with continuity correction > > > > > > data:  AlphaNoteOnsetDists > > > V = 63855, p-value = 0.0002093 > > > alternative hypothesis: true location is not equal to 0.4115136 > > > > > >> wilcox.test(Sample,mu=mean(All), alt = "greater") > > > > > >        Wilcoxon signed rank test with continuity correction > > > > > > data:  AlphaNoteOnsetDists > > > V = 63855, p-value = 0.0001047 > > > alternative hypothesis: true location is greater than 0.4115136 > > > > > > What assumptions are needed for the population? > > > > wikipedia says: > > "The Wilcoxon signed-rank test is a _non-parametric_ statistical > > hypothesis test for... " > > it also talks about the assumptions. > > > > > What can we say according these results? > > > p-value for the "less" is 0.999. > > > > That the p-value for less and greater seem to sum up to one, and that > > the p-value of greater is half of that for two-sided. You shouldn't > > ask what we can say. You should ask yourself "What was the question > > and is this test giving me an answer on that question?" > > > > Cheers > > Joris > > > > -- > > Joris Meys > > Statistical consultant > > > > Ghent University > > Faculty of Bioscience Engineering > > Department of Applied mathematics, biometrics and process control > > > > tel : +32 9 264 59 87 > > [hidden email] > > ------------------------------- > > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 One way of looking at it is doing a sign test after substraction of the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect to have about as many values above as below zero. There is a sign test somewhere in one of the packages, but it's easily done using the binom.test as well : > set.seed(12345) > x1 <- rnorm(100) > x2 <- rpois(100,2) >  binom.test((sum(x1-mean(x1)>0)),length(x1))         Exact binomial test data:  (sum(x1 - mean(x1) > 0)) and length(x1) number of successes = 56, number of trials = 100, p-value = 0.2713 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval:  0.4571875 0.6591640 sample estimates: probability of success                   0.56 >  binom.test((sum(x2-mean(x2)>0)),length(x2))         Exact binomial test data:  (sum(x2 - mean(x2) > 0)) and length(x2) number of successes = 37, number of trials = 100, p-value = 0.01203 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval:  0.2755666 0.4723516 sample estimates: probability of success                   0.37 Cheers Joris On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen <[hidden email]> wrote: > PS. > > Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package? > >> Thanks. What I have had to ask is that >> >> how do you test that the data is symmetric enough? >> If it is not, is it ok to use some data transformation? >> >> when it is said: >> >> "The Wilcoxon signed rank test does not assume that the data are >> sampled from a Gaussian distribution. However it does assume that the >> data are distributed symmetrically around the median. If the >> distribution is asymmetrical, the P value will not tell you much about >> whether the median is different than the hypothetical value." >> >> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote: >> > > Hi all, >> > > >> > > I have a distribution, and take a sample of it. Then I compare >> that >> > sample with the mean of the population like here in "Wilcoxon signed >> >> > rank test with continuity correction": >> > > >> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided") >> > > >> > >        Wilcoxon signed rank test with continuity correction >> > > >> > > data:  AlphaNoteOnsetDists >> > > V = 63855, p-value = 0.0002093 >> > > alternative hypothesis: true location is not equal to 0.4115136 >> > > >> > >> wilcox.test(Sample,mu=mean(All), alt = "greater") >> > > >> > >        Wilcoxon signed rank test with continuity correction >> > > >> > > data:  AlphaNoteOnsetDists >> > > V = 63855, p-value = 0.0001047 >> > > alternative hypothesis: true location is greater than 0.4115136 >> > > >> > > What assumptions are needed for the population? >> > >> > wikipedia says: >> > "The Wilcoxon signed-rank test is a _non-parametric_ statistical >> > hypothesis test for... " >> > it also talks about the assumptions. >> > >> > > What can we say according these results? >> > > p-value for the "less" is 0.999. >> > >> > That the p-value for less and greater seem to sum up to one, and that >> > the p-value of greater is half of that for two-sided. You shouldn't >> > ask what we can say. You should ask yourself "What was the question >> > and is this test giving me an answer on that question?" >> > >> > Cheers >> > Joris >> > >> > -- >> > Joris Meys >> > Statistical consultant >> > >> > Ghent University >> > Faculty of Bioscience Engineering >> > Department of Applied mathematics, biometrics and process control >> > >> > tel : +32 9 264 59 87 >> > [hidden email] >> > ------------------------------- >> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php> -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 [hidden email] ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 On 06/24/2010 12:40 PM, David Winsemius wrote: > > On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: > >> Thanks. What I have had to ask is that >> >> how do you test that the data is symmetric enough? >> If it is not, is it ok to use some data transformation? >> >> when it is said: >> >> "The Wilcoxon signed rank test does not assume that the data are >> sampled from a Gaussian distribution. However it does assume that the >> data are distributed symmetrically around the median. If the >> distribution is asymmetrical, the P value will not tell you much about >> whether the median is different than the hypothetical value." > > You are being misled. Simply finding a statement on a statistics > software website, even one as reputable as Graphpad (???), does not mean > that it is necessarily true. My understanding (confirmed reviewing > "Nonparametric statistical methods for complete and censored data" by M. > M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does > not require that the underlying distributions be symmetric. The above > quotation is highly inaccurate. > To add to what David and others have said, look at the kernel that the U-statistic associated with the WSR test uses: the indicator (0/1) of xi + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the average of a randomly chosen pair of values is positive.  [If there are ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj < 0], i neq j. Frank -- Frank E Harrell Jr   Professor and Chairman        School of Medicine                       Department of Biostatistics   Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. Frank Harrell Department of Biostatistics, Vanderbilt University
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 In reply to this post by David Winsemius I do agree that one should not trust solely on sources like wikipedia and graphpad, although they contain a lot of valuable information. This said, it is not too difficult to illustrate why, in the case of the one-sample signed rank test, the differences should be not to far away from symmetrical. It just needs some reflection on how the statistic is calculated. If you have an asymmetrical distribution, you have a lot of small differences with a negative sign and a lot of large differences with a positive sign if you test against the median or mean. Hence the sum of ranks for one side will be higher than for the other, leading eventually to a significant result. An extreme example : > set.seed(100) > y <- rnorm(100,1,2)^2 > wilcox.test(y,mu=median(y))         Wilcoxon signed rank test with continuity correction data:  y V = 3240.5, p-value = 0.01396 alternative hypothesis: true location is not equal to 1.829867 > wilcox.test(y,mu=mean(y))         Wilcoxon signed rank test with continuity correction data:  y V = 1763, p-value = 0.008837 alternative hypothesis: true location is not equal to 5.137409 Which brings us to the question what location is actually tested in the wilcoxon test. For the measure of location to be the mean (or median), one has to assume that the distribution of the differences is rather symmetrical, which implies your data has to be distributed somewhat symmetrical. The test is robust against violations of this -implicit- assumption, but in more extreme cases skewness does matter. Cheers Joris On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius <[hidden email]> wrote: > > > You are being misled. Simply finding a statement on a statistics software > website, even one as reputable as Graphpad (???), does not mean that it is > necessarily true. My understanding (confirmed reviewing "Nonparametric > statistical methods for complete and censored data" by M. M. Desu, Damaraju > Raghavarao, is that the Wilcoxon signed-rank test does not require that the > underlying distributions be symmetric. The above quotation is highly > inaccurate. > > -- > David. > >> -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 [hidden email] ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 On Jun 24, 2010, at 6:09 PM, Joris Meys wrote: > I do agree that one should not trust solely on sources like wikipedia > and graphpad, although they contain a lot of valuable information. > > This said, it is not too difficult to illustrate why, in the case of > the one-sample signed rank test, That is a key point. I was assuming that you were using the paired   sample version of the WSRT and I may have been misleading the OP. For   the one-sample situation, the assumption of symmetry is needed but for   the paired sampling version of the test, the location shift becomes   the tested hypothesis, and no assumptions about the form of the   hypothesis are made except that they be the same. Any consideration of   median or mean (which will be the same in the case of symmetric   distributions) gets lost in the paired test case. -- David. > the differences should be not to far > away from symmetrical. It just needs some reflection on how the > statistic is calculated. If you have an asymmetrical distribution, you > have a lot of small differences with a negative sign and a lot of > large differences with a positive sign if you test against the median > or mean. Hence the sum of ranks for one side will be higher than for > the other, leading eventually to a significant result. > > An extreme example : > >> set.seed(100) >> y <- rnorm(100,1,2)^2 >> wilcox.test(y,mu=median(y)) > >        Wilcoxon signed rank test with continuity correction > > data:  y > V = 3240.5, p-value = 0.01396 > alternative hypothesis: true location is not equal to 1.829867 > >> wilcox.test(y,mu=mean(y)) > >        Wilcoxon signed rank test with continuity correction > > data:  y > V = 1763, p-value = 0.008837 > alternative hypothesis: true location is not equal to 5.137409 > > Which brings us to the question what location is actually tested in > the wilcoxon test. For the measure of location to be the mean (or > median), one has to assume that the distribution of the differences is > rather symmetrical, which implies your data has to be distributed > somewhat symmetrical. The test is robust against violations of this > -implicit- assumption, but in more extreme cases skewness does matter. > > Cheers > Joris > > On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius <[hidden email] > > wrote: >> >> >> You are being misled. Simply finding a statement on a statistics   >> software >> website, even one as reputable as Graphpad (???), does not mean   >> that it is >> necessarily true. My understanding (confirmed reviewing   >> "Nonparametric >> statistical methods for complete and censored data" by M. M. Desu,   >> Damaraju >> Raghavarao, is that the Wilcoxon signed-rank test does not require   >> that the >> underlying distributions be symmetric. The above quotation is highly >> inaccurate. >> >> -- >> David. >> >>> > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > [hidden email] > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 In reply to this post by Frank Harrell Is there anything for me? There is a lot of data, n=2418, but there are also a lot of ties. My sample n≈250-300 i would like to test, whether the mean of the sample differ significantly from the population mean. The histogram of the population looks like in attached histogram, what test should I use? No choices? This distribution comes from a musical piece and the values are 'tonal distances'. http://users.utu.fi/attenka/Hist.pngAtte > On 06/24/2010 12:40 PM, David Winsemius wrote: > > > > On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: > > > >> Thanks. What I have had to ask is that > >> > >> how do you test that the data is symmetric enough? > >> If it is not, is it ok to use some data transformation? > >> > >> when it is said: > >> > >> "The Wilcoxon signed rank test does not assume that the data are > >> sampled from a Gaussian distribution. However it does assume that the > >> data are distributed symmetrically around the median. If the > >> distribution is asymmetrical, the P value will not tell you much about > >> whether the median is different than the hypothetical value." > > > > You are being misled. Simply finding a statement on a statistics > > software website, even one as reputable as Graphpad (???), does not > mean > > that it is necessarily true. My understanding (confirmed reviewing > > "Nonparametric statistical methods for complete and censored data" > by M. > > M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test > does > > not require that the underlying distributions be symmetric. The above > > quotation is highly inaccurate. > > > > To add to what David and others have said, look at the kernel that the > > U-statistic associated with the WSR test uses: the indicator (0/1) of > xi > + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the > average of a randomly chosen pair of values is positive.  [If there > are > ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj < > > 0], i neq j. > > Frank > > -- > Frank E Harrell Jr   Professor and Chairman        School of Medicine >                       Department of Biostatistics   Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: > Is there anything for me? > > There is a lot of data, n=2418, but there are also a lot of ties. > My sample n≈250-300 > I do not understand why there should be so many ties. You have not   described the measurement process or units. ( ... although you offer a   glipmse without much background  later.) > i would like to test, whether the mean of the sample differ   > significantly from the population mean. Why? What is the purpose of this investigation? Why should the mean of   a sample be that important? > > The histogram of the population looks like in attached histogram,   > what test should I use? No choices? > > This distribution comes from a musical piece and the values are   > 'tonal distances'. > > http://users.utu.fi/attenka/Hist.pngThat picture does not offer much insidght into the features of that   measurement. It appears to have much more structure than I would   expect for a sample from a smooth unimodal underlying population. -- David. > > Atte > >> On 06/24/2010 12:40 PM, David Winsemius wrote: >>> >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: >>> >>>> Thanks. What I have had to ask is that >>>> >>>> how do you test that the data is symmetric enough? >>>> If it is not, is it ok to use some data transformation? >>>> >>>> when it is said: >>>> >>>> "The Wilcoxon signed rank test does not assume that the data are >>>> sampled from a Gaussian distribution. However it does assume that   >>>> the >>>> data are distributed symmetrically around the median. If the >>>> distribution is asymmetrical, the P value will not tell you much   >>>> about >>>> whether the median is different than the hypothetical value." >>> >>> You are being misled. Simply finding a statement on a statistics >>> software website, even one as reputable as Graphpad (???), does not >> mean >>> that it is necessarily true. My understanding (confirmed reviewing >>> "Nonparametric statistical methods for complete and censored data" >> by M. >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test >> does >>> not require that the underlying distributions be symmetric. The   >>> above >>> quotation is highly inaccurate. >>> >> >> To add to what David and others have said, look at the kernel that   >> the >> >> U-statistic associated with the WSR test uses: the indicator (0/1) of >> xi >> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the >> average of a randomly chosen pair of values is positive.  [If there >> are >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj < >> >> 0], i neq j. >> >> Frank >> >> -- >> Frank E Harrell Jr   Professor and Chairman        School of Medicine >>                      Department of Biostatistics   Vanderbilt   >> University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 The values come from this kind of process: The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values. then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values. Atte > On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: > > > Is there anything for me? > > > > There is a lot of data, n=2418, but there are also a lot of ties. > > My sample n≈250-300 > > > > I do not understand why there should be so many ties. You have not   > described the measurement process or units. ( ... although you offer a >   > glipmse without much background  later.) > > > i would like to test, whether the mean of the sample differ   > > significantly from the population mean. > > Why? What is the purpose of this investigation? Why should the mean of >   > a sample be that important? > > > > > The histogram of the population looks like in attached histogram,   > > what test should I use? No choices? > > > > This distribution comes from a musical piece and the values are   > > 'tonal distances'. > > > > http://users.utu.fi/attenka/Hist.png> > That picture does not offer much insidght into the features of that   > measurement. It appears to have much more structure than I would   > expect for a sample from a smooth unimodal underlying population. > > -- > David. > > > > > Atte > > > >> On 06/24/2010 12:40 PM, David Winsemius wrote: > >>> > >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: > >>> > >>>> Thanks. What I have had to ask is that > >>>> > >>>> how do you test that the data is symmetric enough? > >>>> If it is not, is it ok to use some data transformation? > >>>> > >>>> when it is said: > >>>> > >>>> "The Wilcoxon signed rank test does not assume that the data are > >>>> sampled from a Gaussian distribution. However it does assume that >   > >>>> the > >>>> data are distributed symmetrically around the median. If the > >>>> distribution is asymmetrical, the P value will not tell you much   > > >>>> about > >>>> whether the median is different than the hypothetical value." > >>> > >>> You are being misled. Simply finding a statement on a statistics > >>> software website, even one as reputable as Graphpad (???), does not > >> mean > >>> that it is necessarily true. My understanding (confirmed reviewing > >>> "Nonparametric statistical methods for complete and censored data" > >> by M. > >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test > >> does > >>> not require that the underlying distributions be symmetric. The   > >>> above > >>> quotation is highly inaccurate. > >>> > >> > >> To add to what David and others have said, look at the kernel that   > > >> the > >> > >> U-statistic associated with the WSR test uses: the indicator (0/1) > of > >> xi > >> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the > >> average of a randomly chosen pair of values is positive.  [If there > >> are > >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj > < > >> > >> 0], i neq j. > >> > >> Frank > >> > >> -- > >> Frank E Harrell Jr   Professor and Chairman        School of Medicine > >>                      Department of Biostatistics   Vanderbilt   > >> University > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 In reply to this post by David Winsemius BTW. If there is not so weak test that would be suitable for my purpose (because of the ties and the shape of the data), could I proceed this way: It is also worth of comparing different samples taken from the data. Since the mean and sd of the data are available, could I approximate p-values using z- or t-test, just to compare several different samples? Atte > On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote: > > > Is there anything for me? > > > > There is a lot of data, n=2418, but there are also a lot of ties. > > My sample n≈250-300 > > > > I do not understand why there should be so many ties. You have not   > described the measurement process or units. ( ... although you offer a >   > glipmse without much background  later.) > > > i would like to test, whether the mean of the sample differ   > > significantly from the population mean. > > Why? What is the purpose of this investigation? Why should the mean of >   > a sample be that important? > > > > > The histogram of the population looks like in attached histogram,   > > what test should I use? No choices? > > > > This distribution comes from a musical piece and the values are   > > 'tonal distances'. > > > > http://users.utu.fi/attenka/Hist.png> > That picture does not offer much insidght into the features of that   > measurement. It appears to have much more structure than I would   > expect for a sample from a smooth unimodal underlying population. > > -- > David. > > > > > Atte > > > >> On 06/24/2010 12:40 PM, David Winsemius wrote: > >>> > >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote: > >>> > >>>> Thanks. What I have had to ask is that > >>>> > >>>> how do you test that the data is symmetric enough? > >>>> If it is not, is it ok to use some data transformation? > >>>> > >>>> when it is said: > >>>> > >>>> "The Wilcoxon signed rank test does not assume that the data are > >>>> sampled from a Gaussian distribution. However it does assume that >   > >>>> the > >>>> data are distributed symmetrically around the median. If the > >>>> distribution is asymmetrical, the P value will not tell you much   > > >>>> about > >>>> whether the median is different than the hypothetical value." > >>> > >>> You are being misled. Simply finding a statement on a statistics > >>> software website, even one as reputable as Graphpad (???), does not > >> mean > >>> that it is necessarily true. My understanding (confirmed reviewing > >>> "Nonparametric statistical methods for complete and censored data" > >> by M. > >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test > >> does > >>> not require that the underlying distributions be symmetric. The   > >>> above > >>> quotation is highly inaccurate. > >>> > >> > >> To add to what David and others have said, look at the kernel that   > > >> the > >> > >> U-statistic associated with the WSR test uses: the indicator (0/1) > of > >> xi > >> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the > >> average of a randomly chosen pair of values is positive.  [If there > >> are > >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj > < > >> > >> 0], i neq j. > >> > >> Frank > >> > >> -- > >> Frank E Harrell Jr   Professor and Chairman        School of Medicine > >>                      Department of Biostatistics   Vanderbilt   > >> University > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 In reply to this post by Atte Tenkanen As a remark on your histogram : use less breaks! This histogram tells you nothing. An interesting function is ?density , eg : x<-rnorm(250) hist(x,freq=F) lines(density(x),col="red") See also this ppt, a very nice and short introduction to graphics in R : http://csg.sph.umich.edu/docs/R/graphics-1.pdf2010/6/25 Atte Tenkanen <[hidden email]>: > Is there anything for me? > > There is a lot of data, n=2418, but there are also a lot of ties. > My sample n≈250-300 You should think about the central limit theorem. Actually, you can just use a t-test to compare means, as with those sample sizes the mean is almost certainly normally distributed. > > i would like to test, whether the mean of the sample differ significantly from the population mean. > According to probability theory, this will be in 5% of the cases if you repeat your sampling infinitly. But as David asked: why on earth do you want to test that? cheers Joris -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 [hidden email] ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Wilcoxon signed rank test and its requirements

 The central limit theorem doesn't help.  It just addresses type I error, not power. Frank On 06/25/2010 04:29 AM, Joris Meys wrote: > As a remark on your histogram : use less breaks! This histogram tells > you nothing. An interesting function is ?density , eg : > > x<-rnorm(250) > hist(x,freq=F) > lines(density(x),col="red") > > See also this ppt, a very nice and short introduction to graphics in R : > http://csg.sph.umich.edu/docs/R/graphics-1.pdf> > 2010/6/25 Atte Tenkanen<[hidden email]>: >> Is there anything for me? >> >> There is a lot of data, n=2418, but there are also a lot of ties. >> My sample n≈250-300 > > You should think about the central limit theorem. Actually, you can > just use a t-test to compare means, as with those sample sizes the > mean is almost certainly normally distributed. >> >> i would like to test, whether the mean of the sample differ significantly from the population mean. >> > According to probability theory, this will be in 5% of the cases if > you repeat your sampling infinitly. But as David asked: why on earth > do you want to test that? > > cheers > Joris > -- Frank E Harrell Jr   Professor and Chairman        School of Medicine                      Department of Biostatistics   Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. Frank Harrell Department of Biostatistics, Vanderbilt University