Wilcoxon signed rank test and its requirements

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Wilcoxon signed rank test and its requirements

Atte Tenkanen
Hi all,

I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in "Wilcoxon signed rank test with continuity correction":

> wilcox.test(Sample,mu=mean(All), alt="two.sided")

        Wilcoxon signed rank test with continuity correction

data:  AlphaNoteOnsetDists
V = 63855, p-value = 0.0002093
alternative hypothesis: true location is not equal to 0.4115136

> wilcox.test(Sample,mu=mean(All), alt = "greater")

        Wilcoxon signed rank test with continuity correction

data:  AlphaNoteOnsetDists
V = 63855, p-value = 0.0001047
alternative hypothesis: true location is greater than 0.4115136

What assumptions are needed for the population?
What can we say according these results?
p-value for the "less" is 0.999.

Thanks in advance,

Atte

Atte Tenkanen
University of Turku, Finland
Department of Musicology
+35823335278
http://users.utu.fi/attenka/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Joris FA Meys
On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote:

> Hi all,
>
> I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in "Wilcoxon signed rank test with continuity correction":
>
>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>
>        Wilcoxon signed rank test with continuity correction
>
> data:  AlphaNoteOnsetDists
> V = 63855, p-value = 0.0002093
> alternative hypothesis: true location is not equal to 0.4115136
>
>> wilcox.test(Sample,mu=mean(All), alt = "greater")
>
>        Wilcoxon signed rank test with continuity correction
>
> data:  AlphaNoteOnsetDists
> V = 63855, p-value = 0.0001047
> alternative hypothesis: true location is greater than 0.4115136
>
> What assumptions are needed for the population?

wikipedia says:
"The Wilcoxon signed-rank test is a _non-parametric_ statistical
hypothesis test for... "
it also talks about the assumptions.

> What can we say according these results?
> p-value for the "less" is 0.999.

That the p-value for less and greater seem to sum up to one, and that
the p-value of greater is half of that for two-sided. You shouldn't
ask what we can say. You should ask yourself "What was the question
and is this test giving me an answer on that question?"

Cheers
Joris

--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Atte Tenkanen
Thanks. What I have had to ask is that
 
how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

"The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value."

> On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote:
> > Hi all,
> >
> > I have a distribution, and take a sample of it. Then I compare that
> sample with the mean of the population like here in "Wilcoxon signed
> rank test with continuity correction":
> >
> >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
> >
> >        Wilcoxon signed rank test with continuity correction
> >
> > data:  AlphaNoteOnsetDists
> > V = 63855, p-value = 0.0002093
> > alternative hypothesis: true location is not equal to 0.4115136
> >
> >> wilcox.test(Sample,mu=mean(All), alt = "greater")
> >
> >        Wilcoxon signed rank test with continuity correction
> >
> > data:  AlphaNoteOnsetDists
> > V = 63855, p-value = 0.0001047
> > alternative hypothesis: true location is greater than 0.4115136
> >
> > What assumptions are needed for the population?
>
> wikipedia says:
> "The Wilcoxon signed-rank test is a _non-parametric_ statistical
> hypothesis test for... "
> it also talks about the assumptions.
>
> > What can we say according these results?
> > p-value for the "less" is 0.999.
>
> That the p-value for less and greater seem to sum up to one, and that
> the p-value of greater is half of that for two-sided. You shouldn't
> ask what we can say. You should ask yourself "What was the question
> and is this test giving me an answer on that question?"
>
> Cheers
> Joris
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> [hidden email]
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Atte Tenkanen
PS.

Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package?

> Thanks. What I have had to ask is that
>  
> how do you test that the data is symmetric enough?
> If it is not, is it ok to use some data transformation?
>
> when it is said:
>
> "The Wilcoxon signed rank test does not assume that the data are
> sampled from a Gaussian distribution. However it does assume that the
> data are distributed symmetrically around the median. If the
> distribution is asymmetrical, the P value will not tell you much about
> whether the median is different than the hypothetical value."
>
> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote:
> > > Hi all,
> > >
> > > I have a distribution, and take a sample of it. Then I compare
> that
> > sample with the mean of the population like here in "Wilcoxon signed
>
> > rank test with continuity correction":
> > >
> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
> > >
> > >        Wilcoxon signed rank test with continuity correction
> > >
> > > data:  AlphaNoteOnsetDists
> > > V = 63855, p-value = 0.0002093
> > > alternative hypothesis: true location is not equal to 0.4115136
> > >
> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
> > >
> > >        Wilcoxon signed rank test with continuity correction
> > >
> > > data:  AlphaNoteOnsetDists
> > > V = 63855, p-value = 0.0001047
> > > alternative hypothesis: true location is greater than 0.4115136
> > >
> > > What assumptions are needed for the population?
> >
> > wikipedia says:
> > "The Wilcoxon signed-rank test is a _non-parametric_ statistical
> > hypothesis test for... "
> > it also talks about the assumptions.
> >
> > > What can we say according these results?
> > > p-value for the "less" is 0.999.
> >
> > That the p-value for less and greater seem to sum up to one, and that
> > the p-value of greater is half of that for two-sided. You shouldn't
> > ask what we can say. You should ask yourself "What was the question
> > and is this test giving me an answer on that question?"
> >
> > Cheers
> > Joris
> >
> > --
> > Joris Meys
> > Statistical consultant
> >
> > Ghent University
> > Faculty of Bioscience Engineering
> > Department of Applied mathematics, biometrics and process control
> >
> > tel : +32 9 264 59 87
> > [hidden email]
> > -------------------------------
> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Joris FA Meys
One way of looking at it is doing a sign test after substraction of
the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect
to have about as many values above as below zero. There is a sign test
somewhere in one of the packages, but it's easily done using the
binom.test as well :

> set.seed(12345)
> x1 <- rnorm(100)
> x2 <- rpois(100,2)

>  binom.test((sum(x1-mean(x1)>0)),length(x1))

        Exact binomial test

data:  (sum(x1 - mean(x1) > 0)) and length(x1)
number of successes = 56, number of trials = 100, p-value = 0.2713
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4571875 0.6591640
sample estimates:
probability of success
                  0.56

>  binom.test((sum(x2-mean(x2)>0)),length(x2))

        Exact binomial test

data:  (sum(x2 - mean(x2) > 0)) and length(x2)
number of successes = 37, number of trials = 100, p-value = 0.01203
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2755666 0.4723516
sample estimates:
probability of success
                  0.37

Cheers
Joris

On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen <[hidden email]> wrote:

> PS.
>
> Mayby I can somehow try to transform data and check it, for example, using the skewness-function of timeDate-package?
>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much about
>> whether the median is different than the hypothetical value."
>>
>> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]> wrote:
>> > > Hi all,
>> > >
>> > > I have a distribution, and take a sample of it. Then I compare
>> that
>> > sample with the mean of the population like here in "Wilcoxon signed
>>
>> > rank test with continuity correction":
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>> > >
>> > >        Wilcoxon signed rank test with continuity correction
>> > >
>> > > data:  AlphaNoteOnsetDists
>> > > V = 63855, p-value = 0.0002093
>> > > alternative hypothesis: true location is not equal to 0.4115136
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
>> > >
>> > >        Wilcoxon signed rank test with continuity correction
>> > >
>> > > data:  AlphaNoteOnsetDists
>> > > V = 63855, p-value = 0.0001047
>> > > alternative hypothesis: true location is greater than 0.4115136
>> > >
>> > > What assumptions are needed for the population?
>> >
>> > wikipedia says:
>> > "The Wilcoxon signed-rank test is a _non-parametric_ statistical
>> > hypothesis test for... "
>> > it also talks about the assumptions.
>> >
>> > > What can we say according these results?
>> > > p-value for the "less" is 0.999.
>> >
>> > That the p-value for less and greater seem to sum up to one, and that
>> > the p-value of greater is half of that for two-sided. You shouldn't
>> > ask what we can say. You should ask yourself "What was the question
>> > and is this test giving me an answer on that question?"
>> >
>> > Cheers
>> > Joris
>> >
>> > --
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Applied mathematics, biometrics and process control
>> >
>> > tel : +32 9 264 59 87
>> > [hidden email]
>> > -------------------------------
>> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

slre
In reply to this post by Atte Tenkanen
There is a potentially useful remark from Peter Dalfgaard at
http://www.mail-archive.com/r-help@.../msg86359.html :

Summarising:
"[The Wilcoxon paired rank sign test assumes symmetry]  
...of differences, and under the null hypothesis. This is usually
rather uncontroversial. "


My rider to this: It's uncontroversial because differences between
random samples from the same asymmetric distribution would form a
symmetric distribution of differences, and the null for the wilcoxon is
essentially that the distributions are the same. Symmetry of differences
at the null follows.
BUT the corollary is that location might not be the only thing that can
cause a wilcoxon test to show a significant difference.

set.seed(1023)

x<-rlnorm(50)
z<-rlnorm(50, sdlog=3)
z<-z-mean(z)+mean(x)

mean(x)
mean(z)
#Same mean..

wilcox.test(x,z)
#Strongly significant test result.

#Not a perfect example, as the test relates to true means, not data set
means.
#But very different skew and scale will make for a very significant
test result as well as very different means



On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen <[hidden email]> wrote:
> PS.
>
> Mayby I can somehow try to transform data and check it, for example,
using the skewness-function of timeDate-package?

>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that
the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much
about
>> whether the median is different than the hypothetical value."
>>
>> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]>
wrote:
>> > > Hi all,
>> > >
>> > > I have a distribution, and take a sample of it. Then I compare
>> that
>> > sample with the mean of the population like here in "Wilcoxon
signed

>>
>> > rank test with continuity correction":
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>> > >
>> > >        Wilcoxon signed rank test with continuity correction
>> > >
>> > > data:  AlphaNoteOnsetDists
>> > > V = 63855, p-value = 0.0002093
>> > > alternative hypothesis: true location is not equal to 0.4115136
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
>> > >
>> > >        Wilcoxon signed rank test with continuity correction
>> > >
>> > > data:  AlphaNoteOnsetDists
>> > > V = 63855, p-value = 0.0001047
>> > > alternative hypothesis: true location is greater than 0.4115136
>> > >
>> > > What assumptions are needed for the population?
>> >
>> > wikipedia says:
>> > "The Wilcoxon signed-rank test is a _non-parametric_ statistical
>> > hypothesis test for... "
>> > it also talks about the assumptions.
>> >
>> > > What can we say according these results?
>> > > p-value for the "less" is 0.999.
>> >
>> > That the p-value for less and greater seem to sum up to one, and
that
>> > the p-value of greater is half of that for two-sided. You
shouldn't
>> > ask what we can say. You should ask yourself "What was the
question

>> > and is this test giving me an answer on that question?"
>> >
>> > Cheers
>> > Joris
>> >
>> > --
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Applied mathematics, biometrics and process control
>> >
>> > tel : +32 9 264 59 87
>> > [hidden email]
>> > -------------------------------
>> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php 
>



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php 

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

David Winsemius
In reply to this post by Atte Tenkanen

On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

> Thanks. What I have had to ask is that
>
> how do you test that the data is symmetric enough?
> If it is not, is it ok to use some data transformation?
>
> when it is said:
>
> "The Wilcoxon signed rank test does not assume that the data are  
> sampled from a Gaussian distribution. However it does assume that  
> the data are distributed symmetrically around the median. If the  
> distribution is asymmetrical, the P value will not tell you much  
> about whether the median is different than the hypothetical value."

You are being misled. Simply finding a statement on a statistics  
software website, even one as reputable as Graphpad (???), does not  
mean that it is necessarily true. My understanding (confirmed  
reviewing "Nonparametric statistical methods for complete and censored  
data" by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-
rank test does not require that the underlying distributions be  
symmetric. The above quotation is highly inaccurate.

--
David.

>
>> On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]>  
>> wrote:
>>> Hi all,
>>>
>>> I have a distribution, and take a sample of it. Then I compare that
>> sample with the mean of the population like here in "Wilcoxon signed
>> rank test with continuity correction":
>>>
>>>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>>>
>>>        Wilcoxon signed rank test with continuity correction
>>>
>>> data:  AlphaNoteOnsetDists
>>> V = 63855, p-value = 0.0002093
>>> alternative hypothesis: true location is not equal to 0.4115136
>>>
>>>> wilcox.test(Sample,mu=mean(All), alt = "greater")
>>>
>>>        Wilcoxon signed rank test with continuity correction
>>>
>>> data:  AlphaNoteOnsetDists
>>> V = 63855, p-value = 0.0001047
>>> alternative hypothesis: true location is greater than 0.4115136
>>>
>>> What assumptions are needed for the population?
>>
>> wikipedia says:
>> "The Wilcoxon signed-rank test is a _non-parametric_ statistical
>> hypothesis test for... "
>> it also talks about the assumptions.
>>
>>> What can we say according these results?
>>> p-value for the "less" is 0.999.
>>
>> That the p-value for less and greater seem to sum up to one, and that
>> the p-value of greater is half of that for two-sided. You shouldn't
>> ask what we can say. You should ask yourself "What was the question
>> and is this test giving me an answer on that question?"
>>
>> Cheers
>> Joris
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> [hidden email]
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Frank Harrell
On 06/24/2010 12:40 PM, David Winsemius wrote:

>
> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much about
>> whether the median is different than the hypothetical value."
>
> You are being misled. Simply finding a statement on a statistics
> software website, even one as reputable as Graphpad (???), does not mean
> that it is necessarily true. My understanding (confirmed reviewing
> "Nonparametric statistical methods for complete and censored data" by M.
> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test does
> not require that the underlying distributions be symmetric. The above
> quotation is highly inaccurate.
>

To add to what David and others have said, look at the kernel that the
U-statistic associated with the WSR test uses: the indicator (0/1) of xi
+ xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the
average of a randomly chosen pair of values is positive.  [If there are
ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj <
0], i neq j.

Frank

--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Atte Tenkanen
In reply to this post by David Winsemius
> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>
> > Thanks. What I have had to ask is that
> >
> > how do you test that the data is symmetric enough?
> > If it is not, is it ok to use some data transformation?
> >
> > when it is said:
> >
> > "The Wilcoxon signed rank test does not assume that the data are  
> > sampled from a Gaussian distribution. However it does assume that  
> > the data are distributed symmetrically around the median. If the  
> > distribution is asymmetrical, the P value will not tell you much  
> > about whether the median is different than the hypothetical value."
>
> You are being misled. Simply finding a statement on a statistics  
> software website, even one as reputable as Graphpad (???), does not  
> mean that it is necessarily true. My understanding (confirmed  
> reviewing "Nonparametric statistical methods for complete and censored
>  
> data" by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-
>
> rank test does not require that the underlying distributions be  
> symmetric. The above quotation is highly inaccurate.
>
> --
> David.

Thanks. Unfortunately, I can't follow the reference at all, but I read this in that way that I can be carefree as far as the underlying distribution is concerned?

Is there any other authoritative reference where that is just stated in a way "test does not require that the underlying distributions be   symmetric or normal".

Atte


> >
> >> On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen <[hidden email]>  
> >> wrote:
> >>> Hi all,
> >>>
> >>> I have a distribution, and take a sample of it. Then I compare that
> >> sample with the mean of the population like here in "Wilcoxon signed
> >> rank test with continuity correction":
> >>>
> >>>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
> >>>
> >>>        Wilcoxon signed rank test with continuity correction
> >>>
> >>> data:  AlphaNoteOnsetDists
> >>> V = 63855, p-value = 0.0002093
> >>> alternative hypothesis: true location is not equal to 0.4115136
> >>>
> >>>> wilcox.test(Sample,mu=mean(All), alt = "greater")
> >>>
> >>>        Wilcoxon signed rank test with continuity correction
> >>>
> >>> data:  AlphaNoteOnsetDists
> >>> V = 63855, p-value = 0.0001047
> >>> alternative hypothesis: true location is greater than 0.4115136
> >>>
> >>> What assumptions are needed for the population?
> >>
> >> wikipedia says:
> >> "The Wilcoxon signed-rank test is a _non-parametric_ statistical
> >> hypothesis test for... "
> >> it also talks about the assumptions.
> >>
> >>> What can we say according these results?
> >>> p-value for the "less" is 0.999.
> >>
> >> That the p-value for less and greater seem to sum up to one, and that
> >> the p-value of greater is half of that for two-sided. You shouldn't
> >> ask what we can say. You should ask yourself "What was the question
> >> and is this test giving me an answer on that question?"
> >>
> >> Cheers
> >> Joris
> >>
> >> --
> >> Joris Meys
> >> Statistical consultant
> >>
> >> Ghent University
> >> Faculty of Bioscience Engineering
> >> Department of Applied mathematics, biometrics and process control
> >>
> >> tel : +32 9 264 59 87
> >> [hidden email]
> >> -------------------------------
> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Joris FA Meys
In reply to this post by David Winsemius
I do agree that one should not trust solely on sources like wikipedia
and graphpad, although they contain a lot of valuable information.

This said, it is not too difficult to illustrate why, in the case of
the one-sample signed rank test, the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution, you
have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the median
or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.

An extreme example :

> set.seed(100)
> y <- rnorm(100,1,2)^2
> wilcox.test(y,mu=median(y))

        Wilcoxon signed rank test with continuity correction

data:  y
V = 3240.5, p-value = 0.01396
alternative hypothesis: true location is not equal to 1.829867

> wilcox.test(y,mu=mean(y))

        Wilcoxon signed rank test with continuity correction

data:  y
V = 1763, p-value = 0.008837
alternative hypothesis: true location is not equal to 5.137409

Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the differences is
rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
-implicit- assumption, but in more extreme cases skewness does matter.

Cheers
Joris

On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius <[hidden email]> wrote:

>
>
> You are being misled. Simply finding a statement on a statistics software
> website, even one as reputable as Graphpad (???), does not mean that it is
> necessarily true. My understanding (confirmed reviewing "Nonparametric
> statistical methods for complete and censored data" by M. M. Desu, Damaraju
> Raghavarao, is that the Wilcoxon signed-rank test does not require that the
> underlying distributions be symmetric. The above quotation is highly
> inaccurate.
>
> --
> David.
>
>>

--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

David Winsemius

On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:

> I do agree that one should not trust solely on sources like wikipedia
> and graphpad, although they contain a lot of valuable information.
>
> This said, it is not too difficult to illustrate why, in the case of
> the one-sample signed rank test,

That is a key point. I was assuming that you were using the paired  
sample version of the WSRT and I may have been misleading the OP. For  
the one-sample situation, the assumption of symmetry is needed but for  
the paired sampling version of the test, the location shift becomes  
the tested hypothesis, and no assumptions about the form of the  
hypothesis are made except that they be the same. Any consideration of  
median or mean (which will be the same in the case of symmetric  
distributions) gets lost in the paired test case.

--
David.


> the differences should be not to far
> away from symmetrical. It just needs some reflection on how the
> statistic is calculated. If you have an asymmetrical distribution, you
> have a lot of small differences with a negative sign and a lot of
> large differences with a positive sign if you test against the median
> or mean. Hence the sum of ranks for one side will be higher than for
> the other, leading eventually to a significant result.
>
> An extreme example :
>
>> set.seed(100)
>> y <- rnorm(100,1,2)^2
>> wilcox.test(y,mu=median(y))
>
>        Wilcoxon signed rank test with continuity correction
>
> data:  y
> V = 3240.5, p-value = 0.01396
> alternative hypothesis: true location is not equal to 1.829867
>
>> wilcox.test(y,mu=mean(y))
>
>        Wilcoxon signed rank test with continuity correction
>
> data:  y
> V = 1763, p-value = 0.008837
> alternative hypothesis: true location is not equal to 5.137409
>
> Which brings us to the question what location is actually tested in
> the wilcoxon test. For the measure of location to be the mean (or
> median), one has to assume that the distribution of the differences is
> rather symmetrical, which implies your data has to be distributed
> somewhat symmetrical. The test is robust against violations of this
> -implicit- assumption, but in more extreme cases skewness does matter.
>
> Cheers
> Joris
>
> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius <[hidden email]
> > wrote:
>>
>>
>> You are being misled. Simply finding a statement on a statistics  
>> software
>> website, even one as reputable as Graphpad (???), does not mean  
>> that it is
>> necessarily true. My understanding (confirmed reviewing  
>> "Nonparametric
>> statistical methods for complete and censored data" by M. M. Desu,  
>> Damaraju
>> Raghavarao, is that the Wilcoxon signed-rank test does not require  
>> that the
>> underlying distributions be symmetric. The above quotation is highly
>> inaccurate.
>>
>> --
>> David.
>>
>>>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> [hidden email]
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Joris FA Meys
On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
<[hidden email]> wrote:

>
> On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:
>
>> I do agree that one should not trust solely on sources like wikipedia
>> and graphpad, although they contain a lot of valuable information.
>>
>> This said, it is not too difficult to illustrate why, in the case of
>> the one-sample signed rank test,
>
> That is a key point. I was assuming that you were using the paired sample
> version of the WSRT and I may have been misleading the OP. For the
> one-sample situation, the assumption of symmetry is needed but for the
> paired sampling version of the test, the location shift becomes the tested
> hypothesis, and no assumptions about the form of the hypothesis are made
> except that they be the same.

I believe you mean the form of the distributions. The assumption that
the distributions of both samples are the same (or similar, it is a
robust test) implies that the differences x_i - y_i are more or less
symmetrically distributed. Key point here that we're not talking about
the distribution of the populations/samples (as done in the OP) but
about the distribution of the difference. I may not have been clear
enough on that one.

Cheers
Joris

> Any consideration of median or mean (which
> will be the same in the case of symmetric distributions) gets lost in the
> paired test case.
>
> --
> David.
>
>
>> the differences should be not to far
>> away from symmetrical. It just needs some reflection on how the
>> statistic is calculated. If you have an asymmetrical distribution, you
>> have a lot of small differences with a negative sign and a lot of
>> large differences with a positive sign if you test against the median
>> or mean. Hence the sum of ranks for one side will be higher than for
>> the other, leading eventually to a significant result.
>>
>> An extreme example :
>>
>>> set.seed(100)
>>> y <- rnorm(100,1,2)^2
>>> wilcox.test(y,mu=median(y))
>>
>>       Wilcoxon signed rank test with continuity correction
>>
>> data:  y
>> V = 3240.5, p-value = 0.01396
>> alternative hypothesis: true location is not equal to 1.829867
>>
>>> wilcox.test(y,mu=mean(y))
>>
>>       Wilcoxon signed rank test with continuity correction
>>
>> data:  y
>> V = 1763, p-value = 0.008837
>> alternative hypothesis: true location is not equal to 5.137409
>>
>> Which brings us to the question what location is actually tested in
>> the wilcoxon test. For the measure of location to be the mean (or
>> median), one has to assume that the distribution of the differences is
>> rather symmetrical, which implies your data has to be distributed
>> somewhat symmetrical. The test is robust against violations of this
>> -implicit- assumption, but in more extreme cases skewness does matter.
>>
>> Cheers
>> Joris
>>
>> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius <[hidden email]>
>> wrote:
>>>
>>>
>>> You are being misled. Simply finding a statement on a statistics software
>>> website, even one as reputable as Graphpad (???), does not mean that it
>>> is
>>> necessarily true. My understanding (confirmed reviewing "Nonparametric
>>> statistical methods for complete and censored data" by M. M. Desu,
>>> Damaraju
>>> Raghavarao, is that the Wilcoxon signed-rank test does not require that
>>> the
>>> underlying distributions be symmetric. The above quotation is highly
>>> inaccurate.
>>>
>>> --
>>> David.
>>>
>>>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> [hidden email]
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

David Winsemius

On Jun 24, 2010, at 6:42 PM, Joris Meys wrote:

> On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
> <[hidden email]> wrote:
>>
>> On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:
>>
>>> I do agree that one should not trust solely on sources like  
>>> wikipedia
>>> and graphpad, although they contain a lot of valuable information.
>>>
>>> This said, it is not too difficult to illustrate why, in the case of
>>> the one-sample signed rank test,
>>
>> That is a key point. I was assuming that you were using the paired  
>> sample
>> version of the WSRT and I may have been misleading the OP. For the
>> one-sample situation, the assumption of symmetry is needed but for  
>> the
>> paired sampling version of the test, the location shift becomes the  
>> tested
>> hypothesis, and no assumptions about the form of the hypothesis are  
>> made
>> except that they be the same.
>
> I believe you mean the form of the distributions. The assumption that
> the distributions of both samples are the same (or similar, it is a
> robust test) implies that the differences x_i - y_i are more or less
> symmetrically distributed. Key point here that we're not talking about
> the distribution of the populations/samples (as done in the OP) but
> about the distribution of the difference. I may not have been clear
> enough on that one.

What I meant about different hypotheses was that in the single sample  
case the H0 was mean (or median) = mu_pop and in the paired two sample  
the H0 was mean(distr_A_i - distr_B_1) =0. And yes, I did miss the  
OP's point. My apologies.

--
David.

>
> Cheers
> Joris
>
>> Any consideration of median or mean (which
>> will be the same in the case of symmetric distributions) gets lost  
>> in the
>> paired test case.
>>
>> --
>> David.
>>
>>
>>> the differences should be not to far
>>> away from symmetrical. It just needs some reflection on how the
>>> statistic is calculated. If you have an asymmetrical distribution,  
>>> you
>>> have a lot of small differences with a negative sign and a lot of
>>> large differences with a positive sign if you test against the  
>>> median
>>> or mean. Hence the sum of ranks for one side will be higher than for
>>> the other, leading eventually to a significant result.
>>>
>>> An extreme example :
>>>
>>>> set.seed(100)
>>>> y <- rnorm(100,1,2)^2
>>>> wilcox.test(y,mu=median(y))
>>>
>>>       Wilcoxon signed rank test with continuity correction
>>>
>>> data:  y
>>> V = 3240.5, p-value = 0.01396
>>> alternative hypothesis: true location is not equal to 1.829867
>>>
>>>> wilcox.test(y,mu=mean(y))
>>>
>>>       Wilcoxon signed rank test with continuity correction
>>>
>>> data:  y
>>> V = 1763, p-value = 0.008837
>>> alternative hypothesis: true location is not equal to 5.137409
>>>
>>> Which brings us to the question what location is actually tested in
>>> the wilcoxon test. For the measure of location to be the mean (or
>>> median), one has to assume that the distribution of the  
>>> differences is
>>> rather symmetrical, which implies your data has to be distributed
>>> somewhat symmetrical. The test is robust against violations of this
>>> -implicit- assumption, but in more extreme cases skewness does  
>>> matter.
>>>
>>> Cheers
>>> Joris
>>>
>>> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius <[hidden email]
>>> >
>>> wrote:
>>>>
>>>>
>>>> You are being misled. Simply finding a statement on a statistics  
>>>> software
>>>> website, even one as reputable as Graphpad (???), does not mean  
>>>> that it
>>>> is
>>>> necessarily true. My understanding (confirmed reviewing  
>>>> "Nonparametric
>>>> statistical methods for complete and censored data" by M. M. Desu,
>>>> Damaraju
>>>> Raghavarao, is that the Wilcoxon signed-rank test does not  
>>>> require that
>>>> the
>>>> underlying distributions be symmetric. The above quotation is  
>>>> highly
>>>> inaccurate.
>>>>
>>>> --
>>>> David.
>>>>
>>>>>
>>>
>>> --
>>> Joris Meys
>>> Statistical consultant
>>>
>>> Ghent University
>>> Faculty of Bioscience Engineering
>>> Department of Applied mathematics, biometrics and process control
>>>
>>> tel : +32 9 264 59 87
>>> [hidden email]
>>> -------------------------------
>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> [hidden email]
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Atte Tenkanen
In reply to this post by Frank Harrell
Is there anything for me?

There is a lot of data, n=2418, but there are also a lot of ties.
My sample n≈250-300

i would like to test, whether the mean of the sample differ significantly from the population mean.

The histogram of the population looks like in attached histogram, what test should I use? No choices?

This distribution comes from a musical piece and the values are 'tonal distances'.

http://users.utu.fi/attenka/Hist.png

Atte

> On 06/24/2010 12:40 PM, David Winsemius wrote:
> >
> > On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> >
> >> Thanks. What I have had to ask is that
> >>
> >> how do you test that the data is symmetric enough?
> >> If it is not, is it ok to use some data transformation?
> >>
> >> when it is said:
> >>
> >> "The Wilcoxon signed rank test does not assume that the data are
> >> sampled from a Gaussian distribution. However it does assume that the
> >> data are distributed symmetrically around the median. If the
> >> distribution is asymmetrical, the P value will not tell you much about
> >> whether the median is different than the hypothetical value."
> >
> > You are being misled. Simply finding a statement on a statistics
> > software website, even one as reputable as Graphpad (???), does not
> mean
> > that it is necessarily true. My understanding (confirmed reviewing
> > "Nonparametric statistical methods for complete and censored data"
> by M.
> > M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
> does
> > not require that the underlying distributions be symmetric. The above
> > quotation is highly inaccurate.
> >
>
> To add to what David and others have said, look at the kernel that the
>
> U-statistic associated with the WSR test uses: the indicator (0/1) of
> xi
> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the
> average of a randomly chosen pair of values is positive.  [If there
> are
> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj <
>
> 0], i neq j.
>
> Frank
>
> --
> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                       Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

David Winsemius



On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

> Is there anything for me?
>
> There is a lot of data, n=2418, but there are also a lot of ties.
> My sample n≈250-300
>

I do not understand why there should be so many ties. You have not  
described the measurement process or units. ( ... although you offer a  
glipmse without much background  later.)

> i would like to test, whether the mean of the sample differ  
> significantly from the population mean.

Why? What is the purpose of this investigation? Why should the mean of  
a sample be that important?

>
> The histogram of the population looks like in attached histogram,  
> what test should I use? No choices?
>
> This distribution comes from a musical piece and the values are  
> 'tonal distances'.
>
> http://users.utu.fi/attenka/Hist.png

That picture does not offer much insidght into the features of that  
measurement. It appears to have much more structure than I would  
expect for a sample from a smooth unimodal underlying population.

--
David.

>
> Atte
>
>> On 06/24/2010 12:40 PM, David Winsemius wrote:
>>>
>>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>>>
>>>> Thanks. What I have had to ask is that
>>>>
>>>> how do you test that the data is symmetric enough?
>>>> If it is not, is it ok to use some data transformation?
>>>>
>>>> when it is said:
>>>>
>>>> "The Wilcoxon signed rank test does not assume that the data are
>>>> sampled from a Gaussian distribution. However it does assume that  
>>>> the
>>>> data are distributed symmetrically around the median. If the
>>>> distribution is asymmetrical, the P value will not tell you much  
>>>> about
>>>> whether the median is different than the hypothetical value."
>>>
>>> You are being misled. Simply finding a statement on a statistics
>>> software website, even one as reputable as Graphpad (???), does not
>> mean
>>> that it is necessarily true. My understanding (confirmed reviewing
>>> "Nonparametric statistical methods for complete and censored data"
>> by M.
>>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
>> does
>>> not require that the underlying distributions be symmetric. The  
>>> above
>>> quotation is highly inaccurate.
>>>
>>
>> To add to what David and others have said, look at the kernel that  
>> the
>>
>> U-statistic associated with the WSR test uses: the indicator (0/1) of
>> xi
>> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the
>> average of a randomly chosen pair of values is positive.  [If there
>> are
>> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj <
>>
>> 0], i neq j.
>>
>> Frank
>>
>> --
>> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>>                      Department of Biostatistics   Vanderbilt  
>> University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Atte Tenkanen
The values come from this kind of process:
The musical composition is segmented into so-called 'pitch-class segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values.

then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values.

Atte

> On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
>
> > Is there anything for me?
> >
> > There is a lot of data, n=2418, but there are also a lot of ties.
> > My sample n≈250-300
> >
>
> I do not understand why there should be so many ties. You have not  
> described the measurement process or units. ( ... although you offer a
>  
> glipmse without much background  later.)
>
> > i would like to test, whether the mean of the sample differ  
> > significantly from the population mean.
>
> Why? What is the purpose of this investigation? Why should the mean of
>  
> a sample be that important?
>
> >
> > The histogram of the population looks like in attached histogram,  
> > what test should I use? No choices?
> >
> > This distribution comes from a musical piece and the values are  
> > 'tonal distances'.
> >
> > http://users.utu.fi/attenka/Hist.png
>
> That picture does not offer much insidght into the features of that  
> measurement. It appears to have much more structure than I would  
> expect for a sample from a smooth unimodal underlying population.
>
> --
> David.
>
> >
> > Atte
> >
> >> On 06/24/2010 12:40 PM, David Winsemius wrote:
> >>>
> >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> >>>
> >>>> Thanks. What I have had to ask is that
> >>>>
> >>>> how do you test that the data is symmetric enough?
> >>>> If it is not, is it ok to use some data transformation?
> >>>>
> >>>> when it is said:
> >>>>
> >>>> "The Wilcoxon signed rank test does not assume that the data are
> >>>> sampled from a Gaussian distribution. However it does assume that
>  
> >>>> the
> >>>> data are distributed symmetrically around the median. If the
> >>>> distribution is asymmetrical, the P value will not tell you much  
>
> >>>> about
> >>>> whether the median is different than the hypothetical value."
> >>>
> >>> You are being misled. Simply finding a statement on a statistics
> >>> software website, even one as reputable as Graphpad (???), does not
> >> mean
> >>> that it is necessarily true. My understanding (confirmed reviewing
> >>> "Nonparametric statistical methods for complete and censored data"
> >> by M.
> >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
> >> does
> >>> not require that the underlying distributions be symmetric. The  
> >>> above
> >>> quotation is highly inaccurate.
> >>>
> >>
> >> To add to what David and others have said, look at the kernel that  
>
> >> the
> >>
> >> U-statistic associated with the WSR test uses: the indicator (0/1)
> of
> >> xi
> >> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the
> >> average of a randomly chosen pair of values is positive.  [If there
> >> are
> >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj
> <
> >>
> >> 0], i neq j.
> >>
> >> Frank
> >>
> >> --
> >> Frank E Harrell Jr   Professor and Chairman        School of Medicine
> >>                      Department of Biostatistics   Vanderbilt  
> >> University
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Atte Tenkanen
In reply to this post by David Winsemius

BTW. If there is not so weak test that would be suitable for my purpose (because of the ties and the shape of the data), could I proceed this way:

It is also worth of comparing different samples taken from the data. Since the mean and sd of the data are available, could I approximate p-values using z- or t-test, just to compare several different samples?

Atte

> On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
>
> > Is there anything for me?
> >
> > There is a lot of data, n=2418, but there are also a lot of ties.
> > My sample n≈250-300
> >
>
> I do not understand why there should be so many ties. You have not  
> described the measurement process or units. ( ... although you offer a
>  
> glipmse without much background  later.)
>
> > i would like to test, whether the mean of the sample differ  
> > significantly from the population mean.
>
> Why? What is the purpose of this investigation? Why should the mean of
>  
> a sample be that important?
>
> >
> > The histogram of the population looks like in attached histogram,  
> > what test should I use? No choices?
> >
> > This distribution comes from a musical piece and the values are  
> > 'tonal distances'.
> >
> > http://users.utu.fi/attenka/Hist.png
>
> That picture does not offer much insidght into the features of that  
> measurement. It appears to have much more structure than I would  
> expect for a sample from a smooth unimodal underlying population.
>
> --
> David.
>
> >
> > Atte
> >
> >> On 06/24/2010 12:40 PM, David Winsemius wrote:
> >>>
> >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> >>>
> >>>> Thanks. What I have had to ask is that
> >>>>
> >>>> how do you test that the data is symmetric enough?
> >>>> If it is not, is it ok to use some data transformation?
> >>>>
> >>>> when it is said:
> >>>>
> >>>> "The Wilcoxon signed rank test does not assume that the data are
> >>>> sampled from a Gaussian distribution. However it does assume that
>  
> >>>> the
> >>>> data are distributed symmetrically around the median. If the
> >>>> distribution is asymmetrical, the P value will not tell you much  
>
> >>>> about
> >>>> whether the median is different than the hypothetical value."
> >>>
> >>> You are being misled. Simply finding a statement on a statistics
> >>> software website, even one as reputable as Graphpad (???), does not
> >> mean
> >>> that it is necessarily true. My understanding (confirmed reviewing
> >>> "Nonparametric statistical methods for complete and censored data"
> >> by M.
> >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank test
> >> does
> >>> not require that the underlying distributions be symmetric. The  
> >>> above
> >>> quotation is highly inaccurate.
> >>>
> >>
> >> To add to what David and others have said, look at the kernel that  
>
> >> the
> >>
> >> U-statistic associated with the WSR test uses: the indicator (0/1)
> of
> >> xi
> >> + xj > 0.  So WSR tests H0:p=0.5 where p = the probability that the
> >> average of a randomly chosen pair of values is positive.  [If there
> >> are
> >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj
> <
> >>
> >> 0], i neq j.
> >>
> >> Frank
> >>
> >> --
> >> Frank E Harrell Jr   Professor and Chairman        School of Medicine
> >>                      Department of Biostatistics   Vanderbilt  
> >> University
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Joris FA Meys
In reply to this post by Atte Tenkanen
As a remark on your histogram : use less breaks! This histogram tells
you nothing. An interesting function is ?density , eg :

x<-rnorm(250)
hist(x,freq=F)
lines(density(x),col="red")

See also this ppt, a very nice and short introduction to graphics in R :
http://csg.sph.umich.edu/docs/R/graphics-1.pdf

2010/6/25 Atte Tenkanen <[hidden email]>:
> Is there anything for me?
>
> There is a lot of data, n=2418, but there are also a lot of ties.
> My sample n≈250-300

You should think about the central limit theorem. Actually, you can
just use a t-test to compare means, as with those sample sizes the
mean is almost certainly normally distributed.
>
> i would like to test, whether the mean of the sample differ significantly from the population mean.
>
According to probability theory, this will be in 5% of the cases if
you repeat your sampling infinitly. But as David asked: why on earth
do you want to test that?

cheers
Joris

--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Frank Harrell
The central limit theorem doesn't help.  It just addresses type I error,
not power.

Frank

On 06/25/2010 04:29 AM, Joris Meys wrote:

> As a remark on your histogram : use less breaks! This histogram tells
> you nothing. An interesting function is ?density , eg :
>
> x<-rnorm(250)
> hist(x,freq=F)
> lines(density(x),col="red")
>
> See also this ppt, a very nice and short introduction to graphics in R :
> http://csg.sph.umich.edu/docs/R/graphics-1.pdf
>
> 2010/6/25 Atte Tenkanen<[hidden email]>:
>> Is there anything for me?
>>
>> There is a lot of data, n=2418, but there are also a lot of ties.
>> My sample n≈250-300
>
> You should think about the central limit theorem. Actually, you can
> just use a t-test to compare means, as with those sample sizes the
> mean is almost certainly normally distributed.
>>
>> i would like to test, whether the mean of the sample differ significantly from the population mean.
>>
> According to probability theory, this will be in 5% of the cases if
> you repeat your sampling infinitly. But as David asked: why on earth
> do you want to test that?
>
> cheers
> Joris
>


--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Wilcoxon signed rank test and its requirements

Joris FA Meys
2010/6/25 Frank E Harrell Jr <[hidden email]>:
> The central limit theorem doesn't help.  It just addresses type I error,
> not power.
>
> Frank

I don't think I stated otherwise. I am aware of the fact that the
wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
to the t-test in case of skewed distributions. Apologies if I caused
more confusion.

The "problem" with the wilcoxon is twofold as far as I understood this
data correctly :
- there are quite some ties
- the wilcoxon assumes under the null that the distributions are the
same, not only the location. The influence of unequal variances and/or
shapes of the distribution is enhanced in the case of unequal sample
sizes.

The central limit theory makes that :
- the t-test will do correct inference in the presence of ties
- unequal variances can be taken into account using the modified
t-test, both in the case of equal and unequal sample sizes

For these reasons, I would personally use the t-test for comparing two
samples from the described population. Your mileage may vary.

Cheers
Joris

>
> On 06/25/2010 04:29 AM, Joris Meys wrote:
>> As a remark on your histogram : use less breaks! This histogram tells
>> you nothing. An interesting function is ?density , eg :
>>
>> x<-rnorm(250)
>> hist(x,freq=F)
>> lines(density(x),col="red")
>>
>> See also this ppt, a very nice and short introduction to graphics in R :
>> http://csg.sph.umich.edu/docs/R/graphics-1.pdf
>>
>> 2010/6/25 Atte Tenkanen<[hidden email]>:
>>> Is there anything for me?
>>>
>>> There is a lot of data, n=2418, but there are also a lot of ties.
>>> My sample n≈250-300
>>
>> You should think about the central limit theorem. Actually, you can
>> just use a t-test to compare means, as with those sample sizes the
>> mean is almost certainly normally distributed.
>>>
>>> i would like to test, whether the mean of the sample differ significantly from the population mean.
>>>
>> According to probability theory, this will be in 5% of the cases if
>> you repeat your sampling infinitly. But as David asked: why on earth
>> do you want to test that?
>>
>> cheers
>> Joris
>>
>
>
> --
> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                     Department of Biostatistics   Vanderbilt University
>



--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
[hidden email]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
12