
12

Hi all,
I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in "Wilcoxon signed rank test with continuity correction":
> wilcox.test(Sample,mu=mean(All), alt="two.sided")
Wilcoxon signed rank test with continuity correction
data: AlphaNoteOnsetDists
V = 63855, pvalue = 0.0002093
alternative hypothesis: true location is not equal to 0.4115136
> wilcox.test(Sample,mu=mean(All), alt = "greater")
Wilcoxon signed rank test with continuity correction
data: AlphaNoteOnsetDists
V = 63855, pvalue = 0.0001047
alternative hypothesis: true location is greater than 0.4115136
What assumptions are needed for the population?
What can we say according these results?
pvalue for the "less" is 0.999.
Thanks in advance,
Atte
Atte Tenkanen
University of Turku, Finland
Department of Musicology
+35823335278
http://users.utu.fi/attenka/______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]> wrote:
> Hi all,
>
> I have a distribution, and take a sample of it. Then I compare that sample with the mean of the population like here in "Wilcoxon signed rank test with continuity correction":
>
>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>
> Wilcoxon signed rank test with continuity correction
>
> data: AlphaNoteOnsetDists
> V = 63855, pvalue = 0.0002093
> alternative hypothesis: true location is not equal to 0.4115136
>
>> wilcox.test(Sample,mu=mean(All), alt = "greater")
>
> Wilcoxon signed rank test with continuity correction
>
> data: AlphaNoteOnsetDists
> V = 63855, pvalue = 0.0001047
> alternative hypothesis: true location is greater than 0.4115136
>
> What assumptions are needed for the population?
wikipedia says:
"The Wilcoxon signedrank test is a _nonparametric_ statistical
hypothesis test for... "
it also talks about the assumptions.
> What can we say according these results?
> pvalue for the "less" is 0.999.
That the pvalue for less and greater seem to sum up to one, and that
the pvalue of greater is half of that for twosided. You shouldn't
ask what we can say. You should ask yourself "What was the question
and is this test giving me an answer on that question?"
Cheers
Joris

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks. What I have had to ask is that
how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?
when it is said:
"The Wilcoxon signed rank test does not assume that the data are sampled from a Gaussian distribution. However it does assume that the data are distributed symmetrically around the median. If the distribution is asymmetrical, the P value will not tell you much about whether the median is different than the hypothetical value."
> On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]> wrote:
> > Hi all,
> >
> > I have a distribution, and take a sample of it. Then I compare that
> sample with the mean of the population like here in "Wilcoxon signed
> rank test with continuity correction":
> >
> >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
> >
> > Wilcoxon signed rank test with continuity correction
> >
> > data: AlphaNoteOnsetDists
> > V = 63855, pvalue = 0.0002093
> > alternative hypothesis: true location is not equal to 0.4115136
> >
> >> wilcox.test(Sample,mu=mean(All), alt = "greater")
> >
> > Wilcoxon signed rank test with continuity correction
> >
> > data: AlphaNoteOnsetDists
> > V = 63855, pvalue = 0.0001047
> > alternative hypothesis: true location is greater than 0.4115136
> >
> > What assumptions are needed for the population?
>
> wikipedia says:
> "The Wilcoxon signedrank test is a _nonparametric_ statistical
> hypothesis test for... "
> it also talks about the assumptions.
>
> > What can we say according these results?
> > pvalue for the "less" is 0.999.
>
> That the pvalue for less and greater seem to sum up to one, and that
> the pvalue of greater is half of that for twosided. You shouldn't
> ask what we can say. You should ask yourself "What was the question
> and is this test giving me an answer on that question?"
>
> Cheers
> Joris
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


PS.
Mayby I can somehow try to transform data and check it, for example, using the skewnessfunction of timeDatepackage?
> Thanks. What I have had to ask is that
>
> how do you test that the data is symmetric enough?
> If it is not, is it ok to use some data transformation?
>
> when it is said:
>
> "The Wilcoxon signed rank test does not assume that the data are
> sampled from a Gaussian distribution. However it does assume that the
> data are distributed symmetrically around the median. If the
> distribution is asymmetrical, the P value will not tell you much about
> whether the median is different than the hypothetical value."
>
> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]> wrote:
> > > Hi all,
> > >
> > > I have a distribution, and take a sample of it. Then I compare
> that
> > sample with the mean of the population like here in "Wilcoxon signed
>
> > rank test with continuity correction":
> > >
> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
> > >
> > > Wilcoxon signed rank test with continuity correction
> > >
> > > data: AlphaNoteOnsetDists
> > > V = 63855, pvalue = 0.0002093
> > > alternative hypothesis: true location is not equal to 0.4115136
> > >
> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
> > >
> > > Wilcoxon signed rank test with continuity correction
> > >
> > > data: AlphaNoteOnsetDists
> > > V = 63855, pvalue = 0.0001047
> > > alternative hypothesis: true location is greater than 0.4115136
> > >
> > > What assumptions are needed for the population?
> >
> > wikipedia says:
> > "The Wilcoxon signedrank test is a _nonparametric_ statistical
> > hypothesis test for... "
> > it also talks about the assumptions.
> >
> > > What can we say according these results?
> > > pvalue for the "less" is 0.999.
> >
> > That the pvalue for less and greater seem to sum up to one, and that
> > the pvalue of greater is half of that for twosided. You shouldn't
> > ask what we can say. You should ask yourself "What was the question
> > and is this test giving me an answer on that question?"
> >
> > Cheers
> > Joris
> >
> > 
> > Joris Meys
> > Statistical consultant
> >
> > Ghent University
> > Faculty of Bioscience Engineering
> > Department of Applied mathematics, biometrics and process control
> >
> > tel : +32 9 264 59 87
> > [hidden email]
> > 
> > Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


One way of looking at it is doing a sign test after substraction of
the mean. For symmetrical data sets, E[Xmean(X)] = 0, so you expect
to have about as many values above as below zero. There is a sign test
somewhere in one of the packages, but it's easily done using the
binom.test as well :
> set.seed(12345)
> x1 < rnorm(100)
> x2 < rpois(100,2)
> binom.test((sum(x1mean(x1)>0)),length(x1))
Exact binomial test
data: (sum(x1  mean(x1) > 0)) and length(x1)
number of successes = 56, number of trials = 100, pvalue = 0.2713
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.4571875 0.6591640
sample estimates:
probability of success
0.56
> binom.test((sum(x2mean(x2)>0)),length(x2))
Exact binomial test
data: (sum(x2  mean(x2) > 0)) and length(x2)
number of successes = 37, number of trials = 100, pvalue = 0.01203
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.2755666 0.4723516
sample estimates:
probability of success
0.37
Cheers
Joris
On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen < [hidden email]> wrote:
> PS.
>
> Mayby I can somehow try to transform data and check it, for example, using the skewnessfunction of timeDatepackage?
>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much about
>> whether the median is different than the hypothetical value."
>>
>> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]> wrote:
>> > > Hi all,
>> > >
>> > > I have a distribution, and take a sample of it. Then I compare
>> that
>> > sample with the mean of the population like here in "Wilcoxon signed
>>
>> > rank test with continuity correction":
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>> > >
>> > > Wilcoxon signed rank test with continuity correction
>> > >
>> > > data: AlphaNoteOnsetDists
>> > > V = 63855, pvalue = 0.0002093
>> > > alternative hypothesis: true location is not equal to 0.4115136
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
>> > >
>> > > Wilcoxon signed rank test with continuity correction
>> > >
>> > > data: AlphaNoteOnsetDists
>> > > V = 63855, pvalue = 0.0001047
>> > > alternative hypothesis: true location is greater than 0.4115136
>> > >
>> > > What assumptions are needed for the population?
>> >
>> > wikipedia says:
>> > "The Wilcoxon signedrank test is a _nonparametric_ statistical
>> > hypothesis test for... "
>> > it also talks about the assumptions.
>> >
>> > > What can we say according these results?
>> > > pvalue for the "less" is 0.999.
>> >
>> > That the pvalue for less and greater seem to sum up to one, and that
>> > the pvalue of greater is half of that for twosided. You shouldn't
>> > ask what we can say. You should ask yourself "What was the question
>> > and is this test giving me an answer on that question?"
>> >
>> > Cheers
>> > Joris
>> >
>> > 
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Applied mathematics, biometrics and process control
>> >
>> > tel : +32 9 264 59 87
>> > [hidden email]
>> > 
>> > Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


There is a potentially useful remark from Peter Dalfgaard at
http://www.mailarchive.com/rhelp@.../msg86359.html :
Summarising:
"[The Wilcoxon paired rank sign test assumes symmetry]
...of differences, and under the null hypothesis. This is usually
rather uncontroversial. "
My rider to this: It's uncontroversial because differences between
random samples from the same asymmetric distribution would form a
symmetric distribution of differences, and the null for the wilcoxon is
essentially that the distributions are the same. Symmetry of differences
at the null follows.
BUT the corollary is that location might not be the only thing that can
cause a wilcoxon test to show a significant difference.
set.seed(1023)
x<rlnorm(50)
z<rlnorm(50, sdlog=3)
z<zmean(z)+mean(x)
mean(x)
mean(z)
#Same mean..
wilcox.test(x,z)
#Strongly significant test result.
#Not a perfect example, as the test relates to true means, not data set
means.
#But very different skew and scale will make for a very significant
test result as well as very different means
On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen < [hidden email]> wrote:
> PS.
>
> Mayby I can somehow try to transform data and check it, for example,
using the skewnessfunction of timeDatepackage?
>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that
the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much
about
>> whether the median is different than the hypothetical value."
>>
>> > On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]>
wrote:
>> > > Hi all,
>> > >
>> > > I have a distribution, and take a sample of it. Then I compare
>> that
>> > sample with the mean of the population like here in "Wilcoxon
signed
>>
>> > rank test with continuity correction":
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>> > >
>> > > Wilcoxon signed rank test with continuity correction
>> > >
>> > > data: AlphaNoteOnsetDists
>> > > V = 63855, pvalue = 0.0002093
>> > > alternative hypothesis: true location is not equal to 0.4115136
>> > >
>> > >> wilcox.test(Sample,mu=mean(All), alt = "greater")
>> > >
>> > > Wilcoxon signed rank test with continuity correction
>> > >
>> > > data: AlphaNoteOnsetDists
>> > > V = 63855, pvalue = 0.0001047
>> > > alternative hypothesis: true location is greater than 0.4115136
>> > >
>> > > What assumptions are needed for the population?
>> >
>> > wikipedia says:
>> > "The Wilcoxon signedrank test is a _nonparametric_ statistical
>> > hypothesis test for... "
>> > it also talks about the assumptions.
>> >
>> > > What can we say according these results?
>> > > pvalue for the "less" is 0.999.
>> >
>> > That the pvalue for less and greater seem to sum up to one, and
that
>> > the pvalue of greater is half of that for twosided. You
shouldn't
>> > ask what we can say. You should ask yourself "What was the
question
>> > and is this test giving me an answer on that question?"
>> >
>> > Cheers
>> > Joris
>> >
>> > 
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Applied mathematics, biometrics and process control
>> >
>> > tel : +32 9 264 59 87
>> > [hidden email]
>> > 
>> > Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php
>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelp
PLEASE do read the posting guide
http://www.Rproject.org/postingguide.html
and provide commented, minimal, selfcontained, reproducible code.
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> Thanks. What I have had to ask is that
>
> how do you test that the data is symmetric enough?
> If it is not, is it ok to use some data transformation?
>
> when it is said:
>
> "The Wilcoxon signed rank test does not assume that the data are
> sampled from a Gaussian distribution. However it does assume that
> the data are distributed symmetrically around the median. If the
> distribution is asymmetrical, the P value will not tell you much
> about whether the median is different than the hypothetical value."
You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does not
mean that it is necessarily true. My understanding (confirmed
reviewing "Nonparametric statistical methods for complete and censored
data" by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed
rank test does not require that the underlying distributions be
symmetric. The above quotation is highly inaccurate.

David.
>
>> On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]>
>> wrote:
>>> Hi all,
>>>
>>> I have a distribution, and take a sample of it. Then I compare that
>> sample with the mean of the population like here in "Wilcoxon signed
>> rank test with continuity correction":
>>>
>>>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
>>>
>>> Wilcoxon signed rank test with continuity correction
>>>
>>> data: AlphaNoteOnsetDists
>>> V = 63855, pvalue = 0.0002093
>>> alternative hypothesis: true location is not equal to 0.4115136
>>>
>>>> wilcox.test(Sample,mu=mean(All), alt = "greater")
>>>
>>> Wilcoxon signed rank test with continuity correction
>>>
>>> data: AlphaNoteOnsetDists
>>> V = 63855, pvalue = 0.0001047
>>> alternative hypothesis: true location is greater than 0.4115136
>>>
>>> What assumptions are needed for the population?
>>
>> wikipedia says:
>> "The Wilcoxon signedrank test is a _nonparametric_ statistical
>> hypothesis test for... "
>> it also talks about the assumptions.
>>
>>> What can we say according these results?
>>> pvalue for the "less" is 0.999.
>>
>> That the pvalue for less and greater seem to sum up to one, and that
>> the pvalue of greater is half of that for twosided. You shouldn't
>> ask what we can say. You should ask yourself "What was the question
>> and is this test giving me an answer on that question?"
>>
>> Cheers
>> Joris
>>
>> 
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> [hidden email]
>> 
>> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 06/24/2010 12:40 PM, David Winsemius wrote:
>
> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>
>> Thanks. What I have had to ask is that
>>
>> how do you test that the data is symmetric enough?
>> If it is not, is it ok to use some data transformation?
>>
>> when it is said:
>>
>> "The Wilcoxon signed rank test does not assume that the data are
>> sampled from a Gaussian distribution. However it does assume that the
>> data are distributed symmetrically around the median. If the
>> distribution is asymmetrical, the P value will not tell you much about
>> whether the median is different than the hypothetical value."
>
> You are being misled. Simply finding a statement on a statistics
> software website, even one as reputable as Graphpad (???), does not mean
> that it is necessarily true. My understanding (confirmed reviewing
> "Nonparametric statistical methods for complete and censored data" by M.
> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signedrank test does
> not require that the underlying distributions be symmetric. The above
> quotation is highly inaccurate.
>
To add to what David and others have said, look at the kernel that the
Ustatistic associated with the WSR test uses: the indicator (0/1) of xi
+ xj > 0. So WSR tests H0:p=0.5 where p = the probability that the
average of a randomly chosen pair of values is positive. [If there are
ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj <
0], i neq j.
Frank

Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>
> > Thanks. What I have had to ask is that
> >
> > how do you test that the data is symmetric enough?
> > If it is not, is it ok to use some data transformation?
> >
> > when it is said:
> >
> > "The Wilcoxon signed rank test does not assume that the data are
> > sampled from a Gaussian distribution. However it does assume that
> > the data are distributed symmetrically around the median. If the
> > distribution is asymmetrical, the P value will not tell you much
> > about whether the median is different than the hypothetical value."
>
> You are being misled. Simply finding a statement on a statistics
> software website, even one as reputable as Graphpad (???), does not
> mean that it is necessarily true. My understanding (confirmed
> reviewing "Nonparametric statistical methods for complete and censored
>
> data" by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed
>
> rank test does not require that the underlying distributions be
> symmetric. The above quotation is highly inaccurate.
>
> 
> David.
Thanks. Unfortunately, I can't follow the reference at all, but I read this in that way that I can be carefree as far as the underlying distribution is concerned?
Is there any other authoritative reference where that is just stated in a way "test does not require that the underlying distributions be symmetric or normal".
Atte
> >
> >> On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen < [hidden email]>
> >> wrote:
> >>> Hi all,
> >>>
> >>> I have a distribution, and take a sample of it. Then I compare that
> >> sample with the mean of the population like here in "Wilcoxon signed
> >> rank test with continuity correction":
> >>>
> >>>> wilcox.test(Sample,mu=mean(All), alt="two.sided")
> >>>
> >>> Wilcoxon signed rank test with continuity correction
> >>>
> >>> data: AlphaNoteOnsetDists
> >>> V = 63855, pvalue = 0.0002093
> >>> alternative hypothesis: true location is not equal to 0.4115136
> >>>
> >>>> wilcox.test(Sample,mu=mean(All), alt = "greater")
> >>>
> >>> Wilcoxon signed rank test with continuity correction
> >>>
> >>> data: AlphaNoteOnsetDists
> >>> V = 63855, pvalue = 0.0001047
> >>> alternative hypothesis: true location is greater than 0.4115136
> >>>
> >>> What assumptions are needed for the population?
> >>
> >> wikipedia says:
> >> "The Wilcoxon signedrank test is a _nonparametric_ statistical
> >> hypothesis test for... "
> >> it also talks about the assumptions.
> >>
> >>> What can we say according these results?
> >>> pvalue for the "less" is 0.999.
> >>
> >> That the pvalue for less and greater seem to sum up to one, and that
> >> the pvalue of greater is half of that for twosided. You shouldn't
> >> ask what we can say. You should ask yourself "What was the question
> >> and is this test giving me an answer on that question?"
> >>
> >> Cheers
> >> Joris
> >>
> >> 
> >> Joris Meys
> >> Statistical consultant
> >>
> >> Ghent University
> >> Faculty of Bioscience Engineering
> >> Department of Applied mathematics, biometrics and process control
> >>
> >> tel : +32 9 264 59 87
> >> [hidden email]
> >> 
> >> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I do agree that one should not trust solely on sources like wikipedia
and graphpad, although they contain a lot of valuable information.
This said, it is not too difficult to illustrate why, in the case of
the onesample signed rank test, the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution, you
have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the median
or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.
An extreme example :
> set.seed(100)
> y < rnorm(100,1,2)^2
> wilcox.test(y,mu=median(y))
Wilcoxon signed rank test with continuity correction
data: y
V = 3240.5, pvalue = 0.01396
alternative hypothesis: true location is not equal to 1.829867
> wilcox.test(y,mu=mean(y))
Wilcoxon signed rank test with continuity correction
data: y
V = 1763, pvalue = 0.008837
alternative hypothesis: true location is not equal to 5.137409
Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the differences is
rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
implicit assumption, but in more extreme cases skewness does matter.
Cheers
Joris
On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius < [hidden email]> wrote:
>
>
> You are being misled. Simply finding a statement on a statistics software
> website, even one as reputable as Graphpad (???), does not mean that it is
> necessarily true. My understanding (confirmed reviewing "Nonparametric
> statistical methods for complete and censored data" by M. M. Desu, Damaraju
> Raghavarao, is that the Wilcoxon signedrank test does not require that the
> underlying distributions be symmetric. The above quotation is highly
> inaccurate.
>
> 
> David.
>
>>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:
> I do agree that one should not trust solely on sources like wikipedia
> and graphpad, although they contain a lot of valuable information.
>
> This said, it is not too difficult to illustrate why, in the case of
> the onesample signed rank test,
That is a key point. I was assuming that you were using the paired
sample version of the WSRT and I may have been misleading the OP. For
the onesample situation, the assumption of symmetry is needed but for
the paired sampling version of the test, the location shift becomes
the tested hypothesis, and no assumptions about the form of the
hypothesis are made except that they be the same. Any consideration of
median or mean (which will be the same in the case of symmetric
distributions) gets lost in the paired test case.

David.
> the differences should be not to far
> away from symmetrical. It just needs some reflection on how the
> statistic is calculated. If you have an asymmetrical distribution, you
> have a lot of small differences with a negative sign and a lot of
> large differences with a positive sign if you test against the median
> or mean. Hence the sum of ranks for one side will be higher than for
> the other, leading eventually to a significant result.
>
> An extreme example :
>
>> set.seed(100)
>> y < rnorm(100,1,2)^2
>> wilcox.test(y,mu=median(y))
>
> Wilcoxon signed rank test with continuity correction
>
> data: y
> V = 3240.5, pvalue = 0.01396
> alternative hypothesis: true location is not equal to 1.829867
>
>> wilcox.test(y,mu=mean(y))
>
> Wilcoxon signed rank test with continuity correction
>
> data: y
> V = 1763, pvalue = 0.008837
> alternative hypothesis: true location is not equal to 5.137409
>
> Which brings us to the question what location is actually tested in
> the wilcoxon test. For the measure of location to be the mean (or
> median), one has to assume that the distribution of the differences is
> rather symmetrical, which implies your data has to be distributed
> somewhat symmetrical. The test is robust against violations of this
> implicit assumption, but in more extreme cases skewness does matter.
>
> Cheers
> Joris
>
> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius < [hidden email]
> > wrote:
>>
>>
>> You are being misled. Simply finding a statement on a statistics
>> software
>> website, even one as reputable as Graphpad (???), does not mean
>> that it is
>> necessarily true. My understanding (confirmed reviewing
>> "Nonparametric
>> statistical methods for complete and censored data" by M. M. Desu,
>> Damaraju
>> Raghavarao, is that the Wilcoxon signedrank test does not require
>> that the
>> underlying distributions be symmetric. The above quotation is highly
>> inaccurate.
>>
>> 
>> David.
>>
>>>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
< [hidden email]> wrote:
>
> On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:
>
>> I do agree that one should not trust solely on sources like wikipedia
>> and graphpad, although they contain a lot of valuable information.
>>
>> This said, it is not too difficult to illustrate why, in the case of
>> the onesample signed rank test,
>
> That is a key point. I was assuming that you were using the paired sample
> version of the WSRT and I may have been misleading the OP. For the
> onesample situation, the assumption of symmetry is needed but for the
> paired sampling version of the test, the location shift becomes the tested
> hypothesis, and no assumptions about the form of the hypothesis are made
> except that they be the same.
I believe you mean the form of the distributions. The assumption that
the distributions of both samples are the same (or similar, it is a
robust test) implies that the differences x_i  y_i are more or less
symmetrically distributed. Key point here that we're not talking about
the distribution of the populations/samples (as done in the OP) but
about the distribution of the difference. I may not have been clear
enough on that one.
Cheers
Joris
> Any consideration of median or mean (which
> will be the same in the case of symmetric distributions) gets lost in the
> paired test case.
>
> 
> David.
>
>
>> the differences should be not to far
>> away from symmetrical. It just needs some reflection on how the
>> statistic is calculated. If you have an asymmetrical distribution, you
>> have a lot of small differences with a negative sign and a lot of
>> large differences with a positive sign if you test against the median
>> or mean. Hence the sum of ranks for one side will be higher than for
>> the other, leading eventually to a significant result.
>>
>> An extreme example :
>>
>>> set.seed(100)
>>> y < rnorm(100,1,2)^2
>>> wilcox.test(y,mu=median(y))
>>
>> Wilcoxon signed rank test with continuity correction
>>
>> data: y
>> V = 3240.5, pvalue = 0.01396
>> alternative hypothesis: true location is not equal to 1.829867
>>
>>> wilcox.test(y,mu=mean(y))
>>
>> Wilcoxon signed rank test with continuity correction
>>
>> data: y
>> V = 1763, pvalue = 0.008837
>> alternative hypothesis: true location is not equal to 5.137409
>>
>> Which brings us to the question what location is actually tested in
>> the wilcoxon test. For the measure of location to be the mean (or
>> median), one has to assume that the distribution of the differences is
>> rather symmetrical, which implies your data has to be distributed
>> somewhat symmetrical. The test is robust against violations of this
>> implicit assumption, but in more extreme cases skewness does matter.
>>
>> Cheers
>> Joris
>>
>> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius < [hidden email]>
>> wrote:
>>>
>>>
>>> You are being misled. Simply finding a statement on a statistics software
>>> website, even one as reputable as Graphpad (???), does not mean that it
>>> is
>>> necessarily true. My understanding (confirmed reviewing "Nonparametric
>>> statistical methods for complete and censored data" by M. M. Desu,
>>> Damaraju
>>> Raghavarao, is that the Wilcoxon signedrank test does not require that
>>> the
>>> underlying distributions be symmetric. The above quotation is highly
>>> inaccurate.
>>>
>>> 
>>> David.
>>>
>>>>
>>
>> 
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> [hidden email]
>> 
>> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Jun 24, 2010, at 6:42 PM, Joris Meys wrote:
> On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
> < [hidden email]> wrote:
>>
>> On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:
>>
>>> I do agree that one should not trust solely on sources like
>>> wikipedia
>>> and graphpad, although they contain a lot of valuable information.
>>>
>>> This said, it is not too difficult to illustrate why, in the case of
>>> the onesample signed rank test,
>>
>> That is a key point. I was assuming that you were using the paired
>> sample
>> version of the WSRT and I may have been misleading the OP. For the
>> onesample situation, the assumption of symmetry is needed but for
>> the
>> paired sampling version of the test, the location shift becomes the
>> tested
>> hypothesis, and no assumptions about the form of the hypothesis are
>> made
>> except that they be the same.
>
> I believe you mean the form of the distributions. The assumption that
> the distributions of both samples are the same (or similar, it is a
> robust test) implies that the differences x_i  y_i are more or less
> symmetrically distributed. Key point here that we're not talking about
> the distribution of the populations/samples (as done in the OP) but
> about the distribution of the difference. I may not have been clear
> enough on that one.
What I meant about different hypotheses was that in the single sample
case the H0 was mean (or median) = mu_pop and in the paired two sample
the H0 was mean(distr_A_i  distr_B_1) =0. And yes, I did miss the
OP's point. My apologies.

David.
>
> Cheers
> Joris
>
>> Any consideration of median or mean (which
>> will be the same in the case of symmetric distributions) gets lost
>> in the
>> paired test case.
>>
>> 
>> David.
>>
>>
>>> the differences should be not to far
>>> away from symmetrical. It just needs some reflection on how the
>>> statistic is calculated. If you have an asymmetrical distribution,
>>> you
>>> have a lot of small differences with a negative sign and a lot of
>>> large differences with a positive sign if you test against the
>>> median
>>> or mean. Hence the sum of ranks for one side will be higher than for
>>> the other, leading eventually to a significant result.
>>>
>>> An extreme example :
>>>
>>>> set.seed(100)
>>>> y < rnorm(100,1,2)^2
>>>> wilcox.test(y,mu=median(y))
>>>
>>> Wilcoxon signed rank test with continuity correction
>>>
>>> data: y
>>> V = 3240.5, pvalue = 0.01396
>>> alternative hypothesis: true location is not equal to 1.829867
>>>
>>>> wilcox.test(y,mu=mean(y))
>>>
>>> Wilcoxon signed rank test with continuity correction
>>>
>>> data: y
>>> V = 1763, pvalue = 0.008837
>>> alternative hypothesis: true location is not equal to 5.137409
>>>
>>> Which brings us to the question what location is actually tested in
>>> the wilcoxon test. For the measure of location to be the mean (or
>>> median), one has to assume that the distribution of the
>>> differences is
>>> rather symmetrical, which implies your data has to be distributed
>>> somewhat symmetrical. The test is robust against violations of this
>>> implicit assumption, but in more extreme cases skewness does
>>> matter.
>>>
>>> Cheers
>>> Joris
>>>
>>> On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius < [hidden email]
>>> >
>>> wrote:
>>>>
>>>>
>>>> You are being misled. Simply finding a statement on a statistics
>>>> software
>>>> website, even one as reputable as Graphpad (???), does not mean
>>>> that it
>>>> is
>>>> necessarily true. My understanding (confirmed reviewing
>>>> "Nonparametric
>>>> statistical methods for complete and censored data" by M. M. Desu,
>>>> Damaraju
>>>> Raghavarao, is that the Wilcoxon signedrank test does not
>>>> require that
>>>> the
>>>> underlying distributions be symmetric. The above quotation is
>>>> highly
>>>> inaccurate.
>>>>
>>>> 
>>>> David.
>>>>
>>>>>
>>>
>>> 
>>> Joris Meys
>>> Statistical consultant
>>>
>>> Ghent University
>>> Faculty of Bioscience Engineering
>>> Department of Applied mathematics, biometrics and process control
>>>
>>> tel : +32 9 264 59 87
>>> [hidden email]
>>> 
>>> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>>
>>
>
>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Is there anything for me?
There is a lot of data, n=2418, but there are also a lot of ties.
My sample n≈250300
i would like to test, whether the mean of the sample differ significantly from the population mean.
The histogram of the population looks like in attached histogram, what test should I use? No choices?
This distribution comes from a musical piece and the values are 'tonal distances'.
http://users.utu.fi/attenka/Hist.pngAtte
> On 06/24/2010 12:40 PM, David Winsemius wrote:
> >
> > On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> >
> >> Thanks. What I have had to ask is that
> >>
> >> how do you test that the data is symmetric enough?
> >> If it is not, is it ok to use some data transformation?
> >>
> >> when it is said:
> >>
> >> "The Wilcoxon signed rank test does not assume that the data are
> >> sampled from a Gaussian distribution. However it does assume that the
> >> data are distributed symmetrically around the median. If the
> >> distribution is asymmetrical, the P value will not tell you much about
> >> whether the median is different than the hypothetical value."
> >
> > You are being misled. Simply finding a statement on a statistics
> > software website, even one as reputable as Graphpad (???), does not
> mean
> > that it is necessarily true. My understanding (confirmed reviewing
> > "Nonparametric statistical methods for complete and censored data"
> by M.
> > M. Desu, Damaraju Raghavarao, is that the Wilcoxon signedrank test
> does
> > not require that the underlying distributions be symmetric. The above
> > quotation is highly inaccurate.
> >
>
> To add to what David and others have said, look at the kernel that the
>
> Ustatistic associated with the WSR test uses: the indicator (0/1) of
> xi
> + xj > 0. So WSR tests H0:p=0.5 where p = the probability that the
> average of a randomly chosen pair of values is positive. [If there
> are
> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj <
>
> 0], i neq j.
>
> Frank
>
> 
> Frank E Harrell Jr Professor and Chairman School of Medicine
> Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
> Is there anything for me?
>
> There is a lot of data, n=2418, but there are also a lot of ties.
> My sample n≈250300
>
I do not understand why there should be so many ties. You have not
described the measurement process or units. ( ... although you offer a
glipmse without much background later.)
> i would like to test, whether the mean of the sample differ
> significantly from the population mean.
Why? What is the purpose of this investigation? Why should the mean of
a sample be that important?
>
> The histogram of the population looks like in attached histogram,
> what test should I use? No choices?
>
> This distribution comes from a musical piece and the values are
> 'tonal distances'.
>
> http://users.utu.fi/attenka/Hist.pngThat picture does not offer much insidght into the features of that
measurement. It appears to have much more structure than I would
expect for a sample from a smooth unimodal underlying population.

David.
>
> Atte
>
>> On 06/24/2010 12:40 PM, David Winsemius wrote:
>>>
>>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>>>
>>>> Thanks. What I have had to ask is that
>>>>
>>>> how do you test that the data is symmetric enough?
>>>> If it is not, is it ok to use some data transformation?
>>>>
>>>> when it is said:
>>>>
>>>> "The Wilcoxon signed rank test does not assume that the data are
>>>> sampled from a Gaussian distribution. However it does assume that
>>>> the
>>>> data are distributed symmetrically around the median. If the
>>>> distribution is asymmetrical, the P value will not tell you much
>>>> about
>>>> whether the median is different than the hypothetical value."
>>>
>>> You are being misled. Simply finding a statement on a statistics
>>> software website, even one as reputable as Graphpad (???), does not
>> mean
>>> that it is necessarily true. My understanding (confirmed reviewing
>>> "Nonparametric statistical methods for complete and censored data"
>> by M.
>>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signedrank test
>> does
>>> not require that the underlying distributions be symmetric. The
>>> above
>>> quotation is highly inaccurate.
>>>
>>
>> To add to what David and others have said, look at the kernel that
>> the
>>
>> Ustatistic associated with the WSR test uses: the indicator (0/1) of
>> xi
>> + xj > 0. So WSR tests H0:p=0.5 where p = the probability that the
>> average of a randomly chosen pair of values is positive. [If there
>> are
>> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj <
>>
>> 0], i neq j.
>>
>> Frank
>>
>> 
>> Frank E Harrell Jr Professor and Chairman School of Medicine
>> Department of Biostatistics Vanderbilt
>> University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The values come from this kind of process:
The musical composition is segmented into socalled 'pitchclass segments' and these segments are compared with one reference set with a distance function. Only some distance values are possible. These distance values can be averaged over music bars which produces smoother distribution and the 'comparison curve' that illustrates the distances according to the reference set through a musical piece result in more readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I would prefer to use original values.
then, I want to pick only some regions from the piece and compare those values of those regions, whether they are higher than the mean of all values.
Atte
> On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
>
> > Is there anything for me?
> >
> > There is a lot of data, n=2418, but there are also a lot of ties.
> > My sample n≈250300
> >
>
> I do not understand why there should be so many ties. You have not
> described the measurement process or units. ( ... although you offer a
>
> glipmse without much background later.)
>
> > i would like to test, whether the mean of the sample differ
> > significantly from the population mean.
>
> Why? What is the purpose of this investigation? Why should the mean of
>
> a sample be that important?
>
> >
> > The histogram of the population looks like in attached histogram,
> > what test should I use? No choices?
> >
> > This distribution comes from a musical piece and the values are
> > 'tonal distances'.
> >
> > http://users.utu.fi/attenka/Hist.png>
> That picture does not offer much insidght into the features of that
> measurement. It appears to have much more structure than I would
> expect for a sample from a smooth unimodal underlying population.
>
> 
> David.
>
> >
> > Atte
> >
> >> On 06/24/2010 12:40 PM, David Winsemius wrote:
> >>>
> >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> >>>
> >>>> Thanks. What I have had to ask is that
> >>>>
> >>>> how do you test that the data is symmetric enough?
> >>>> If it is not, is it ok to use some data transformation?
> >>>>
> >>>> when it is said:
> >>>>
> >>>> "The Wilcoxon signed rank test does not assume that the data are
> >>>> sampled from a Gaussian distribution. However it does assume that
>
> >>>> the
> >>>> data are distributed symmetrically around the median. If the
> >>>> distribution is asymmetrical, the P value will not tell you much
>
> >>>> about
> >>>> whether the median is different than the hypothetical value."
> >>>
> >>> You are being misled. Simply finding a statement on a statistics
> >>> software website, even one as reputable as Graphpad (???), does not
> >> mean
> >>> that it is necessarily true. My understanding (confirmed reviewing
> >>> "Nonparametric statistical methods for complete and censored data"
> >> by M.
> >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signedrank test
> >> does
> >>> not require that the underlying distributions be symmetric. The
> >>> above
> >>> quotation is highly inaccurate.
> >>>
> >>
> >> To add to what David and others have said, look at the kernel that
>
> >> the
> >>
> >> Ustatistic associated with the WSR test uses: the indicator (0/1)
> of
> >> xi
> >> + xj > 0. So WSR tests H0:p=0.5 where p = the probability that the
> >> average of a randomly chosen pair of values is positive. [If there
> >> are
> >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj
> <
> >>
> >> 0], i neq j.
> >>
> >> Frank
> >>
> >> 
> >> Frank E Harrell Jr Professor and Chairman School of Medicine
> >> Department of Biostatistics Vanderbilt
> >> University
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


BTW. If there is not so weak test that would be suitable for my purpose (because of the ties and the shape of the data), could I proceed this way:
It is also worth of comparing different samples taken from the data. Since the mean and sd of the data are available, could I approximate pvalues using z or ttest, just to compare several different samples?
Atte
> On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:
>
> > Is there anything for me?
> >
> > There is a lot of data, n=2418, but there are also a lot of ties.
> > My sample n≈250300
> >
>
> I do not understand why there should be so many ties. You have not
> described the measurement process or units. ( ... although you offer a
>
> glipmse without much background later.)
>
> > i would like to test, whether the mean of the sample differ
> > significantly from the population mean.
>
> Why? What is the purpose of this investigation? Why should the mean of
>
> a sample be that important?
>
> >
> > The histogram of the population looks like in attached histogram,
> > what test should I use? No choices?
> >
> > This distribution comes from a musical piece and the values are
> > 'tonal distances'.
> >
> > http://users.utu.fi/attenka/Hist.png>
> That picture does not offer much insidght into the features of that
> measurement. It appears to have much more structure than I would
> expect for a sample from a smooth unimodal underlying population.
>
> 
> David.
>
> >
> > Atte
> >
> >> On 06/24/2010 12:40 PM, David Winsemius wrote:
> >>>
> >>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
> >>>
> >>>> Thanks. What I have had to ask is that
> >>>>
> >>>> how do you test that the data is symmetric enough?
> >>>> If it is not, is it ok to use some data transformation?
> >>>>
> >>>> when it is said:
> >>>>
> >>>> "The Wilcoxon signed rank test does not assume that the data are
> >>>> sampled from a Gaussian distribution. However it does assume that
>
> >>>> the
> >>>> data are distributed symmetrically around the median. If the
> >>>> distribution is asymmetrical, the P value will not tell you much
>
> >>>> about
> >>>> whether the median is different than the hypothetical value."
> >>>
> >>> You are being misled. Simply finding a statement on a statistics
> >>> software website, even one as reputable as Graphpad (???), does not
> >> mean
> >>> that it is necessarily true. My understanding (confirmed reviewing
> >>> "Nonparametric statistical methods for complete and censored data"
> >> by M.
> >>> M. Desu, Damaraju Raghavarao, is that the Wilcoxon signedrank test
> >> does
> >>> not require that the underlying distributions be symmetric. The
> >>> above
> >>> quotation is highly inaccurate.
> >>>
> >>
> >> To add to what David and others have said, look at the kernel that
>
> >> the
> >>
> >> Ustatistic associated with the WSR test uses: the indicator (0/1)
> of
> >> xi
> >> + xj > 0. So WSR tests H0:p=0.5 where p = the probability that the
> >> average of a randomly chosen pair of values is positive. [If there
> >> are
> >> ties this probably needs to be worded as P[xi + xj > 0] = P[xi + xj
> <
> >>
> >> 0], i neq j.
> >>
> >> Frank
> >>
> >> 
> >> Frank E Harrell Jr Professor and Chairman School of Medicine
> >> Department of Biostatistics Vanderbilt
> >> University
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


As a remark on your histogram : use less breaks! This histogram tells
you nothing. An interesting function is ?density , eg :
x<rnorm(250)
hist(x,freq=F)
lines(density(x),col="red")
See also this ppt, a very nice and short introduction to graphics in R :
http://csg.sph.umich.edu/docs/R/graphics1.pdf2010/6/25 Atte Tenkanen < [hidden email]>:
> Is there anything for me?
>
> There is a lot of data, n=2418, but there are also a lot of ties.
> My sample n≈250300
You should think about the central limit theorem. Actually, you can
just use a ttest to compare means, as with those sample sizes the
mean is almost certainly normally distributed.
>
> i would like to test, whether the mean of the sample differ significantly from the population mean.
>
According to probability theory, this will be in 5% of the cases if
you repeat your sampling infinitly. But as David asked: why on earth
do you want to test that?
cheers
Joris

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The central limit theorem doesn't help. It just addresses type I error,
not power.
Frank
On 06/25/2010 04:29 AM, Joris Meys wrote:
> As a remark on your histogram : use less breaks! This histogram tells
> you nothing. An interesting function is ?density , eg :
>
> x<rnorm(250)
> hist(x,freq=F)
> lines(density(x),col="red")
>
> See also this ppt, a very nice and short introduction to graphics in R :
> http://csg.sph.umich.edu/docs/R/graphics1.pdf>
> 2010/6/25 Atte Tenkanen< [hidden email]>:
>> Is there anything for me?
>>
>> There is a lot of data, n=2418, but there are also a lot of ties.
>> My sample n≈250300
>
> You should think about the central limit theorem. Actually, you can
> just use a ttest to compare means, as with those sample sizes the
> mean is almost certainly normally distributed.
>>
>> i would like to test, whether the mean of the sample differ significantly from the population mean.
>>
> According to probability theory, this will be in 5% of the cases if
> you repeat your sampling infinitly. But as David asked: why on earth
> do you want to test that?
>
> cheers
> Joris
>

Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


2010/6/25 Frank E Harrell Jr < [hidden email]>:
> The central limit theorem doesn't help. It just addresses type I error,
> not power.
>
> Frank
I don't think I stated otherwise. I am aware of the fact that the
wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
to the ttest in case of skewed distributions. Apologies if I caused
more confusion.
The "problem" with the wilcoxon is twofold as far as I understood this
data correctly :
 there are quite some ties
 the wilcoxon assumes under the null that the distributions are the
same, not only the location. The influence of unequal variances and/or
shapes of the distribution is enhanced in the case of unequal sample
sizes.
The central limit theory makes that :
 the ttest will do correct inference in the presence of ties
 unequal variances can be taken into account using the modified
ttest, both in the case of equal and unequal sample sizes
For these reasons, I would personally use the ttest for comparing two
samples from the described population. Your mileage may vary.
Cheers
Joris
>
> On 06/25/2010 04:29 AM, Joris Meys wrote:
>> As a remark on your histogram : use less breaks! This histogram tells
>> you nothing. An interesting function is ?density , eg :
>>
>> x<rnorm(250)
>> hist(x,freq=F)
>> lines(density(x),col="red")
>>
>> See also this ppt, a very nice and short introduction to graphics in R :
>> http://csg.sph.umich.edu/docs/R/graphics1.pdf>>
>> 2010/6/25 Atte Tenkanen< [hidden email]>:
>>> Is there anything for me?
>>>
>>> There is a lot of data, n=2418, but there are also a lot of ties.
>>> My sample n≈250300
>>
>> You should think about the central limit theorem. Actually, you can
>> just use a ttest to compare means, as with those sample sizes the
>> mean is almost certainly normally distributed.
>>>
>>> i would like to test, whether the mean of the sample differ significantly from the population mean.
>>>
>> According to probability theory, this will be in 5% of the cases if
>> you repeat your sampling infinitly. But as David asked: why on earth
>> do you want to test that?
>>
>> cheers
>> Joris
>>
>
>
> 
> Frank E Harrell Jr Professor and Chairman School of Medicine
> Department of Biostatistics Vanderbilt University
>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

12
