

Dear fellow R users,
Keywords: KruskalWallis, PostHoc, pairwise comparisons, NemenyiDamicoWolfeDunn test, coin package, oneway_test
I am using the "oneway_test" function in the R package "coin" and I am obtaining results which I cannot believe are accurate. I do not wish to waste anyone's time and so if the following problem is rather trivial, I apologize, however I could not seem to resolve my problem with an online search and I am fresh out of ideas.
I have carried out a KruskalWallis test to compare breeding strategy variance of my study organisms (rank data, therefore nonparametric, in oder of increasing degree of "terrestrialization", in this case: adaptations to breeding on land as opposed to in aquatic habitats) between habitat groups (I, II and III). Subsequently I would like to do a "PostHoc test" or in other words a set of corrected pairwise comparisons to test the relationship between individual groups. For this I would like to use the NemenyiDamicoWolfeDunn test in the "coin" package (aka oneway_test). However, when I apply it to my data, I receive highly significant differences between all of my groups, which when looking at my data, cannot be true. I have posted one of my command blocks below containing my data set as well as the script adapted from the coin package manual.
library(coin)
library(multcomp)
###this is my data:
mydata < data.frame(breeding = c(4,4,4,4,1,1,1,1,8,8,8,8,9,7,7,4,4,4,6,1,1,1,1,1,1,4,1,4,4,1,1,4,4,1,1,1,1,6,6,6,6,6,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,11,7,1,1,1,1,1,4,4,4,4,1,1,1,1,8,8,8,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,7,7,7,7,7,7,7,7,7,7,5,5,5,5,5,4,4,6,6,6,6,6,6,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,3,3,3,1,4,1,7,7,7,7,7,7,12,12,12,12,12,12,1,4,7,4,1,1,1,1,1,1,1,1,4),
habitat = factor(c(rep("I", 68), rep("II", 89),
rep("III", 12))))
###box plot to visualize data
boxplot(breeding~habitat,data=mydata,main="Boxplot of breeding strategies",ylab="breeding strategy",col="gold",lty=1)
### KruskalWallis test
kruskal_test(breeding ~ habitat, data = mydata, distribution = approximate(B = 9999))
### NemenyiDamicoWolfeDunn test (joint ranking)
NDWD < oneway_test(breeding ~ habitat, data = mydata,
ytrafo = function(data) trafo(data, numeric_trafo = rank),
xtrafo = function(data) trafo(data, factor_trafo = function(x)
model.matrix(~x  1) %*% t(contrMat(table(x), "Tukey"))),
teststat = "max", distribution = approximate(B = 900000))
### global pvalue
print(pvalue(NDWD))
### sitesbysite p values at alpha = 0.01 (page 244)
print(pvalue(NDWD, method = "singlestep"))
I should be detecting some nonsignificance between groups I and III at least, but the test comes back with extremely low pvalues. Where am I going wrong?
Thank you very much for your help.
With kind regards
Christoph
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Jan 09, 2012 at 11:48am Christoph Liedtke wrote:
> I should be detecting some nonsignificance between groups I and III at least, but the test comes back with
> extremely low pvalues. Where am I going wrong?
Nowhere, I think. This does seem to be an error in coin. You should send your example to the maintainers of the package. Apart from the visual you have provided, other reasons for thinking that this is an error are the following.
First, if you redo the analysis excluding habitat II, then the contrast is not significant, as expected. Secondly, if you repeat the full analysis using package nparcomp then you get the results you are expecting, based on the graphical representation of the data. See the examples below.
## drop habitat == II
NDWD < oneway_test(breeding ~ habitat, data = droplevels(subset(mydata, habitat != "II")),
ytrafo = function(data) trafo(data, numeric_trafo = rank),
xtrafo = function(data) trafo(data, factor_trafo = function(x)
model.matrix(~x  1) %*% t(contrMat(table(x), "Tukey"))),
teststat = "max", distribution = approximate(B = 900000))
print(NDWD)
print(pvalue(NDWD, method = "singlestep"))
## use nparcomp
library(nparcomp)
npar < nparcomp(breeding ~ habitat, data = mydata, type = "Tukey")
npar
Regards, Mark.
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa


The usual way to pose questions about "unexpected behavior" is to look
up the package maintainer's name and email address in the DESCRIPTION
file of the package which is accessed with the help function:
help(package=coin)
You see:
Maintainer: Torsten Hothorn < [hidden email]>

David
On Jan 9, 2012, at 4:48 AM, Christoph Liedtke wrote:
> Dear fellow R users,
>
> Keywords: KruskalWallis, PostHoc, pairwise comparisons, Nemenyi
> DamicoWolfeDunn test, coin package, oneway_test
>
> I am using the "oneway_test" function in the R package "coin" and I
> am obtaining results which I cannot believe are accurate. I do not
> wish to waste anyone's time and so if the following problem is
> rather trivial, I apologize, however I could not seem to resolve my
> problem with an online search and I am fresh out of ideas.
>
> I have carried out a KruskalWallis test to compare breeding
> strategy variance of my study organisms (rank data, therefore non
> parametric, in oder of increasing degree of "terrestrialization", in
> this case: adaptations to breeding on land as opposed to in aquatic
> habitats) between habitat groups (I, II and III). Subsequently I
> would like to do a "PostHoc test" or in other words a set of
> corrected pairwise comparisons to test the relationship between
> individual groups. For this I would like to use the NemenyiDamico
> WolfeDunn test in the "coin" package (aka oneway_test). However,
> when I apply it to my data, I receive highly significant differences
> between all of my groups, which when looking at my data, cannot be
> true. I have posted one of my command blocks below containing my
> data set as well as the script adapted from the coin package manual.
>
> library(coin)
> library(multcomp)
>
> ###this is my data:
> mydata < data.frame(breeding =
> c(4,4,4,4,1,1,1,1,8,8,8,8,9,7,7,4,4,4,6,1,1,1,1,1,1,4,1,4,4,1,1,4,4,1,1,1,1,6,6,6,6,6,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,11,7,1,1,1,1,1,4,4,4,4,1,1,1,1,8,8,8,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,7,7,7,7,7,7,7,7,7,7,5,5,5,5,5,4,4,6,6,6,6,6,6,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,3,3,3,1,4,1,7,7,7,7,7,7,12,12,12,12,12,12,1,4,7,4,1,1,1,1,1,1,1,1,4),
> habitat = factor(c(rep("I", 68), rep("II", 89),
> rep("III", 12))))
>
> ###box plot to visualize data
> boxplot(breeding~habitat,data=mydata,main="Boxplot of breeding
> strategies",ylab="breeding strategy",col="gold",lty=1)
>
>
> ### KruskalWallis test
> kruskal_test(breeding ~ habitat, data = mydata, distribution =
> approximate(B = 9999))
>
> ### NemenyiDamicoWolfeDunn test (joint ranking)
> NDWD < oneway_test(breeding ~ habitat, data = mydata,
> ytrafo = function(data) trafo(data, numeric_trafo = rank),
> xtrafo = function(data) trafo(data, factor_trafo = function(x)
> model.matrix(~x  1) %*% t(contrMat(table(x), "Tukey"))),
> teststat = "max", distribution = approximate(B = 900000))
>
> ### global pvalue
> print(pvalue(NDWD))
>
> ### sitesbysite p values at alpha = 0.01 (page 244)
> print(pvalue(NDWD, method = "singlestep"))
>
>
>
>
>
> I should be detecting some nonsignificance between groups I and III
> at least, but the test comes back with extremely low pvalues.
> Where am I going wrong?
>
> Thank you very much for your help.
>
> With kind regards
>
> Christoph
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

