Hello,
I'm having some issues grouping cases for some Mann-Whitney U tests I'm attempting to run. I'm willing to use wilcox.test if it'll work; I've also tried wilcox_test() from the "coin" package. Here's the deal: for each column (A through H), I would like to run the two-sample independent test, comparing Group 5 (CD8.14 through CD8.17) to Group 6 (CD8.18 through CD8.21). So, for A, test Group 5 agains Group 6, for B, test Group 5 against Group 6, and so on. I'm sure this is basic, but thank you in advance for any help you can provide. All the best, David [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Can you describe how your data is organized. It is clear there are eight
columns, but it is not clear how the groups are represented, a Group column or do the groups have to be assembled from information in another column (a column with CD8.14, etc)? Create a small version of the data and use dput() to create a readable version that can be pasted into a plain text email (don't use html) usually results in a faster response. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of David Chertudi > Sent: Friday, July 06, 2012 9:15 AM > To: [hidden email] > Subject: [R] Mann-Whitney by group > > Hello, > > I'm having some issues grouping cases for some Mann-Whitney U tests I'm > attempting to run. I'm willing to use wilcox.test if it'll work; I've > also tried wilcox_test() from the "coin" package. Here's the deal: for > each column (A through H), I would like to run the two-sample > independent test, comparing Group 5 (CD8.14 through CD8.17) to Group 6 > (CD8.18 through CD8.21). So, for A, test Group 5 agains Group 6, for B, > test Group 5 against Group 6, and so on. I'm sure this is basic, but > thank you in advance for any help you can provide. > > All the best, > David > > > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi David,
Thank you for the insight: I could have sworn I added a picture of the data, but providing the actual data is worlds easier to deal with, I'm sure. I've never used dput(), so I entered it using the dataframe in question as the object, and I've pasted the results below. Essentially, I would like to run the two-sample independent test, comparing Group 5 (CD8.14 through CD8.17) to Group 6 (CD8.18 through CD8.21). So, for A, test Group 5 agains Group 6, for B, test Group 5 against Group 6, and so on. I'm not going to muddy the waters by telling you what I've tried; suffice it to say that I'm looking for insights into how to structure R commands to compare groups of data of this format. Many thanks in advance, David structure(list(Gene = structure(c(1L, 12L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 14L, 15L, 16L, 17L, 18L), .Label = c("CD8.1", "CD8.10", "CD8.11", "CD8.12", "CD8.13", "CD8.14", "CD8.15", "CD8.16", "CD8.17", "CD8.18", "CD8.19", "CD8.2", "CD8.20", "CD8.21", "CD8.22", "CD8.23", "CD8.24", "CD8.25", "CD8.3", "CD8.4", "CD8.5", "CD8.6", "CD8.7", "CD8.8", "CD8.9"), class = "factor"), Group = structure(c(8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L), .Label = c("Fabbf Ova CD40", "Fabbf Ova MHC2", "Fabbf Ova WT", "Fabbf WT", "Naïve CD40", "Naïve MHC II", "Naïve WT", "1", "2", "3", "4", "5", "6", "7"), class = "factor"), A = c(19.4701946749544, 0.679440926463348, 0.69035683372563, 0.347105466158261, 0.435480792190284, 0.338699910286907, 0.651378057031152, 0.707065053752258, 0.685244609506316, 0.816673858871597, 0.597009097584509, 0.592331304482431, 0.709359033358704, 0.628406759227531, 0.78158729467231, 1.0422377526669, 0.61560003251142, 0.463755016733183, 0.419700860701392, 0.380946898502731, 0.41156961153081, 0.471790683365624, 0.552773224145722, 0.466787799928649, 0.767624372542755), B = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), C = c(0.914649979331863, 1.443086801592, 0.928280641141244, 0.467498974059775, 0.668742025741347, 0.568417298005388, 0.778907650835673, 0.778765181169635, 1.01103488277517, 0.674133483128923, 0.830400022230133, 0.652687178870627, 0.746893950266518, 0.765498308522646, 1.01273201749333, 1.00309616129672, 0.63015975419947, 0.615905247119739, 0.615687625199691, 0.503136087800137, 0.566164026974035, 0.631519467967541, 0.822126705285366, 0.543873075815645, 1.00596108625425), D = c(13.3272657341526, 3.08914950309865, 1.71836820240434, 0.723301573710509, 1.21014411624732, 1.92899377364865, 1.80280408189187, 2.25057819266424, 2.23876060313374, 1.30849425313072, 1.58782967140617, 1.19199809794126, 1.64151140806787, 0.241017500596534, 0.364896032519483, 0.322953808735804, 0.2052110581509, 0.927601295331376, 0.808910781520832, 0.538033121081646, 0.655348783504307, 0.564449549672088, 0.521729926793001, 0.414305517285192, 0.507084483980948), E = c(56.2830291897158, 9.76091939190267, 4.80922410182105, 17.0056576949022, 20.851046177766, 17.9057247086369, 5.93332779160845, 4.73058157592946, 5.59155211460608, 9.67484467290805, 5.92374864612388, 7.12393623733123, 5.33576126730867, 10.3943422629275, 10.8732527705049, 12.4861085370674, 12.0918705721064, 13.3210661695018, 10.9410344557684, 15.1298307761675, 13.0708078246191, 9.4445293976312, 6.94340249514349, 5.07888688780375, 8.33846787814466 ), F = c(15.0459568981729, 21.6362955612539, 9.66673955488981, 27.2276698483913, 18.1090094072926, 20.0952712980862, 24.9249499974856, 23.5540183530194, 29.6638363657906, 28.9779309040733, 42.0402820641407, 33.8068160394092, 51.7299064374737, 37.8306751403421, 43.1955470199259, 45.5125262939585, 40.3109474523637, 23.6341894633273, 23.9721353180788, 20.4920649252818, 24.8898447627354, 34.2686409607416, 31.3815198841165, 31.4947528368753, 43.2686436885025), G = c(15.9703031418086, 5.30495997585743, 3.07594974529074, 8.28703732907722, 10.9437825143868, 6.91196232523896, 2.97808148581742, 2.6386825521864, 2.2415006913088, 5.00747306438661, 2.65238188782831, 3.18277515130905, 3.14638620532385, 2.5149505923191, 2.48862112414046, 2.97170069886913, 1.91643165326171, 7.50682774199005, 5.39102206185423, 5.2498453524987, 7.17519969844757, 2.66448841457179, 2.78444235996995, 1.62286520735228, 2.48760726398266 ), H = c(-1, -1, -1, 0.0036561124481055, 0.010723007432761, 0.0196616746380801, 0.0371046164124276, -1, 3.27378673314144e-05, -1, -1, -1, -1, 0.00816215783906802, 21.5424904701651, 9.57900616157724, 0.00735942043489242, 1.50346040901698, 0.0909450037365435, 1.49237001404701e-05, 0.000775741472561218, -1, 17.9023582944659, 0.0176891314806093, 33.4326253626981)), .Names = c("Gene", "Group", "A", "B", "C", "D", "E", "F", "G", "H"), row.names = c(NA, 25L), class = "data.frame") |
This works. First we assign the results of dput() to a variable
So we can use it. Then we eliminate the groups we don't need. Third we remake the factors to eliminate the groups and genes that do not appear in the data subset. Finally, compute the tests. Dta <- structure(list(Gene = structure(c(1L, 12L, 19L, 20L, 21L, ....................lines omitted.......................... "Group", "A", "B", "C", "D", "E", "F", "G", "H"), row.names = c(NA, 25L), class = "data.frame") # Pull out just groups 5 and 6 Dtb <- Dta[Dta$Group %in% c(5, 6), ] # Check the resulting data frame - 8 observations, # four in each group, all measurements in B are 1 Dtb # Eliminate factor levels that do not exist in the reduced # data set Dtb$Gene <- factor(Dtb$Gene) Dtb$Group <- factor(Dtb$Group) # Mann-Whitney is the same as Wilcoxon Rank Sum test (see manual page) ?wilcox.test # Compute test for A wilcox.test(A~Group, Dtb) # Compute all the tests apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) # Error relates to column B which is constant ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of Oxenstierna > Sent: Friday, July 06, 2012 3:34 PM > To: [hidden email] > Subject: Re: [R] Mann-Whitney by group > > Hi David, > > Thank you for the insight: I could have sworn I added a picture of the > data, but providing the actual data is worlds easier to deal with, I'm > sure. > I've never used dput(), so I entered it using the dataframe in question > as > the object, and I've pasted the results below. > > Essentially, I would like to run the two-sample independent test, > comparing > Group 5 (CD8.14 through CD8.17) to Group 6 (CD8.18 through CD8.21). So, > for > A, test Group 5 agains Group 6, for B, test Group 5 against Group 6, > and so > on. I'm not going to muddy the waters by telling you what I've tried; > suffice it to say that I'm looking for insights into how to structure R > commands to compare groups of data of this format. > > Many thanks in advance, > > David > > > structure(list(Gene = structure(c(1L, 12L, 19L, 20L, 21L, 22L, > 23L, 24L, 25L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, > 14L, 15L, 16L, 17L, 18L), .Label = c("CD8.1", "CD8.10", "CD8.11", > "CD8.12", "CD8.13", "CD8.14", "CD8.15", "CD8.16", "CD8.17", "CD8.18", > "CD8.19", "CD8.2", "CD8.20", "CD8.21", "CD8.22", "CD8.23", "CD8.24", > "CD8.25", "CD8.3", "CD8.4", "CD8.5", "CD8.6", "CD8.7", "CD8.8", > "CD8.9"), class = "factor"), Group = structure(c(8L, 8L, 8L, > 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, > 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L), .Label = c("Fabbf Ova > CD40", > "Fabbf Ova MHC2", "Fabbf Ova WT", "Fabbf WT", "Naïve CD40", "Naïve MHC > II", > "Naïve WT", "1", "2", "3", "4", "5", "6", "7"), class = "factor"), > A = c(19.4701946749544, 0.679440926463348, 0.69035683372563, > 0.347105466158261, 0.435480792190284, 0.338699910286907, > 0.651378057031152, 0.707065053752258, 0.685244609506316, > 0.816673858871597, 0.597009097584509, 0.592331304482431, > 0.709359033358704, 0.628406759227531, 0.78158729467231, > 1.0422377526669, > 0.61560003251142, 0.463755016733183, 0.419700860701392, > 0.380946898502731, > 0.41156961153081, 0.471790683365624, 0.552773224145722, > 0.466787799928649, > 0.767624372542755), B = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), C = c(0.914649979331863, > 1.443086801592, 0.928280641141244, 0.467498974059775, > 0.668742025741347, > 0.568417298005388, 0.778907650835673, 0.778765181169635, > 1.01103488277517, 0.674133483128923, 0.830400022230133, > 0.652687178870627, > 0.746893950266518, 0.765498308522646, 1.01273201749333, > 1.00309616129672, > 0.63015975419947, 0.615905247119739, 0.615687625199691, > 0.503136087800137, > 0.566164026974035, 0.631519467967541, 0.822126705285366, > 0.543873075815645, 1.00596108625425), D = c(13.3272657341526, > 3.08914950309865, 1.71836820240434, 0.723301573710509, > 1.21014411624732, > 1.92899377364865, 1.80280408189187, 2.25057819266424, > 2.23876060313374, > 1.30849425313072, 1.58782967140617, 1.19199809794126, > 1.64151140806787, > 0.241017500596534, 0.364896032519483, 0.322953808735804, > 0.2052110581509, 0.927601295331376, 0.808910781520832, > 0.538033121081646, > 0.655348783504307, 0.564449549672088, 0.521729926793001, > 0.414305517285192, 0.507084483980948), E = c(56.2830291897158, > 9.76091939190267, 4.80922410182105, 17.0056576949022, > 20.851046177766, > 17.9057247086369, 5.93332779160845, 4.73058157592946, > 5.59155211460608, > 9.67484467290805, 5.92374864612388, 7.12393623733123, > 5.33576126730867, > 10.3943422629275, 10.8732527705049, 12.4861085370674, > 12.0918705721064, > 13.3210661695018, 10.9410344557684, 15.1298307761675, > 13.0708078246191, > 9.4445293976312, 6.94340249514349, 5.07888688780375, > 8.33846787814466 > ), F = c(15.0459568981729, 21.6362955612539, 9.66673955488981, > 27.2276698483913, 18.1090094072926, 20.0952712980862, > 24.9249499974856, > 23.5540183530194, 29.6638363657906, 28.9779309040733, > 42.0402820641407, > 33.8068160394092, 51.7299064374737, 37.8306751403421, > 43.1955470199259, > 45.5125262939585, 40.3109474523637, 23.6341894633273, > 23.9721353180788, > 20.4920649252818, 24.8898447627354, 34.2686409607416, > 31.3815198841165, > 31.4947528368753, 43.2686436885025), G = c(15.9703031418086, > 5.30495997585743, 3.07594974529074, 8.28703732907722, > 10.9437825143868, > 6.91196232523896, 2.97808148581742, 2.6386825521864, > 2.2415006913088, > 5.00747306438661, 2.65238188782831, 3.18277515130905, > 3.14638620532385, > 2.5149505923191, 2.48862112414046, 2.97170069886913, > 1.91643165326171, > 7.50682774199005, 5.39102206185423, 5.2498453524987, > 7.17519969844757, > 2.66448841457179, 2.78444235996995, 1.62286520735228, > 2.48760726398266 > ), H = c(-1, -1, -1, 0.0036561124481055, 0.010723007432761, > 0.0196616746380801, 0.0371046164124276, -1, 3.27378673314144e-05, > -1, -1, -1, -1, 0.00816215783906802, 21.5424904701651, > 9.57900616157724, > 0.00735942043489242, 1.50346040901698, 0.0909450037365435, > 1.49237001404701e-05, 0.000775741472561218, -1, 17.9023582944659, > 0.0176891314806093, 33.4326253626981)), .Names = c("Gene", > "Group", "A", "B", "C", "D", "E", "F", "G", "H"), row.names = c(NA, > 25L), class = "data.frame") > > -- > View this message in context: http://r.789695.n4.nabble.com/Mann- > Whitney-by-group-tp4635618p4635667.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
This works very well--thanks so much.
By way of extension: how would one extract elements from the result object? For example: thing<=apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) summary(thing)$p.value Does not provide a list of p-values as it would in a regression object. Ideally, I would like to be able to extract the W score and p-value by A,B,C,... Any ideas greatly appreciated! |
Untested, I think you need to lapply() over thing with some sort of extractor:
lapply(thing, function(x) x[['p.value']]) Michael On Jul 10, 2012, at 3:45 PM, Oxenstierna <[hidden email]> wrote: > This works very well--thanks so much. > > By way of extension: how would one extract elements from the result object? > > For example: > > thing<=apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) > > summary(thing)$p.value > > Does not provide a list of p-values as it would in a regression object. > Ideally, I would like to be able to extract the W score and p-value by > A,B,C,... > > Any ideas greatly appreciated! > > > -- > View this message in context: http://r.789695.n4.nabble.com/Mann-Whitney-by-group-tp4635618p4636055.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
lapply(thing, function(x) x[['p.value']]) --works very well, thank you.
Not to be a chore, but I'm interested in comparing the results of wilcox.test--and the methodology we've employed so far--with the results and methodology of wilcox_test (library("coin")). So, I'd like to compare groups 5 and 6 across A through H using wilcox_test, and then I'd like to extract the p-values. Going through the same methodology as above, but replacing wilcox.test with wilcox_test has failed, and so has the p.value extraction method: lapply(thing, function(x) x[['p.value']]) . I believe the latter failure has to do with the fact that the coin package has a built-in class and built-in extraction method (pvalue() to extract and class "IndependenceTest"), but I don't know how to work around it. For example, for a single comparison: wilcox_test(A~Group, Dtb) works fine, and pvalue(wilcox.test(A~Group, Dtb)) extracts the p-value. So, any ideas about how to compare groups 5 and 6 across A through H using wilcox_test? Many thanks! |
On Mon, Jul 16, 2012 at 3:39 PM, Oxenstierna <[hidden email]> wrote:
> lapply(thing, function(x) x[['p.value']]) --works very well, thank you. > > Not to be a chore, but I'm interested in comparing the results of > wilcox.test--and the methodology we've employed so far--with the results and > methodology of wilcox_test (library("coin")). So, I'd like to compare > groups 5 and 6 across A through H using wilcox_test, and then I'd like to > extract the p-values. Going through the same methodology as above, but > replacing wilcox.test with wilcox_test has failed, and so has the p.value > extraction method: lapply(thing, function(x) x[['p.value']]) . > > I believe the latter failure has to do with the fact that the coin package > has a built-in class and built-in extraction method (pvalue() to extract and > class "IndependenceTest"), but I don't know how to work around it. For > example, for a single comparison: wilcox_test(A~Group, Dtb) works fine, and > pvalue(wilcox.test(A~Group, Dtb)) extracts the p-value. > > So, any ideas about how to compare groups 5 and 6 across A through H using > wilcox_test? I think there are a few things at play here. 1) coin uses so-called S4 objects, so `[[` style subsetting isn't going to work. The "right way" is, as you have found to use the pvalue() function. 2) It looks like you need to use the formula intervace for wilcox_test. Unfortunately, this makes things a little more complicated as you'll need to construct the formula programmatically. A one liner looks something like this. lapply(LETTERS[1:8], function(x) pvalue(wilcox_test(as.formula(paste(x, "~ Group ")), Dtb))) Where lapply loops over the letters A,B, etc. and makes the string `A ~ Group`, converts it to a formula, passes that to wilcox_test, then gets the pvalue and returns it. In two lines you could do: thing <- lapply(LETTERS[1:8], function(x) wilcox_test(as.formula(paste(x, "~ Group")), Dtb)) thing2 <- lapply(thing, pvalue) Where thing has all the test result objects, and thing2 collects the pvalues. Hope this helps, Michael ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On 2012-07-17 05:13, R. Michael Weylandt wrote:
> On Mon, Jul 16, 2012 at 3:39 PM, Oxenstierna <[hidden email]> wrote: >> lapply(thing, function(x) x[['p.value']]) --works very well, thank you. >> >> Not to be a chore, but I'm interested in comparing the results of >> wilcox.test--and the methodology we've employed so far--with the results and >> methodology of wilcox_test (library("coin")). So, I'd like to compare There should not be any differences between the p-values obtained using 'wilcox.test' and 'wilcox_test' in the asymptotic case. However, the latter function allows you to use the exact null distribution even in the presence of ties, or use an Monte Carlo approximation of the exact null distribution. Using the approximately exact null distribution is particularly helpful when the asymptotics doesn't work well, say, large but unbalanced data, and/or the exact computations are too time consuming. >> groups 5 and 6 across A through H using wilcox_test, and then I'd like to >> extract the p-values. Going through the same methodology as above, but >> replacing wilcox.test with wilcox_test has failed, and so has the p.value >> extraction method: lapply(thing, function(x) x[['p.value']]) . >> >> I believe the latter failure has to do with the fact that the coin package >> has a built-in class and built-in extraction method (pvalue() to extract and >> class "IndependenceTest"), but I don't know how to work around it. For >> example, for a single comparison: wilcox_test(A~Group, Dtb) works fine, and >> pvalue(wilcox.test(A~Group, Dtb)) extracts the p-value. >> >> So, any ideas about how to compare groups 5 and 6 across A through H using >> wilcox_test? Well, since you're doing multiple tests here (A, C, ..., H vs Group) you should consider adjusting for multiplicity and 'coin' allows you to do that easily and efficiently. A multivariate Wilcoxon rank-sum test can be constructed using > set.seed(711109) # for reproducibility > it <- independence_test(A + C + D + E + F + G + H ~ Group, data = Dtb, ytrafo = function(data) trafo(data, numeric_trafo = rank), distribution = approximate(B = 100000)) approximating the exact null distribution using 100,000 Monte Carlo replicates. Step-down adjusted p-values taking account of the dependence structure between the test statistics and possible discreteness in the null distribution are available through > (psd <- pvalue(it, "step-down")) A C D E F G H 5 0.08598 0.08598 0.08598 0.20018 0.08598 0.08598 0.34182 and using the development version of 'coin', available at R-Forge, we can get the unadjusted p-values from > (pu <- pvalue(it, "unadjusted")) A C D E F G H 5 0.02894 0.02894 0.02894 0.11512 0.02894 0.02894 0.34182 If we look at the ratio of step-down adjusted and unadjusted p-values, > psd / pu A C D E F G H 5 2.970974 2.970974 2.970974 1.738881 2.970974 2.970974 1 we can see that this type of adjustment is pretty powerful compared to step-down methods like Bonferroni-Holm that doesn't take account of the correlation nor the discreteness > p.adjust(pu) / pu A C D E F G H 5 7 7 7 2 7 7 1 HTH, Henric > > I think there are a few things at play here. > > 1) coin uses so-called S4 objects, so `[[` style subsetting isn't > going to work. The "right way" is, as you have found to use the > pvalue() function. > > 2) It looks like you need to use the formula intervace for > wilcox_test. Unfortunately, this makes things a little more > complicated as you'll need to construct the formula programmatically. > > A one liner looks something like this. > > lapply(LETTERS[1:8], function(x) > pvalue(wilcox_test(as.formula(paste(x, "~ Group ")), Dtb))) > > Where lapply loops over the letters A,B, etc. and makes the string `A > ~ Group`, converts it to a formula, passes that to wilcox_test, then > gets the pvalue and returns it. > > In two lines you could do: > > thing <- lapply(LETTERS[1:8], function(x) > wilcox_test(as.formula(paste(x, "~ Group")), Dtb)) > thing2 <- lapply(thing, pvalue) > > Where thing has all the test result objects, and thing2 collects the pvalues. > > Hope this helps, > Michael > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Michael Weylandt
The time has come to shake the cobwebs off of this analysis. I have
more data now and need to run the same tests, the same way as above. My question is this--some of the pairs include NAs, and so are gumming up the works. I'm not sure how to exclude them using the lhs ~ rhs syntax. Any ideas here? Many thanks, as usual. Data and syntax below. David sara.data=structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), Pairs = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), Actb = c(2.2734065552504, 1.69621901296377, 1.07836251830772, 1.46314001007756, 1.76537848566894, 0.689064098855446, 0.462820758081676, NA, NA, 2.22119254577143 ), Bcl2 = c(0.12954440593121, 0.0902306425601895, 0.219044589401239, 0.103793432483774, 0.119463699676088, 0.112179645963861, 0.136910739776212, 0.433953247043377, 0.401539702575691, 0.352218179109105), Bcl6 = c(1.78964109252879, 1.56011379020288, 0.750029838175481, 1.80189108290585, 1.09372632818505, 0.275815381178548, 0.785680035605173, NA, NA, 0.311865838414934 ), Ccl5 = c(0.140676314771846, 0.103227179167928, 0.210718001043218, 0.101548390950462, 0.140625579216236, 0.218846310909471, 0.132902076760262, 0.35763207205821, 0.320733407260836, 0.0983004520984843), Ccr7 = c(0.116274608274044, 0.0623582657156311, 0.111654418769019, 0.110221412062233, 0.0646423645035265, 0.0924168984762384, 0.0322085814124609, NA, NA, 0.0315246913534493 ), Cd27 = c(0.599332581326994, 0.536313800392409, 0.776647646561188, 0.511624999868611, 0.481254858629634, 0.365428233004039, 0.30446734845483, 0.880574935388197, 1.19362122336861, 0.121581553928565), Cd28 = c(0.8476006082089, 0.976603410250505, 0.976783190446247, 0.8288118647421, 0.854672311976977, 0.576719839424659, 0.4221908111396, 1.22864113852622, 5.19562728663742, 0.401610355554234), Cd40 = c(0.209298226865743, 0.0680133680665235, 0.0233440779283003, 0.191986570448918, 0.128784506152115, NA, NA, NA, NA, NA)), .Names = c("Groups", "Pairs", "Actb", "Bcl2", "Bcl6", "Ccl5", "Ccr7", "Cd27", "Cd28", "Cd40"), class = "data.frame", row.names = c(NA, -10L)) results=apply(saradata[,4:length(saradata)], 2, function(x) wilcox.test(x~saradata$Groups,paired=TRUE,alternative="two.sided")) # Extract p-values from saved results lapply(results, function(x) x[['p.value']]) -- I drink your milkshake. On Tue, Jul 10, 2012 at 3:13 PM, R. Michael Weylandt <[hidden email]> <[hidden email]> wrote: > Untested, I think you need to lapply() over thing with some sort of extractor: > > lapply(thing, function(x) x[['p.value']]) > > Michael > > On Jul 10, 2012, at 3:45 PM, Oxenstierna <[hidden email]> wrote: > >> This works very well--thanks so much. >> >> By way of extension: how would one extract elements from the result object? >> >> For example: >> >> thing<=apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) >> >> summary(thing)$p.value >> >> Does not provide a list of p-values as it would in a regression object. >> Ideally, I would like to be able to extract the W score and p-value by >> A,B,C,... >> >> Any ideas greatly appreciated! >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Mann-Whitney-by-group-tp4635618p4636055.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi,
You may try: unlist(lapply(sara.data[,4:length(sara.data)],function(x) {x1<-tapply(is.na(x),list(sara.data$Groups),FUN=sum); if(x1[1]!=x1[2]) NULL else wilcox.test(x~sara.data$Groups,paired=TRUE,alternative="two.sided")$p.value})) # Bcl2 Ccl5 Cd27 Cd28 #0.1250 0.1875 0.8125 0.8125 A.K. ----- Original Message ----- From: David Chertudi <[hidden email]> To: R. Michael Weylandt <[hidden email]> Cc: "[hidden email]" <[hidden email]> Sent: Sunday, September 8, 2013 11:13 PM Subject: Re: [R] Mann-Whitney by group The time has come to shake the cobwebs off of this analysis. I have more data now and need to run the same tests, the same way as above. My question is this--some of the pairs include NAs, and so are gumming up the works. I'm not sure how to exclude them using the lhs ~ rhs syntax. Any ideas here? Many thanks, as usual. Data and syntax below. David sara.data=structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), Pairs = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), Actb = c(2.2734065552504, 1.69621901296377, 1.07836251830772, 1.46314001007756, 1.76537848566894, 0.689064098855446, 0.462820758081676, NA, NA, 2.22119254577143 ), Bcl2 = c(0.12954440593121, 0.0902306425601895, 0.219044589401239, 0.103793432483774, 0.119463699676088, 0.112179645963861, 0.136910739776212, 0.433953247043377, 0.401539702575691, 0.352218179109105), Bcl6 = c(1.78964109252879, 1.56011379020288, 0.750029838175481, 1.80189108290585, 1.09372632818505, 0.275815381178548, 0.785680035605173, NA, NA, 0.311865838414934 ), Ccl5 = c(0.140676314771846, 0.103227179167928, 0.210718001043218, 0.101548390950462, 0.140625579216236, 0.218846310909471, 0.132902076760262, 0.35763207205821, 0.320733407260836, 0.0983004520984843), Ccr7 = c(0.116274608274044, 0.0623582657156311, 0.111654418769019, 0.110221412062233, 0.0646423645035265, 0.0924168984762384, 0.0322085814124609, NA, NA, 0.0315246913534493 ), Cd27 = c(0.599332581326994, 0.536313800392409, 0.776647646561188, 0.511624999868611, 0.481254858629634, 0.365428233004039, 0.30446734845483, 0.880574935388197, 1.19362122336861, 0.121581553928565), Cd28 = c(0.8476006082089, 0.976603410250505, 0.976783190446247, 0.8288118647421, 0.854672311976977, 0.576719839424659, 0.4221908111396, 1.22864113852622, 5.19562728663742, 0.401610355554234), Cd40 = c(0.209298226865743, 0.0680133680665235, 0.0233440779283003, 0.191986570448918, 0.128784506152115, NA, NA, NA, NA, NA)), .Names = c("Groups", "Pairs", "Actb", "Bcl2", "Bcl6", "Ccl5", "Ccr7", "Cd27", "Cd28", "Cd40"), class = "data.frame", row.names = c(NA, -10L)) results=apply(saradata[,4:length(saradata)], 2, function(x) wilcox.test(x~saradata$Groups,paired=TRUE,alternative="two.sided")) # Extract p-values from saved results lapply(results, function(x) x[['p.value']]) -- I drink your milkshake. On Tue, Jul 10, 2012 at 3:13 PM, R. Michael Weylandt <[hidden email]> <[hidden email]> wrote: > Untested, I think you need to lapply() over thing with some sort of extractor: > > lapply(thing, function(x) x[['p.value']]) > > Michael > > On Jul 10, 2012, at 3:45 PM, Oxenstierna <[hidden email]> wrote: > >> This works very well--thanks so much. >> >> By way of extension: how would one extract elements from the result object? >> >> For example: >> >> thing<=apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) >> >> summary(thing)$p.value >> >> Does not provide a list of p-values as it would in a regression object. >> Ideally, I would like to be able to extract the W score and p-value by >> A,B,C,... >> >> Any ideas greatly appreciated! >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Mann-Whitney-by-group-tp4635618p4636055.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hello Arun,
Thanks so much--while I haven't tried it yet, this seems as though it will be an excellent way to skip the categories (Actb, etc) that have missing values (NAs). The second part of my question, which I didn't ask before: instead of skipping, is there a way to continue with the wilcoxons even if there are NA's? In this dataset, Groups is the two-level factor that sets up the pairwise comparison, and Pairs is a 5-level factor that pairs together each instance within the groups. For Actb, for example, the 3rd and 4th instance of Group 2 are missing. How would I automate the procedure of excluding the 3rd and 4th instance in Group 1, and then running wilcox.test on the remaining three instances (1,2, and 5)? The exclusions will vary by category (saradata[,4:10]) in the sample I provided. Many thanks. David -- I drink your milkshake. On Mon, Sep 9, 2013 at 5:42 AM, arun <[hidden email]> wrote: > Hi, > > You may try: > unlist(lapply(sara.data[,4:length(sara.data)],function(x) {x1<-tapply(is.na(x),list(sara.data$Groups),FUN=sum); if(x1[1]!=x1[2]) NULL else wilcox.test(x~sara.data$Groups,paired=TRUE,alternative="two.sided")$p.value})) > # Bcl2 Ccl5 Cd27 Cd28 > #0.1250 0.1875 0.8125 0.8125 > > A.K. > > > > ----- Original Message ----- > From: David Chertudi <[hidden email]> > To: R. Michael Weylandt <[hidden email]> > Cc: "[hidden email]" <[hidden email]> > Sent: Sunday, September 8, 2013 11:13 PM > Subject: Re: [R] Mann-Whitney by group > > The time has come to shake the cobwebs off of this analysis. I have > more data now and need to run the same tests, the same way as above. > My question is this--some of the pairs include NAs, and so are gumming > up the works. I'm not sure how to exclude them using the lhs ~ rhs > syntax. Any ideas here? > > Many thanks, as usual. Data and syntax below. > > David > > > sara.data=structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > 2L), Pairs = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), Actb = > c(2.2734065552504, > 1.69621901296377, 1.07836251830772, 1.46314001007756, 1.76537848566894, > 0.689064098855446, 0.462820758081676, NA, NA, 2.22119254577143 > ), Bcl2 = c(0.12954440593121, 0.0902306425601895, 0.219044589401239, > 0.103793432483774, 0.119463699676088, 0.112179645963861, 0.136910739776212, > 0.433953247043377, 0.401539702575691, 0.352218179109105), Bcl6 = > c(1.78964109252879, > 1.56011379020288, 0.750029838175481, 1.80189108290585, 1.09372632818505, > 0.275815381178548, 0.785680035605173, NA, NA, 0.311865838414934 > ), Ccl5 = c(0.140676314771846, 0.103227179167928, 0.210718001043218, > 0.101548390950462, 0.140625579216236, 0.218846310909471, 0.132902076760262, > 0.35763207205821, 0.320733407260836, 0.0983004520984843), Ccr7 = > c(0.116274608274044, > 0.0623582657156311, 0.111654418769019, 0.110221412062233, 0.0646423645035265, > 0.0924168984762384, 0.0322085814124609, NA, NA, 0.0315246913534493 > ), Cd27 = c(0.599332581326994, 0.536313800392409, 0.776647646561188, > 0.511624999868611, 0.481254858629634, 0.365428233004039, 0.30446734845483, > 0.880574935388197, 1.19362122336861, 0.121581553928565), Cd28 = > c(0.8476006082089, > 0.976603410250505, 0.976783190446247, 0.8288118647421, 0.854672311976977, > 0.576719839424659, 0.4221908111396, 1.22864113852622, 5.19562728663742, > 0.401610355554234), Cd40 = c(0.209298226865743, 0.0680133680665235, > 0.0233440779283003, 0.191986570448918, 0.128784506152115, NA, > NA, NA, NA, NA)), .Names = c("Groups", "Pairs", "Actb", "Bcl2", > "Bcl6", "Ccl5", "Ccr7", "Cd27", "Cd28", "Cd40"), class = "data.frame", > row.names = c(NA, > -10L)) > > results=apply(saradata[,4:length(saradata)], 2, > function(x) > > wilcox.test(x~saradata$Groups,paired=TRUE,alternative="two.sided")) > > # Extract p-values from saved results > lapply(results, function(x) x[['p.value']]) > > > -- > I drink your milkshake. > > > On Tue, Jul 10, 2012 at 3:13 PM, R. Michael Weylandt > <[hidden email]> <[hidden email]> wrote: >> Untested, I think you need to lapply() over thing with some sort of extractor: >> >> lapply(thing, function(x) x[['p.value']]) >> >> Michael >> >> On Jul 10, 2012, at 3:45 PM, Oxenstierna <[hidden email]> wrote: >> >>> This works very well--thanks so much. >>> >>> By way of extension: how would one extract elements from the result object? >>> >>> For example: >>> >>> thing<=apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) >>> >>> summary(thing)$p.value >>> >>> Does not provide a list of p-values as it would in a regression object. >>> Ideally, I would like to be able to extract the W score and p-value by >>> A,B,C,... >>> >>> Any ideas greatly appreciated! >>> >>> >>> -- >>> View this message in context: http://r.789695.n4.nabble.com/Mann-Whitney-by-group-tp4635618p4636055.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi David: Try: unlist(lapply(sara.data[,4:length(sara.data)],function(x){indx1<- 1:(length(x)/2); indx2<- ((length(x)/2)+1):length(x); x1<-data.frame(FH=x[indx1],LH=x[indx2],Group1=sara.data$Group[indx1],Group2=sara.data$Group[indx2]); x2<- x1[!(is.na(x1$FH) | is.na(x1$LH)),]; if(nrow(x2)==0) NULL else with(x2,wilcox.test(FH,LH,paired=TRUE, alternative="two.sided")$p.value) })) # Bcl2 Bcl6 Ccl5 Ccr7 Cd27 Cd28 #0.1250 0.2500 0.1875 0.2500 0.8125 0.8125 #Cd40 Group 2 values were all NAs, so it was not tested. A.K. ----- Original Message ----- From: David Chertudi <[hidden email]> To: arun <[hidden email]> Cc: R help <[hidden email]> Sent: Monday, September 9, 2013 2:29 PM Subject: Re: [R] Mann-Whitney by group Hello Arun, Thanks so much--while I haven't tried it yet, this seems as though it will be an excellent way to skip the categories (Actb, etc) that have missing values (NAs). The second part of my question, which I didn't ask before: instead of skipping, is there a way to continue with the wilcoxons even if there are NA's? In this dataset, Groups is the two-level factor that sets up the pairwise comparison, and Pairs is a 5-level factor that pairs together each instance within the groups. For Actb, for example, the 3rd and 4th instance of Group 2 are missing. How would I automate the procedure of excluding the 3rd and 4th instance in Group 1, and then running wilcox.test on the remaining three instances (1,2, and 5)? The exclusions will vary by category (saradata[,4:10]) in the sample I provided. Many thanks. David -- I drink your milkshake. On Mon, Sep 9, 2013 at 5:42 AM, arun <[hidden email]> wrote: > Hi, > > You may try: > unlist(lapply(sara.data[,4:length(sara.data)],function(x) {x1<-tapply(is.na(x),list(sara.data$Groups),FUN=sum); if(x1[1]!=x1[2]) NULL else wilcox.test(x~sara.data$Groups,paired=TRUE,alternative="two.sided")$p.value})) > # Bcl2 Ccl5 Cd27 Cd28 > #0.1250 0.1875 0.8125 0.8125 > > A.K. > > > > ----- Original Message ----- > From: David Chertudi <[hidden email]> > To: R. Michael Weylandt <[hidden email]> > Cc: "[hidden email]" <[hidden email]> > Sent: Sunday, September 8, 2013 11:13 PM > Subject: Re: [R] Mann-Whitney by group > > The time has come to shake the cobwebs off of this analysis. I have > more data now and need to run the same tests, the same way as above. > My question is this--some of the pairs include NAs, and so are gumming > up the works. I'm not sure how to exclude them using the lhs ~ rhs > syntax. Any ideas here? > > Many thanks, as usual. Data and syntax below. > > David > > > sara.data=structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, > 2L), Pairs = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), Actb = > c(2.2734065552504, > 1.69621901296377, 1.07836251830772, 1.46314001007756, 1.76537848566894, > 0.689064098855446, 0.462820758081676, NA, NA, 2.22119254577143 > ), Bcl2 = c(0.12954440593121, 0.0902306425601895, 0.219044589401239, > 0.103793432483774, 0.119463699676088, 0.112179645963861, 0.136910739776212, > 0.433953247043377, 0.401539702575691, 0.352218179109105), Bcl6 = > c(1.78964109252879, > 1.56011379020288, 0.750029838175481, 1.80189108290585, 1.09372632818505, > 0.275815381178548, 0.785680035605173, NA, NA, 0.311865838414934 > ), Ccl5 = c(0.140676314771846, 0.103227179167928, 0.210718001043218, > 0.101548390950462, 0.140625579216236, 0.218846310909471, 0.132902076760262, > 0.35763207205821, 0.320733407260836, 0.0983004520984843), Ccr7 = > c(0.116274608274044, > 0.0623582657156311, 0.111654418769019, 0.110221412062233, 0.0646423645035265, > 0.0924168984762384, 0.0322085814124609, NA, NA, 0.0315246913534493 > ), Cd27 = c(0.599332581326994, 0.536313800392409, 0.776647646561188, > 0.511624999868611, 0.481254858629634, 0.365428233004039, 0.30446734845483, > 0.880574935388197, 1.19362122336861, 0.121581553928565), Cd28 = > c(0.8476006082089, > 0.976603410250505, 0.976783190446247, 0.8288118647421, 0.854672311976977, > 0.576719839424659, 0.4221908111396, 1.22864113852622, 5.19562728663742, > 0.401610355554234), Cd40 = c(0.209298226865743, 0.0680133680665235, > 0.0233440779283003, 0.191986570448918, 0.128784506152115, NA, > NA, NA, NA, NA)), .Names = c("Groups", "Pairs", "Actb", "Bcl2", > "Bcl6", "Ccl5", "Ccr7", "Cd27", "Cd28", "Cd40"), class = "data.frame", > row.names = c(NA, > -10L)) > > results=apply(saradata[,4:length(saradata)], 2, > function(x) > > wilcox.test(x~saradata$Groups,paired=TRUE,alternative="two.sided")) > > # Extract p-values from saved results > lapply(results, function(x) x[['p.value']]) > > > -- > I drink your milkshake. > > > On Tue, Jul 10, 2012 at 3:13 PM, R. Michael Weylandt > <[hidden email]> <[hidden email]> wrote: >> Untested, I think you need to lapply() over thing with some sort of extractor: >> >> lapply(thing, function(x) x[['p.value']]) >> >> Michael >> >> On Jul 10, 2012, at 3:45 PM, Oxenstierna <[hidden email]> wrote: >> >>> This works very well--thanks so much. >>> >>> By way of extension: how would one extract elements from the result object? >>> >>> For example: >>> >>> thing<=apply(Dtb[,3:10], 2, function(x) wilcox.test(x~Dtb$Group)) >>> >>> summary(thing)$p.value >>> >>> Does not provide a list of p-values as it would in a regression object. >>> Ideally, I would like to be able to extract the W score and p-value by >>> A,B,C,... >>> >>> Any ideas greatly appreciated! >>> >>> >>> -- >>> View this message in context: http://r.789695.n4.nabble.com/Mann-Whitney-by-group-tp4635618p4636055.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |