|
Hi,
I need to create a data frame containing the results of a number of ANOVA's but I'm having some trouble setting it up (some being enough for me to spend 3 days trying with no progress and be left staring in to the abyss which some people call a weekend, and what I will call 2 quiet days in the office...) The response variable is V. I need to do an ANOVA for each G. The fixed effect will be S ("M" or "F") whilst also having the S*L and L ("1" or "2") as random effects. The anova of G AB01 would be some thing like: y=V, fixed=S, Random= L & L*S... The new data frame would then compile all the variance components for each G, including total and residual variance. here is the example dataframe using 2 G's, with 2 S values, 2 L, and 2 replicates for each. df<-as.data.frame(c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02")) names(df)<-"G" df$L<-as.numeric(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)) df$S<-(c("m","m","f","f","m","m","f","f","m","m","f","f","m","m","f","f")) df$R<-as.numeric(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)) df$V<-as.numeric(c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) It is worth noting the actual data this will be used on is >10000*G's, 2*S's, 40*L's, and 2*R's so hand writing an ANOVA for each G is not preferred... Here is a twitter link to a crudely drawn illustration of the aim illustrated (using 3 Ls) in case I have confused you with words (through my own poor understanding): https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large Thanks in advance for your time, Rob (please save my weekend...) |
|
Rob:
On Fri, May 4, 2012 at 9:18 AM, robgriffin247 <[hidden email]> wrote: > Hi, > I need to create a data frame containing the results of a number of ANOVA's > but I'm having some trouble setting it up (some being enough for me to spend > 3 days trying with no progress and be left staring in to the abyss which > some people call a weekend, and what I will call 2 quiet days in the > office...) I would suggest staying out of the office and consulting a local statistician Monday morning. As a poor second choice, post on a statistics Help list (e.g. stats.stackexchange.com). I haven't gone through your post in detail, but it appears to have little to do with R and a **lot** to do with your lack of statistical understanding. It appears that you need to formulate a scientifically appropriate mixed effect model (the problem is never "how to set up an anova"), and interaction with a local consultant is the best way to do that. I suppose you could also post this on the r-sig-mixed-models list, as they often go beyond the R issues to the statistical modeling. But remote consulting is a risky business, as despite the best of intentions on both sides, incomplete or mis- communication can lead to errors of the third kind (right answer -- wrong question). Best, Bert > > The response variable is *V*. > I need to do an ANOVA for each *G*. > The fixed effect will be *S* ("M" or "F") whilst also having the *S*L* and > *L* ("1" or "2") as random effects. > The anova of *G* /AB01 /would be some thing like: y=V, fixed=S, Random= L & > L*S... > The new data frame would then compile all the variance components for each > G, including total and residual variance. > > here is the example dataframe using 2 G's, with 2 S values, 2 L, and 2 > replicates for each. > > df<-as.data.frame(c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02")) > names(df)<-"G" > df$L<-as.numeric(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)) > df$S<-(c("m","m","f","f","m","m","f","f","m","m","f","f","m","m","f","f")) > df$R<-as.numeric(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)) > df$V<-as.numeric(c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) > > It is worth noting the actual data this will be used on is >10000*G's, > 2*S's, 40*L's, and 2*R's so hand writing an ANOVA for each G is not > preferred... > > Here is a twitter link to a crudely drawn illustration of the aim > illustrated (using 3 Ls) in case I have confused you with words (through my > own poor understanding): > https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large > https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large > > Thanks in advance for your time, > Rob > (please save my weekend...) > > -- > View this message in context: http://r.789695.n4.nabble.com/ANOVA-problem-tp4609062.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
The following constructs the data.frame that I think the original
poster asked for. I don't understand the graph, so I didn't attempt it. I agree with Bert that this might not make sense. Specifically, the distinction between AB01 and AB02 is not modeled, and that is probably the critical factor. I made several style changes in the dataset. The name "df" is a function name, and its use as a data.frame name will lead the reader to confusion. I constructed the data.frame directly, not by constructing vectors and putting them together. I declared the factor variables to be factors. For factors with more than two levels, this assures that they get the right number of degrees of freedom in the anova table. rg <- data.frame(G=c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01", "AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02"), L=factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)), S=factor(c("m","m","f","f","m","m","f","f", "m","m","f","f","m","m","f","f")), R=factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)), V=c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) summary(aov(V ~ S * L, data=rg[1:8,])) ## no Error term, to be sure we understand rg.aov <- lapply(split(rg, rg$G), function(x) aov(V ~ S*L + Error(L), data=x)) summary(rg.aov[[1]]) ## same Sums of Squares as above, but now with Error term anovaSumsOfSquares <- function(list.of.aov.objects) { t(sapply(rg.aov, function(y) { tmpy <- sapply(y[-1], function(x) { tmp <- summary(x)[[1]] nt <- sub(" +$", "", rownames(tmp)) result <- tmp[,"Sum Sq"] names(result) <- nt result}) c(tmpy[[1]], tmpy[[2]]) })) } anovaSumsOfSquares(rg.aov) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Bert Gunter
Gee Bert, thanks for the really helpful tip. But if you read my post properly you'll note that I do know how ANOVA's work.
> The anova of *G* /AB01 /would be some thing like: y=V, fixed=S, Random= L & > L*S... I didn't want to show a full model formula in case it led people do the wrong path to solving the real problem (seeing as there are several ways to create mixed effects models which for some reason may not work with solutions to the problem) which is how to actually get R to do ANOVA to analyse the data for each value of G in the data frame given in the example and then get R to give me the output data frame I desire, ergo, it is indeed an R problem. Perhaps you should read up on the R mailing list posting guidelines: "Questions about statistics: The R mailing lists are primarily intended for questions and discussion about the R software. However, questions about statistical methodology are sometimes posted. If the question is well-asked and of interest to someone on the list, it may elicit an informative up-to-date answer...." so not rude and sarcastic ones then.. I will admit statistics is an element of the question I have posed, but it is entirely in an R based context. My understanding of statistics is perfectly acceptable thanks to numerous courses taken through my undergraduate, masters, and PhD studies. If you're not willing to help someone solve their problems then don't bother posting - do you have nothing better to do with your time? I would also suggest that my post has a lot more to do with R than your post just moments ago which is solely about statistics and is of no relevance to the R help forum. http://r.789695.n4.nabble.com/Off-Topic-Crime-Statistics-Don-t-Pay-td4609170.html I know you regularly post on this forum and are often helpful, but sometimes unhelpful posts are unnecessary. Rant over. As for everyone else: Firstly, sorry about the above, it's been a long week. Secondly, I would still really like some helpful answers from people who are interested in helping me, and more constructive replies will be greatly appreciated. On Fri, May 4, 2012 at 9:18 AM, robgriffin247 <[hidden email]> wrote: > Hi, > I need to create a data frame containing the results of a number of ANOVA's > but I'm having some trouble setting it up (some being enough for me to spend > 3 days trying with no progress and be left staring in to the abyss which > some people call a weekend, and what I will call 2 quiet days in the > office...) I would suggest staying out of the office and consulting a local statistician Monday morning. As a poor second choice, post on a statistics Help list (e.g. stats.stackexchange.com). I haven't gone through your post in detail, but it appears to have little to do with R and a **lot** to do with your lack of statistical understanding. It appears that you need to formulate a scientifically appropriate mixed effect model (the problem is never "how to set up an anova"), and interaction with a local consultant is the best way to do that. I suppose you could also post this on the r-sig-mixed-models list, as they often go beyond the R issues to the statistical modeling. But remote consulting is a risky business, as despite the best of intentions on both sides, incomplete or mis- communication can lead to errors of the third kind (right answer -- wrong question). Best, Bert > > The response variable is *V*. > I need to do an ANOVA for each *G*. > The fixed effect will be *S* ("M" or "F") whilst also having the *S*L* and > *L* ("1" or "2") as random effects. > The anova of *G* /AB01 /would be some thing like: y=V, fixed=S, Random= L & > L*S... > The new data frame would then compile all the variance components for each > G, including total and residual variance. > > here is the example dataframe using 2 G's, with 2 S values, 2 L, and 2 > replicates for each. > > df<-as.data.frame(c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02")) > names(df)<-"G" > df$L<-as.numeric(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)) > df$S<-(c("m","m","f","f","m","m","f","f","m","m","f","f","m","m","f","f")) > df$R<-as.numeric(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)) > df$V<-as.numeric(c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) > > It is worth noting the actual data this will be used on is >10000*G's, > 2*S's, 40*L's, and 2*R's so hand writing an ANOVA for each G is not > preferred... > > Here is a twitter link to a crudely drawn illustration of the aim > illustrated (using 3 Ls) in case I have confused you with words (through my > own poor understanding): > https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large > https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large > > Thanks in advance for your time, > Rob > (please save my weekend...) > > -- > View this message in context: http://r.789695.n4.nabble.com/ANOVA-problem-tp4609062.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Richard M. Heiberger
Thanks Richard, that works great on the test data,
I'll try it out on the full dataset now and let you know how it goes. Thanks a lot! |
| Powered by Nabble | Edit this page |
