# ANOVA problem

5 messages
Open this post in threaded view
|
Report Content as Inappropriate

## ANOVA problem

 Hi, I need to create a data frame containing the results of a number of ANOVA's but I'm having some trouble setting it up (some being enough for me to spend 3 days trying with no progress and be left staring in to the abyss which some people call a weekend, and what I will call 2 quiet days in the office...) The response variable is V. I need to do an ANOVA for each G. The fixed effect will be S ("M" or "F") whilst also having the S*L and L ("1" or "2") as random effects. The anova of G AB01 would be some thing like: y=V, fixed=S, Random= L & L*S... The new data frame would then compile all the variance components for each G, including total and residual variance. here is the example dataframe using 2 G's, with 2 S values, 2 L, and 2 replicates for each. df<-as.data.frame(c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02")) names(df)<-"G" df\$L<-as.numeric(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)) df\$S<-(c("m","m","f","f","m","m","f","f","m","m","f","f","m","m","f","f")) df\$R<-as.numeric(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)) df\$V<-as.numeric(c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) It is worth noting the actual data this will be used on is >10000*G's,  2*S's,  40*L's,  and 2*R's so hand writing an ANOVA for each G is not preferred... Here is a twitter link to a crudely drawn illustration of the aim illustrated (using 3 Ls) in case I have confused you with words (through my own poor understanding): https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large Thanks in advance for your time, Rob (please save my weekend...)
Open this post in threaded view
|
Report Content as Inappropriate

## Re: ANOVA problem

 Rob: On Fri, May 4, 2012 at 9:18 AM, robgriffin247 <[hidden email]> wrote: > Hi, > I need to create a data frame containing the results of a number of ANOVA's > but I'm having some trouble setting it up (some being enough for me to spend > 3 days trying with no progress and be left staring in to the abyss which > some people call a weekend, and what I will call 2 quiet days in the > office...) I would suggest staying out of the office and consulting a local statistician Monday morning. As a poor second choice, post on a statistics Help list (e.g. stats.stackexchange.com). I haven't gone through your post in detail, but it appears to have little to do with R and a **lot** to do with your lack of statistical understanding. It appears that you need to formulate a scientifically appropriate mixed effect model (the problem is never "how to set up an anova"), and interaction with a local consultant is the best way to do that. I suppose you could also post this on the r-sig-mixed-models list, as they often go beyond the R issues to the statistical modeling. But remote consulting is a risky business, as despite the best of intentions on both sides, incomplete or mis- communication can lead to errors of the third kind (right answer -- wrong question). Best, Bert > > The response variable is *V*. > I need to do an ANOVA for each *G*. > The fixed effect will be *S* ("M" or "F") whilst also having the *S*L* and > *L* ("1" or "2") as random effects. > The anova of *G* /AB01 /would be some thing like: y=V, fixed=S, Random= L & > L*S... > The new data frame would then compile all the variance components for each > G, including total and residual variance. > > here is the example dataframe using 2 G's, with 2 S values, 2 L, and 2 > replicates for each. > > df<-as.data.frame(c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02")) > names(df)<-"G" > df\$L<-as.numeric(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)) > df\$S<-(c("m","m","f","f","m","m","f","f","m","m","f","f","m","m","f","f")) > df\$R<-as.numeric(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)) > df\$V<-as.numeric(c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) > > It is worth noting the actual data this will be used on is >10000*G's, > 2*S's,  40*L's,  and 2*R's so hand writing an ANOVA for each G is not > preferred... > > Here is a twitter link to a crudely drawn illustration of the aim > illustrated (using 3 Ls) in case I have confused you with words (through my > own poor understanding): > https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large> https://twitter.com/#!/robgriffin247/status/198446041316593666/photo/1/large> > Thanks in advance for your time, > Rob > (please save my weekend...) > > -- > View this message in context: http://r.789695.n4.nabble.com/ANOVA-problem-tp4609062.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: ANOVA problem

 The following constructs the data.frame that I think the original poster asked for. I don't understand the graph, so I didn't attempt it. I agree with Bert that this might not make sense.  Specifically, the distinction between AB01 and AB02 is not modeled, and that is probably the critical factor. I made several style changes in the dataset.  The name "df" is a function name, and its use as a data.frame name will lead the reader to confusion. I constructed the data.frame directly, not by constructing vectors and putting them together. I declared the factor variables to be factors.  For factors with more than two levels, this assures that they get the right number of degrees of freedom in the anova table. rg <- data.frame(G=c("AB01","AB01","AB01","AB01","AB01","AB01","AB01","AB01",                    "AB02","AB02","AB02","AB02","AB02","AB02","AB02","AB02"),                  L=factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),                  S=factor(c("m","m","f","f","m","m","f","f",                    "m","m","f","f","m","m","f","f")),                  R=factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)),                  V=c(1,2,12,21,5,6,12,34,1,6,52,41,5,43,13,24)) summary(aov(V ~ S * L, data=rg[1:8,])) ## no Error term, to be sure we understand rg.aov <- lapply(split(rg, rg\$G),                  function(x) aov(V ~ S*L + Error(L), data=x)) summary(rg.aov[[1]])  ## same Sums of Squares as above, but now with Error term anovaSumsOfSquares <- function(list.of.aov.objects) {   t(sapply(rg.aov, function(y) {     tmpy <-       sapply(y[-1], function(x) {         tmp <- summary(x)[[1]]         nt <- sub(" +\$", "", rownames(tmp))         result <- tmp[,"Sum Sq"]         names(result) <- nt         result})     c(tmpy[[1]], tmpy[[2]])   })) } anovaSumsOfSquares(rg.aov) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate