difficulties computing a simple anova Classic List Threaded 5 messages Open this post in threaded view
|

difficulties computing a simple anova

 My grasp of R and statistics are both seriously lacking, so if this question is completely naive, I apologize in advance. I've hunted for a couple hours on the internet and none of the methods I've found have produced the result I'm looking for. I'm currently a student in a Statistics class and we are learning the ANOVA. We had to do one by hand and then reproduce our work in SAS. I really like the idea of understanding R however and would like to reproduce the solution in R if possible. Where I'm at now is this little program: http://odin.himinbi.org/classes/psy304b/spider_analysis.rThe program calculates an anova manually (correctly, I'm pretty sure, it agrees with the same numbers in excel). The answer that it comes up with doesn't agree with any of the numbers I can get either the aov or anova functions to produce. Can anyone help me with simply the method to compute a one-way anova? Well, specifically how to replicate the sort of anova people learn in an intro to statistics class. All of the degrees of freedom are off from what I expect them to be (they're all 1). (The original problem, should it help in understanding my question, is at: http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it will likely look pretty funky if your browser doesn't support mathml (firefox does).) Will The program is as follows: library(foreign) # spiderdata <- read.csv("spider_data.csv") spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6), Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17), Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) summary(spiderdata) # Compute a one-way ANOVA by hand J = length(spiderdata) sqdata <- data.frame((spiderdata - mean(spiderdata)) ^ 2) for(j in 2:J) { sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2) } sqdata N = 0 for(j in 1:J) { N = N + length(sqdata[[j]]) } SSW = sum(sqdata) MSW = SSW / (N - J) SSB = 0 for(j in 1:(length(spiderdata))) { SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[] - (sum(spiderdata) / N)) ^ 2) } MSB = SSB / (J - 1) F = MSB / MSW f_prob = pf(F, J - 1, N - J) reject_point = qf(.95, J - 1, N - J) cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F, ", P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "") anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata)) aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: difficulties computing a simple anova

 Your data setup is wrong. You have one factor (Drug) with 3 levels (Zoloft, Naltrexone, Valium). So your data should be: spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"), each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12, 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) You should be able to take it from there. (Since this is a homework problem, I'm being intentionally vague.) cheers, Simon. On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote: > My grasp of R and statistics are both seriously lacking, so if this question > is completely naive, I apologize in advance. I've hunted for a couple hours > on the internet and none of the methods I've found have produced the result > I'm looking for. > > I'm currently a student in a Statistics class and we are learning the ANOVA. > We had to do one by hand and then reproduce our work in SAS. I really like > the idea of understanding R however and would like to reproduce the solution > in R if possible. > > Where I'm at now is this little program: > http://odin.himinbi.org/classes/psy304b/spider_analysis.r> > The program calculates an anova manually (correctly, I'm pretty sure, it > agrees with the same numbers in excel). The answer that it comes up with > doesn't agree with any of the numbers I can get either the aov or anova > functions to produce. > > Can anyone help me with simply the method to compute a one-way anova? Well, > specifically how to replicate the sort of anova people learn in an intro to > statistics class. All of the degrees of freedom are off from what I expect > them to be (they're all 1). > > (The original problem, should it help in understanding my question, is at: > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it will > likely look pretty funky if your browser doesn't support mathml (firefox > does).) > > Will > > The program is as follows: > > library(foreign) > # spiderdata <- read.csv("spider_data.csv") > > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6), > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17), > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) > > summary(spiderdata) > > # Compute a one-way ANOVA by hand > > J = length(spiderdata) > > sqdata <- data.frame((spiderdata - mean(spiderdata)) ^ 2) > for(j in 2:J) { > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2) > } > sqdata > > N = 0 > for(j in 1:J) { > N = N + length(sqdata[[j]]) > } > > SSW = sum(sqdata) > MSW = SSW / (N - J) > SSB = 0 > for(j in 1:(length(spiderdata))) { > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[] - > (sum(spiderdata) / N)) ^ 2) > } > MSB = SSB / (J - 1) > > F = MSB / MSW > f_prob = pf(F, J - 1, N - J) > reject_point = qf(.95, J - 1, N - J) > > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F, ", > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "") > > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata)) > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata) > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320 Goddard Building (8) T: +61 7 3365 2506 http://www.uq.edu.au/~uqsblombemail: S.Blomberg1_at_uq.edu.au Policies: 1.  I will NOT analyse your data for you. 2.  Your deadline is your problem. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: difficulties computing a simple anova

 Excellent. I was able to do the analysis with no problem. I just had to do some twiddling with the inputs to get things from the CSV to the format you described. When the data is in that format, I would like to be able to say, "create a column which is each square of the difference of each response with the mean of all the responses for that drug." Is there a simple way to do that? What I am doing currently is: squares = c() for(drug in unique(drugs)) {    responses = subset(spiderdata, Drug == drug, select = Response)    squares = append(squares, (responses - mean(responses)) ^ 2) } spiderdata <- cbind(spiderdata, SquaresWithin = squares) This works because the elements are sorted by drug and the drugs vector has the drug names in the same order. If my data were mixed up however, this method would misalign results. Is there any way to say something like: spiderdata\$SquaresWithin = (spiderdata\$Response - mean(subset(spiderdata, Drug == *spiderdata\$Drug of current row*, select = Response))) ^ 2 Just using Drug == spiderdata\$Drug doesn't seem to work and I think it is because Drug is a factor? I suppose I could always sort on the element I was generating a new column for and cause them to align, but I was wondering if the syntax would allow me to refer to elements of the row a computation. I was trying it with tapply which it sounds like should work, but this is not the method: sq_within = function(row) { (row\$Response - mean(subset(spiderdata, row = row\$Drug))) ^ 2 } tapply(spiderdata, spiderdata\$Drug, fun = sq_within) It doesn't like the first and second arguments to be different lengths. The first argument is the data to be iterated over though and I think my understanding of how the function works is somehow fundamentally off. I've had a hard time finding clear examples where tapply isn't used with a function that generates a different number of outputs than there are inputs. Thank you again for the help. I'm liking R, just still getting the hang of some of the conceptual stuff. Will P.S. You don't have to be vague for the sake of my homework. The assignment was to do this in SAS which I've already completed. I don't plan on owning SAS though and I would like to be able to do statistics in the future, so learning R is just for my benefit. On Jan 30, 2008 8:11 PM, Simon Blomberg <[hidden email]> wrote: > Your data setup is wrong. You have one factor (Drug) with 3 levels > (Zoloft, Naltrexone, Valium). So your data should be: > > spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"), > each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12, > 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) > > You should be able to take it from there. (Since this is a homework > problem, I'm being intentionally vague.) > > cheers, > > Simon. > > On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote: > > My grasp of R and statistics are both seriously lacking, so if this > question > > is completely naive, I apologize in advance. I've hunted for a couple > hours > > on the internet and none of the methods I've found have produced the > result > > I'm looking for. > > > > I'm currently a student in a Statistics class and we are learning the > ANOVA. > > We had to do one by hand and then reproduce our work in SAS. I really > like > > the idea of understanding R however and would like to reproduce the > solution > > in R if possible. > > > > Where I'm at now is this little program: > > http://odin.himinbi.org/classes/psy304b/spider_analysis.r> > > > The program calculates an anova manually (correctly, I'm pretty sure, it > > agrees with the same numbers in excel). The answer that it comes up with > > doesn't agree with any of the numbers I can get either the aov or anova > > functions to produce. > > > > Can anyone help me with simply the method to compute a one-way anova? > Well, > > specifically how to replicate the sort of anova people learn in an intro > to > > statistics class. All of the degrees of freedom are off from what I > expect > > them to be (they're all 1). > > > > (The original problem, should it help in understanding my question, is > at: > > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it > will > > likely look pretty funky if your browser doesn't support mathml (firefox > > does).) > > > > Will > > > > The program is as follows: > > > > library(foreign) > > # spiderdata <- read.csv("spider_data.csv") > > > > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6), > > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17), > > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9)) > > > > summary(spiderdata) > > > > # Compute a one-way ANOVA by hand > > > > J = length(spiderdata) > > > > sqdata <- data.frame((spiderdata - mean(spiderdata)) ^ 2) > > for(j in 2:J) { > > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2) > > } > > sqdata > > > > N = 0 > > for(j in 1:J) { > > N = N + length(sqdata[[j]]) > > } > > > > SSW = sum(sqdata) > > MSW = SSW / (N - J) > > SSB = 0 > > for(j in 1:(length(spiderdata))) { > > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[] - > > (sum(spiderdata) / N)) ^ 2) > > } > > MSB = SSB / (J - 1) > > > > F = MSB / MSW > > f_prob = pf(F, J - 1, N - J) > > reject_point = qf(.95, J - 1, N - J) > > > > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F, > ", > > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "") > > > > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata)) > > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata) > > > >       [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- > Simon Blomberg, BSc (Hons), PhD, MAppStat. > Lecturer and Consultant Statistician > Faculty of Biological and Chemical Sciences > The University of Queensland > St. Lucia Queensland 4072 > Australia > Room 320 Goddard Building (8) > T: +61 7 3365 2506 > http://www.uq.edu.au/~uqsblomb  > email: S.Blomberg1_at_uq.edu.au > > Policies: > 1.  I will NOT analyse your data for you. > 2.  Your deadline is your problem. > > The combination of some data and an aching desire for > an answer does not ensure that a reasonable answer can > be extracted from a given body of data. - John Tukey. > >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.