difficulties computing a simple anova

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

difficulties computing a simple anova

Will Holcomb
My grasp of R and statistics are both seriously lacking, so if this question
is completely naive, I apologize in advance. I've hunted for a couple hours
on the internet and none of the methods I've found have produced the result
I'm looking for.

I'm currently a student in a Statistics class and we are learning the ANOVA.
We had to do one by hand and then reproduce our work in SAS. I really like
the idea of understanding R however and would like to reproduce the solution
in R if possible.

Where I'm at now is this little program:
http://odin.himinbi.org/classes/psy304b/spider_analysis.r

The program calculates an anova manually (correctly, I'm pretty sure, it
agrees with the same numbers in excel). The answer that it comes up with
doesn't agree with any of the numbers I can get either the aov or anova
functions to produce.

Can anyone help me with simply the method to compute a one-way anova? Well,
specifically how to replicate the sort of anova people learn in an intro to
statistics class. All of the degrees of freedom are off from what I expect
them to be (they're all 1).

(The original problem, should it help in understanding my question, is at:
http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it will
likely look pretty funky if your browser doesn't support mathml (firefox
does).)

Will

The program is as follows:

library(foreign)
# spiderdata <- read.csv("spider_data.csv")

spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6),
Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))

summary(spiderdata)

# Compute a one-way ANOVA by hand

J = length(spiderdata)

sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
for(j in 2:J) {
sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
}
sqdata

N = 0
for(j in 1:J) {
N = N + length(sqdata[[j]])
}

SSW = sum(sqdata)
MSW = SSW / (N - J)
SSB = 0
for(j in 1:(length(spiderdata))) {
SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
(sum(spiderdata) / N)) ^ 2)
}
MSB = SSB / (J - 1)

F = MSB / MSW
f_prob = pf(F, J - 1, N - J)
reject_point = qf(.95, J - 1, N - J)

cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F, ",
P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")

anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: difficulties computing a simple anova

Simon Blomberg-4
Your data setup is wrong. You have one factor (Drug) with 3 levels
(Zoloft, Naltrexone, Valium). So your data should be:

spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))

You should be able to take it from there. (Since this is a homework
problem, I'm being intentionally vague.)

cheers,

Simon.

On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:

> My grasp of R and statistics are both seriously lacking, so if this question
> is completely naive, I apologize in advance. I've hunted for a couple hours
> on the internet and none of the methods I've found have produced the result
> I'm looking for.
>
> I'm currently a student in a Statistics class and we are learning the ANOVA.
> We had to do one by hand and then reproduce our work in SAS. I really like
> the idea of understanding R however and would like to reproduce the solution
> in R if possible.
>
> Where I'm at now is this little program:
> http://odin.himinbi.org/classes/psy304b/spider_analysis.r
>
> The program calculates an anova manually (correctly, I'm pretty sure, it
> agrees with the same numbers in excel). The answer that it comes up with
> doesn't agree with any of the numbers I can get either the aov or anova
> functions to produce.
>
> Can anyone help me with simply the method to compute a one-way anova? Well,
> specifically how to replicate the sort of anova people learn in an intro to
> statistics class. All of the degrees of freedom are off from what I expect
> them to be (they're all 1).
>
> (The original problem, should it help in understanding my question, is at:
> http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it will
> likely look pretty funky if your browser doesn't support mathml (firefox
> does).)
>
> Will
>
> The program is as follows:
>
> library(foreign)
> # spiderdata <- read.csv("spider_data.csv")
>
> spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6),
> Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
>
> summary(spiderdata)
>
> # Compute a one-way ANOVA by hand
>
> J = length(spiderdata)
>
> sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> for(j in 2:J) {
> sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> }
> sqdata
>
> N = 0
> for(j in 1:J) {
> N = N + length(sqdata[[j]])
> }
>
> SSW = sum(sqdata)
> MSW = SSW / (N - J)
> SSB = 0
> for(j in 1:(length(spiderdata))) {
> SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> (sum(spiderdata) / N)) ^ 2)
> }
> MSB = SSB / (J - 1)
>
> F = MSB / MSW
> f_prob = pf(F, J - 1, N - J)
> reject_point = qf(.95, J - 1, N - J)
>
> cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F, ",
> P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
>
> anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Simon Blomberg, BSc (Hons), PhD, MAppStat.
Lecturer and Consultant Statistician
Faculty of Biological and Chemical Sciences
The University of Queensland
St. Lucia Queensland 4072
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for
an answer does not ensure that a reasonable answer can
be extracted from a given body of data. - John Tukey.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: difficulties computing a simple anova

Will Holcomb
Excellent. I was able to do the analysis with no problem. I just had to do
some twiddling with the inputs to get things from the CSV to the format you
described.

When the data is in that format, I would like to be able to say, "create a
column which is each square of the difference of each response with the mean
of all the responses for that drug." Is there a simple way to do that? What
I am doing currently is:

squares = c()
for(drug in unique(drugs)) {
   responses = subset(spiderdata, Drug == drug, select = Response)
   squares = append(squares, (responses - mean(responses)) ^ 2)
}
spiderdata <- cbind(spiderdata, SquaresWithin = squares)

This works because the elements are sorted by drug and the drugs vector has
the drug names in the same order. If my data were mixed up however, this
method would misalign results. Is there any way to say something like:

spiderdata$SquaresWithin = (spiderdata$Response - mean(subset(spiderdata,
Drug == *spiderdata$Drug of current row*, select = Response))) ^ 2

Just using Drug == spiderdata$Drug doesn't seem to work and I think it is
because Drug is a factor?

I suppose I could always sort on the element I was generating a new column
for and cause them to align, but I was wondering if the syntax would allow
me to refer to elements of the row a computation. I was trying it with
tapply which it sounds like should work, but this is not the method:

sq_within = function(row) { (row$Response - mean(subset(spiderdata, row =
row$Drug))) ^ 2 }
tapply(spiderdata, spiderdata$Drug, fun = sq_within)

It doesn't like the first and second arguments to be different lengths. The
first argument is the data to be iterated over though and I think my
understanding of how the function works is somehow fundamentally off. I've
had a hard time finding clear examples where tapply isn't used with a
function that generates a different number of outputs than there are inputs.

Thank you again for the help. I'm liking R, just still getting the hang of
some of the conceptual stuff.

Will

P.S. You don't have to be vague for the sake of my homework. The assignment
was to do this in SAS which I've already completed. I don't plan on owning
SAS though and I would like to be able to do statistics in the future, so
learning R is just for my benefit.

On Jan 30, 2008 8:11 PM, Simon Blomberg <[hidden email]> wrote:

> Your data setup is wrong. You have one factor (Drug) with 3 levels
> (Zoloft, Naltrexone, Valium). So your data should be:
>
> spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
> each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
> 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
>
> You should be able to take it from there. (Since this is a homework
> problem, I'm being intentionally vague.)
>
> cheers,
>
> Simon.
>
> On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:
> > My grasp of R and statistics are both seriously lacking, so if this
> question
> > is completely naive, I apologize in advance. I've hunted for a couple
> hours
> > on the internet and none of the methods I've found have produced the
> result
> > I'm looking for.
> >
> > I'm currently a student in a Statistics class and we are learning the
> ANOVA.
> > We had to do one by hand and then reproduce our work in SAS. I really
> like
> > the idea of understanding R however and would like to reproduce the
> solution
> > in R if possible.
> >
> > Where I'm at now is this little program:
> > http://odin.himinbi.org/classes/psy304b/spider_analysis.r
> >
> > The program calculates an anova manually (correctly, I'm pretty sure, it
> > agrees with the same numbers in excel). The answer that it comes up with
> > doesn't agree with any of the numbers I can get either the aov or anova
> > functions to produce.
> >
> > Can anyone help me with simply the method to compute a one-way anova?
> Well,
> > specifically how to replicate the sort of anova people learn in an intro
> to
> > statistics class. All of the degrees of freedom are off from what I
> expect
> > them to be (they're all 1).
> >
> > (The original problem, should it help in understanding my question, is
> at:
> > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it
> will
> > likely look pretty funky if your browser doesn't support mathml (firefox
> > does).)
> >
> > Will
> >
> > The program is as follows:
> >
> > library(foreign)
> > # spiderdata <- read.csv("spider_data.csv")
> >
> > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6),
> > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> >
> > summary(spiderdata)
> >
> > # Compute a one-way ANOVA by hand
> >
> > J = length(spiderdata)
> >
> > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> > for(j in 2:J) {
> > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> > }
> > sqdata
> >
> > N = 0
> > for(j in 1:J) {
> > N = N + length(sqdata[[j]])
> > }
> >
> > SSW = sum(sqdata)
> > MSW = SSW / (N - J)
> > SSB = 0
> > for(j in 1:(length(spiderdata))) {
> > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> > (sum(spiderdata) / N)) ^ 2)
> > }
> > MSB = SSB / (J - 1)
> >
> > F = MSB / MSW
> > f_prob = pf(F, J - 1, N - J)
> > reject_point = qf(.95, J - 1, N - J)
> >
> > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F,
> ",
> > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
> >
> > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Simon Blomberg, BSc (Hons), PhD, MAppStat.
> Lecturer and Consultant Statistician
> Faculty of Biological and Chemical Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> Room 320 Goddard Building (8)
> T: +61 7 3365 2506
> http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb>
> email: S.Blomberg1_at_uq.edu.au
>
> Policies:
> 1.  I will NOT analyse your data for you.
> 2.  Your deadline is your problem.
>
> The combination of some data and an aching desire for
> an answer does not ensure that a reasonable answer can
> be extracted from a given body of data. - John Tukey.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: difficulties computing a simple anova

Greg Snow-2
Here are a couple of ways:
 
tmp.df <- data.frame( group=factor(sample(LETTERS[1:3], 25, replace=TRUE)),
  val = rnorm(25, 50, 5) )
 
tmp.df <- transform(tmp.df, ssq = ave(val, group,
   FUN=function(x) x-mean(x))^2 )
# another way
tmp.means <- tapply(tmp.df$val, tmp.df$group, FUN=mean)
tmp.df$ssq2 <- as.vector( ( tmp.df$val - tmp.means[tmp.df$group] ) ^2 )
# check that they both worked
with( tmp.df, all.equal(ssq,ssq2) )
 
Hope this helps,

________________________________

From: [hidden email] on behalf of Will Holcomb
Sent: Thu 1/31/2008 8:10 AM
To: Simon Blomberg
Cc: r-help
Subject: Re: [R] difficulties computing a simple anova



Excellent. I was able to do the analysis with no problem. I just had to do
some twiddling with the inputs to get things from the CSV to the format you
described.

When the data is in that format, I would like to be able to say, "create a
column which is each square of the difference of each response with the mean
of all the responses for that drug." Is there a simple way to do that? What
I am doing currently is:

squares = c()
for(drug in unique(drugs)) {
   responses = subset(spiderdata, Drug == drug, select = Response)
   squares = append(squares, (responses - mean(responses)) ^ 2)
}
spiderdata <- cbind(spiderdata, SquaresWithin = squares)

This works because the elements are sorted by drug and the drugs vector has
the drug names in the same order. If my data were mixed up however, this
method would misalign results. Is there any way to say something like:

spiderdata$SquaresWithin = (spiderdata$Response - mean(subset(spiderdata,
Drug == *spiderdata$Drug of current row*, select = Response))) ^ 2

Just using Drug == spiderdata$Drug doesn't seem to work and I think it is
because Drug is a factor?

I suppose I could always sort on the element I was generating a new column
for and cause them to align, but I was wondering if the syntax would allow
me to refer to elements of the row a computation. I was trying it with
tapply which it sounds like should work, but this is not the method:

sq_within = function(row) { (row$Response - mean(subset(spiderdata, row =
row$Drug))) ^ 2 }
tapply(spiderdata, spiderdata$Drug, fun = sq_within)

It doesn't like the first and second arguments to be different lengths. The
first argument is the data to be iterated over though and I think my
understanding of how the function works is somehow fundamentally off. I've
had a hard time finding clear examples where tapply isn't used with a
function that generates a different number of outputs than there are inputs.

Thank you again for the help. I'm liking R, just still getting the hang of
some of the conceptual stuff.

Will

P.S. You don't have to be vague for the sake of my homework. The assignment
was to do this in SAS which I've already completed. I don't plan on owning
SAS though and I would like to be able to do statistics in the future, so
learning R is just for my benefit.

On Jan 30, 2008 8:11 PM, Simon Blomberg <[hidden email]> wrote:

> Your data setup is wrong. You have one factor (Drug) with 3 levels
> (Zoloft, Naltrexone, Valium). So your data should be:
>
> spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
> each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
> 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
>
> You should be able to take it from there. (Since this is a homework
> problem, I'm being intentionally vague.)
>
> cheers,
>
> Simon.
>
> On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:
> > My grasp of R and statistics are both seriously lacking, so if this
> question
> > is completely naive, I apologize in advance. I've hunted for a couple
> hours
> > on the internet and none of the methods I've found have produced the
> result
> > I'm looking for.
> >
> > I'm currently a student in a Statistics class and we are learning the
> ANOVA.
> > We had to do one by hand and then reproduce our work in SAS. I really
> like
> > the idea of understanding R however and would like to reproduce the
> solution
> > in R if possible.
> >
> > Where I'm at now is this little program:
> > http://odin.himinbi.org/classes/psy304b/spider_analysis.r
> >
> > The program calculates an anova manually (correctly, I'm pretty sure, it
> > agrees with the same numbers in excel). The answer that it comes up with
> > doesn't agree with any of the numbers I can get either the aov or anova
> > functions to produce.
> >
> > Can anyone help me with simply the method to compute a one-way anova?
> Well,
> > specifically how to replicate the sort of anova people learn in an intro
> to
> > statistics class. All of the degrees of freedom are off from what I
> expect
> > them to be (they're all 1).
> >
> > (The original problem, should it help in understanding my question, is
> at:
> > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it
> will
> > likely look pretty funky if your browser doesn't support mathml (firefox
> > does).)
> >
> > Will
> >
> > The program is as follows:
> >
> > library(foreign)
> > # spiderdata <- read.csv("spider_data.csv")
> >
> > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6),
> > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> >
> > summary(spiderdata)
> >
> > # Compute a one-way ANOVA by hand
> >
> > J = length(spiderdata)
> >
> > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> > for(j in 2:J) {
> > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> > }
> > sqdata
> >
> > N = 0
> > for(j in 1:J) {
> > N = N + length(sqdata[[j]])
> > }
> >
> > SSW = sum(sqdata)
> > MSW = SSW / (N - J)
> > SSB = 0
> > for(j in 1:(length(spiderdata))) {
> > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> > (sum(spiderdata) / N)) ^ 2)
> > }
> > MSB = SSB / (J - 1)
> >
> > F = MSB / MSW
> > f_prob = pf(F, J - 1, N - J)
> > reject_point = qf(.95, J - 1, N - J)
> >
> > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:", F,
> ",
> > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
> >
> > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Simon Blomberg, BSc (Hons), PhD, MAppStat.
> Lecturer and Consultant Statistician
> Faculty of Biological and Chemical Sciences
> The University of Queensland
> St. Lucia Queensland 4072
> Australia
> Room 320 Goddard Building (8)
> T: +61 7 3365 2506
> http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb>
> email: S.Blomberg1_at_uq.edu.au
>
> Policies:
> 1.  I will NOT analyse your data for you.
> 2.  Your deadline is your problem.
>
> The combination of some data and an aching desire for
> an answer does not ensure that a reasonable answer can
> be extracted from a given body of data. - John Tukey.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: difficulties computing a simple anova

Will Holcomb
That's excellent. I somehow missed that variable names were case sensitive.
I was using all lowercase names and it was throwing things off. The error I
would get using fun instead of FUN was along the lines of:

function (x)
x - mean(x)
<environment: 0x84be554>
Error in unique.default(x) : unique() applies only to vectors
Execution halted

Which was a bit misleading since it looked like the function was being used.
This solution was exactly the sort of thing I was hoping was possible
though. Thank you. I talked to my prof about the fact I like R and will
never own a copy of SAS, and I get to do my work in R this semester, so
y'all may well hear from me again. =)

Will

On Jan 31, 2008 12:52 PM, Greg Snow <[hidden email]> wrote:

>  Here are a couple of ways:
>
> tmp.df <- data.frame( group=factor(sample(LETTERS[1:3], 25,
> replace=TRUE)),
>   val = rnorm(25, 50, 5) )
>
> tmp.df <- transform(tmp.df, ssq = ave(val, group,
>    FUN=function(x) x-mean(x))^2 )
> # another way
> tmp.means <- tapply(tmp.df$val, tmp.df$group, FUN=mean)
> tmp.df$ssq2 <- as.vector( ( tmp.df$val - tmp.means[tmp.df$group] ) ^2 )
> # check that they both worked
> with( tmp.df, all.equal(ssq,ssq2) )
>
> Hope this helps,
>
> ------------------------------
> *From:* [hidden email] on behalf of Will Holcomb
> *Sent:* Thu 1/31/2008 8:10 AM
> *To:* Simon Blomberg
> *Cc:* r-help
> *Subject:* Re: [R] difficulties computing a simple anova
>
>  Excellent. I was able to do the analysis with no problem. I just had to
> do
> some twiddling with the inputs to get things from the CSV to the format
> you
> described.
>
> When the data is in that format, I would like to be able to say, "create a
> column which is each square of the difference of each response with the
> mean
> of all the responses for that drug." Is there a simple way to do that?
> What
> I am doing currently is:
>
> squares = c()
> for(drug in unique(drugs)) {
>    responses = subset(spiderdata, Drug == drug, select = Response)
>    squares = append(squares, (responses - mean(responses)) ^ 2)
> }
> spiderdata <- cbind(spiderdata, SquaresWithin = squares)
>
> This works because the elements are sorted by drug and the drugs vector
> has
> the drug names in the same order. If my data were mixed up however, this
> method would misalign results. Is there any way to say something like:
>
> spiderdata$SquaresWithin = (spiderdata$Response - mean(subset(spiderdata,
> Drug == *spiderdata$Drug of current row*, select = Response))) ^ 2
>
> Just using Drug == spiderdata$Drug doesn't seem to work and I think it is
> because Drug is a factor?
>
> I suppose I could always sort on the element I was generating a new column
> for and cause them to align, but I was wondering if the syntax would allow
> me to refer to elements of the row a computation. I was trying it with
> tapply which it sounds like should work, but this is not the method:
>
> sq_within = function(row) { (row$Response - mean(subset(spiderdata, row =
> row$Drug))) ^ 2 }
> tapply(spiderdata, spiderdata$Drug, fun = sq_within)
>
> It doesn't like the first and second arguments to be different lengths.
> The
> first argument is the data to be iterated over though and I think my
> understanding of how the function works is somehow fundamentally off. I've
> had a hard time finding clear examples where tapply isn't used with a
> function that generates a different number of outputs than there are
> inputs.
>
> Thank you again for the help. I'm liking R, just still getting the hang of
> some of the conceptual stuff.
>
> Will
>
> P.S. You don't have to be vague for the sake of my homework. The
> assignment
> was to do this in SAS which I've already completed. I don't plan on owning
> SAS though and I would like to be able to do statistics in the future, so
> learning R is just for my benefit.
>
> On Jan 30, 2008 8:11 PM, Simon Blomberg <[hidden email]> wrote:
>
> > Your data setup is wrong. You have one factor (Drug) with 3 levels
> > (Zoloft, Naltrexone, Valium). So your data should be:
> >
> > spiderdata <- data.frame(Drug=rep(c("Zoloft", "Naltrexone", "Valium"),
> > each=10), Response=c(9, 11, 5, 12, 15, 14, 13, 12, 7, 6, 15, 16, 12, 12,
> > 18, 19, 23, 20, 13, 17, 9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> >
> > You should be able to take it from there. (Since this is a homework
> > problem, I'm being intentionally vague.)
> >
> > cheers,
> >
> > Simon.
> >
> > On Wed, 2008-01-30 at 18:47 -0600, Will Holcomb wrote:
> > > My grasp of R and statistics are both seriously lacking, so if this
> > question
> > > is completely naive, I apologize in advance. I've hunted for a couple
> > hours
> > > on the internet and none of the methods I've found have produced the
> > result
> > > I'm looking for.
> > >
> > > I'm currently a student in a Statistics class and we are learning the
> > ANOVA.
> > > We had to do one by hand and then reproduce our work in SAS. I really
> > like
> > > the idea of understanding R however and would like to reproduce the
> > solution
> > > in R if possible.
> > >
> > > Where I'm at now is this little program:
> > > http://odin.himinbi.org/classes/psy304b/spider_analysis.r
> > >
> > > The program calculates an anova manually (correctly, I'm pretty sure,
> it
> > > agrees with the same numbers in excel). The answer that it comes up
> with
> > > doesn't agree with any of the numbers I can get either the aov or
> anova
> > > functions to produce.
> > >
> > > Can anyone help me with simply the method to compute a one-way anova?
> > Well,
> > > specifically how to replicate the sort of anova people learn in an
> intro
> > to
> > > statistics class. All of the degrees of freedom are off from what I
> > expect
> > > them to be (they're all 1).
> > >
> > > (The original problem, should it help in understanding my question, is
> > at:
> > > http://odin.himinbi.org/classes/psy304b/homework_1.xhtml#2 though it
> > will
> > > likely look pretty funky if your browser doesn't support mathml
> (firefox
> > > does).)
> > >
> > > Will
> > >
> > > The program is as follows:
> > >
> > > library(foreign)
> > > # spiderdata <- read.csv("spider_data.csv")
> > >
> > > spiderdata = data.frame(Zoloft = c(9, 11, 5, 12, 15, 14, 13, 12, 7,
> 6),
> > > Naltrexone = c(15, 16, 12, 12, 18, 19, 23, 20, 13, 17),
> > > Valium = c(9, 11, 12, 5, 13, 15, 11, 8, 6, 9))
> > >
> > > summary(spiderdata)
> > >
> > > # Compute a one-way ANOVA by hand
> > >
> > > J = length(spiderdata)
> > >
> > > sqdata <- data.frame((spiderdata[1] - mean(spiderdata[1])) ^ 2)
> > > for(j in 2:J) {
> > > sqdata <- cbind(sqdata, (spiderdata[j] - mean(spiderdata[j])) ^ 2)
> > > }
> > > sqdata
> > >
> > > N = 0
> > > for(j in 1:J) {
> > > N = N + length(sqdata[[j]])
> > > }
> > >
> > > SSW = sum(sqdata)
> > > MSW = SSW / (N - J)
> > > SSB = 0
> > > for(j in 1:(length(spiderdata))) {
> > > SSB = SSB + length(spiderdata[[j]]) * ((mean(spiderdata[j])[[1]] -
> > > (sum(spiderdata) / N)) ^ 2)
> > > }
> > > MSB = SSB / (J - 1)
> > >
> > > F = MSB / MSW
> > > f_prob = pf(F, J - 1, N - J)
> > > reject_point = qf(.95, J - 1, N - J)
> > >
> > > cat("SSW:", SSW, ", MSW:", MSW, ", SSB:", SSB, ", MSB:", MSB, ", F:",
> F,
> > ",
> > > P(F):", f_prob, ", P(", reject_point, ") = .95\n", sep = "")
> > >
> > > anova(lm(Zoloft ~ Valium + Naltrexone, data = spiderdata))
> > > aov(Zoloft ~ Valium + Naltrexone, data = spiderdata)
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > --
> > Simon Blomberg, BSc (Hons), PhD, MAppStat.
> > Lecturer and Consultant Statistician
> > Faculty of Biological and Chemical Sciences
> > The University of Queensland
> > St. Lucia Queensland 4072
> > Australia
> > Room 320 Goddard Building (8)
> > T: +61 7 3365 2506
> > http://www.uq.edu.au/~uqsblomb <http://www.uq.edu.au/%7Euqsblomb> <
> http://www.uq.edu.au/%7Euqsblomb>
> > email: S.Blomberg1_at_uq.edu.au
> >
> > Policies:
> > 1.  I will NOT analyse your data for you.
> > 2.  Your deadline is your problem.
> >
> > The combination of some data and an aching desire for
> > an answer does not ensure that a reasonable answer can
> > be extracted from a given body of data. - John Tukey.
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.