# Pearson chi-square test

5 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Pearson chi-square test

 Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x <- table(A,B) y <- table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi-square_test#Calculating_the_test-statisticsum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Pearson chi-square test

 Not sure what you want to test here with two matrices, but reading the manual helps here as well: y a vector; ignored if x is a matrix. x and y are matrices in your example, so it comes as no surprise that you get different results. On top of that, your manual calculation is not correct if you want to test whether two samples come from the same distribution (so don't be surprised if R still gives a different value...). HTH, Michael > -----Original Message----- > From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of Michael Haenlein > Sent: Tuesday, September 27, 2011 12:45 > To: [hidden email] > Subject: [R] Pearson chi-square test > > Dear all, > > I have some trouble understanding the chisq.test function. > Take the following example: > > set.seed(1) > A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) > B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) > C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) > x <- table(A,B) > y <- table(A,C) > > When I calculate the test statistic by hand I get a value of > approximately > 75.9: > http://en.wikipedia.org/wiki/Pearson's_chi-> square_test#Calculating_the_test-statistic > sum((x-y)^2/y) > > But when I do chisq.test(x,y) I get a value of 12.2 while > chisq.test(y,x) > gives a value of 10.3. > > I understand that I must be doing something wrong here, but I'm not > sure > what. > > Thanks, > > Michael > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-> guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Pearson chi-square test

 Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2  / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. > -----Original Message----- > From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of Meyners, Michael > Sent: Tuesday, September 27, 2011 13:28 > To: Michael Haenlein; [hidden email] > Subject: Re: [R] Pearson chi-square test > > Not sure what you want to test here with two matrices, but reading the > manual helps here as well: > > y a vector; ignored if x is a matrix. > > x and y are matrices in your example, so it comes as no surprise that > you get different results. On top of that, your manual calculation is > not correct if you want to test whether two samples come from the same > distribution (so don't be surprised if R still gives a different > value...). > > HTH, Michael > > > -----Original Message----- > > From: [hidden email] [mailto:r-help-bounces@r- > > project.org] On Behalf Of Michael Haenlein > > Sent: Tuesday, September 27, 2011 12:45 > > To: [hidden email] > > Subject: [R] Pearson chi-square test > > > > Dear all, > > > > I have some trouble understanding the chisq.test function. > > Take the following example: > > > > set.seed(1) > > A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) > > B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) > > C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) > > x <- table(A,B) > > y <- table(A,C) > > > > When I calculate the test statistic by hand I get a value of > > approximately > > 75.9: > > http://en.wikipedia.org/wiki/Pearson's_chi-> > square_test#Calculating_the_test-statistic > > sum((x-y)^2/y) > > > > But when I do chisq.test(x,y) I get a value of 12.2 while > > chisq.test(y,x) > > gives a value of 10.3. > > > > I understand that I must be doing something wrong here, but I'm not > > sure > > what. > > > > Thanks, > > > > Michael > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [hidden email] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-> > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-> guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate