Hello List,
I'm trying to do a paired t-test, and I'm wondering if it's consistent with equations. I have a dataset that has a response and two treatments (here's an example): ID trt order resp 17 1 0 1 0.0037513592 18 2 0 1 0.0118723051 19 4 0 1 0.0002610251 20 5 0 1 -0.0077951450 21 6 0 1 0.0022339952 22 7 0 2 0.0235195453 The subjects were randomized and assigned to receive either the treatment or the placebo first, then the other. I know I'll eventually have to move on to a GLM or something that incorporates the order, but for now I wanted to start with a simple t.test. My problem is that, if I get the responses into two vectors x and y (sorted by ID) and do a t.test, and then compare that to a formula t.test, they aren't the same. > t.test(x,y,paired=TRUE) Paired t-test data: x and y t = -0.3492, df = 15, p-value = 0.7318 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.010446921 0.007505966 sample estimates: mean of the differences -0.001470477 > t.test(resp~trt,data=dat1[[3]],paired=TRUE) Paired t-test data: resp by trt t = -0.3182, df = 15, p-value = 0.7547 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.007096678 0.005253173 sample estimates: mean of the differences -0.0009217521 What I'm assuming is that the equation isn't retaining the inherent order of the dataset, so the pairing isn't matching up (even though the dataset is ordered by ID). Is there a way to make the t.test retain the correct ordering? Thanks, Sam ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Aug 15, 2010, at 9:05 AM, R Help wrote:
> Hello List, > > I'm trying to do a paired t-test, and I'm wondering if it's consistent > with equations. I have a dataset that has a response and two > treatments (here's an example): > > ID trt order resp > 17 1 0 1 0.0037513592 > 18 2 0 1 0.0118723051 > 19 4 0 1 0.0002610251 > 20 5 0 1 -0.0077951450 > 21 6 0 1 0.0022339952 > 22 7 0 2 0.0235195453 > > The subjects were randomized and assigned to receive either the > treatment or the placebo first, then the other. I know I'll > eventually have to move on to a GLM or something that incorporates the > order, but for now I wanted to start with a simple t.test. My problem > is that, if I get the responses into two vectors x and y (sorted by > ID) and do a t.test, and then compare that to a formula t.test, they > aren't the same. > >> t.test(x,y,paired=TRUE) > > Paired t-test > > data: x and y > t = -0.3492, df = 15, p-value = 0.7318 > alternative hypothesis: true difference in means is not equal to 0 > 95 percent confidence interval: > -0.010446921 0.007505966 > sample estimates: > mean of the differences > -0.001470477 > >> t.test(resp~trt,data=dat1[[3]],paired=TRUE) > > Paired t-test > > data: resp by trt > t = -0.3182, df = 15, p-value = 0.7547 > alternative hypothesis: true difference in means is not equal to 0 > 95 percent confidence interval: > -0.007096678 0.005253173 > sample estimates: > mean of the differences > -0.0009217521 > > What I'm assuming is that the equation isn't retaining the inherent > order of the dataset, so the pairing isn't matching up (even though > the dataset is ordered by ID). Is there a way to make the t.test > retain the correct ordering? > > Thanks, > Sam See this thread from just 2 days ago: https://stat.ethz.ch/pipermail/r-help/2010-August/249068.html perhaps focusing on Thomas' reply, which is the next post in the thread. Bottom line, don't use the formula method for a paired t test. HTH, Marc Schwartz ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Marc Schwartz wrote:
> On Aug 15, 2010, at 9:05 AM, R Help wrote: > >> Hello List, >> >> I'm trying to do a paired t-test, and I'm wondering if it's consistent >> with equations. I have a dataset that has a response and two >> treatments (here's an example): >> >> ID trt order resp >> 17 1 0 1 0.0037513592 >> 18 2 0 1 0.0118723051 >> 19 4 0 1 0.0002610251 >> 20 5 0 1 -0.0077951450 >> 21 6 0 1 0.0022339952 >> 22 7 0 2 0.0235195453 >> >> The subjects were randomized and assigned to receive either the >> treatment or the placebo first, then the other. I know I'll >> eventually have to move on to a GLM or something that incorporates the >> order, but for now I wanted to start with a simple t.test. My problem >> is that, if I get the responses into two vectors x and y (sorted by >> ID) and do a t.test, and then compare that to a formula t.test, they >> aren't the same. >> >>> t.test(x,y,paired=TRUE) >> Paired t-test >> >> data: x and y >> t = -0.3492, df = 15, p-value = 0.7318 >> alternative hypothesis: true difference in means is not equal to 0 >> 95 percent confidence interval: >> -0.010446921 0.007505966 >> sample estimates: >> mean of the differences >> -0.001470477 >> >>> t.test(resp~trt,data=dat1[[3]],paired=TRUE) >> Paired t-test >> >> data: resp by trt >> t = -0.3182, df = 15, p-value = 0.7547 >> alternative hypothesis: true difference in means is not equal to 0 >> 95 percent confidence interval: >> -0.007096678 0.005253173 >> sample estimates: >> mean of the differences >> -0.0009217521 >> >> What I'm assuming is that the equation isn't retaining the inherent >> order of the dataset, so the pairing isn't matching up (even though >> the dataset is ordered by ID). Is there a way to make the t.test >> retain the correct ordering? >> >> Thanks, >> Sam > > > See this thread from just 2 days ago: > > https://stat.ethz.ch/pipermail/r-help/2010-August/249068.html > > perhaps focusing on Thomas' reply, which is the next post in the thread. > > Bottom line, don't use the formula method for a paired t test. Yes. I'm not sure the same problem is afoot here, though. In particular, I'm puzzled by the fact that there are 15DF in both cases, but different average difference. This kind of suggests to me that maybe the x and y are not computed correctly. (If only the ordering was scrambled, the average difference should be the same, but the variance typically inflated.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: [hidden email] Priv: [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Aug 15, 2010, at 3:31 PM, Peter Dalgaard wrote: > Marc Schwartz wrote: >> On Aug 15, 2010, at 9:05 AM, R Help wrote: >> >>> Hello List, >>> >>> I'm trying to do a paired t-test, and I'm wondering if it's >>> consistent >>> with equations. I have a dataset that has a response and two >>> treatments (here's an example): >>> >>> ID trt order resp >>> 17 1 0 1 0.0037513592 >>> 18 2 0 1 0.0118723051 >>> 19 4 0 1 0.0002610251 >>> 20 5 0 1 -0.0077951450 >>> 21 6 0 1 0.0022339952 >>> 22 7 0 2 0.0235195453 >>> >>> The subjects were randomized and assigned to receive either the >>> treatment or the placebo first, then the other. I know I'll >>> eventually have to move on to a GLM or something that incorporates >>> the >>> order, but for now I wanted to start with a simple t.test. My >>> problem >>> is that, if I get the responses into two vectors x and y (sorted by >>> ID) and do a t.test, and then compare that to a formula t.test, they >>> aren't the same. >>> >>>> t.test(x,y,paired=TRUE) >>> Paired t-test >>> >>> data: x and y >>> t = -0.3492, df = 15, p-value = 0.7318 >>> alternative hypothesis: true difference in means is not equal to 0 >>> 95 percent confidence interval: >>> -0.010446921 0.007505966 >>> sample estimates: >>> mean of the differences >>> -0.001470477 >>> >>>> t.test(resp~trt,data=dat1[[3]],paired=TRUE) Since neither resp or trt would be in dat1[[3]] wouldn't the fact that no error was reported imply that either dat1 had been attached (and we were not informed of hthat prior attach()-ment or that resp and trt are also object names besides being column names inside dat1? >>> Paired t-test >>> >>> data: resp by trt >>> t = -0.3182, df = 15, p-value = 0.7547 >>> alternative hypothesis: true difference in means is not equal to 0 >>> 95 percent confidence interval: >>> -0.007096678 0.005253173 >>> sample estimates: >>> mean of the differences >>> -0.0009217521 >>> >>> What I'm assuming is that the equation isn't retaining the inherent >>> order of the dataset, so the pairing isn't matching up (even though >>> the dataset is ordered by ID). Is there a way to make the t.test >>> retain the correct ordering? >>> >>> Thanks, >>> Sam >> >> >> See this thread from just 2 days ago: >> >> https://stat.ethz.ch/pipermail/r-help/2010-August/249068.html >> >> perhaps focusing on Thomas' reply, which is the next post in the >> thread. >> >> Bottom line, don't use the formula method for a paired t test. > > Yes. I'm not sure the same problem is afoot here, though. In > particular, > I'm puzzled by the fact that there are 15DF in both cases, but > different > average difference. This kind of suggests to me that maybe the x and y > are not computed correctly. (If only the ordering was scrambled, the > average difference should be the same, but the variance typically > inflated.) > > -- > Peter Dalgaard David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Aug 15, 2010, at 2:48 PM, David Winsemius wrote:
> > On Aug 15, 2010, at 3:31 PM, Peter Dalgaard wrote: > >> Marc Schwartz wrote: >>> On Aug 15, 2010, at 9:05 AM, R Help wrote: >>> >>>> Hello List, >>>> >>>> I'm trying to do a paired t-test, and I'm wondering if it's consistent >>>> with equations. I have a dataset that has a response and two >>>> treatments (here's an example): >>>> >>>> ID trt order resp >>>> 17 1 0 1 0.0037513592 >>>> 18 2 0 1 0.0118723051 >>>> 19 4 0 1 0.0002610251 >>>> 20 5 0 1 -0.0077951450 >>>> 21 6 0 1 0.0022339952 >>>> 22 7 0 2 0.0235195453 >>>> >>>> The subjects were randomized and assigned to receive either the >>>> treatment or the placebo first, then the other. I know I'll >>>> eventually have to move on to a GLM or something that incorporates the >>>> order, but for now I wanted to start with a simple t.test. My problem >>>> is that, if I get the responses into two vectors x and y (sorted by >>>> ID) and do a t.test, and then compare that to a formula t.test, they >>>> aren't the same. >>>> >>>>> t.test(x,y,paired=TRUE) >>>> Paired t-test >>>> >>>> data: x and y >>>> t = -0.3492, df = 15, p-value = 0.7318 >>>> alternative hypothesis: true difference in means is not equal to 0 >>>> 95 percent confidence interval: >>>> -0.010446921 0.007505966 >>>> sample estimates: >>>> mean of the differences >>>> -0.001470477 >>>> >>>>> t.test(resp~trt,data=dat1[[3]],paired=TRUE) > > Since neither resp or trt would be in dat1[[3]] wouldn't the fact that no error was reported imply that either dat1 had been attached (and we were not informed of hthat prior attach()-ment or that resp and trt are also object names besides being column names inside dat1? > > >>>> Paired t-test >>>> >>>> data: resp by trt >>>> t = -0.3182, df = 15, p-value = 0.7547 >>>> alternative hypothesis: true difference in means is not equal to 0 >>>> 95 percent confidence interval: >>>> -0.007096678 0.005253173 >>>> sample estimates: >>>> mean of the differences >>>> -0.0009217521 >>>> >>>> What I'm assuming is that the equation isn't retaining the inherent >>>> order of the dataset, so the pairing isn't matching up (even though >>>> the dataset is ordered by ID). Is there a way to make the t.test >>>> retain the correct ordering? >>>> >>>> Thanks, >>>> Sam >>> >>> >>> See this thread from just 2 days ago: >>> >>> https://stat.ethz.ch/pipermail/r-help/2010-August/249068.html >>> >>> perhaps focusing on Thomas' reply, which is the next post in the thread. >>> >>> Bottom line, don't use the formula method for a paired t test. >> >> Yes. I'm not sure the same problem is afoot here, though. In particular, >> I'm puzzled by the fact that there are 15DF in both cases, but different >> average difference. This kind of suggests to me that maybe the x and y >> are not computed correctly. (If only the ordering was scrambled, the >> average difference should be the same, but the variance typically >> inflated.) >> I suspect that David is correct here. Good catch. set.seed(1) x <- rnorm(16, 1, 1) y <- rnorm(16, 1.5, 1) grp <- rep(c("A", "B"), each = 16) > t.test(x, y, paired = TRUE) Paired t-test data: x and y t = -1.595, df = 15, p-value = 0.1316 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.2841549 0.1848776 sample estimates: mean of the differences -0.5496387 > t.test(x-y) One Sample t-test data: x - y t = -1.595, df = 15, p-value = 0.1316 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -1.2841549 0.1848776 sample estimates: mean of x -0.5496387 > t.test(c(x, y) ~ grp, paired = TRUE) Paired t-test data: c(x, y) by grp t = -1.595, df = 15, p-value = 0.1316 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.2841549 0.1848776 sample estimates: mean of the differences -0.5496387 # Scramble the pairings, as Peter notes set.seed(2) > t.test(c(sample(x), y) ~ grp, paired = TRUE) Paired t-test data: c(sample(x), y) by grp t = -1.8166, df = 15, p-value = 0.0893 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.19453037 0.09525302 sample estimates: mean of the differences -0.5496387 The prior thread behavior was due to the handling of missing data compromising the pairings. So to the OP, check your working environment and your invocation of the formula method 'data' argument. However, avoid using the formula method for paired t tests. Regards, Marc ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |