Quantcast

comparing proportions

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

comparing proportions

array chip
Hi, I have a dataset that has 2 groups of samples. For each sample, then
response measured is the number of success (no.success) obatined with the number
of trials (no.trials). So a porportion of success (prpop.success) can be
computed as no.success/no.trials. Now the objective is to test if there is a
statistical significant difference in the proportion of success between the 2
groups of samples (say n1=20, n2=30).

I can think of 2 ways to do the test:

1. regular t test based on the variable prop.success
2. Mann-Whitney test based on the variable prop.success
2. do a binomial regression as:
    fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
         family=binomial)
    anova(fit, test='Chisq')

My questions is:
1. Is t test appropriate for comparing 2 groups of proportions?
2. how about Mann-Whitney non-parametric test?
3. Among the 3, which technique is more appropriate?
4. any other technique you can suggest?

Thank you,

John


     
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: comparing proportions

array chip
Hi Bert,

Thanks for your reply. If I understand correctly, prop.test() is not suitable to
my situation. The input to prop.test() is 2 numbers for each group (# of success
and # of trials, for example, groups 1 has 5 success out of 10 trials; group 2
has 3 success out of 7 trials; etc. prop.test() tests whether the probability of
success is the same across groups.

In my case, each group has several subjects and each subject has 2 numbers (#
success and # trials). So

for group 1:
subject 1: 5 success, 10 trials
subject 2: 3 success, 8 trials
:
:

for group 2:
subject a: 7 success, 9 trials
subject b: 6 success, 7 trials
:
:

I want to test whether the probability of success in group 1 is the same as in
group 2. It's like comparing 2 groups of samples using t test, what I am
uncertain about is that whether regular t test (or non-pamametric test) is still
appropriate here when the response variable is actually proportions.

I guess prop.test() can not be used with my dataset, or I may be wrong?

Thanks

John
 
 
 




________________________________
From: Bert Gunter <[hidden email]>

Sent: Wed, February 9, 2011 3:58:05 PM
Subject: Re: [R] comparing proportions

1. Is this a homework problem?

2. ?prop.test

3. If you haven't done so already, get and consult a basic statistical
methods book to help you with questions such as this.

-- Bert


> Hi, I have a dataset that has 2 groups of samples. For each sample, then
> response measured is the number of success (no.success) obatined with the
>number
> of trials (no.trials). So a porportion of success (prpop.success) can be
> computed as no.success/no.trials. Now the objective is to test if there is a
> statistical significant difference in the proportion of success between the 2
> groups of samples (say n1=20, n2=30).
>
> I can think of 2 ways to do the test:
>
> 1. regular t test based on the variable prop.success
> 2. Mann-Whitney test based on the variable prop.success
> 2. do a binomial regression as:
>     fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
>          family=binomial)
>     anova(fit, test='Chisq')
>
> My questions is:
> 1. Is t test appropriate for comparing 2 groups of proportions?
> 2. how about Mann-Whitney non-parametric test?
> 3. Among the 3, which technique is more appropriate?
> 4. any other technique you can suggest?
>
> Thank you,
>
> John
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


--
Bert Gunter
Genentech Nonclinical Biostatistics



     
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: comparing proportions

Robert A LaBudde
prop.test() is applicable to a binomial experiment in each of two classes.

Your experiment is binomial only at the subject level. You then have
multiple subjects in each of your groups.

You have a random factor "Subjects" that must be accounted for.

The best way to analyze is a generalized linear mixed model with a
binomial distribution family and a logit or probit link. You will
probably have to investigate overdispersion. If you have a small
number of subjects, and don't care about the among-subject effect,
you can model them as fixed effects and use glm() instead.

Your original question, I believe, related to doing an ANOVA assuming
normality. In order for this to work with this kind of proportion
problem, you generally won't get good results unless the number of
replicates per subject is 12 or more, and the proportions involved
are within 0.15 to 0.85. Otherwise you will have biased confidence
intervals and significance tests.



At 07:51 PM 2/9/2011, array chip wrote:

>Content-type: text/plain
>Content-disposition: inline
>Content-length: 2969
>
>Hi Bert,
>
>Thanks for your reply. If I understand correctly, prop.test() is not
>suitable to
>my situation. The input to prop.test() is 2 numbers for each group
>(# of success
>and # of trials, for example, groups 1 has 5 success out of 10
>trials; group 2
>has 3 success out of 7 trials; etc. prop.test() tests whether the
>probability of
>success is the same across groups.
>
>In my case, each group has several subjects and each subject has 2 numbers (#
>success and # trials). So
>
>for group 1:
>subject 1: 5 success, 10 trials
>subject 2: 3 success, 8 trials
>:
>:
>
>for group 2:
>subject a: 7 success, 9 trials
>subject b: 6 success, 7 trials
>:
>:
>
>I want to test whether the probability of success in group 1 is the
>same as in
>group 2. It's like comparing 2 groups of samples using t test, what I am
>uncertain about is that whether regular t test (or non-pamametric
>test) is still
>appropriate here when the response variable is actually proportions.
>
>I guess prop.test() can not be used with my dataset, or I may be wrong?
>
>Thanks
>
>John
>
>
>
>
>
>
>
>________________________________
>From: Bert Gunter <[hidden email]>
>
>Sent: Wed, February 9, 2011 3:58:05 PM
>Subject: Re: [R] comparing proportions
>
>1. Is this a homework problem?
>
>2. ?prop.test
>
>3. If you haven't done so already, get and consult a basic statistical
>methods book to help you with questions such as this.
>
>-- Bert
>
>
> > Hi, I have a dataset that has 2 groups of samples. For each sample, then
> > response measured is the number of success (no.success) obatined with the
> >number
> > of trials (no.trials). So a porportion of success (prpop.success) can be
> > computed as no.success/no.trials. Now the objective is to test if
> there is a
> > statistical significant difference in the proportion of success
> between the 2
> > groups of samples (say n1=20, n2=30).
> >
> > I can think of 2 ways to do the test:
> >
> > 1. regular t test based on the variable prop.success
> > 2. Mann-Whitney test based on the variable prop.success
> > 2. do a binomial regression as:
> >     fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> >          family=binomial)
> >     anova(fit, test='Chisq')
> >
> > My questions is:
> > 1. Is t test appropriate for comparing 2 groups of proportions?
> > 2. how about Mann-Whitney non-parametric test?
> > 3. Among the 3, which technique is more appropriate?
> > 4. any other technique you can suggest?
> >
> > Thank you,
> >
> > John
> >
> >
> >
> >        [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
>--
>Bert Gunter
>Genentech Nonclinical Biostatistics
>
>
>
>
>         [[alternative HTML version deleted]]
>
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: comparing proportions

Robert A LaBudde
1. If you use a random effects model, you should make Subject the
random factor. I.e., a random intercepts model with 1|Subject. Group
is a fixed effect: You have only 2 groups. Even if you had more than
2 groups, treating Group as random would return a standard deviation,
not a P-value as you wanted. Finally, I doubt you believe the groups
used are meaningless, and only the population of groups is of
interest. Instead you consider them special, so Group is a fixed effect.

2. The number of observations for each Subject is the number of
trials, which you previously indicated were 7 to 10 in the cases listed.

3. If you have no interest in the Subject effect, you can use a fixed
Subject factor instead with glm() instead of glmer() or other mixed
model function. This is a good idea so long as the number of subjects
is, say, less than 10. Otherwise a mixed model would be a better idea.

I suggest you fit all three models to learn about what you're doing:
1) glmer() or equivalent, with cbind(successes, failures) ~ 1|Subject
+ Group; 2) glm() with cbind(successes, failures) ~ Subject + Group;
and 3) lm(p ~ Subject + Group), where p is the proportion success for
a particular subject and group.

Then compare the results. They will probably all 3 give the same
conclusion to the hypothesis question about Group. I would guess the
glmer() P-value will be larger, then the glm() and finally the lm(),
but the last two may reverse. The lm() model may actually perform
fairly well, as the Edgeworth series converges rapidly to normal for
binomial distributions with p within 0.15 to 0.85 and 10+ replicates,
as I stated before.

I'd be interested in seeing the results of these 3 fits myself just
for curiosity.

At 01:21 PM 2/10/2011, array chip wrote:

>Robert and Bert, thank you both very much for the response, really
>appreciated. I agree that using regular ANOVA (or regular t test)
>may not be wise during the normality issue. So I am debating between
>generalized linear model using glm(.., family=binomial) or
>generalized linear mixed effect model using glmer(...,
>family=binomial). I will forward to Robert an offline list email I
>sent to Bert about whether using (1|subject) versus (1|group) in
>mixed model specification. If using (1|group), both models will give
>me the same testing for fixed effects, which is what I am mainly
>interested in. So do I really need a mixed model here?
>
>Thanks again
>
>John
>
>
>From: Bert Gunter <[hidden email]>
>To: Robert A LaBudde <[hidden email]>
>Cc: array chip <[hidden email]>
>Sent: Thu, February 10, 2011 10:04:06 AM
>Subject: Re: [R] comparing proportions
>
>Robert:
>
>Yes, exactly. In an offlist email exchange, he clarified this for me,
>and I suggested exactly what you did, also with the cautions that his
>initial ad hoc suggestions were unwise. His subsequent post to R-help
>and the sig-mixed-models lists were the result, although he appears to
>have specified the model incorrectly in his glmer function (as
>(1|Group) instead of (1|subject).
>
>Cheers,
>Bert
>
>On Thu, Feb 10, 2011 at 9:55 AM, Robert A LaBudde
><<mailto:[hidden email]>[hidden email]> wrote:
> > prop.test() is applicable to a binomial experiment in each of two classes.
> >
> > Your experiment is binomial only at the subject level. You then have
> > multiple subjects in each of your groups.
> >
> > You have a random factor "Subjects" that must be accounted for.
> >
> > The best way to analyze is a generalized linear mixed model with a binomial
> > distribution family and a logit or probit link. You will probably have to
> > investigate overdispersion. If you have a small number of subjects, and
> > don't care about the among-subject effect, you can model them as fixed
> > effects and use glm() instead.
> >
> > Your original question, I believe, related to doing an ANOVA assuming
> > normality. In order for this to work with this kind of proportion problem,
> > you generally won't get good results unless the number of replicates per
> > subject is 12 or more, and the proportions involved are within
> 0.15 to 0.85.
> > Otherwise you will have biased confidence intervals and significance tests.
> >
> >
> >
> > At 07:51 PM 2/9/2011, array chip wrote:
> >>
> >> Content-type: text/plain
> >> Content-disposition: inline
> >> Content-length: 2969
> >>
> >> Hi Bert,
> >>
> >> Thanks for your reply. If I understand correctly, prop.test() is not
> >> suitable to
> >> my situation. The input to prop.test() is 2 numbers for each group (# of
> >> success
> >> and # of trials, for example, groups 1 has 5 success out of 10 trials;
> >> group 2
> >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> >> probability of
> >> success is the same across groups.
> >>
> >> In my case, each group has several subjects and each subject has 2 numbers
> >> (#
> >> success and # trials). So
> >>
> >> for group 1:
> >> subject 1: 5 success, 10 trials
> >> subject 2: 3 success, 8 trials
> >> :
> >> :
> >>
> >> for group 2:
> >> subject a: 7 success, 9 trials
> >> subject b: 6 success, 7 trials
> >> :
> >> :
> >>
> >> I want to test whether the probability of success in group 1 is the same
> >> as in
> >> group 2. It's like comparing 2 groups of samples using t test, what I am
> >> uncertain about is that whether regular t test (or non-pamametric test) is
> >> still
> >> appropriate here when the response variable is actually proportions.
> >>
> >> I guess prop.test() can not be used with my dataset, or I may be wrong?
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Bert Gunter <<mailto:[hidden email]>[hidden email]>
> >>
> >> Sent: Wed, February 9, 2011 3:58:05 PM
> >> Subject: Re: [R] comparing proportions
> >>
> >> 1. Is this a homework problem?
> >>
> >> 2. ?prop.test
> >>
> >> 3. If you haven't done so already, get and consult a basic statistical
> >> methods book to help you with questions such as this.
> >>
> >> -- Bert
> >>
> >>
> >> > Hi, I have a dataset that has 2 groups of samples. For each sample, then
> >> > response measured is the number of success (no.success) obatined with
> >> > the
> >> >number
> >> > of trials (no.trials). So a porportion of success (prpop.success) can be
> >> > computed as no.success/no.trials. Now the objective is to test if there
> >> > is a
> >> > statistical significant difference in the proportion of success between
> >> > the 2
> >> > groups of samples (say n1=20, n2=30).
> >> >
> >> > I can think of 2 ways to do the test:
> >> >
> >> > 1. regular t test based on the variable prop.success
> >> > 2. Mann-Whitney test based on the variable prop.success
> >> > 2. do a binomial regression as:
> >> >     fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> >> >          family=binomial)
> >> >     anova(fit, test='Chisq')
> >> >
> >> > My questions is:
> >> > 1. Is t test appropriate for comparing 2 groups of proportions?
> >> > 2. how about Mann-Whitney non-parametric test?
> >> > 3. Among the 3, which technique is more appropriate?
> >> > 4. any other technique you can suggest?
> >> >
> >> > Thank you,
> >> >
> >> > John
> >> >
> >> >
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> >
> >> > ______________________________________________
> >> > <mailto:[hidden email]>[hidden email] mailing list
> >> >
> <https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> >
> <http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> <mailto:[hidden email]>[hidden email] mailing list
> >>
> <https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>
> <http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ================================================================
> > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail:
> <mailto:[hidden email]>[hidden email]
> > Least Cost Formulations, Ltd.            URL:
> <http://lcfltd.com/>http://lcfltd.com/
> > 824 Timberlake Drive                     Tel: 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> >
> > "Vere scire est per causas scire"
> > ================================================================
> >
> >
>
>
>
>--
>Bert Gunter
>Genentech Nonclinical Biostatistics
>467-7374
><http://devo.gene.com/groups/devo/depts/ncb/home.shtml>http://devo.gene.com/groups/devo/depts/ncb/home.shtml

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: comparing proportions

array chip
Robert, thank you!

I tried all 3 models you suggested. Since each subject only has one line of data
in my dataset, would including Subject as a factor in glm() or lm() lead to 0 df
for resaiduals?

Attached is my dataset and here is my version of the 3 models:

test<-read.table("test.txt",sep='\t',header=T)
library(lme4)

## model 1: generalized mixed model
fit1<-glmer(cbind(success,failure)~group+(1|subject),data=test,family=binomial)

## model 2: generalized model
fit2<-glm(cbind(success,failure)~group,data=test,family=binomial)

## model 3: linear model
fit3<-lm(prop.success~group,weights=success+failure,data=test)

> fit1
Generalized linear mixed model fit by the Laplace approximation
Formula: cbind(success, failure) ~ group + (1 | subject)
   Data: test
   AIC   BIC logLik deviance
 54.75 57.89 -24.38    48.75
Random effects:
 Groups  Name        Variance Std.Dev.
 subject (Intercept) 1.8256   1.3511 
Number of obs: 21, groups: subject, 21
Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.6785     0.4950  -1.371    0.170
groupgroup2  -0.7030     0.6974  -1.008    0.313

> summary(fit2)
Call:
glm(formula = cbind(success, failure) ~ group, family = binomial,
    data = test)
Deviance Residuals:
    Min       1Q   Median       3Q      Max 
-2.8204  -2.0789  -0.5407   1.0403   4.0539 
Coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -0.4400     0.2080  -2.115   0.0344 *
groupgroup2  -0.6587     0.3108  -2.119   0.0341 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 78.723  on 20  degrees of freedom
Residual deviance: 74.152  on 19  degrees of freedom

> summary(fit3)
Call:
lm(formula = prop.success ~ group, data = test, weights = success +
    failure)
Residuals:
    Min      1Q  Median      3Q     Max
-1.1080 -0.7071 -0.2261  0.4754  1.9157
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.39175    0.08809   4.447 0.000276 ***
groupgroup2 -0.14175    0.12364  -1.146 0.265838   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8676 on 19 degrees of freedom
Multiple R-squared: 0.0647,     Adjusted R-squared: 0.01548
F-statistic: 1.314 on 1 and 19 DF,  p-value: 0.2658
 
As you can see, all 3 models gave quite different results, the model 2
(generalized linear model) gave significant p value while model 1 (generalized
mixed model) and model 3 (regular linear model) did not. Also as you can from
the data, prop.success has some value outside the range 0.15 to 0.85, so maybe
regular linear model may not be appropriate?
 
Thank you,
 
John
 
 


________________________________
From: Robert A LaBudde <[hidden email]>
To: array chip <[hidden email]>
Cc: Bert Gunter <[hidden email]>; [hidden email]
Sent: Thu, February 10, 2011 12:54:44 PM
Subject: Re: [R] comparing proportions

1. If you use a random effects model, you should make Subject the random factor.
I.e., a random intercepts model with 1|Subject. Group is a fixed effect: You
have only 2 groups. Even if you had more than 2 groups, treating Group as random
would return a standard deviation, not a P-value as you wanted. Finally, I doubt
you believe the groups used are meaningless, and only the population of groups
is of interest. Instead you consider them special, so Group is a fixed effect.

2. The number of observations for each Subject is the number of trials, which
you previously indicated were 7 to 10 in the cases listed.

3. If you have no interest in the Subject effect, you can use a fixed Subject
factor instead with glm() instead of glmer() or other mixed model function. This
is a good idea so long as the number of subjects is, say, less than 10.
Otherwise a mixed model would be a better idea.

I suggest you fit all three models to learn about what you're doing: 1) glmer()
or equivalent, with cbind(successes, failures) ~ 1|Subject + Group; 2) glm()
with cbind(successes, failures) ~ Subject + Group; and 3) lm(p ~ Subject +
Group), where p is the proportion success for a particular subject and group.

Then compare the results. They will probably all 3 give the same conclusion to
the hypothesis question about Group. I would guess the glmer() P-value will be
larger, then the glm() and finally the lm(), but the last two may reverse. The
lm() model may actually perform fairly well, as the Edgeworth series converges
rapidly to normal for binomial distributions with p within 0.15 to 0.85 and 10+
replicates, as I stated before.

I'd be interested in seeing the results of these 3 fits myself just for
curiosity.

At 01:21 PM 2/10/2011, array chip wrote:

> Robert and Bert, thank you both very much for the response, really appreciated.
>I agree that using regular ANOVA (or regular t test) may not be wise during the
>normality issue. So I am debating between generalized linear model using glm(..,
>family=binomial) or generalized linear mixed effect model using glmer(...,
>family=binomial). I will forward to Robert an offline list email I sent to Bert
>about whether using (1|subject) versus (1|group) in mixed model specification.
>If using (1|group), both models will give me the same testing for fixed effects,
>which is what I am mainly interested in. So do I really need a mixed model here?
>
> Thanks again
>
> John
>
>
> From: Bert Gunter <[hidden email]>
> To: Robert A LaBudde <[hidden email]>
> Cc: array chip <[hidden email]>
> Sent: Thu, February 10, 2011 10:04:06 AM
> Subject: Re: [R] comparing proportions
>
> Robert:
>
> Yes, exactly. In an offlist email exchange, he clarified this for me,
> and I suggested exactly what you did, also with the cautions that his
> initial ad hoc suggestions were unwise. His subsequent post to R-help
> and the sig-mixed-models lists were the result, although he appears to
> have specified the model incorrectly in his glmer function (as
> (1|Group) instead of (1|subject).
>
> Cheers,
> Bert
>
> On Thu, Feb 10, 2011 at 9:55 AM, Robert A LaBudde
><<mailto:[hidden email]>[hidden email]> wrote:
> > prop.test() is applicable to a binomial experiment in each of two classes.
> >
> > Your experiment is binomial only at the subject level. You then have
> > multiple subjects in each of your groups.
> >
> > You have a random factor "Subjects" that must be accounted for.
> >
> > The best way to analyze is a generalized linear mixed model with a binomial
> > distribution family and a logit or probit link. You will probably have to
> > investigate overdispersion. If you have a small number of subjects, and
> > don't care about the among-subject effect, you can model them as fixed
> > effects and use glm() instead.
> >
> > Your original question, I believe, related to doing an ANOVA assuming
> > normality. In order for this to work with this kind of proportion problem,
> > you generally won't get good results unless the number of replicates per
> > subject is 12 or more, and the proportions involved are within 0.15 to 0.85.
> > Otherwise you will have biased confidence intervals and significance tests.
> >
> >
> >
> > At 07:51 PM 2/9/2011, array chip wrote:
> >>
> >> Content-type: text/plain
> >> Content-disposition: inline
> >> Content-length: 2969
> >>
> >> Hi Bert,
> >>
> >> Thanks for your reply. If I understand correctly, prop.test() is not
> >> suitable to
> >> my situation. The input to prop.test() is 2 numbers for each group (# of
> >> success
> >> and # of trials, for example, groups 1 has 5 success out of 10 trials;
> >> group 2
> >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> >> probability of
> >> success is the same across groups.
> >>
> >> In my case, each group has several subjects and each subject has 2 numbers
> >> (#
> >> success and # trials). So
> >>
> >> for group 1:
> >> subject 1: 5 success, 10 trials
> >> subject 2: 3 success, 8 trials
> >> :
> >> :
> >>
> >> for group 2:
> >> subject a: 7 success, 9 trials
> >> subject b: 6 success, 7 trials
> >> :
> >> :
> >>
> >> I want to test whether the probability of success in group 1 is the same
> >> as in
> >> group 2. It's like comparing 2 groups of samples using t test, what I am
> >> uncertain about is that whether regular t test (or non-pamametric test) is
> >> still
> >> appropriate here when the response variable is actually proportions.
> >>
> >> I guess prop.test() can not be used with my dataset, or I may be wrong?
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Bert Gunter <<mailto:[hidden email]>[hidden email]>
> >>
> >> Sent: Wed, February 9, 2011 3:58:05 PM
> >> Subject: Re: [R] comparing proportions
> >>
> >> 1. Is this a homework problem?
> >>
> >> 2. ?prop.test
> >>
> >> 3. If you haven't done so already, get and consult a basic statistical
> >> methods book to help you with questions such as this.
> >>
> >> -- Bert
> >>
> >>
> >> > Hi, I have a dataset that has 2 groups of samples. For each sample, then
> >> > response measured is the number of success (no.success) obatined with
> >> > the
> >> >number
> >> > of trials (no.trials). So a porportion of success (prpop.success) can be
> >> > computed as no.success/no.trials. Now the objective is to test if there
> >> > is a
> >> > statistical significant difference in the proportion of success between
> >> > the 2
> >> > groups of samples (say n1=20, n2=30).
> >> >
> >> > I can think of 2 ways to do the test:
> >> >
> >> > 1. regular t test based on the variable prop.success
> >> > 2. Mann-Whitney test based on the variable prop.success
> >> > 2. do a binomial regression as:
> >> >    fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> >> >          family=binomial)
> >> >    anova(fit, test='Chisq')
> >> >
> >> > My questions is:
> >> > 1. Is t test appropriate for comparing 2 groups of proportions?
> >> > 2. how about Mann-Whitney non-parametric test?
> >> > 3. Among the 3, which technique is more appropriate?
> >> > 4. any other technique you can suggest?
> >> >
> >> > Thank you,
> >> >
> >> > John
> >> >
> >> >
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> >
> >> > ______________________________________________
> >> > <mailto:[hidden email]>[hidden email] mailing list
> >> >
><https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
>
> >> > PLEASE do read the posting guide
> >> >
><http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
>
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> <mailto:[hidden email]>[hidden email] mailing list
> >>
><https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
>
> >> PLEASE do read the posting guide
> >>
><http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
>
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ================================================================
> > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail:
><mailto:[hidden email]>[hidden email]
> > Least Cost Formulations, Ltd.            URL:
><http://lcfltd.com/>http://lcfltd.com/
> > 824 Timberlake Drive                    Tel: 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> >
> > "Vere scire est per causas scire"
> > ================================================================
> >
> >
>
>
>
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
><http://devo.gene.com/groups/devo/depts/ncb/home.shtml>http://devo.gene.com/groups/devo/depts/ncb/home.shtml
>l
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                    Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================



______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

test.txt (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: comparing proportions

Mike Marchywka

I had a longer draft before but I'll just ask if you ever looked at your data?

cr=rainbow(3);
plot(df$success+df$failure,df$prop.success,col=cr[as.numeric(df$group)],cex=as.numeric(df$group))


you'd have to suspect that p is a function of the number of trials per subject
and that differences between groups would be an issue with how their trial counts
were distributed. Speculating further, the one group seems to look like a candidate for linear
fit and the other perhaps non-monotonic curve suggestive of "learning" with more trials and
maybe fatigue in both groups.
This wouldn't show up in your lm model if I read it right and indeed the weights may not
help at all- but you could try something with numbers of trials as an independent variable
and look for something of the form p~an+bn^2 perhaps in a nonlinear model.

As long as you are just doing post hoc analysis with quite limited data, and even if you had a specific plan
figured out a head of time, it is always a good idea to plot your data and try many
different tests and relationships among your variables. Presumably all this
stuff is judged by how well it helps you understand the system of interest. 



----------------------------------------
Date: Thu, 10 Feb 2011 14:17:29 -0800
From: [hidden email]
To: [hidden email]
CC: [hidden email]; [hidden email]
Subject: Re: [R] comparing proportions


Robert, thank you!

I tried all 3 models you suggested. Since each subject only has one line of data
in my dataset, would including Subject as a factor in glm() or lm() lead to 0 df
for resaiduals?

Attached is my dataset and here is my version of the 3 models:

test<-read.table("test.txt",sep='\t',header=T)
library(lme4)

## model 1: generalized mixed model
fit1<-glmer(cbind(success,failure)~group+(1|subject),data=test,family=binomial)

## model 2: generalized model
fit2<-glm(cbind(success,failure)~group,data=test,family=binomial)

## model 3: linear model
fit3<-lm(prop.success~group,weights=success+failure,data=test)

> fit1
Generalized linear mixed model fit by the Laplace approximation
Formula: cbind(success, failure) ~ group + (1 | subject)
   Data: test
   AIC   BIC logLik deviance
 54.75 57.89 -24.38    48.75
Random effects:
 Groups  Name        Variance Std.Dev.
 subject (Intercept) 1.8256   1.3511
Number of obs: 21, groups: subject, 21
Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.6785     0.4950  -1.371    0.170
groupgroup2  -0.7030     0.6974  -1.008    0.313

> summary(fit2)
Call:
glm(formula = cbind(success, failure) ~ group, family = binomial,
    data = test)
Deviance Residuals:
    Min       1Q   Median       3Q      Max
-2.8204  -2.0789  -0.5407   1.0403   4.0539
Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.4400     0.2080  -2.115   0.0344 *
groupgroup2  -0.6587     0.3108  -2.119   0.0341 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 78.723  on 20  degrees of freedom
Residual deviance: 74.152  on 19  degrees of freedom

> summary(fit3)
Call:
lm(formula = prop.success ~ group, data = test, weights = success +
    failure)
Residuals:
    Min      1Q  Median      3Q     Max
-1.1080 -0.7071 -0.2261  0.4754  1.9157
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.39175    0.08809   4.447 0.000276 ***
groupgroup2 -0.14175    0.12364  -1.146 0.265838
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8676 on 19 degrees of freedom
Multiple R-squared: 0.0647,     Adjusted R-squared: 0.01548
F-statistic: 1.314 on 1 and 19 DF,  p-value: 0.2658

As you can see, all 3 models gave quite different results, the model 2
(generalized linear model) gave significant p value while model 1 (generalized
mixed model) and model 3 (regular linear model) did not. Also as you can from
the data, prop.success has some value outside the range 0.15 to 0.85, so maybe
regular linear model may not be appropriate?

Thank you,

John




________________________________
From: Robert A LaBudde
To: array chip
Cc: Bert Gunter ; [hidden email]
Sent: Thu, February 10, 2011 12:54:44 PM
Subject: Re: [R] comparing proportions

1. If you use a random effects model, you should make Subject the random factor.
I.e., a random intercepts model with 1|Subject. Group is a fixed effect: You
have only 2 groups. Even if you had more than 2 groups, treating Group as random
would return a standard deviation, not a P-value as you wanted. Finally, I doubt
you believe the groups used are meaningless, and only the population of groups
is of interest. Instead you consider them special, so Group is a fixed effect.

2. The number of observations for each Subject is the number of trials, which
you previously indicated were 7 to 10 in the cases listed.

3. If you have no interest in the Subject effect, you can use a fixed Subject
factor instead with glm() instead of glmer() or other mixed model function. This
is a good idea so long as the number of subjects is, say, less than 10.
Otherwise a mixed model would be a better idea.

I suggest you fit all three models to learn about what you're doing: 1) glmer()
or equivalent, with cbind(successes, failures) ~ 1|Subject + Group; 2) glm()
with cbind(successes, failures) ~ Subject + Group; and 3) lm(p ~ Subject +
Group), where p is the proportion success for a particular subject and group.

Then compare the results. They will probably all 3 give the same conclusion to
the hypothesis question about Group. I would guess the glmer() P-value will be
larger, then the glm() and finally the lm(), but the last two may reverse. The
lm() model may actually perform fairly well, as the Edgeworth series converges
rapidly to normal for binomial distributions with p within 0.15 to 0.85 and 10+
replicates, as I stated before.

I'd be interested in seeing the results of these 3 fits myself just for
curiosity.

At 01:21 PM 2/10/2011, array chip wrote:

> Robert and Bert, thank you both very much for the response, really appreciated.
>I agree that using regular ANOVA (or regular t test) may not be wise during the
>normality issue. So I am debating between generalized linear model using glm(..,
>family=binomial) or generalized linear mixed effect model using glmer(...,
>family=binomial). I will forward to Robert an offline list email I sent to Bert
>about whether using (1|subject) versus (1|group) in mixed model specification.
>If using (1|group), both models will give me the same testing for fixed effects,
>which is what I am mainly interested in. So do I really need a mixed model here?
>
> Thanks again
>
> John
>
>
> From: Bert Gunter
> To: Robert A LaBudde
> Cc: array chip
> Sent: Thu, February 10, 2011 10:04:06 AM
> Subject: Re: [R] comparing proportions
>
> Robert:
>
> Yes, exactly. In an offlist email exchange, he clarified this for me,
> and I suggested exactly what you did, also with the cautions that his
> initial ad hoc suggestions were unwise. His subsequent post to R-help
> and the sig-mixed-models lists were the result, although he appears to
> have specified the model incorrectly in his glmer function (as
> (1|Group) instead of (1|subject).
>
> Cheers,
> Bert
>
> On Thu, Feb 10, 2011 at 9:55 AM, Robert A LaBudde
><[hidden email]> wrote:
> > prop.test() is applicable to a binomial experiment in each of two classes.
> >
> > Your experiment is binomial only at the subject level. You then have
> > multiple subjects in each of your groups.
> >
> > You have a random factor "Subjects" that must be accounted for.
> >
> > The best way to analyze is a generalized linear mixed model with a binomial
> > distribution family and a logit or probit link. You will probably have to
> > investigate overdispersion. If you have a small number of subjects, and
> > don't care about the among-subject effect, you can model them as fixed
> > effects and use glm() instead.
> >
> > Your original question, I believe, related to doing an ANOVA assuming
> > normality. In order for this to work with this kind of proportion problem,
> > you generally won't get good results unless the number of replicates per
> > subject is 12 or more, and the proportions involved are within 0.15 to 0.85.
> > Otherwise you will have biased confidence intervals and significance tests.
> >
> >
> >
> > At 07:51 PM 2/9/2011, array chip wrote:
> >>
> >> Content-type: text/plain
> >> Content-disposition: inline
> >> Content-length: 2969
> >>
> >> Hi Bert,
> >>
> >> Thanks for your reply. If I understand correctly, prop.test() is not
> >> suitable to
> >> my situation. The input to prop.test() is 2 numbers for each group (# of
> >> success
> >> and # of trials, for example, groups 1 has 5 success out of 10 trials;
> >> group 2
> >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> >> probability of
> >> success is the same across groups.
> >>
> >> In my case, each group has several subjects and each subject has 2 numbers
> >> (#
> >> success and # trials). So
> >>
> >> for group 1:
> >> subject 1: 5 success, 10 trials
> >> subject 2: 3 success, 8 trials
> >> :
> >> :
> >>
> >> for group 2:
> >> subject a: 7 success, 9 trials
> >> subject b: 6 success, 7 trials
> >> :
> >> :
> >>
> >> I want to test whether the probability of success in group 1 is the same
> >> as in
> >> group 2. It's like comparing 2 groups of samples using t test, what I am
> >> uncertain about is that whether regular t test (or non-pamametric test) is
> >> still
> >> appropriate here when the response variable is actually proportions.
> >>
> >> I guess prop.test() can not be used with my dataset, or I may be wrong?
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Bert Gunter <[hidden email]>
> >>
> >> Sent: Wed, February 9, 2011 3:58:05 PM
> >> Subject: Re: [R] comparing proportions
> >>
> >> 1. Is this a homework problem?
> >>
> >> 2. ?prop.test
> >>
> >> 3. If you haven't done so already, get and consult a basic statistical
> >> methods book to help you with questions such as this.
> >>
> >> -- Bert
> >>
> >>
> >> > Hi, I have a dataset that has 2 groups of samples. For each sample, then
> >> > response measured is the number of success (no.success) obatined with
> >> > the
> >> >number
> >> > of trials (no.trials). So a porportion of success (prpop.success) can be
> >> > computed as no.success/no.trials. Now the objective is to test if there
> >> > is a
> >> > statistical significant difference in the proportion of success between
> >> > the 2
> >> > groups of samples (say n1=20, n2=30).
> >> >
> >> > I can think of 2 ways to do the test:
> >> >
> >> > 1. regular t test based on the variable prop.success
> >> > 2. Mann-Whitney test based on the variable prop.success
> >> > 2. do a binomial regression as:
> >> >    fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> >> >          family=binomial)
> >> >    anova(fit, test='Chisq')
> >> >
> >> > My questions is:
> >> > 1. Is t test appropriate for comparing 2 groups of proportions?
> >> > 2. how about Mann-Whitney non-parametric test?
> >> > 3. Among the 3, which technique is more appropriate?
> >> > 4. any other technique you can suggest?
> >> >
> >> > Thank you,
> >> >
> >> > John
> >> >
> >> >
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list
> >> >
>https://stat.ethz.ch/mailman/listinfo/r-help
>
> >> > PLEASE do read the posting guide
> >> >
>http://www.R-project.org/posting-guide.html
>
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >>
>https://stat.ethz.ch/mailman/listinfo/r-help
>
> >> PLEASE do read the posting guide
> >>
>http://www.R-project.org/posting-guide.html
>
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ================================================================
> > Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail:
>[hidden email]
> > Least Cost Formulations, Ltd.            URL:
>http://lcfltd.com/
> > 824 Timberlake Drive                    Tel: 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> >
> > "Vere scire est per causas scire"
> > ================================================================
> >
> >
>
>
>
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
>http://devo.gene.com/groups/devo/depts/ncb/home.shtml
>l

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                    Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================




______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
     
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: comparing proportions

Robert A LaBudde
In reply to this post by array chip
You need to change models 2 and 3 to use ~ group
+ subject. You left subject out as a fixed factor.

At 05:17 PM 2/10/2011, array chip wrote:

>Robert, thank you!
>
>I tried all 3 models you suggested. Since each
>subject only has one line of data in my dataset,
>would including Subject as a factor in glm() or
>lm() lead to 0 df for resaiduals?
>
>Attached is my dataset and here is my version of the 3 models:
>
>test<-read.table("test.txt",sep='\t',header=T)
>library(lme4)
>
>## model 1: generalized mixed model
>fit1<-glmer(cbind(success,failure)~group+(1|subject),data=test,family=binomial)
>## model 2: generalized model
>fit2<-glm(cbind(success,failure)~group,data=test,family=binomial)
>## model 3: linear model
>fit3<-lm(prop.success~group,weights=success+failure,data=test)
> > fit1
>Generalized linear mixed model fit by the Laplace approximation
>Formula: cbind(success, failure) ~ group + (1 | subject)
>    Data: test
>    AIC   BIC logLik deviance
>  54.75 57.89 -24.38    48.75
>Random effects:
>  Groups  Name        Variance Std.Dev.
>  subject (Intercept) 1.8256   1.3511
>Number of obs: 21, groups: subject, 21
>Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
>(Intercept)  -0.6785     0.4950  -1.371    0.170
>groupgroup2  -0.7030     0.6974  -1.008    0.313
>
>
> > summary(fit2)
>
>Call:
>glm(formula = cbind(success, failure) ~ group, family = binomial,
>     data = test)
>
>Deviance Residuals:
>     Min       1Q   Median       3Q      Max
>-2.8204  -2.0789  -0.5407   1.0403   4.0539
>
>Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
>(Intercept)  -0.4400     0.2080  -2.115   0.0344 *
>groupgroup2  -0.6587     0.3108  -2.119   0.0341 *
>---
>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01
>‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 78.723  on 20  degrees of freedom
>Residual deviance: 74.152  on 19  degrees of freedom
>
> > summary(fit3)
>
>Call:
>lm(formula = prop.success ~ group, data = test, weights = success +
>     failure)
>
>Residuals:
>     Min      1Q  Median      3Q     Max
>-1.1080 -0.7071 -0.2261  0.4754  1.9157
>
>Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
>(Intercept)  0.39175    0.08809   4.447 0.000276 ***
>groupgroup2 -0.14175    0.12364  -1.146 0.265838
>---
>Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01
>‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>Residual standard error: 0.8676 on 19 degrees of freedom
>Multiple R-squared: 0.0647,     Adjusted R-squared: 0.01548
>F-statistic: 1.314 on 1 and 19 DF,  p-value: 0.2658
>
>
>
>As you can see, all 3 models gave quite
>different results, the model 2 (generalized
>linear model) gave significant p value while
>model 1 (generalized mixed model) and model 3
>(regular linear model) did not. Also as you can
>from the data, prop.success has some value
>outside the range 0.15 to 0.85, so maybe regular
>linear model may not be appropriate?
>
>
>
>Thank you,
>
>
>
>John
>
>
>
>
>
>
>From: Robert A LaBudde <[hidden email]>
>To: array chip <[hidden email]>
>Cc: Bert Gunter <[hidden email]>; [hidden email]
>Sent: Thu, February 10, 2011 12:54:44 PM
>Subject: Re: [R] comparing proportions
>
>1. If you use a random effects model, you should
>make Subject the random factor. I.e., a random
>intercepts model with 1|Subject. Group is a
>fixed effect: You have only 2 groups. Even if
>you had more than 2 groups, treating Group as
>random would return a standard deviation, not a
>P-value as you wanted. Finally, I doubt you
>believe the groups used are meaningless, and
>only the population of groups is of interest.
>Instead you consider them special, so Group is a fixed effect.
>
>2. The number of observations for each Subject
>is the number of trials, which you previously
>indicated were 7 to 10 in the cases listed.
>
>3. If you have no interest in the Subject
>effect, you can use a fixed Subject factor
>instead with glm() instead of glmer() or other
>mixed model function. This is a good idea so
>long as the number of subjects is, say, less
>than 10. Otherwise a mixed model would be a better idea.
>
>I suggest you fit all three models to learn
>about what you're doing: 1) glmer() or
>equivalent, with cbind(successes, failures) ~
>1|Subject + Group; 2) glm() with
>cbind(successes, failures) ~ Subject + Group;
>and 3) lm(p ~ Subject + Group), where p is the
>proportion success for a particular subject and group.
>
>Then compare the results. They will probably all
>3 give the same conclusion to the hypothesis
>question about Group. I would guess the glmer()
>P-value will be larger, then the glm() and
>finally the lm(), but the last two may reverse.
>The lm() model may actually perform fairly well,
>as the Edgeworth series converges rapidly to
>normal for binomial distributions with p within
>0.15 to 0.85 and 10+ replicates, as I stated before.
>
>I'd be interested in seeing the results of these
>3 fits myself just for curiosity.
>
>At 01:21 PM 2/10/2011, array chip wrote:
> > Robert and Bert, thank you both very much for
> the response, really appreciated. I agree that
> using regular ANOVA (or regular t test) may not
> be wise during the normality issue. So I am
> debating between generalized linear model using
> glm(.., family=binomial) or generalized linear
> mixed effect model using glmer(...,
> family=binomial). I will forward to Robert an
> offline list email I sent to Bert about whether
> using (1|subject) versus (1|group) in mixed
> model specification. If using (1|group), both
> models will give me the same testing for fixed
> effects, which is what I am mainly interested
> in. So do I really need a mixed model here?
> >
> > Thanks again
> >
> > John
> >
> >
> > From: Bert Gunter <<mailto:[hidden email]>[hidden email]>
> > To: Robert A LaBudde <<mailto:[hidden email]>[hidden email]>
> > Cc: array chip <<mailto:[hidden email]>[hidden email]>
> > Sent: Thu, February 10, 2011 10:04:06 AM
> > Subject: Re: [R] comparing proportions
> >
> > Robert:
> >
> > Yes, exactly. In an offlist email exchange, he clarified this for me,
> > and I suggested exactly what you did, also with the cautions that his
> > initial ad hoc suggestions were unwise. His subsequent post to R-help
> > and the sig-mixed-models lists were the result, although he appears to
> > have specified the model incorrectly in his glmer function (as
> > (1|Group) instead of (1|subject).
> >
> > Cheers,
> > Bert
> >
> > On Thu, Feb 10, 2011 at 9:55 AM, Robert A
> LaBudde <<mailto:[hidden email]><mailto:[hidden email]>[hidden email]> wrote:
> > > prop.test() is applicable to a binomial
> experiment in each of two classes.
> > >
> > > Your experiment is binomial only at the subject level. You then have
> > > multiple subjects in each of your groups.
> > >
> > > You have a random factor "Subjects" that must be accounted for.
> > >
> > > The best way to analyze is a generalized
> linear mixed model with a binomial
> > > distribution family and a logit or probit link. You will probably have to
> > > investigate overdispersion. If you have a small number of subjects, and
> > > don't care about the among-subject effect, you can model them as fixed
> > > effects and use glm() instead.
> > >
> > > Your original question, I believe, related to doing an ANOVA assuming
> > > normality. In order for this to work with
> this kind of proportion problem,
> > > you generally won't get good results unless the number of replicates per
> > > subject is 12 or more, and the proportions
> involved are within 0.15 to 0.85.
> > > Otherwise you will have biased confidence
> intervals and significance tests.
> > >
> > >
> > >
> > > At 07:51 PM 2/9/2011, array chip wrote:
> > >>
> > >> Content-type: text/plain
> > >> Content-disposition: inline
> > >> Content-length: 2969
> > >>
> > >> Hi Bert,
> > >>
> > >> Thanks for your reply. If I understand correctly, prop.test() is not
> > >> suitable to
> > >> my situation. The input to prop.test() is 2 numbers for each group (# of
> > >> success
> > >> and # of trials, for example, groups 1 has 5 success out of 10 trials;
> > >> group 2
> > >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> > >> probability of
> > >> success is the same across groups.
> > >>
> > >> In my case, each group has several
> subjects and each subject has 2 numbers
> > >> (#
> > >> success and # trials). So
> > >>
> > >> for group 1:
> > >> subject 1: 5 success, 10 trials
> > >> subject 2: 3 success, 8 trials
> > >> :
> > >> :
> > >>
> > >> for group 2:
> > >> subject a: 7 success, 9 trials
> > >> subject b: 6 success, 7 trials
> > >> :
> > >> :
> > >>
> > >> I want to test whether the probability of success in group 1 is the same
> > >> as in
> > >> group 2. It's like comparing 2 groups of samples using t test, what I am
> > >> uncertain about is that whether regular t
> test (or non-pamametric test) is
> > >> still
> > >> appropriate here when the response variable is actually proportions.
> > >>
> > >> I guess prop.test() can not be used with my dataset, or I may be wrong?
> > >>
> > >> Thanks
> > >>
> > >> John
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> ________________________________
> > >> From: Bert Gunter
> <<mailto:[hidden email]><mailto:[hidden email]>[hidden email]>
> > >>
> > >> Sent: Wed, February 9, 2011 3:58:05 PM
> > >> Subject: Re: [R] comparing proportions
> > >>
> > >> 1. Is this a homework problem?
> > >>
> > >> 2. ?prop.test
> > >>
> > >> 3. If you haven't done so already, get and consult a basic statistical
> > >> methods book to help you with questions such as this.
> > >>
> > >> -- Bert
> > >>
> > >>
> > >> > Hi, I have a dataset that has 2 groups
> of samples. For each sample, then
> > >> > response measured is the number of success (no.success) obatined with
> > >> > the
> > >> >number
> > >> > of trials (no.trials). So a porportion
> of success (prpop.success) can be
> > >> > computed as no.success/no.trials. Now
> the objective is to test if there
> > >> > is a
> > >> > statistical significant difference in
> the proportion of success between
> > >> > the 2
> > >> > groups of samples (say n1=20, n2=30).
> > >> >
> > >> > I can think of 2 ways to do the test:
> > >> >
> > >> > 1. regular t test based on the variable prop.success
> > >> > 2. Mann-Whitney test based on the variable prop.success
> > >> > 2. do a binomial regression as:
> > >> >    fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> > >> >          family=binomial)
> > >> >    anova(fit, test='Chisq')
> > >> >
> > >> > My questions is:
> > >> > 1. Is t test appropriate for comparing 2 groups of proportions?
> > >> > 2. how about Mann-Whitney non-parametric test?
> > >> > 3. Among the 3, which technique is more appropriate?
> > >> > 4. any other technique you can suggest?
> > >> >
> > >> > Thank you,
> > >> >
> > >> > John
> > >> >
> > >> >
> > >> >
> > >> >        [[alternative HTML version deleted]]
> > >> >
> > >> >
> > >> > ______________________________________________
> > >> >
> <mailto:[hidden email]><mailto:[hidden email]>[hidden email]
> mailing list
> > >> >
> <<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > PLEASE do read the posting guide
> > >> >
> <http://www.R-project.org/posting-guide.html><http://www.r-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> > >> > and provide commented, minimal, self-contained, reproducible code.
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Bert Gunter
> > >> Genentech Nonclinical Biostatistics
> > >>
> > >>
> > >>
> > >>
> > >>        [[alternative HTML version deleted]]
> > >>
> > >>
> > >> ______________________________________________
> > >>
> <mailto:[hidden email]><mailto:[hidden email]>[hidden email]
> mailing list
> > >>
> <<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >>
> <<http://www.r-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ================================================================
> > > Robert A. LaBudde, PhD, PAS, Dpl.
> ACAFS  e-mail: <mailto:[hidden email]><mailto:[hidden email]>[hidden email]
> > > Least Cost Formulations,
> Ltd.            URL: <http://lcfltd.com/><http://lcfltd.com/>http://lcfltd.com/
> > > 824 Timberlake Drive                    Tel: 757-467-0954
> > > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> > >
> > > "Vere scire est per causas scire"
> > > ================================================================
> > >
> > >
> >
> >
> >
> > --
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > 467-7374
> >
> <http://devo.gene.com/groups/devo/depts/ncb/home.shtml><http://devo.gene.com/groups/devo/depts/ncb/home.shtml>http://devo.gene.com/groups/devo/depts/ncb/home.shtml
>
>================================================================
>Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail:
><mailto:[hidden email]>[hidden email]
>Least Cost Formulations, Ltd.            URL:
><http://lcfltd.com/>http://lcfltd.com/
>824 Timberlake Drive                    Tel: 757-467-0954
>Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
>"Vere scire est per causas scire"
>================================================================
>
>

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...