How to test a difference in ratios of count data in R

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to test a difference in ratios of count data in R

szhan
Hello R-experts,
I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.

I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.


data test;
input replicate treatment$ n X;
datalines;
1 A 32 4
1 B 33 18
2 A 20 6
2 B 21 18
3 A 7 0
3 B 8 4
;

proc glimmix data=test;
class replicate treatment;
model X/n = treatment / solution;
random intercept / subject=replicate;
run;

ods select lsmeans;
proc glimmix data=test;
class replicate treatment;
model X/n = treatment / solution;
random intercept / subject=replicate;
lsmeans treatment / cl ilink;
run;

I appreciate your help in advance!
Joshua


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to test a difference in ratios of count data in R

glsnow
There are multiple ways of doing this, but here are a couple.

To just test the fixed effect of treatment you can use the glm function:

test <- read.table(text="
replicate treatment n X
1 A 32 4
1 B 33 18
2 A 20 6
2 B 21 18
3 A 7 0
3 B 8 4
", header=TRUE)

fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
summary(fit1)

Note that the default baseline value may differ between R and SAS,
which would result in a reversed sign on the slope coefficient (and
different intercept).

To include replicate as a random effect you need an additional
package, here I use lme4 and the glmer function:

library(lme4)
fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
family=binomial)
summary(fit2)



On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <[hidden email]> wrote:

> Hello R-experts,
> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.
>
> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.
>
>
> data test;
> input replicate treatment$ n X;
> datalines;
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ;
>
> proc glimmix data=test;
> class replicate treatment;
> model X/n = treatment / solution;
> random intercept / subject=replicate;
> run;
>
> ods select lsmeans;
> proc glimmix data=test;
> class replicate treatment;
> model X/n = treatment / solution;
> random intercept / subject=replicate;
> lsmeans treatment / cl ilink;
> run;
>
> I appreciate your help in advance!
> Joshua
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to test a difference in ratios of count data in R

David Winsemius

> On Sep 28, 2016, at 9:49 AM, Greg Snow <[hidden email]> wrote:
>
> There are multiple ways of doing this, but here are a couple.
>
> To just test the fixed effect of treatment you can use the glm function:
>
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
>
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
>
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
>
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
>
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
>
>
>
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <[hidden email]> wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.
>>
>> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.
>>
>>
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>

Greg has already shown you how that is done in R and how to do logistic regression:

#  I usually think of Poisson regression when I hear a desire is to model ratios of counts that have a denominator. The log(sample_size) is supplied as an offset to correct for the variation in size of subsamples.


fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
summary(fit1)

#  And the lme4 analogue with replication:

library(lme4)
fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
family=poisson)
summary(fit2)
#----output----
Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
 [glmerMod]
 Family: poisson  ( log )
Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid
    31.9     31.3    -13.0     25.9        3

Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.0504 -0.4146 -0.3487  0.3956  1.0791

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.03159  0.1777  
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
treatmentB    1.3365     0.3529   3.787 0.000152 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.838

Compare with the binomial model:
#============


 fitBin <- glmer( cbind(X,n-X) ~ treatment +  (1|replicate), data=test,
 family=binomial)
 coef(fitBin)
#----
$replicate
  (Intercept) treatmentB
1  -2.0487694   2.364695
2  -0.9908556   2.364695
3  -2.1844435   2.364695

attr(,"class")
[1] "coef.mer"
#-----
 summary(fitBin)
#---------
Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
 [glmerMod]
 Family: binomial  ( logit )
Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid
    30.1     29.4    -12.0     24.1        3

Scaled residuals:
     Min       1Q   Median       3Q      Max
-0.88757 -0.35065 -0.03137  0.26897  0.67505

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.4123   0.6421  
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7442     0.5438  -3.208  0.00134 **
treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.568

The binomial model has a logit link. Your glimmix procedure appears to have a gaussian/normal distributional assumption and an identity link by default. If we run this using those assumptions in lme4::glmer we get these results (with a warning that in this case we can overlook since the results with lmer turned out to be identical)
#--------
 fitNorm <- glmer( I(X/n) ~ treatment +  (1|replicate), data=test,
                  family=gaussian)

#-------
Warning message:
In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
  calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
> coef(fitNorm); summary(fitNorm)
$replicate
  (Intercept) treatmentB
1 0.091096925  0.4925325
2 0.324579602  0.4925325
3 0.009323473  0.4925325

attr(,"class")
[1] "coef.mer"
Linear mixed model fit by REML ['lmerMod']
Formula: I(X/n) ~ treatment + (1 | replicate)
   Data: test

REML criterion at convergence: -4.2

Scaled residuals:
    Min      1Q  Median      3Q     Max
-0.7864 -0.4278 -0.1152  0.5143  0.8246

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.027895 0.16702
 Residual              0.002356 0.04854
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.14167    0.10042   1.411
treatmentB   0.49253    0.03963  12.427

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.197

That's (probably) the model to compare to your SAS results if my reading of the SAS Proc GLIMMIX manual page is correct.

--
David.

>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>>
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>>
>> I appreciate your help in advance!
>> Joshua
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to test a difference in ratios of count data in R

glsnow
In reply to this post by glsnow
It is usually best to keep these discussions on the list.  Someone
else may have a better answer than mine, or be able to respond
quicker, and if I answer on R-help then it is community
service/involvement.  If I respond directly then it is consulting and
then we need a contract and I have to charge you (not that I would see
the money myself, but it would help my budget be a little less red).

For your question, your fit4 and my fit2 are just 2 different ways of
fitting the exact same model to the exact same data, so there is no
surprise that the results match.  Which one to use is personal
preference.

The line that starts with "treatmentB" is the coefficient
(log-odds-ratio) for B compared to A, so that is the main line to look
at for interpretation.

The correlation of the fixed effects is mainly there for diagnostics,
if it is too close to -1 or 1 then that indicates that assumptions may
not hold, or computations may be in doubt.  Your value is not of
concern.


On Wed, Sep 28, 2016 at 2:14 PM, Shuhua Zhan <[hidden email]> wrote:

> Hi Greg,
> Thank you very much for your help!
> I'd like to use glmer. From the output of summary(fit2) as below, Could I
> draw a conclusion that the treatment B
> significantly increases the counts of x group (p=6.11e-07)? I'm wondering if
> I could know that the treatment B
> significantly increases the ratio of x group (X/n) and how I could obtain
> the mean ratios of treatment A and B.
> To this end, should I fit the model using the ratio of X group (X/n)?  I
> tried it as
> fit4 <- glmer( X/n ~ treatment + (1|replicate), data=test, family=binomial,
> weights=n)
> but summary(fit4) is the same as summary(fit2).
> I also don't know how to interpret "Correlation of Fixed Effects: treatmentB
> -0.568 in the output.
>> summary(fit2)
> Generalized linear mixed model fit by maximum likelihood (Laplace
>   Approximation) [glmerMod]
>  Family: binomial  ( logit )
> Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
>    Data: test
>
>      AIC      BIC   logLik deviance df.resid
>     30.1     29.4    -12.0     24.1        3
>
> Scaled residuals:
>      Min       1Q   Median       3Q      Max
> -0.88757 -0.35065 -0.03137  0.26897  0.67505
>
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  replicate (Intercept) 0.4123   0.6421
> Number of obs: 6, groups:  replicate, 3
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -1.7442     0.5438  -3.208  0.00134 **
> treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
>            (Intr)
> treatmentB -0.568
>
> Thanks again,
> Joshua
>
> ________________________________
> From: Greg Snow <[hidden email]>
> Sent: Wednesday, September 28, 2016 12:49:49 PM
> To: Shuhua Zhan
> Cc: [hidden email]
> Subject: Re: [R] How to test a difference in ratios of count data in R
>
> There are multiple ways of doing this, but here are a couple.
>
> To just test the fixed effect of treatment you can use the glm function:
>
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
>
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
>
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
>
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
>
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
>
>
>
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <[hidden email]> wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups differ
>> across two distinct treatments. For example, we have three replicates of
>> treatment A, and three replicates of treatment B. For each treatment, we
>> have counts X from one group and counts Y from another group. My
>> understanding is that that GLIMMIX procedure in SAS can calculate whether
>> the ratio of counts in one group (X/X+Y) significantly differs between
>> treatments.
>>
>> I think this is the way you do it in SAS. The replicate and treatment
>> variables are self-explanatory. The first number (n) refers to the total
>> counts X + Y; the second number (X) refers to the counts X. Is there a way
>> to do this in R? Since we have 20,000 datasets to be tested, it may be
>> easier to retrive the significant test as the given dataset below and its
>> p>F value and mean ratios of treatments in R than SAS.
>>
>>
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>>
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>>
>> I appreciate your help in advance!
>> Joshua
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> [hidden email]



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to test a difference in ratios of count data in R

szhan
In reply to this post by David Winsemius

Thank you, David and Greg for your help!

I drew conclusion that the treatment B significantly increases the ratio of x group (X/n) from  based on p values from the treatmentB line of the outputs at logistic reg. and Poisson reg.(p=6.11e-07, Logistic; p=0.000152, Poisson). I'm wondering whether the significance of the (Intercept) line in both outputs affect my conclusion.

Logistic reg.  Fixed effects:

            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.7442     0.5438  -3.208  0.00134 **
treatmentB    2.3647     0.4741   4.988 6.11e-07 ***


Poisson reg. Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
treatmentB    1.3365     0.3529   3.787 0.000152 ***


I think I may use the command below to obtain the mean ratios of x group in treatment A and B for Logistic reg. but I have not figured out yet for Poisson reg.


> tapply(predict(fit2, type="response"), test$treatment, mean)
        A         B
0.1620254 0.6404239

I'll appreciate if you know the command for Poisson reg.
Joshua



________________________________
From: David Winsemius <[hidden email]>
Sent: Wednesday, September 28, 2016 4:54:46 PM
To: Shuhua Zhan
Cc: [hidden email]; Greg Snow
Subject: Re: [R] How to test a difference in ratios of count data in R


> On Sep 28, 2016, at 9:49 AM, Greg Snow <[hidden email]> wrote:
>
> There are multiple ways of doing this, but here are a couple.
>
> To just test the fixed effect of treatment you can use the glm function:
>
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
>
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
>
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
>
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
>
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
>
>
>
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <[hidden email]> wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.
>>
>> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.
>>
>>
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>
Greg has already shown you how that is done in R and how to do logistic regression:

#  I usually think of Poisson regression when I hear a desire is to model ratios of counts that have a denominator. The log(sample_size) is supplied as an offset to correct for the variation in size of subsamples.


fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
summary(fit1)

#  And the lme4 analogue with replication:

library(lme4)
fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
family=poisson)
summary(fit2)
#----output----
Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
 [glmerMod]
 Family: poisson  ( log )
Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid
    31.9     31.3    -13.0     25.9        3

Scaled residuals:
    Min      1Q  Median      3Q     Max
-1.0504 -0.4146 -0.3487  0.3956  1.0791

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.03159  0.1777
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
treatmentB    1.3365     0.3529   3.787 0.000152 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.838

Compare with the binomial model:
#============


 fitBin <- glmer( cbind(X,n-X) ~ treatment +  (1|replicate), data=test,
 family=binomial)
 coef(fitBin)
#----
$replicate
  (Intercept) treatmentB
1  -2.0487694   2.364695
2  -0.9908556   2.364695
3  -2.1844435   2.364695

attr(,"class")
[1] "coef.mer"
#-----
 summary(fitBin)
#---------
Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
 [glmerMod]
 Family: binomial  ( logit )
Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid
    30.1     29.4    -12.0     24.1        3

Scaled residuals:
     Min       1Q   Median       3Q      Max
-0.88757 -0.35065 -0.03137  0.26897  0.67505

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.4123   0.6421
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.7442     0.5438  -3.208  0.00134 **
treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.568

The binomial model has a logit link. Your glimmix procedure appears to have a gaussian/normal distributional assumption and an identity link by default. If we run this using those assumptions in lme4::glmer we get these results (with a warning that in this case we can overlook since the results with lmer turned out to be identical)
#--------
 fitNorm <- glmer( I(X/n) ~ treatment +  (1|replicate), data=test,
                  family=gaussian)

#-------
Warning message:
In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
  calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
> coef(fitNorm); summary(fitNorm)
$replicate
  (Intercept) treatmentB
1 0.091096925  0.4925325
2 0.324579602  0.4925325
3 0.009323473  0.4925325

attr(,"class")
[1] "coef.mer"
Linear mixed model fit by REML ['lmerMod']
Formula: I(X/n) ~ treatment + (1 | replicate)
   Data: test

REML criterion at convergence: -4.2

Scaled residuals:
    Min      1Q  Median      3Q     Max
-0.7864 -0.4278 -0.1152  0.5143  0.8246

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.027895 0.16702
 Residual              0.002356 0.04854
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.14167    0.10042   1.411
treatmentB   0.49253    0.03963  12.427

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.197

That's (probably) the model to compare to your SAS results if my reading of the SAS Proc GLIMMIX manual page is correct.

--
David.

>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>>
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>>
>> I appreciate your help in advance!
>> Joshua
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> [hidden email]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to test a difference in ratios of count data in R

David Winsemius

> On Sep 30, 2016, at 9:40 AM, Shuhua Zhan <[hidden email]> wrote:
>
>
> Thank you, David and Greg for your help!
> I drew conclusion that the treatment B significantly increases the ratio of x group (X/n) from  based on p values from the treatmentB line of the outputs at logistic reg. and Poisson reg.(p=6.11e-07, Logistic; p=0.000152, Poisson). I'm wondering whether the significance of the (Intercept) line in both outputs affect my conclusion.
> Logistic reg.  Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -1.7442     0.5438  -3.208  0.00134 **

As in most regression situations, the p-value is calculated on the basis of a null hypothesis that the baseline assemblage of covariates will be associated with an "outcome" of one on the transformed scale of analysis. For logistic regression this would imply  an odds ratio of 1.0 and that the baseline covariates would then be associated with a 50:50 split of events and non-events. This is often not a hypothesis of particular interest, so generally the advice is to ignore that p-value.


> treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
>
> Poisson reg. Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***

For Poisson regression the Intercept is the log( rate) at a covariate value of again zero for continuous variables and baseline for factors.  The null hypothesis tested is that the log(rate) is zero. This is often not a meaningful hypothesis, (especially if age is a covariate as it usually is events in a medical settings).


> treatmentB    1.3365     0.3529   3.787 0.000152 ***
>
> I think I may use the command below to obtain the mean ratios of x group in treatment A and B for Logistic reg. but I have not figured out yet for Poisson reg.
>
> > tapply(predict(fit2, type="response"), test$treatment, mean)
>         A         B
> 0.1620254 0.6404239
>
> I'll appreciate if you know the command for Poisson reg.

Should be the same call but the interpretation is different as mentioned above. The ?predict.glm halp page says " the alternative "response" is on the scale of the response variable. " and that is no limited to logistic regression despite the fact that the next sentence applies that principle to the logistic case.

This is knowledge one would probably get in a third or fourth semester stats course, at least based on my experience 30 year ago. This is really not what the Rhelp mailing list is set up to provide, (nor am I necessarily the most qualified to provide online statistics education) so I'm suggesting you pose any further question to a forum that is designed to address such matters. Go to http://stats.stackexchange.com/  or find an outlet that is designed for self-learning of statistical concepts.

This is material that I learned by both formal education and study of Breslow and Day's two volume text "Statistical Methods in Cancer Research" which I believe is a classic and still useful resource. It used GLIM as the coding platform, but I found transferring from GLIM to R rather easy. I can also recommend Venables and Ripley's "Modern Applied Statistics with S-Plus" which in it's 4th edition had R-specific annotations. It covers a wide range of applications, but does assume significant prior training.

I encourage other respondents who have different recommendations for more recent texts suitable for self-study of using R for generalized linear models with replication to chime in here, since I think such recommendations _are_ on-topic. I hesitate to recommend "Mixed Effects Models with S and S-Plus" because so much has been changed in more recent versions of package lme4 that I find it very error prone to use. I wish the effort to write a replacement had continued but my searches have suggested to me that it has not happened. I'd be happy to be corrected.


Best;

David.



> Joshua
>
>
> From: David Winsemius <[hidden email]>
> Sent: Wednesday, September 28, 2016 4:54:46 PM
> To: Shuhua Zhan
> Cc: [hidden email]; Greg Snow
> Subject: Re: [R] How to test a difference in ratios of count data in R
>  
>
> > On Sep 28, 2016, at 9:49 AM, Greg Snow <[hidden email]> wrote:
> >
> > There are multiple ways of doing this, but here are a couple.
> >
> > To just test the fixed effect of treatment you can use the glm function:
> >
> > test <- read.table(text="
> > replicate treatment n X
> > 1 A 32 4
> > 1 B 33 18
> > 2 A 20 6
> > 2 B 21 18
> > 3 A 7 0
> > 3 B 8 4
> > ", header=TRUE)
> >
> > fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> > summary(fit1)
> >
> > Note that the default baseline value may differ between R and SAS,
> > which would result in a reversed sign on the slope coefficient (and
> > different intercept).
> >
> > To include replicate as a random effect you need an additional
> > package, here I use lme4 and the glmer function:
> >
> > library(lme4)
> > fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> > family=binomial)
> > summary(fit2)
> >
> >
> >
> > On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <[hidden email]> wrote:
> >> Hello R-experts,
> >> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.
> >>
> >> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.
> >>
> >>
> >> data test;
> >> input replicate treatment$ n X;
> >> datalines;
> >> 1 A 32 4
> >> 1 B 33 18
> >> 2 A 20 6
> >> 2 B 21 18
> >> 3 A 7 0
> >> 3 B 8 4
> >> ;
> >>
>
> Greg has already shown you how that is done in R and how to do logistic regression:
>
> #  I usually think of Poisson regression when I hear a desire is to model ratios of counts that have a denominator. The log(sample_size) is supplied as an offset to correct for the variation in size of subsamples.
>
>
> fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
> summary(fit1)
>
> #  And the lme4 analogue with replication:
>
> library(lme4)
> fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
> family=poisson)
> summary(fit2)
> #----output----
> Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
>  [glmerMod]
>  Family: poisson  ( log )
> Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
>    Data: test
>
>      AIC      BIC   logLik deviance df.resid
>     31.9     31.3    -13.0     25.9        3
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -1.0504 -0.4146 -0.3487  0.3956  1.0791
>
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  replicate (Intercept) 0.03159  0.1777  
> Number of obs: 6, groups:  replicate, 3
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
> treatmentB    1.3365     0.3529   3.787 0.000152 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
>            (Intr)
> treatmentB -0.838
>
> Compare with the binomial model:
> #============
>
>
>  fitBin <- glmer( cbind(X,n-X) ~ treatment +  (1|replicate), data=test,
>  family=binomial)
>  coef(fitBin)
> #----
> $replicate
>   (Intercept) treatmentB
> 1  -2.0487694   2.364695
> 2  -0.9908556   2.364695
> 3  -2.1844435   2.364695
>
> attr(,"class")
> [1] "coef.mer"
> #-----
>  summary(fitBin)
> #---------
> Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
>  [glmerMod]
>  Family: binomial  ( logit )
> Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
>    Data: test
>
>      AIC      BIC   logLik deviance df.resid
>     30.1     29.4    -12.0     24.1        3
>
> Scaled residuals:
>      Min       1Q   Median       3Q      Max
> -0.88757 -0.35065 -0.03137  0.26897  0.67505
>
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  replicate (Intercept) 0.4123   0.6421  
> Number of obs: 6, groups:  replicate, 3
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  -1.7442     0.5438  -3.208  0.00134 **
> treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
>            (Intr)
> treatmentB -0.568
>
> The binomial model has a logit link. Your glimmix procedure appears to have a gaussian/normal distributional assumption and an identity link by default. If we run this using those assumptions in lme4::glmer we get these results (with a warning that in this case we can overlook since the results with lmer turned out to be identical)
> #--------
>  fitNorm <- glmer( I(X/n) ~ treatment +  (1|replicate), data=test,
>                   family=gaussian)
>
> #-------
> Warning message:
> In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
>   calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
> > coef(fitNorm); summary(fitNorm)
> $replicate
>   (Intercept) treatmentB
> 1 0.091096925  0.4925325
> 2 0.324579602  0.4925325
> 3 0.009323473  0.4925325
>
> attr(,"class")
> [1] "coef.mer"
> Linear mixed model fit by REML ['lmerMod']
> Formula: I(X/n) ~ treatment + (1 | replicate)
>    Data: test
>
> REML criterion at convergence: -4.2
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -0.7864 -0.4278 -0.1152  0.5143  0.8246
>
> Random effects:
>  Groups    Name        Variance Std.Dev.
>  replicate (Intercept) 0.027895 0.16702
>  Residual              0.002356 0.04854
> Number of obs: 6, groups:  replicate, 3
>
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  0.14167    0.10042   1.411
> treatmentB   0.49253    0.03963  12.427
>
> Correlation of Fixed Effects:
>            (Intr)
> treatmentB -0.197
>
> That's (probably) the model to compare to your SAS results if my reading of the SAS Proc GLIMMIX manual page is correct.
>
> --
> David.
>
> >> proc glimmix data=test;
> >> class replicate treatment;
> >> model X/n = treatment / solution;
> >> random intercept / subject=replicate;
> >> run;
> >>
> >> ods select lsmeans;
> >> proc glimmix data=test;
> >> class replicate treatment;
> >> model X/n = treatment / solution;
> >> random intercept / subject=replicate;
> >> lsmeans treatment / cl ilink;
> >> run;
> >>
> >> I appreciate your help in advance!
> >> Joshua
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > [hidden email]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.