Zero-inflated Negat. Binom. model

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Zero-inflated Negat. Binom. model

Luciano La Sala-2
Dear R crew:
 
I am sorry this question has been posted before, but I can't seem to solve
this problem yet.
I have a simple dataset consisting of two variables: cestode intensity and
chick size (defined as CAPI).
Intensity is a count and clearly overdispersed, with way too many zeroes.
I'm interested in looking at the association between these two variables,
i.e. how well does chick size predict tape intensity?
 
Since I have a small sample size, I fit a zero inflated negat. Binomial (not
Poisson) model using the "pscl" package.
 
I built tried two models and got the outputs below.
 
> model <- zeroinfl(Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
 
Call:
zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
 
Count model coefficients (negbin with log link):
(Intercept)         CAPI  
   -2.99182      0.06817  
Theta = 0.4528
 
Zero-inflation model coefficients (binomial with logit link):
(Intercept)         CAPI  
    12.1364      -0.1572  
 
> summary(model)
 
Call:
zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
 
Pearson residuals:
     Min       1Q   Median       3Q      Max
-0.62751 -0.38842 -0.21303 -0.06899  7.29566
 
Count model coefficients (negbin with log link):
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -2.99182    3.39555  -0.881   0.3783  
CAPI         0.06817    0.04098   1.664   0.0962 .
Log(theta)  -0.79222    0.45031  -1.759   0.0785 .
 
Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) 12.13636    3.71918   3.263  0.00110 **
CAPI        -0.15720    0.04989  -3.151  0.00163 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
Theta = 0.4528
Number of iterations in BFGS optimization: 1
Log-likelihood: -140.2 on 5 Df
 
QUESTIONS
 
1. Is my model adequately specified?
 
2. CAPI is included in blocks 1 of output containing negative binomial
regression coefficients for CAPI, and is also included also in block 2
corresponding to the inflation model. Does this make sense?  
 
If I specify my model slightly differently, I get what I believe is more
reasonable results:
 
> model12 <- zeroinfl(Int_Cesto ~ 1|CAPI, dist = "negbin", EM = TRUE)
> model12
 
Call:
zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)
 
Count model coefficients (negbin with log link):
(Intercept)  
      2.692  
Theta = 0.4346
 
Zero-inflation model coefficients (binomial with logit link):
(Intercept)         CAPI  
    13.2476      -0.1708  
 
> summary(model12)
 
Call:
zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)
 
Pearson residuals:
     Min       1Q   Median       3Q      Max
-0.61616 -0.36902 -0.19466 -0.06666  4.85481
 
Count model coefficients (negbin with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.6924     0.3031   8.883   <2e-16 ***
Log(theta)   -0.8334     0.4082  -2.042   0.0412 *  
 
Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 13.24757    3.64531   3.634 0.000279 ***
CAPI        -0.17078    0.04921  -3.471 0.000519 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
Theta = 0.4346
Number of iterations in BFGS optimization: 1
Log-likelihood: -141.9 on 4 Df
 
QUESTION:
 
1.    Is this model specification and output more reasonable?
2.    CAPI appears only in the second block that corresponds to the
inflation model.
 
 
Thanks in advance!
Luciano      
 

 

 

 


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Zero-inflated Negat. Binom. model

Achim Zeileis-4
On Thu, 11 Feb 2010, Luciano La Sala wrote:

> Dear R crew:
>
> I am sorry this question has been posted before, but I can't seem to solve
> this problem yet.

Just for the others reader who might not recall that you asked virtually
the same question last week. This was my reply:
   https://stat.ethz.ch/pipermail/r-help/2010-February/227040.html

> I have a simple dataset consisting of two variables: cestode intensity and
> chick size (defined as CAPI).
> Intensity is a count and clearly overdispersed, with way too many zeroes.
> I'm interested in looking at the association between these two variables,
> i.e. how well does chick size predict tape intensity?
>
> Since I have a small sample size, I fit a zero inflated negat. Binomial (not
> Poisson) model using the "pscl" package.
>
> I built tried two models and got the outputs below.
>
>> model <- zeroinfl(Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Count model coefficients (negbin with log link):
> (Intercept)         CAPI
>   -2.99182      0.06817
> Theta = 0.4528
>
> Zero-inflation model coefficients (binomial with logit link):
> (Intercept)         CAPI
>    12.1364      -0.1572
>
>> summary(model)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Pearson residuals:
>     Min       1Q   Median       3Q      Max
> -0.62751 -0.38842 -0.21303 -0.06899  7.29566
>
> Count model coefficients (negbin with log link):
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.99182    3.39555  -0.881   0.3783
> CAPI         0.06817    0.04098   1.664   0.0962 .
> Log(theta)  -0.79222    0.45031  -1.759   0.0785 .
>
> Zero-inflation model coefficients (binomial with logit link):
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) 12.13636    3.71918   3.263  0.00110 **
> CAPI        -0.15720    0.04989  -3.151  0.00163 **
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Theta = 0.4528
> Number of iterations in BFGS optimization: 1
> Log-likelihood: -140.2 on 5 Df
>
> QUESTIONS
>
> 1. Is my model adequately specified?

See my reply linked above. I guess it's hard to say more than that.

> 2. CAPI is included in blocks 1 of output containing negative binomial
> regression coefficients for CAPI, and is also included also in block 2
> corresponding to the inflation model. Does this make sense?

Dito.

> If I specify my model slightly differently, I get what I believe is more
> reasonable results:
>
>> model12 <- zeroinfl(Int_Cesto ~ 1|CAPI, dist = "negbin", EM = TRUE)
>> model12
>
> Call:
> zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)
>
> Count model coefficients (negbin with log link):
> (Intercept)
>      2.692
> Theta = 0.4346
>
> Zero-inflation model coefficients (binomial with logit link):
> (Intercept)         CAPI
>    13.2476      -0.1708
>
>> summary(model12)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)
>
> Pearson residuals:
>     Min       1Q   Median       3Q      Max
> -0.61616 -0.36902 -0.19466 -0.06666  4.85481
>
> Count model coefficients (negbin with log link):
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)   2.6924     0.3031   8.883   <2e-16 ***
> Log(theta)   -0.8334     0.4082  -2.042   0.0412 *
>
> Zero-inflation model coefficients (binomial with logit link):
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) 13.24757    3.64531   3.634 0.000279 ***
> CAPI        -0.17078    0.04921  -3.471 0.000519 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Theta = 0.4346
> Number of iterations in BFGS optimization: 1
> Log-likelihood: -141.9 on 4 Df
>
> QUESTION:
>
> 1.    Is this model specification and output more reasonable?
> 2.    CAPI appears only in the second block that corresponds to the
> inflation model.

You can apply standard model selection techniques. Hands-on examples for
that are in the vignette that I pointed you to last week:
   vignette("countreg", package = "pscl")

Different model selection strategies may however yield different models.
AIC will prefer the model with CAPI in the count equation. In contrast,
a Wald test at 5% level would drop it. You could also look at the BIC and
at the LR test. My guess is though that the answer will be: CAPI has some
weak but practically not very relevant influence on the mean in the count
component.

But I strongly recommend that you try to get more familiar with the
zero-inflated model and the general model selection strategies. Or you
could try to get help from a local statistician to obtain an appropriate
model and interpret its results.

hth,
Z

>
> Thanks in advance!
> Luciano
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Question on funcion and npar

Luciano La Sala-2
Dear everyone,

The following function, taken from Quick-R, gets measures of central
tendency and spread for a numeric vector x.

I can't figure out what the argument npar means in each instance.
Any tips will be most appreciated.

mysummary <- function(x, npar=TRUE, print=TRUE) {
   if (!npar) {
     center <- mean(x); spread <- sd(x)
   } else {
     center <- median(x); spread <- mad(x)
   }
   if (print & !npar) {
     cat("Mean=", center, "\n", "SD=", spread, "\n")
   } else if (print & npar) {
     cat("Median=", center, "\n", "MAD=", spread, "\n")
   }
   result <- list(center=center,spread=spread)
   return(result)
}


--
Luciano F. La Sala
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
Cátedra de Epidemiología
Departamento de Biología, Bioquímica y Farmacia
Universidad Nacional del Sur
San Juan 670
Bahía Blanca (8000)
Argentina

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on funcion and npar

Marc Schwartz-3

> On Apr 21, 2015, at 1:05 PM, Luciano La Sala <[hidden email]> wrote:
>
> Dear everyone,
>
> The following function, taken from Quick-R, gets measures of central tendency and spread for a numeric vector x.
>
> I can't figure out what the argument npar means in each instance.
> Any tips will be most appreciated.
>
> mysummary <- function(x, npar=TRUE, print=TRUE) {
>  if (!npar) {
>    center <- mean(x); spread <- sd(x)
>  } else {
>    center <- median(x); spread <- mad(x)
>  }
>  if (print & !npar) {
>    cat("Mean=", center, "\n", "SD=", spread, "\n")
>  } else if (print & npar) {
>    cat("Median=", center, "\n", "MAD=", spread, "\n")
>  }
>  result <- list(center=center,spread=spread)
>  return(result)
> }
>


Presumably ‘nonparametric’, since median()/mad() is used in lieu of mean()/sd(), if npar = TRUE.

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on funcion and npar

Bert Gunter
In reply to this post by Luciano La Sala-2
Have you gone thruway any T tutorials yet? This is a very basic question
that I do not believe would arise if you had done so.

The answer is that it is a logical that controls whether means and sd's or
medians and mads are calculated and returned. But I don't think this answer
will be comprehensible if you asked the question in the first place. Some
minimum effort must be made to learn the language details to understand
even simple function code.

Cheers,
Bert

On Tuesday, April 21, 2015, Luciano La Sala <[hidden email]>
wrote:

> Dear everyone,
>
> The following function, taken from Quick-R, gets measures of central
> tendency and spread for a numeric vector x.
>
> I can't figure out what the argument npar means in each instance.
> Any tips will be most appreciated.
>
> mysummary <- function(x, npar=TRUE, print=TRUE) {
>   if (!npar) {
>     center <- mean(x); spread <- sd(x)
>   } else {
>     center <- median(x); spread <- mad(x)
>   }
>   if (print & !npar) {
>     cat("Mean=", center, "\n", "SD=", spread, "\n")
>   } else if (print & npar) {
>     cat("Median=", center, "\n", "MAD=", spread, "\n")
>   }
>   result <- list(center=center,spread=spread)
>   return(result)
> }
>
>
> --
> Luciano F. La Sala
> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> Cátedra de Epidemiología
> Departamento de Biología, Bioquímica y Farmacia
> Universidad Nacional del Sur
> San Juan 670
> Bahía Blanca (8000)
> Argentina
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom."
Clifford Stoll

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.