Optimization problem

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimization problem

Alan Harrison-4
Hello Folks,

Very new to R so bear with me, running 5.2 on XP.  Trying to do a zero-inflated negative binomial regression on placental scar data as dependent.  Lactation, location, number of tick larvae present and mass of mouse are independents.  Dataframe and attributes below:


 Location         Lac Scars Lar Mass Lacfac
1   Tullychurry   0     0  15 13.87      0
2      Somerset   0     0   0 15.60      0
3     Tollymore   0     0   3 16.43      0
4     Tollymore   0     0   0 16.55      0
5       Caledon   0     0   0 17.47      0
6  Hillsborough   1     5   0 18.18      1
7       Caledon   0     0   1 19.06      0
8   Portglenone   0     4   0 19.10      0
9   Portglenone   0     5   0 19.13      0
10    Tollymore   0     5   3 19.50      0
11 Hillsborough   1     5   0 19.58      1
12  Portglenone   0     4   0 19.76      0
13      Caledon   0     8   0 19.97      0
14 Hillsborough   1     4   0 20.02      1
15  Tullychurry   0     3   3 20.13      0
16 Hillsborough   1     5   0 20.18      1
17   LoughNavar   1     5   0 20.20      1
18    Tollymore   0     0   1 20.24      0
19 Hillsborough   1     5   0 20.48      1
20      Caledon   0     4   1 20.56      0
21      Caledon   0     3   2 20.58      0
22    Tollymore   0     4   3 20.58      0
23    Tollymore   0     0   2 20.88      0
24 Hillsborough   1     0   0 21.01      1
25  Portglenone   0     5   0 21.08      0
26  Tullychurry   0     2   5 21.28      0
27 Ballysallagh   1     4   0 21.59      1
28      Caledon   0     0   1 21.68      0
29 Hillsborough   1     5   0 22.09      1
30  Tullychurry   0     5   5 22.28      0
31  Tullychurry   1     6  75 22.43      1
32 Ballysallagh   1     5   0 22.57      1
33 Ballysallagh   1     4   0 22.67      1
34   LoughNavar   1     5   3 22.71      1
35 Hillsborough   1     4   0 23.01      1
36      Caledon   0     0   3 23.08      0
37   LoughNavar   1     5   0 23.53      1
38 Ballysallagh   1     4   0 23.55      1
39  Portglenone   1     6   0 23.61      1
40   Mt.Stewart   0     3   0 23.70      0
41     Somerset   0     5   0 23.83      0
42 Ballysallagh   1     5   0 23.93      1
43 Ballysallagh   1     5   0 24.01      1
44      Caledon   0     0   3 24.14      0
45   LoughNavar   0     6   0 24.30      0
46   LoughNavar   1     5   0 24.34      1
47 Hillsborough   1     4   0 24.45      1
48      Caledon   0     3   2 24.55      0
49  Tullychurry   0     5  44 24.83      0
50 Hillsborough   1     5   0 24.86      1
51 Ballysallagh   1     5   0 25.02      1
52  Tullychurry   0     0   9 25.27      0
53   Mt.Stewart   0     5   0 25.31      0
54   LoughNavar   1     4   8 25.43      1
55     Somerset   1     0   0 25.58      1
56 Hillsborough   1     5   0 25.82      1
57  Portglenone   1     2   0 26.02      1
58 Ballysallagh   1     5   0 26.19      1
59   Mt.Stewart   1     0   0 26.66      1
60  Randalstown   1     0   1 26.70      1
61     Somerset   0     4   0 27.01      0
62   Mt.Stewart   0     4   0 27.05      0
63     Somerset   0     3   0 27.10      0
64     Somerset   0     6   0 27.34      0
65     Somerset   0     0   0 27.87      0
66   LoughNavar   1     5   1 28.01      1
67  Tullychurry   1     6  42 28.55      1
68 Hillsborough   1     5   0 28.84      1
69  Portglenone   1     4   0 29.00      1
70     Somerset   1     4   0 31.87      1
71 Ballysallagh   1     5   0 33.06      1
72   LoughNavar   1     4   0 33.24      1
73     Somerset   1     4   0 33.36      1

alan : 'data.frame':    73 obs. of  6 variables:
 $ Location: Factor w/ 10 levels "Ballysallagh",..: 10 8 9 9 2 3 2 6 6 9 ...
 $ Lac     : int  0 0 0 0 0 1 0 0 0 0 ...
 $ Scars   : int  0 0 0 0 0 5 0 4 5 5 ...
 $ Lar     : int  15 0 3 0 0 0 1 0 0 3 ...
 $ Mass    : num  13.9 15.6 16.4 16.6 17.5 ...
 $ Lacfac  : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...

The syntax I used to create the model is:

zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)

The error given is:

Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE,  :
        non-finite value supplied by optim
In addition: Warning message:
fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1), family = binomial())

I understand this is a problem with the model I specified, could anyone help out??

Many thanks

Alan Harrison

Quercus
Queen's University Belfast
MBC, 97 Lisburn Road
Belfast

BT9 7BL

T: 02890 972219
M: 07798615682


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Gabor Grothendieck
Lac and Lacfac are the same.

On 8/21/07, Alan Harrison <[hidden email]> wrote:

> Hello Folks,
>
> Very new to R so bear with me, running 5.2 on XP.  Trying to do a zero-inflated negative binomial regression on placental scar data as dependent.  Lactation, location, number of tick larvae present and mass of mouse are independents.  Dataframe and attributes below:
>
>
>  Location         Lac Scars Lar Mass Lacfac
> 1   Tullychurry   0     0  15 13.87      0
> 2      Somerset   0     0   0 15.60      0
> 3     Tollymore   0     0   3 16.43      0
> 4     Tollymore   0     0   0 16.55      0
> 5       Caledon   0     0   0 17.47      0
> 6  Hillsborough   1     5   0 18.18      1
> 7       Caledon   0     0   1 19.06      0
> 8   Portglenone   0     4   0 19.10      0
> 9   Portglenone   0     5   0 19.13      0
> 10    Tollymore   0     5   3 19.50      0
> 11 Hillsborough   1     5   0 19.58      1
> 12  Portglenone   0     4   0 19.76      0
> 13      Caledon   0     8   0 19.97      0
> 14 Hillsborough   1     4   0 20.02      1
> 15  Tullychurry   0     3   3 20.13      0
> 16 Hillsborough   1     5   0 20.18      1
> 17   LoughNavar   1     5   0 20.20      1
> 18    Tollymore   0     0   1 20.24      0
> 19 Hillsborough   1     5   0 20.48      1
> 20      Caledon   0     4   1 20.56      0
> 21      Caledon   0     3   2 20.58      0
> 22    Tollymore   0     4   3 20.58      0
> 23    Tollymore   0     0   2 20.88      0
> 24 Hillsborough   1     0   0 21.01      1
> 25  Portglenone   0     5   0 21.08      0
> 26  Tullychurry   0     2   5 21.28      0
> 27 Ballysallagh   1     4   0 21.59      1
> 28      Caledon   0     0   1 21.68      0
> 29 Hillsborough   1     5   0 22.09      1
> 30  Tullychurry   0     5   5 22.28      0
> 31  Tullychurry   1     6  75 22.43      1
> 32 Ballysallagh   1     5   0 22.57      1
> 33 Ballysallagh   1     4   0 22.67      1
> 34   LoughNavar   1     5   3 22.71      1
> 35 Hillsborough   1     4   0 23.01      1
> 36      Caledon   0     0   3 23.08      0
> 37   LoughNavar   1     5   0 23.53      1
> 38 Ballysallagh   1     4   0 23.55      1
> 39  Portglenone   1     6   0 23.61      1
> 40   Mt.Stewart   0     3   0 23.70      0
> 41     Somerset   0     5   0 23.83      0
> 42 Ballysallagh   1     5   0 23.93      1
> 43 Ballysallagh   1     5   0 24.01      1
> 44      Caledon   0     0   3 24.14      0
> 45   LoughNavar   0     6   0 24.30      0
> 46   LoughNavar   1     5   0 24.34      1
> 47 Hillsborough   1     4   0 24.45      1
> 48      Caledon   0     3   2 24.55      0
> 49  Tullychurry   0     5  44 24.83      0
> 50 Hillsborough   1     5   0 24.86      1
> 51 Ballysallagh   1     5   0 25.02      1
> 52  Tullychurry   0     0   9 25.27      0
> 53   Mt.Stewart   0     5   0 25.31      0
> 54   LoughNavar   1     4   8 25.43      1
> 55     Somerset   1     0   0 25.58      1
> 56 Hillsborough   1     5   0 25.82      1
> 57  Portglenone   1     2   0 26.02      1
> 58 Ballysallagh   1     5   0 26.19      1
> 59   Mt.Stewart   1     0   0 26.66      1
> 60  Randalstown   1     0   1 26.70      1
> 61     Somerset   0     4   0 27.01      0
> 62   Mt.Stewart   0     4   0 27.05      0
> 63     Somerset   0     3   0 27.10      0
> 64     Somerset   0     6   0 27.34      0
> 65     Somerset   0     0   0 27.87      0
> 66   LoughNavar   1     5   1 28.01      1
> 67  Tullychurry   1     6  42 28.55      1
> 68 Hillsborough   1     5   0 28.84      1
> 69  Portglenone   1     4   0 29.00      1
> 70     Somerset   1     4   0 31.87      1
> 71 Ballysallagh   1     5   0 33.06      1
> 72   LoughNavar   1     4   0 33.24      1
> 73     Somerset   1     4   0 33.36      1
>
> alan : 'data.frame':    73 obs. of  6 variables:
>  $ Location: Factor w/ 10 levels "Ballysallagh",..: 10 8 9 9 2 3 2 6 6 9 ...
>  $ Lac     : int  0 0 0 0 0 1 0 0 0 0 ...
>  $ Scars   : int  0 0 0 0 0 5 0 4 5 5 ...
>  $ Lar     : int  15 0 3 0 0 0 1 0 0 3 ...
>  $ Mass    : num  13.9 15.6 16.4 16.6 17.5 ...
>  $ Lacfac  : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
>
> The syntax I used to create the model is:
>
> zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)
>
> The error given is:
>
> Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE,  :
>        non-finite value supplied by optim
> In addition: Warning message:
> fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1), family = binomial())
>
> I understand this is a problem with the model I specified, could anyone help out??
>
> Many thanks
>
> Alan Harrison
>
> Quercus
> Queen's University Belfast
> MBC, 97 Lisburn Road
> Belfast
>
> BT9 7BL
>
> T: 02890 972219
> M: 07798615682
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Ben Bolker-2
In reply to this post by Alan Harrison-4

  (Hope this gets threaded properly.  Sorry if it doesn't.)

   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
produce NAs (but would produce something like a singular Hessian
and maybe other problems) -- but they're not even specified in this
model.

  The bottom line is that you have a location with a single
observation, so the GLM that zicounts runs to get the initial
parameter values has an unestimable location:mass interaction
for one location, so it gives an NA, so optim complains.

  In gruesome detail:

## set up  data
scardat = read.table("scars.dat",header=TRUE)
library(zicounts)
## try to run model
zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scardat)
## tried to debug this by dumping zicounts.R to a file, modifying
## it to put a "trace" argument in that would print out the parameters
## and log-likelihood for every call to the log-likelihood function.
dump("zicounts",file="zicounts.R")
source("zicounts.R")
zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scardat,trace=TRUE)
## this actually didn't do any good because the negative log-likelihood
## function never gets called -- as it turns out optim() barfs when it
## gets its initial values, before it ever gets to evaluating the
log-likelihood

## check the glm -- this is the equivalent of what zicounts does to
## get the initial values of the x parameters
p1 <- glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
          data=scardat,family="poisson")
which(is.na(coef(p1)))

## find out what the deal is
table(scardat$Location)

scar2 = subset(scardat,Location!="Randalstown")
## first step to removing the bad point from the data set -- but ...
table(scar2$Location)
## it leaves the Location factor with the same levels, so
##  now we have ZERO counts for one location:
## redefine the factor to drop unused levels
scar2$Location <- factor(scar2$Location)
## OK, looks fine now
table(scar2$Location)

zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scar2)
## now we get another error ("system is computationally singular" when
## trying to compute Hessian -- overparameterized?)   Not in any
## trivial way that I can see.  It would be nice to get into the guts
## of zicounts and stop it from trying to invert the Hessian, which is
## I think where this happens.

  In the meanwhile, I have some other  ideas about this analysis (sorry,
but you started it ...)

  Looking at the data in a few different ways:

library(lattice)
xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
       auto.key=list(columns=3))
xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)

xyplot(Scars~Lar,groups=Location,data=scar2,
       auto.key=list(columns=3))
xyplot(Scars~Mass|Lar,data=scar2)
xyplot(Scars~Lar|Location,data=scar2)

   Some thoughts: (1) I'm not at all sure that
zero-inflation is necessary (see Warton 2005, Environmentrics).
This is a fairly small, noisy data set without huge numbers
of zeros -- a plain old negative binomial might be fine.
 
   I don't actually see a lot of signal here, period (although there may
be some) ...
there's not a huge range in Lar (whatever it is -- the rest of the
covariates I
think I can interpret).  It would be tempting to try to fit location as
a random
effect, because fitting all those extra degrees of freedom is going to
kill you.
On the other hand, GLMMs are a bit hairy.

   cheers
       Ben

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Optimization problem

Gabor Grothendieck
Try this.

1. following Ben remove the Randalstown point and reset the levels of the
Location factor.

2. then replace solve with ginv so it uses the generalized inverse to calculate
the hessian:

alan2 <- subset(alan, subset = Location != "Randalstown")
alan2$Location <- factor(as.character(alan2$Location))

library(MASS)
solve <- ginv

zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass
+ Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data = alan2)

rm(solve)

On 8/21/07, Ben Bolker <[hidden email]> wrote:

>
>  (Hope this gets threaded properly.  Sorry if it doesn't.)
>
>   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
> produce NAs (but would produce something like a singular Hessian
> and maybe other problems) -- but they're not even specified in this
> model.
>
>  The bottom line is that you have a location with a single
> observation, so the GLM that zicounts runs to get the initial
> parameter values has an unestimable location:mass interaction
> for one location, so it gives an NA, so optim complains.
>
>  In gruesome detail:
>
> ## set up  data
> scardat = read.table("scars.dat",header=TRUE)
> library(zicounts)
> ## try to run model
> zinb.zc <- zicounts(resp=Scars~.,
>                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    data=scardat)
> ## tried to debug this by dumping zicounts.R to a file, modifying
> ## it to put a "trace" argument in that would print out the parameters
> ## and log-likelihood for every call to the log-likelihood function.
> dump("zicounts",file="zicounts.R")
> source("zicounts.R")
> zinb.zc <- zicounts(resp=Scars~.,
>                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    data=scardat,trace=TRUE)
> ## this actually didn't do any good because the negative log-likelihood
> ## function never gets called -- as it turns out optim() barfs when it
> ## gets its initial values, before it ever gets to evaluating the
> log-likelihood
>
> ## check the glm -- this is the equivalent of what zicounts does to
> ## get the initial values of the x parameters
> p1 <- glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
>          data=scardat,family="poisson")
> which(is.na(coef(p1)))
>
> ## find out what the deal is
> table(scardat$Location)
>
> scar2 = subset(scardat,Location!="Randalstown")
> ## first step to removing the bad point from the data set -- but ...
> table(scar2$Location)
> ## it leaves the Location factor with the same levels, so
> ##  now we have ZERO counts for one location:
> ## redefine the factor to drop unused levels
> scar2$Location <- factor(scar2$Location)
> ## OK, looks fine now
> table(scar2$Location)
>
> zinb.zc <- zicounts(resp=Scars~.,
>                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
>                    data=scar2)
> ## now we get another error ("system is computationally singular" when
> ## trying to compute Hessian -- overparameterized?)   Not in any
> ## trivial way that I can see.  It would be nice to get into the guts
> ## of zicounts and stop it from trying to invert the Hessian, which is
> ## I think where this happens.
>
>  In the meanwhile, I have some other  ideas about this analysis (sorry,
> but you started it ...)
>
>  Looking at the data in a few different ways:
>
> library(lattice)
> xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
>       auto.key=list(columns=3))
> xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)
>
> xyplot(Scars~Lar,groups=Location,data=scar2,
>       auto.key=list(columns=3))
> xyplot(Scars~Mass|Lar,data=scar2)
> xyplot(Scars~Lar|Location,data=scar2)
>
>   Some thoughts: (1) I'm not at all sure that
> zero-inflation is necessary (see Warton 2005, Environmentrics).
> This is a fairly small, noisy data set without huge numbers
> of zeros -- a plain old negative binomial might be fine.
>
>   I don't actually see a lot of signal here, period (although there may
> be some) ...
> there's not a huge range in Lar (whatever it is -- the rest of the
> covariates I
> think I can interpret).  It would be tempting to try to fit location as
> a random
> effect, because fitting all those extra degrees of freedom is going to
> kill you.
> On the other hand, GLMMs are a bit hairy.
>
>   cheers
>       Ben
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.