# glm.fit: fitted probabilities numerically 0 or 1 occurred? Classic List Threaded 3 messages Reply | Threaded
Open this post in threaded view
|

## glm.fit: fitted probabilities numerically 0 or 1 occurred?

 This post was updated on . CONTENTS DELETED The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

## Re: glm.fit: fitted probabilities numerically 0 or 1 occurred?

 > I get the errors; > glm.fit: fitted probabilities numerically 0 or 1 occurred and > glm.fit: algorithm did not converge > ..... > Is there any way to to > fix this problem? There are two separate issues. One is the appearance of fitted values at 0 or 1. The other is the lack of convergence. The first is usually not fatal; it means that the probabilities are so close to 0 or 1 that a double precision value can't distinguish them from 0 or 1. Often that's a transient condition during iteration and the final fitted values are inside (0,1), but final fitted values can also be out there if you have continuous predictor values a long way out;  by itself, that usually won't stop a glm. The second is a bit more problematic. Sometimes it's just that you need to increase the maximum number of iterations (see the control= argument and ?glm.control). That is always worth a try - use some absurdly high number like 1000 instead of the default 25 and go find some coffee while it thinks about it. If that solves your problem then you're OK, or at least as OK as you can be with a data set that hard to fit. But if you're bootstrapping with some anomalous values it is also possible that some of your bootstrapped sets have too high a proportion of anomalies, and under those conditions it's possible that there could be no sensible optimum within reach. One way of dealing with that in a boostrap or other simulation context is to check the 'converged' value and if it's FALSE, return an NA for your statistic. But of course that is a form of censoring; if you have a high proportion of such instances you'd be on very thin ice drawing conclusions. S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

## Re: glm.fit: fitted probabilities numerically 0 or 1 occurred?

 On 21-Mar-2012 S Ellison wrote: >> I get the errors; >> glm.fit: fitted probabilities numerically 0 or 1 occurred and >> glm.fit: algorithm did not converge >> ..... >> Is there any way to to >> fix this problem? > > There are two separate issues. > One is the appearance of fitted values at 0 or 1. > The other is the lack of convergence. > > The first is usually not fatal; it means that the probabilities > are so close to 0 or 1 that a double precision value can't > distinguish them from 0 or 1. > Often that's a transient condition during iteration and the > final fitted values are inside (0,1), but final fitted values > can also be out there if you have continuous predictor values > a long way out;  by itself, that usually won't stop a glm. > > The second is a bit more problematic. Sometimes it's just that > you need to increase the maximum number of iterations (see the > control= argument and ?glm.control). That is always worth a try > - use some absurdly high number like 1000 instead of the default > 25 and go find some coffee while it thinks about it. If that > solves your problem then you're OK, or at least as OK as you can > be with a data set that hard to fit. > But if you're bootstrapping with some anomalous values it is > also possible that some of your bootstrapped sets have too high > a proportion of anomalies, and under those conditions it's > possible that there could be no sensible optimum within reach. > > One way of dealing with that in a boostrap or other simulation > context is to check the 'converged' value and if it's FALSE, > return an NA for your statistic. But of course that is a form > of censoring; if you have a high proportion of such instances > you'd be on very thin ice drawing conclusions. > > S Ellison There is a further general issue. The occurence of   glm.fit: fitted probabilities numerically 0 or 1 occurred and   glm.fit: algorithm did not converge is usually a symptom of what is called "linear separation". The simplest instance of this is when there is a single predictor variable X whose values are, say, ordered as X < X < ... < X[n-1] < X[n]. If it should happen that the values Y ... Y[n] of the 0/1 response variable Y are all equal to 0 for X,...,X[k] and all equal to 1 for X[k+1],...,X[n], and you are fitting a glm(Y~X,family=binomial) (which is what you use for logistic regression), then the formula being fitted is   log(P(Y=1|X)/(Y=0|X)) = (X - m)/s for some values of m and s. In the circumstances described, for any value of m between   X[k] < m < X[k+1] the fit can force P[Y=1|X] --> 0 for X = X,...,X[k] and can force P[Y=1|X] --> 1 for X = X[k+1],...,X[n] by pushing s --> 0 (so then the first set of (X - m)/s --> -Inf and the second set of (X - m)/s --> +Inf), thus forcing the probabilities predicted by the model towards agreeing exactly with the frequencies observed in the data. In other words, any value of m satisfying X[k] < m < X[k+1] *separates* the Y=0 responses from the Y=1 responses: all the Y=0 responses are for X-values less than m, and all the Y=1 responses are for values of X greater than m. Since such a value of m is not unique (any value between X[k] and X[k+1] will do), failure of convergence will be observed. This can certainly arise from excessively "outlying" data, i.e. the difference X[k+1] - X[k] is much greater than the true value of the logistic scale parameter s, so that it is almost certain that all responses for X <= X[k] will be 0, and all responses for X >= X[k+1] will be 1. But it can also readily arise when there are only a few data, i.e. n is small; and also, even with many data, if the scale paramater is n ot large compared with the spacing between X values. If you run the following code a few times, you will see that "separation" occurs with about 30% probability.   n <- 6   X <- (1:n) ; m <- (n+1)/2 ; s <- 2   count <- numeric(100)   for(i in (1:100)) {     Y <- 1*(runif(n) <= exp((X-m)/s)/(1+exp((X-m)/s)))     count[i] <- (max(X[Y==0]) < min(X[Y==1]))   }   sum(count) Similarly, with n <- 10, it occurs about 20% of the time; for n <- 15 about 15%, and it never gets much below 15% no matter how large n is. The more general situation of "linear separation" is where you have many predictor variables, when there can be a high probability that the random outcomes Y=0 label one set of cases, the Y=1 label another set of cases, and the X-vectors for Y=0 can be separated from the X-vectors for Y=1 by a linear form, i.e. there is a vector of coefficients beta such that   X[for any Y==0]%*%beta < X[for any Y==1]%*%beta This also can arise quite readily in practice. There is no definitive procedure for coping with this situation when it arises. What the fitting procedure is doing is just what it is supposed to do, namely predicting P(Y=1)=0 for every X such that the observed Y=0, and predicting P(Y=1)=1 for every X such that the observed Y=1 whenever it can (i.e. whenever there is separation). So indeed this is a result of maximum-likelihood estimation in that it is seeking values of the parameters There are work-rounds which are variants of penalising small values of the scale paramater s, so that what is being maximised is not the plain likelihood L but L moderated by the penalty so that for the smaller values of s the penalised likelihood will be less than the plain likelihood (and, for really small s, much less). However, choosing a penalising strategy is not well-defined from a theoretical point of view, and will involve an element of arbitrary choice, i.e. a penalisation function which appropriately reflects your objectives. Hoping this helps, Ted. PS Try searching for "linear separation" and "linear separability" in the searchable R-help archives at   http://tolstoy.newcastle.edu.au/R/(the old R Site Help and Archive Search at http://finzi.psych.upenn.edu/is no longer of much use). ------------------------------------------------- E-Mail: (Ted Harding) <[hidden email]> Date: 21-Mar-2012  Time: 15:35:00 This message was sent by XFMail ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.