

Dear Rcore,
I have found an edgecase where the glm function falsely concludes that the model has converged. The issue is the following: my data contains a number of covariates, one of these covariates has a very small variance. For most of the rows of this covariate, the value is 0, except for one of the rows, where it is 1.
The glm function correctly determines the beta and standard error estimates for all other covariates.
I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txtThe model I'm using is very simple:
data < read.table("rtestdata.txt")
model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] + data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] + data[,12] + data[,13] + data[,14], family=binomial("logit"))
summary(model)
You will see that for covariate data[,13], the beta/coefficient estimate is around 9. The number of iterations that has been performed is 8, and model$converged returns TRUE.
I've used some alternate logistic regression code in C ( https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces identical estimates for the other covariates and comparable deviance values. However, using this C code, I'm seeing that the estimate for data[,13] is around 100 (since I'm allowing a maximum of 100 MLE iterations). There, the conclusion is that the model does not converge.
The difference between the two pieces of code is that in R, the glm() function determines convergence of the whole model by measuring the difference between deviance of the current iteration versus the deviance of the prior iteration, and calls the model converged when it reaches a certain epsilon value. In the C++ code, the model is converged when all parameters haven't changed markedly compared to the previous iteration.
I think both approaches are valid, although the R variant (while faster) makes it vulnerable to wrongly concluding convergence in edge cases such as the one presented above, resulting in wrong coefficient estimates. For people wanting to use logistic regression in a training/prediction kind of setting, using these estimates might influence their predictive performance.
The problem here is that the glm function does not return any warnings when one of the covariates in the model does not converge. For someone who is not paying attention, this may lead them to conclude there is nothing wrong with their data. In my opinion, the default behavior in this case should therefore be to conclude that the model did not converge, or at least to show a warning message.
Please let me know whether you believe this is an issue, and whether I can provide additional information.
With kind regards,
HarmJan Westra
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Allow me to chime in. That's an interesting case you present, but as far as
I'm concerned the algorithm did converge. The estimate of 9.25 has an
estimated standard error of 72.4, meaning that frequentists would claim the
true value would lie anywhere between appx. 151 and 132 (CI) and hence the
estimate from the glm algorithm is perfectly compatible with the one from
the C++ code. And as the glm algorithm uses a different convergence rule,
the algorithm rightfully reported it converged. It's not because another
algorithm based on another rule doesn't converge, that the one glm uses
didn't.
On top of that: In both cases the huge standard error on that estimate
clearly tells you that the estimate should not be trusted, and the fit is
unstable. That's to be expected, given the insane inbalance in your data,
especially for the 13th column. If my students would incorporate that
variable in a generalized linear model and tries to formulate a conclusion
based on that coefficient, they failed the exam. So if somebody does this
analysis and tries to draw any conclusion whatsoever on that estimate,
maybe they should leave the analysis to somebody who does know what they're
doing.
Cheers
Joris
On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra < [hidden email]
> wrote:
> Dear Rcore,
>
>
> I have found an edgecase where the glm function falsely concludes that
> the model has converged. The issue is the following: my data contains a
> number of covariates, one of these covariates has a very small variance.
> For most of the rows of this covariate, the value is 0, except for one of
> the rows, where it is 1.
>
>
> The glm function correctly determines the beta and standard error
> estimates for all other covariates.
>
>
> I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt>
>
> The model I'm using is very simple:
>
>
> data < read.table("rtestdata.txt")
>
> model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> data[,12] + data[,13] + data[,14], family=binomial("logit"))
>
> summary(model)
>
>
> You will see that for covariate data[,13], the beta/coefficient estimate
> is around 9. The number of iterations that has been performed is 8, and
> model$converged returns TRUE.
>
>
> I've used some alternate logistic regression code in C (
> https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> identical estimates for the other covariates and comparable deviance
> values. However, using this C code, I'm seeing that the estimate for
> data[,13] is around 100 (since I'm allowing a maximum of 100 MLE
> iterations). There, the conclusion is that the model does not converge.
>
>
> The difference between the two pieces of code is that in R, the glm()
> function determines convergence of the whole model by measuring the
> difference between deviance of the current iteration versus the deviance of
> the prior iteration, and calls the model converged when it reaches a
> certain epsilon value. In the C++ code, the model is converged when all
> parameters haven't changed markedly compared to the previous iteration.
>
>
> I think both approaches are valid, although the R variant (while faster)
> makes it vulnerable to wrongly concluding convergence in edge cases such as
> the one presented above, resulting in wrong coefficient estimates. For
> people wanting to use logistic regression in a training/prediction kind of
> setting, using these estimates might influence their predictive performance.
>
>
> The problem here is that the glm function does not return any warnings
> when one of the covariates in the model does not converge. For someone who
> is not paying attention, this may lead them to conclude there is nothing
> wrong with their data. In my opinion, the default behavior in this case
> should therefore be to conclude that the model did not converge, or at
> least to show a warning message.
>
>
> Please let me know whether you believe this is an issue, and whether I can
> provide additional information.
>
>
> With kind regards,
>
>
> HarmJan Westra
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Dear Joris,
I agree that such a covariate should not be used in the analysis, and fully agree with your assessment. However, your response assumes that everybody who uses R knows what they're doing, which is a dangerous assumption to make. I bet there are a lot of people who blindly trust the output from R, even when there's clearly something wrong with the estimates.
In terms of your conclusion that the C++ estimate corresponds to a value within the R estimated confidence interval: if I allow the C++ code to run for 1000 iterations, it's estimate would be around 1000. It simply never converges.
I think there's nothing wrong with letting the user know there might be something wrong with one of the estimates, especially if your code can easily figure it out for you, by adding an additional rule. Not everyone is always paying attention (even if they know what they're doing).
With kind regards,
HarmJan
________________________________
From: Joris Meys < [hidden email]>
Sent: Thursday, July 20, 2017 11:38 AM
To: HarmJan Westra
Cc: [hidden email]
Subject: Re: [Rd] Wrongly converging glm()
Allow me to chime in. That's an interesting case you present, but as far as I'm concerned the algorithm did converge. The estimate of 9.25 has an estimated standard error of 72.4, meaning that frequentists would claim the true value would lie anywhere between appx. 151 and 132 (CI) and hence the estimate from the glm algorithm is perfectly compatible with the one from the C++ code. And as the glm algorithm uses a different convergence rule, the algorithm rightfully reported it converged. It's not because another algorithm based on another rule doesn't converge, that the one glm uses didn't.
On top of that: In both cases the huge standard error on that estimate clearly tells you that the estimate should not be trusted, and the fit is unstable. That's to be expected, given the insane inbalance in your data, especially for the 13th column. If my students would incorporate that variable in a generalized linear model and tries to formulate a conclusion based on that coefficient, they failed the exam. So if somebody does this analysis and tries to draw any conclusion whatsoever on that estimate, maybe they should leave the analysis to somebody who does know what they're doing.
Cheers
Joris
On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]>> wrote:
Dear Rcore,
I have found an edgecase where the glm function falsely concludes that the model has converged. The issue is the following: my data contains a number of covariates, one of these covariates has a very small variance. For most of the rows of this covariate, the value is 0, except for one of the rows, where it is 1.
The glm function correctly determines the beta and standard error estimates for all other covariates.
I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txtThe model I'm using is very simple:
data < read.table("rtestdata.txt")
model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] + data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] + data[,12] + data[,13] + data[,14], family=binomial("logit"))
summary(model)
You will see that for covariate data[,13], the beta/coefficient estimate is around 9. The number of iterations that has been performed is 8, and model$converged returns TRUE.
I've used some alternate logistic regression code in C ( https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces identical estimates for the other covariates and comparable deviance values. However, using this C code, I'm seeing that the estimate for data[,13] is around 100 (since I'm allowing a maximum of 100 MLE iterations). There, the conclusion is that the model does not converge.
The difference between the two pieces of code is that in R, the glm() function determines convergence of the whole model by measuring the difference between deviance of the current iteration versus the deviance of the prior iteration, and calls the model converged when it reaches a certain epsilon value. In the C++ code, the model is converged when all parameters haven't changed markedly compared to the previous iteration.
I think both approaches are valid, although the R variant (while faster) makes it vulnerable to wrongly concluding convergence in edge cases such as the one presented above, resulting in wrong coefficient estimates. For people wanting to use logistic regression in a training/prediction kind of setting, using these estimates might influence their predictive performance.
The problem here is that the glm function does not return any warnings when one of the covariates in the model does not converge. For someone who is not paying attention, this may lead them to conclude there is nothing wrong with their data. In my opinion, the default behavior in this case should therefore be to conclude that the model did not converge, or at least to show a warning message.
Please let me know whether you believe this is an issue, and whether I can provide additional information.
With kind regards,
HarmJan Westra
[[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


In defence of HarmaJan's original post I would say that there is a difference between true convergence and satisfying a convergence criterion.
In my view the algorithm has not converged. This is a case of quasicomplete separate  there are both successes and failures when x13=0 but only failures when x13=1. As a result, the likelihood has no maximum and increases, albeit slightly, as the associated coefficient tends to infinity while maximizing over the other parameters. The estimate given is not the MLE and the standard error is not meaningful because the conditions for convergence of MLEs to their asymptotic normal distribution has been violated.
I agree with Joris that someone familiar with logistic regression should be able to identify this situation  though the solution is not as simple as throwing out the covariate. Suppose that there had been many failures when x13=1, not just 1. The same problem would arise, but x13 is clearly an important covariate. Removing it from the analysis is not the thing to do. A better solution is to penalize the likelihood or (and I'm showing my true colours here) conduct a Bayesian analysis.
Regarding the statement that the algorithm has converged, perhaps R should say more truthfully that the convergence criterion has been satisfied  but that might lead to more confusion. In this case, any convergence criterion will be satisfied eventually. If you increase the maximum number of iterations in the C implementation then the other convergence criterion will be satisfied and the code will say that the algorithm has converged.
In the end, it's up to the analyst to be aware of the pitfalls and how to address them.
Cheers,
Simon
> Original Message
> From: Rdevel [mailto: [hidden email]] On Behalf Of Joris Meys
> Sent: July 20, 2017 11:39 AM
> To: HarmJan Westra < [hidden email]>
> Cc: [hidden email]
> Subject: Re: [Rd] Wrongly converging glm()
>
> Allow me to chime in. That's an interesting case you present, but as far as I'm
> concerned the algorithm did converge. The estimate of 9.25 has an estimated
> standard error of 72.4, meaning that frequentists would claim the true value
> would lie anywhere between appx. 151 and 132 (CI) and hence the estimate
> from the glm algorithm is perfectly compatible with the one from the C++ code.
> And as the glm algorithm uses a different convergence rule, the algorithm
> rightfully reported it converged. It's not because another algorithm based on
> another rule doesn't converge, that the one glm uses didn't.
>
> On top of that: In both cases the huge standard error on that estimate clearly
> tells you that the estimate should not be trusted, and the fit is unstable. That's
> to be expected, given the insane inbalance in your data, especially for the 13th
> column. If my students would incorporate that variable in a generalized linear
> model and tries to formulate a conclusion based on that coefficient, they failed
> the exam. So if somebody does this analysis and tries to draw any conclusion
> whatsoever on that estimate, maybe they should leave the analysis to
> somebody who does know what they're doing.
>
> Cheers
> Joris
>
> On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra
> < [hidden email]
> > wrote:
>
> > Dear Rcore,
> >
> >
> > I have found an edgecase where the glm function falsely concludes
> > that the model has converged. The issue is the following: my data
> > contains a number of covariates, one of these covariates has a very small
> variance.
> > For most of the rows of this covariate, the value is 0, except for one
> > of the rows, where it is 1.
> >
> >
> > The glm function correctly determines the beta and standard error
> > estimates for all other covariates.
> >
> >
> > I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt> >
> >
> > The model I'm using is very simple:
> >
> >
> > data < read.table("rtestdata.txt")
> >
> > model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> > data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> > data[,12] + data[,13] + data[,14], family=binomial("logit"))
> >
> > summary(model)
> >
> >
> > You will see that for covariate data[,13], the beta/coefficient
> > estimate is around 9. The number of iterations that has been
> > performed is 8, and model$converged returns TRUE.
> >
> >
> > I've used some alternate logistic regression code in C (
> > https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> > identical estimates for the other covariates and comparable deviance
> > values. However, using this C code, I'm seeing that the estimate for
> > data[,13] is around 100 (since I'm allowing a maximum of 100 MLE
> > iterations). There, the conclusion is that the model does not converge.
> >
> >
> > The difference between the two pieces of code is that in R, the glm()
> > function determines convergence of the whole model by measuring the
> > difference between deviance of the current iteration versus the
> > deviance of the prior iteration, and calls the model converged when it
> > reaches a certain epsilon value. In the C++ code, the model is
> > converged when all parameters haven't changed markedly compared to the
> previous iteration.
> >
> >
> > I think both approaches are valid, although the R variant (while
> > faster) makes it vulnerable to wrongly concluding convergence in edge
> > cases such as the one presented above, resulting in wrong coefficient
> > estimates. For people wanting to use logistic regression in a
> > training/prediction kind of setting, using these estimates might influence
> their predictive performance.
> >
> >
> > The problem here is that the glm function does not return any warnings
> > when one of the covariates in the model does not converge. For someone
> > who is not paying attention, this may lead them to conclude there is
> > nothing wrong with their data. In my opinion, the default behavior in
> > this case should therefore be to conclude that the model did not
> > converge, or at least to show a warning message.
> >
> >
> > Please let me know whether you believe this is an issue, and whether I
> > can provide additional information.
> >
> >
> > With kind regards,
> >
> >
> > HarmJan Westra
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rdevel> >
>
>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and BioInformatics
>
> tel : +32 (0)9 264 61 79
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Dear Simon,
Thanks for your response. I have a suggestion that could be nonintrusive, but still provide some additional info to the user.
The glm function already checks for collinearity of the input, and you can easily check which covariate was aliased as a result, using summary(model)$aliased.
My suggestion would therefore be to add an additional variable to the model summary, e.g. summary(model)$estimateconverged, which states whether the MLE has converged for that particular covariate.
This would provide users a way to perform sanity checks on the models they fit: sometimes you're running hundreds or millions of models, making it infeasible to check every single one of them. I agree that investigating the estimates + standard errors would be a solution, but then again, the estimate that R produces for such a covariate might as well be random.
With kind regards,
HarmJan
________________________________
From: Simon Bonner < [hidden email]>
Sent: Thursday, July 20, 2017 12:32 PM
To: Joris Meys; HarmJan Westra
Cc: [hidden email]
Subject: RE: [Rd] Wrongly converging glm()
In defence of HarmaJan's original post I would say that there is a difference between true convergence and satisfying a convergence criterion.
In my view the algorithm has not converged. This is a case of quasicomplete separate  there are both successes and failures when x13=0 but only failures when x13=1. As a result, the likelihood has no maximum and increases, albeit slightly, as the associated coefficient tends to infinity while maximizing over the other parameters. The estimate given is not the MLE and the standard error is not meaningful because the conditions for convergence of MLEs to their asymptotic normal distribution has been violated.
I agree with Joris that someone familiar with logistic regression should be able to identify this situation  though the solution is not as simple as throwing out the covariate. Suppose that there had been many failures when x13=1, not just 1. The same problem would arise, but x13 is clearly an important covariate. Removing it from the analysis is not the thing to do. A better solution is to penalize the likelihood or (and I'm showing my true colours here) conduct a Bayesian analysis.
Regarding the statement that the algorithm has converged, perhaps R should say more truthfully that the convergence criterion has been satisfied  but that might lead to more confusion. In this case, any convergence criterion will be satisfied eventually. If you increase the maximum number of iterations in the C implementation then the other convergence criterion will be satisfied and the code will say that the algorithm has converged.
In the end, it's up to the analyst to be aware of the pitfalls and how to address them.
Cheers,
Simon
> Original Message
> From: Rdevel [mailto: [hidden email]] On Behalf Of Joris Meys
> Sent: July 20, 2017 11:39 AM
> To: HarmJan Westra < [hidden email]>
> Cc: [hidden email]
> Subject: Re: [Rd] Wrongly converging glm()
>
> Allow me to chime in. That's an interesting case you present, but as far as I'm
> concerned the algorithm did converge. The estimate of 9.25 has an estimated
> standard error of 72.4, meaning that frequentists would claim the true value
> would lie anywhere between appx. 151 and 132 (CI) and hence the estimate
> from the glm algorithm is perfectly compatible with the one from the C++ code.
> And as the glm algorithm uses a different convergence rule, the algorithm
> rightfully reported it converged. It's not because another algorithm based on
> another rule doesn't converge, that the one glm uses didn't.
>
> On top of that: In both cases the huge standard error on that estimate clearly
> tells you that the estimate should not be trusted, and the fit is unstable. That's
> to be expected, given the insane inbalance in your data, especially for the 13th
> column. If my students would incorporate that variable in a generalized linear
> model and tries to formulate a conclusion based on that coefficient, they failed
> the exam. So if somebody does this analysis and tries to draw any conclusion
> whatsoever on that estimate, maybe they should leave the analysis to
> somebody who does know what they're doing.
>
> Cheers
> Joris
>
> On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra
> < [hidden email]
> > wrote:
>
> > Dear Rcore,
> >
> >
> > I have found an edgecase where the glm function falsely concludes
> > that the model has converged. The issue is the following: my data
> > contains a number of covariates, one of these covariates has a very small
> variance.
> > For most of the rows of this covariate, the value is 0, except for one
> > of the rows, where it is 1.
> >
> >
> > The glm function correctly determines the beta and standard error
> > estimates for all other covariates.
> >
> >
> > I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt> >
> >
> > The model I'm using is very simple:
> >
> >
> > data < read.table("rtestdata.txt")
> >
> > model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> > data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> > data[,12] + data[,13] + data[,14], family=binomial("logit"))
> >
> > summary(model)
> >
> >
> > You will see that for covariate data[,13], the beta/coefficient
> > estimate is around 9. The number of iterations that has been
> > performed is 8, and model$converged returns TRUE.
> >
> >
> > I've used some alternate logistic regression code in C (
> > https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> > identical estimates for the other covariates and comparable deviance
> > values. However, using this C code, I'm seeing that the estimate for
> > data[,13] is around 100 (since I'm allowing a maximum of 100 MLE
> > iterations). There, the conclusion is that the model does not converge.
> >
> >
> > The difference between the two pieces of code is that in R, the glm()
> > function determines convergence of the whole model by measuring the
> > difference between deviance of the current iteration versus the
> > deviance of the prior iteration, and calls the model converged when it
> > reaches a certain epsilon value. In the C++ code, the model is
> > converged when all parameters haven't changed markedly compared to the
> previous iteration.
> >
> >
> > I think both approaches are valid, although the R variant (while
> > faster) makes it vulnerable to wrongly concluding convergence in edge
> > cases such as the one presented above, resulting in wrong coefficient
> > estimates. For people wanting to use logistic regression in a
> > training/prediction kind of setting, using these estimates might influence
> their predictive performance.
> >
> >
> > The problem here is that the glm function does not return any warnings
> > when one of the covariates in the model does not converge. For someone
> > who is not paying attention, this may lead them to conclude there is
> > nothing wrong with their data. In my opinion, the default behavior in
> > this case should therefore be to conclude that the model did not
> > converge, or at least to show a warning message.
> >
> >
> > Please let me know whether you believe this is an issue, and whether I
> > can provide additional information.
> >
> >
> > With kind regards,
> >
> >
> > HarmJan Westra
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rdevel> >
>
>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and BioInformatics
>
> tel : +32 (0)9 264 61 79
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


On Thu, Jul 20, 2017 at 6:21 PM, HarmJan Westra < [hidden email]
> wrote:
> Dear Joris,
>
>
> I agree that such a covariate should not be used in the analysis, and
> fully agree with your assessment. However, your response assumes that
> everybody who uses R knows what they're doing, which is a dangerous
> assumption to make. I bet there are a lot of people who blindly trust the
> output from R, even when there's clearly something wrong with the estimates.
>
You missed my point then. I don't assume that everybody who uses R knows
what they're doing. Actually, I know for a fact quite a few people using R
have absolutely no clue about what they are doing. My point is that
everybody using R should first do the effort of learning what they're
doing. And if they don't, one shouldn't blame R. There's a million
different cases where both algorithms would converge and the resulting
estimates are totally meaningless regardless. R cannot be held responsible
for that.
>
>
> In terms of your conclusion that the C++ estimate corresponds to a value
> within the R estimated confidence interval: if I allow the C++ code to run
> for 1000 iterations, it's estimate would be around 1000. It simply never
> converges.
>
I didn't test that far, and you're right in the sense that 100 is indeed
not the final estimate. After looking at the C code, it appears as if the
author of that code combines a NewtonRaphson approach with a different
convergence rule. And then it's quite understandible it doesn't converge.
You can wildly vary that estimate, the effect it has on the jacobian, log
likelihood or deviance will be insignificant. So the model won't improve,
it would just move all over the parameter space.
>
>
> I think there's nothing wrong with letting the user know there might be
> something wrong with one of the estimates, especially if your code can
> easily figure it out for you, by adding an additional rule. Not everyone is
> always paying attention (even if they know what they're doing).
>
If R would do that, it wouldn't start the fitting procedure but just return
an error "Your analysis died due to a lack of useable data." . Because
that's the problem here.
>
>
> With kind regards,
>
>
> HarmJan
>
>
> ________________________________
> From: Joris Meys < [hidden email]>
> Sent: Thursday, July 20, 2017 11:38 AM
> To: HarmJan Westra
> Cc: [hidden email]
> Subject: Re: [Rd] Wrongly converging glm()
>
> Allow me to chime in. That's an interesting case you present, but as far
> as I'm concerned the algorithm did converge. The estimate of 9.25 has an
> estimated standard error of 72.4, meaning that frequentists would claim the
> true value would lie anywhere between appx. 151 and 132 (CI) and hence the
> estimate from the glm algorithm is perfectly compatible with the one from
> the C++ code. And as the glm algorithm uses a different convergence rule,
> the algorithm rightfully reported it converged. It's not because another
> algorithm based on another rule doesn't converge, that the one glm uses
> didn't.
>
> On top of that: In both cases the huge standard error on that estimate
> clearly tells you that the estimate should not be trusted, and the fit is
> unstable. That's to be expected, given the insane inbalance in your data,
> especially for the 13th column. If my students would incorporate that
> variable in a generalized linear model and tries to formulate a conclusion
> based on that coefficient, they failed the exam. So if somebody does this
> analysis and tries to draw any conclusion whatsoever on that estimate,
> maybe they should leave the analysis to somebody who does know what they're
> doing.
>
> Cheers
> Joris
>
> On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra <
> [hidden email]<mailto: [hidden email]>> wrote:
> Dear Rcore,
>
>
> I have found an edgecase where the glm function falsely concludes that
> the model has converged. The issue is the following: my data contains a
> number of covariates, one of these covariates has a very small variance.
> For most of the rows of this covariate, the value is 0, except for one of
> the rows, where it is 1.
>
>
> The glm function correctly determines the beta and standard error
> estimates for all other covariates.
>
>
> I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt>
>
> The model I'm using is very simple:
>
>
> data < read.table("rtestdata.txt")
>
> model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> data[,12] + data[,13] + data[,14], family=binomial("logit"))
>
> summary(model)
>
>
> You will see that for covariate data[,13], the beta/coefficient estimate
> is around 9. The number of iterations that has been performed is 8, and
> model$converged returns TRUE.
>
>
> I've used some alternate logistic regression code in C (
> https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> identical estimates for the other covariates and comparable deviance
> values. However, using this C code, I'm seeing that the estimate for
> data[,13] is around 100 (since I'm allowing a maximum of 100 MLE
> iterations). There, the conclusion is that the model does not converge.
>
>
> The difference between the two pieces of code is that in R, the glm()
> function determines convergence of the whole model by measuring the
> difference between deviance of the current iteration versus the deviance of
> the prior iteration, and calls the model converged when it reaches a
> certain epsilon value. In the C++ code, the model is converged when all
> parameters haven't changed markedly compared to the previous iteration.
>
>
> I think both approaches are valid, although the R variant (while faster)
> makes it vulnerable to wrongly concluding convergence in edge cases such as
> the one presented above, resulting in wrong coefficient estimates. For
> people wanting to use logistic regression in a training/prediction kind of
> setting, using these estimates might influence their predictive performance.
>
>
> The problem here is that the glm function does not return any warnings
> when one of the covariates in the model does not converge. For someone who
> is not paying attention, this may lead them to conclude there is nothing
> wrong with their data. In my opinion, the default behavior in this case
> should therefore be to conclude that the model did not converge, or at
> least to show a warning message.
>
>
> Please let me know whether you believe this is an issue, and whether I can
> provide additional information.
>
>
> With kind regards,
>
>
> HarmJan Westra
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto: [hidden email]> mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and BioInformatics
>
> tel : +32 (0)9 264 61 79
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


My apologies if I seemed to �blame R�. This was in no way my intention. I get the feeling that you�re missing my point as well.
I observed something that I thought was confusing, when comparing two more or less identical methods (when validating the C code), and wanted to make a suggestion as to how to help future R users. Note that I already acknowledged that my data was bad. Note that I also mention that the way R determines convergence is a valid approach.
What strikes me as odd is that R would warn you when your data is faulty for a function such as cor(), but not for glm(). I don�t see why you wouldn�t want to check both convergence criteria if you know multiple of such criteria exist. It would make the software more user friendly in the end.
It may be true that there are millions of edge cases causing issues with glm(), as you say, but here I am presenting an edge case that can be easily detected, by checking whether the difference in beta estimates between the current and previous iteration is bigger than a certain epsilon value.
I agree �that everybody using R should first do the effort of learning what they're doing�, but it is a bit of a nonargument, because we all know that, the world just doesn�t work that way, plus this is one of the arguments that has held for example the Linux community back for quite a while (i.e. let�s not make the software more user friendly because the user should be more knowledgeable).
HarmJan
From: Joris Meys<mailto: [hidden email]>
Sent: Thursday, July 20, 2017 13:16
To: HarmJan Westra<mailto: [hidden email]>
Cc: [hidden email]<mailto: [hidden email]>
Subject: Re: [Rd] Wrongly converging glm()
On Thu, Jul 20, 2017 at 6:21 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]>> wrote:
Dear Joris,
I agree that such a covariate should not be used in the analysis, and fully agree with your assessment. However, your response assumes that everybody who uses R knows what they're doing, which is a dangerous assumption to make. I bet there are a lot of people who blindly trust the output from R, even when there's clearly something wrong with the estimates.
You missed my point then. I don't assume that everybody who uses R knows what they're doing. Actually, I know for a fact quite a few people using R have absolutely no clue about what they are doing. My point is that everybody using R should first do the effort of learning what they're doing. And if they don't, one shouldn't blame R. There's a million different cases where both algorithms would converge and the resulting estimates are totally meaningless regardless. R cannot be held responsible for that.
In terms of your conclusion that the C++ estimate corresponds to a value within the R estimated confidence interval: if I allow the C++ code to run for 1000 iterations, it's estimate would be around 1000. It simply never converges.
I didn't test that far, and you're right in the sense that 100 is indeed not the final estimate. After looking at the C code, it appears as if the author of that code combines a NewtonRaphson approach with a different convergence rule. And then it's quite understandible it doesn't converge. You can wildly vary that estimate, the effect it has on the jacobian, log likelihood or deviance will be insignificant. So the model won't improve, it would just move all over the parameter space.
I think there's nothing wrong with letting the user know there might be something wrong with one of the estimates, especially if your code can easily figure it out for you, by adding an additional rule. Not everyone is always paying attention (even if they know what they're doing).
If R would do that, it wouldn't start the fitting procedure but just return an error "Your analysis died due to a lack of useable data." . Because that's the problem here.
With kind regards,
HarmJan
________________________________
From: Joris Meys < [hidden email]<mailto: [hidden email]>>
Sent: Thursday, July 20, 2017 11:38 AM
To: HarmJan Westra
Cc: [hidden email]<mailto: [hidden email]>
Subject: Re: [Rd] Wrongly converging glm()
Allow me to chime in. That's an interesting case you present, but as far as I'm concerned the algorithm did converge. The estimate of 9.25 has an estimated standard error of 72.4, meaning that frequentists would claim the true value would lie anywhere between appx. 151 and 132 (CI) and hence the estimate from the glm algorithm is perfectly compatible with the one from the C++ code. And as the glm algorithm uses a different convergence rule, the algorithm rightfully reported it converged. It's not because another algorithm based on another rule doesn't converge, that the one glm uses didn't.
On top of that: In both cases the huge standard error on that estimate clearly tells you that the estimate should not be trusted, and the fit is unstable. That's to be expected, given the insane inbalance in your data, especially for the 13th column. If my students would incorporate that variable in a generalized linear model and tries to formulate a conclusion based on that coefficient, they failed the exam. So if somebody does this analysis and tries to draw any conclusion whatsoever on that estimate, maybe they should leave the analysis to somebody who does know what they're doing.
Cheers
Joris
On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>> wrote:
Dear Rcore,
I have found an edgecase where the glm function falsely concludes that the model has converged. The issue is the following: my data contains a number of covariates, one of these covariates has a very small variance. For most of the rows of this covariate, the value is 0, except for one of the rows, where it is 1.
The glm function correctly determines the beta and standard error estimates for all other covariates.
I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txtThe model I'm using is very simple:
data < read.table("rtestdata.txt")
model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] + data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] + data[,12] + data[,13] + data[,14], family=binomial("logit"))
summary(model)
You will see that for covariate data[,13], the beta/coefficient estimate is around 9. The number of iterations that has been performed is 8, and model$converged returns TRUE.
I've used some alternate logistic regression code in C ( https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces identical estimates for the other covariates and comparable deviance values. However, using this C code, I'm seeing that the estimate for data[,13] is around 100 (since I'm allowing a maximum of 100 MLE iterations). There, the conclusion is that the model does not converge.
The difference between the two pieces of code is that in R, the glm() function determines convergence of the whole model by measuring the difference between deviance of the current iteration versus the deviance of the prior iteration, and calls the model converged when it reaches a certain epsilon value. In the C++ code, the model is converged when all parameters haven't changed markedly compared to the previous iteration.
I think both approaches are valid, although the R variant (while faster) makes it vulnerable to wrongly concluding convergence in edge cases such as the one presented above, resulting in wrong coefficient estimates. For people wanting to use logistic regression in a training/prediction kind of setting, using these estimates might influence their predictive performance.
The problem here is that the glm function does not return any warnings when one of the covariates in the model does not converge. For someone who is not paying attention, this may lead them to conclude there is nothing wrong with their data. In my opinion, the default behavior in this case should therefore be to conclude that the model did not converge, or at least to show a warning message.
Please let me know whether you believe this is an issue, and whether I can provide additional information.
With kind regards,
HarmJan Westra
[[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79<tel:%2B32%20%280%299%20264%2061%2079>
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Hi HarmJan. I've been following this thread to some degree and just want
to add that
this issue is not specific to the GLM. It's a problem with optimization of
functions in general. I was using use Rvmmin with constraints which is an
extremely solid optimization package written by John Nash ( uses a modified
BFGS algorithm) and it took me two years to realize that, although my
optimization generally converged, there was an idenitifiability issue with
my model that basically meant that the results meant nothing. I only
eventually found this out because, in the econometrics literature, the
type of economic model I was estimating ( rational expectations ) is known
to have an identifiability issue. I guess if I was an economics expert, I
may have been able to know this but, in general, I think what you are
asking
optimization code to do is EXTREMELY DIFFICULT.
John Nash can say more because he's THE optimization masteR but it's much
more difficult to write optimization algorithms with convergence rules that
are able to identify when mathematical convergence ( norm near zero say )
is not necessarily model convergence. That I can tell you from experience
!!!!!!!
On Thu, Jul 20, 2017 at 2:32 PM, HarmJan Westra < [hidden email]
> wrote:
> My apologies if I seemed to ‘blame R’. This was in no way my intention. I
> get the feeling that you’re missing my point as well.
>
> I observed something that I thought was confusing, when comparing two more
> or less identical methods (when validating the C code), and wanted to make
> a suggestion as to how to help future R users. Note that I already
> acknowledged that my data was bad. Note that I also mention that the way R
> determines convergence is a valid approach.
>
> What strikes me as odd is that R would warn you when your data is faulty
> for a function such as cor(), but not for glm(). I don’t see why you
> wouldn’t want to check both convergence criteria if you know multiple of
> such criteria exist. It would make the software more user friendly in the
> end.
>
> It may be true that there are millions of edge cases causing issues with
> glm(), as you say, but here I am presenting an edge case that can be easily
> detected, by checking whether the difference in beta estimates between the
> current and previous iteration is bigger than a certain epsilon value.
>
> I agree ‘that everybody using R should first do the effort of learning
> what they're doing’, but it is a bit of a nonargument, because we all know
> that, the world just doesn’t work that way, plus this is one of the
> arguments that has held for example the Linux community back for quite a
> while (i.e. let’s not make the software more user friendly because the user
> should be more knowledgeable).
>
> HarmJan
>
>
> From: Joris Meys<mailto: [hidden email]>
> Sent: Thursday, July 20, 2017 13:16
> To: HarmJan Westra<mailto: [hidden email]>
> Cc: [hidden email]<mailto: [hidden email]>
> Subject: Re: [Rd] Wrongly converging glm()
>
>
>
> On Thu, Jul 20, 2017 at 6:21 PM, HarmJan Westra <
> [hidden email]<mailto: [hidden email]>> wrote:
> Dear Joris,
>
>
> I agree that such a covariate should not be used in the analysis, and
> fully agree with your assessment. However, your response assumes that
> everybody who uses R knows what they're doing, which is a dangerous
> assumption to make. I bet there are a lot of people who blindly trust the
> output from R, even when there's clearly something wrong with the estimates.
>
> You missed my point then. I don't assume that everybody who uses R knows
> what they're doing. Actually, I know for a fact quite a few people using R
> have absolutely no clue about what they are doing. My point is that
> everybody using R should first do the effort of learning what they're
> doing. And if they don't, one shouldn't blame R. There's a million
> different cases where both algorithms would converge and the resulting
> estimates are totally meaningless regardless. R cannot be held responsible
> for that.
>
>
>
> In terms of your conclusion that the C++ estimate corresponds to a value
> within the R estimated confidence interval: if I allow the C++ code to run
> for 1000 iterations, it's estimate would be around 1000. It simply never
> converges.
>
> I didn't test that far, and you're right in the sense that 100 is indeed
> not the final estimate. After looking at the C code, it appears as if the
> author of that code combines a NewtonRaphson approach with a different
> convergence rule. And then it's quite understandible it doesn't converge.
> You can wildly vary that estimate, the effect it has on the jacobian, log
> likelihood or deviance will be insignificant. So the model won't improve,
> it would just move all over the parameter space.
>
>
>
> I think there's nothing wrong with letting the user know there might be
> something wrong with one of the estimates, especially if your code can
> easily figure it out for you, by adding an additional rule. Not everyone is
> always paying attention (even if they know what they're doing).
>
> If R would do that, it wouldn't start the fitting procedure but just
> return an error "Your analysis died due to a lack of useable data." .
> Because that's the problem here.
>
>
>
> With kind regards,
>
>
> HarmJan
>
>
> ________________________________
> From: Joris Meys < [hidden email]<mailto: [hidden email]>>
> Sent: Thursday, July 20, 2017 11:38 AM
> To: HarmJan Westra
> Cc: [hidden email]<mailto: [hidden email]>
> Subject: Re: [Rd] Wrongly converging glm()
>
> Allow me to chime in. That's an interesting case you present, but as far
> as I'm concerned the algorithm did converge. The estimate of 9.25 has an
> estimated standard error of 72.4, meaning that frequentists would claim the
> true value would lie anywhere between appx. 151 and 132 (CI) and hence the
> estimate from the glm algorithm is perfectly compatible with the one from
> the C++ code. And as the glm algorithm uses a different convergence rule,
> the algorithm rightfully reported it converged. It's not because another
> algorithm based on another rule doesn't converge, that the one glm uses
> didn't.
>
> On top of that: In both cases the huge standard error on that estimate
> clearly tells you that the estimate should not be trusted, and the fit is
> unstable. That's to be expected, given the insane inbalance in your data,
> especially for the 13th column. If my students would incorporate that
> variable in a generalized linear model and tries to formulate a conclusion
> based on that coefficient, they failed the exam. So if somebody does this
> analysis and tries to draw any conclusion whatsoever on that estimate,
> maybe they should leave the analysis to somebody who does know what they're
> doing.
>
> Cheers
> Joris
>
> On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra <
> [hidden email]<mailto: [hidden email]><mailto:
> [hidden email]<mailto: [hidden email]>>> wrote:
> Dear Rcore,
>
>
> I have found an edgecase where the glm function falsely concludes that
> the model has converged. The issue is the following: my data contains a
> number of covariates, one of these covariates has a very small variance.
> For most of the rows of this covariate, the value is 0, except for one of
> the rows, where it is 1.
>
>
> The glm function correctly determines the beta and standard error
> estimates for all other covariates.
>
>
> I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt>
>
> The model I'm using is very simple:
>
>
> data < read.table("rtestdata.txt")
>
> model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> data[,12] + data[,13] + data[,14], family=binomial("logit"))
>
> summary(model)
>
>
> You will see that for covariate data[,13], the beta/coefficient estimate
> is around 9. The number of iterations that has been performed is 8, and
> model$converged returns TRUE.
>
>
> I've used some alternate logistic regression code in C (
> https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> identical estimates for the other covariates and comparable deviance
> values. However, using this C code, I'm seeing that the estimate for
> data[,13] is around 100 (since I'm allowing a maximum of 100 MLE
> iterations). There, the conclusion is that the model does not converge.
>
>
> The difference between the two pieces of code is that in R, the glm()
> function determines convergence of the whole model by measuring the
> difference between deviance of the current iteration versus the deviance of
> the prior iteration, and calls the model converged when it reaches a
> certain epsilon value. In the C++ code, the model is converged when all
> parameters haven't changed markedly compared to the previous iteration.
>
>
> I think both approaches are valid, although the R variant (while faster)
> makes it vulnerable to wrongly concluding convergence in edge cases such as
> the one presented above, resulting in wrong coefficient estimates. For
> people wanting to use logistic regression in a training/prediction kind of
> setting, using these estimates might influence their predictive performance.
>
>
> The problem here is that the glm function does not return any warnings
> when one of the covariates in the model does not converge. For someone who
> is not paying attention, this may lead them to conclude there is nothing
> wrong with their data. In my opinion, the default behavior in this case
> should therefore be to conclude that the model did not converge, or at
> least to show a warning message.
>
>
> Please let me know whether you believe this is an issue, and whether I can
> provide additional information.
>
>
> With kind regards,
>
>
> HarmJan Westra
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto: [hidden email]><mailto:R
>  [hidden email]<mailto: [hidden email]>> mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and BioInformatics
>
> tel : +32 (0)9 264 61 79<tel:%2B32%20%280%299%20264%2061%2079>
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto: [hidden email]> mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
>
>
> 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and BioInformatics
>
> tel : +32 (0)9 264 61 79
> [hidden email]
> 
> Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php>
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Dear Mark,
I agree that convergence is a problem that applies to optimization in general, where the function you�re trying to optimize may have more than one local minimum. In your case, you probably would have to try different starting points for the MLE procedure. This should not be the case for logistic regression however (unless, like in my data, you have something that defies your model assumptions; check Simon Bonner�s response).
Still, I would think it would be a bit odd if the deviance wouldn�t change, but one of the model parameters did after the next MLE iteration. It would tell me that these parameters wouldn�t add to the model fit, which in my opinion would be useful debugging information, even when I would be hitting a local minimum (it could even help me inform that there is another, more optimal, solution?). Probably I should try to figure out whether this observation is also true for other models/link functions (I honestly don�t know).
However, thanks to your response, I can see that my suggestion is probably not applicable to all glm link functions, and I see how implementation of my proposed �warning system� could be confusing to the user. Thanks alot!
With kind regards,
HarmJan
From: Mark Leeds<mailto: [hidden email]>
Sent: Thursday, July 20, 2017 14:54
To: HarmJan Westra<mailto: [hidden email]>
Cc: Joris Meys<mailto: [hidden email]>; [hidden email]<mailto: [hidden email]>
Subject: Re: [Rd] Wrongly converging glm()
Hi HarmJan. I've been following this thread to some degree and just want to add that
this issue is not specific to the GLM. It's a problem with optimization of functions in general. I was using use Rvmmin with constraints which is an extremely solid optimization package written by John Nash ( uses a modified BFGS algorithm) and it took me two years to realize that, although my optimization generally converged, there was an idenitifiability issue with my model that basically meant that the results meant nothing. I only eventually found this out because, in the econometrics literature, the type of economic model I was estimating ( rational expectations ) is known to have an identifiability issue. I guess if I was an economics expert, I may have been able to know this but, in general, I think what you are asking
optimization code to do is EXTREMELY DIFFICULT.
John Nash can say more because he's THE optimization masteR but it's much more difficult to write optimization algorithms with convergence rules that are able to identify when mathematical convergence ( norm near zero say ) is not necessarily model convergence. That I can tell you from experience !!!!!!!
On Thu, Jul 20, 2017 at 2:32 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]>> wrote:
My apologies if I seemed to �blame R�. This was in no way my intention. I get the feeling that you�re missing my point as well.
I observed something that I thought was confusing, when comparing two more or less identical methods (when validating the C code), and wanted to make a suggestion as to how to help future R users. Note that I already acknowledged that my data was bad. Note that I also mention that the way R determines convergence is a valid approach.
What strikes me as odd is that R would warn you when your data is faulty for a function such as cor(), but not for glm(). I don�t see why you wouldn�t want to check both convergence criteria if you know multiple of such criteria exist. It would make the software more user friendly in the end.
It may be true that there are millions of edge cases causing issues with glm(), as you say, but here I am presenting an edge case that can be easily detected, by checking whether the difference in beta estimates between the current and previous iteration is bigger than a certain epsilon value.
I agree �that everybody using R should first do the effort of learning what they're doing�, but it is a bit of a nonargument, because we all know that, the world just doesn�t work that way, plus this is one of the arguments that has held for example the Linux community back for quite a while (i.e. let�s not make the software more user friendly because the user should be more knowledgeable).
HarmJan
From: Joris Meys<mailto: [hidden email]<mailto: [hidden email]>>
Sent: Thursday, July 20, 2017 13:16
To: HarmJan Westra<mailto: [hidden email]<mailto: [hidden email]>>
Cc: [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>
Subject: Re: [Rd] Wrongly converging glm()
On Thu, Jul 20, 2017 at 6:21 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>> wrote:
Dear Joris,
I agree that such a covariate should not be used in the analysis, and fully agree with your assessment. However, your response assumes that everybody who uses R knows what they're doing, which is a dangerous assumption to make. I bet there are a lot of people who blindly trust the output from R, even when there's clearly something wrong with the estimates.
You missed my point then. I don't assume that everybody who uses R knows what they're doing. Actually, I know for a fact quite a few people using R have absolutely no clue about what they are doing. My point is that everybody using R should first do the effort of learning what they're doing. And if they don't, one shouldn't blame R. There's a million different cases where both algorithms would converge and the resulting estimates are totally meaningless regardless. R cannot be held responsible for that.
In terms of your conclusion that the C++ estimate corresponds to a value within the R estimated confidence interval: if I allow the C++ code to run for 1000 iterations, it's estimate would be around 1000. It simply never converges.
I didn't test that far, and you're right in the sense that 100 is indeed not the final estimate. After looking at the C code, it appears as if the author of that code combines a NewtonRaphson approach with a different convergence rule. And then it's quite understandible it doesn't converge. You can wildly vary that estimate, the effect it has on the jacobian, log likelihood or deviance will be insignificant. So the model won't improve, it would just move all over the parameter space.
I think there's nothing wrong with letting the user know there might be something wrong with one of the estimates, especially if your code can easily figure it out for you, by adding an additional rule. Not everyone is always paying attention (even if they know what they're doing).
If R would do that, it wouldn't start the fitting procedure but just return an error "Your analysis died due to a lack of useable data." . Because that's the problem here.
With kind regards,
HarmJan
________________________________
From: Joris Meys < [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>>
Sent: Thursday, July 20, 2017 11:38 AM
To: HarmJan Westra
Cc: [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>
Subject: Re: [Rd] Wrongly converging glm()
Allow me to chime in. That's an interesting case you present, but as far as I'm concerned the algorithm did converge. The estimate of 9.25 has an estimated standard error of 72.4, meaning that frequentists would claim the true value would lie anywhere between appx. 151 and 132 (CI) and hence the estimate from the glm algorithm is perfectly compatible with the one from the C++ code. And as the glm algorithm uses a different convergence rule, the algorithm rightfully reported it converged. It's not because another algorithm based on another rule doesn't converge, that the one glm uses didn't.
On top of that: In both cases the huge standard error on that estimate clearly tells you that the estimate should not be trusted, and the fit is unstable. That's to be expected, given the insane inbalance in your data, especially for the 13th column. If my students would incorporate that variable in a generalized linear model and tries to formulate a conclusion based on that coefficient, they failed the exam. So if somebody does this analysis and tries to draw any conclusion whatsoever on that estimate, maybe they should leave the analysis to somebody who does know what they're doing.
Cheers
Joris
On Thu, Jul 20, 2017 at 5:02 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>><mailto: [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>>> wrote:
Dear Rcore,
I have found an edgecase where the glm function falsely concludes that the model has converged. The issue is the following: my data contains a number of covariates, one of these covariates has a very small variance. For most of the rows of this covariate, the value is 0, except for one of the rows, where it is 1.
The glm function correctly determines the beta and standard error estimates for all other covariates.
I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txtThe model I'm using is very simple:
data < read.table("rtestdata.txt")
model < glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] + data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] + data[,12] + data[,13] + data[,14], family=binomial("logit"))
summary(model)
You will see that for covariate data[,13], the beta/coefficient estimate is around 9. The number of iterations that has been performed is 8, and model$converged returns TRUE.
I've used some alternate logistic regression code in C ( https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces identical estimates for the other covariates and comparable deviance values. However, using this C code, I'm seeing that the estimate for data[,13] is around 100 (since I'm allowing a maximum of 100 MLE iterations). There, the conclusion is that the model does not converge.
The difference between the two pieces of code is that in R, the glm() function determines convergence of the whole model by measuring the difference between deviance of the current iteration versus the deviance of the prior iteration, and calls the model converged when it reaches a certain epsilon value. In the C++ code, the model is converged when all parameters haven't changed markedly compared to the previous iteration.
I think both approaches are valid, although the R variant (while faster) makes it vulnerable to wrongly concluding convergence in edge cases such as the one presented above, resulting in wrong coefficient estimates. For people wanting to use logistic regression in a training/prediction kind of setting, using these estimates might influence their predictive performance.
The problem here is that the glm function does not return any warnings when one of the covariates in the model does not converge. For someone who is not paying attention, this may lead them to conclude there is nothing wrong with their data. In my opinion, the default behavior in this case should therefore be to conclude that the model did not converge, or at least to show a warning message.
Please let me know whether you believe this is an issue, and whether I can provide additional information.
With kind regards,
HarmJan Westra
[[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>><mailto: [hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>>> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79<tel:%2B32%20%280%299%20264%2061%2079><tel:%2B32%20%280%299%20264%2061%2079>
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]><mailto: [hidden email]<mailto: [hidden email]>> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79<tel:%2B32%20%280%299%20264%2061%2079>
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


On Thu, Jul 20, 2017 at 8:32 PM, HarmJan Westra < [hidden email]
> wrote:
> My apologies if I seemed to ‘blame R’. This was in no way my intention. I
> get the feeling that you’re missing my point as well.
>
I get that now. But you're on Rdevel and you started with the claim that R
"falsely reports...". That looks like a bug report, and that's why I
initially answered that R correctly reports it converged. Maybe to the
wrong value, but it converged.
>
>
> What strikes me as odd is that R would warn you when your data is faulty
> for a function such as cor(), but not for glm(). I don’t see why you
> wouldn’t want to check both convergence criteria if you know multiple of
> such criteria exist. It would make the software more user friendly in the
> end.
>
The unfitness of the data bears no relation to the convergence criterium
and vice versa. These data checks should be done before the convergence
algorithm is even started, and as Mark Leeds also indicated, that's one
hell of a job to do. That said, the glm function has an argument "method"
by which you can provide an alternative version of glm.fit(). Adapting
that one to use another convergence criterium is rather trivial, so
technically R even allows you to do that out of the box. No patches needed.
>
>
> I agree ‘that everybody using R should first do the effort of learning
> what they're doing’, but it is a bit of a nonargument, because we all know
> that, the world just doesn’t work that way, plus this is one of the
> arguments that has held for example the Linux community back for quite a
> while (i.e. let’s not make the software more user friendly because the user
> should be more knowledgeable).
>
That's a wrong analogy imho. You can expect Linux to be user friendly, but
not "I will detect every logical fallacy in the article you're writing in
this text editor" friendly. And honestly, that's a bit what you're asking R
to do here. I understand why, but there's always cases that will be missed.
And I wouldn't dare to speak in the name of the R core team, but I can
imagine they have a little more urgent issues than helping my students to
pass their statistics course ;)
Cheers
Joris

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Dear Joris,
I’ll be more careful in my wording next time; thanks for the pointer, and thanks for the discussion. This whole process has been quite educational! 😉. I think we’ve reached a consensus here, where the situation as it is right now has been chosen to allow for flexibility of R’s glm() function.
With kind regards,
HarmJan
From: Joris Meys<mailto: [hidden email]>
Sent: Thursday, July 20, 2017 16:06
To: HarmJan Westra<mailto: [hidden email]>
Cc: [hidden email]<mailto: [hidden email]>
Subject: Re: [Rd] Wrongly converging glm()
On Thu, Jul 20, 2017 at 8:32 PM, HarmJan Westra < [hidden email]<mailto: [hidden email]>> wrote:
My apologies if I seemed to ‘blame R’. This was in no way my intention. I get the feeling that you’re missing my point as well.
I get that now. But you're on Rdevel and you started with the claim that R "falsely reports...". That looks like a bug report, and that's why I initially answered that R correctly reports it converged. Maybe to the wrong value, but it converged.
What strikes me as odd is that R would warn you when your data is faulty for a function such as cor(), but not for glm(). I don’t see why you wouldn’t want to check both convergence criteria if you know multiple of such criteria exist. It would make the software more user friendly in the end.
The unfitness of the data bears no relation to the convergence criterium and vice versa. These data checks should be done before the convergence algorithm is even started, and as Mark Leeds also indicated, that's one hell of a job to do. That said, the glm function has an argument "method" by which you can provide an alternative version of glm.fit(). Adapting that one to use another convergence criterium is rather trivial, so technically R even allows you to do that out of the box. No patches needed.
I agree ‘that everybody using R should first do the effort of learning what they're doing’, but it is a bit of a nonargument, because we all know that, the world just doesn’t work that way, plus this is one of the arguments that has held for example the Linux community back for quite a while (i.e. let’s not make the software more user friendly because the user should be more knowledgeable).
That's a wrong analogy imho. You can expect Linux to be user friendly, but not "I will detect every logical fallacy in the article you're writing in this text editor" friendly. And honestly, that's a bit what you're asking R to do here. I understand why, but there's always cases that will be missed. And I wouldn't dare to speak in the name of the R core team, but I can imagine they have a little more urgent issues than helping my students to pass their statistics course ;)
Cheers
Joris

Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and BioInformatics
tel : +32 (0)9 264 61 79
[hidden email]

Disclaimer : http://helpdesk.ugent.be/emaildisclaimer.php [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


I'm chiming in late since I read the news in digest form, and I won't copy the entire
conversation to date.
The issue raised comes up quite often in Cox models, so often that the Therneau and
Grambsch book has a section on the issue (3.5, p 58). After a few initial iterations the
offending coefficient will increase by a constant at each iteration while the
loglikelihood approaches an asymptote (essentially once the other coefficients "settle
down").
The coxph routine tries to detect this case and print a warning, and this turns out to be
very hard to do accurately. I worked hard at tuning the threshold(s) for the message
several years ago and finally gave up; I am guessing that the warning misses > 5% of the
cases when the issue is true, and that 5% of the warnings that do print are incorrect.
(And these estimates may be too optimistic.) Highly correlated predictors tend to trip
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.
All in all, I am not completely sure whether the message does more harm than good. I'd be
quite reluctant to go down the same path again with the glm function.
Terry Therneau
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Please allow me to add my 3 cents. Stopping an iterative optimization algorithm at an "appropriate" juncture is very tricky. All one can say is that the algorithm terminated because it triggered a particular stopping criterion. A good software will tell you why it stopped  i.e. the stopping criterion that was triggered. It is extremely difficult to make a failsafe guarantee that the triggered stopping criterion is the correct one and that the answer obtained is trustworthy. It is up to the user to determine whether the answer makes sense. In the case of maximizing a likelihood function, it is perfectly reasonable to stop when the algorithm has not made any progress in increasing the log likelihood. In this case, the software should print out something like "algorithm terminated due to lack of improvement in loglikelihood." Therefore, I don't see a need to issue any warning, but simply report the stopping criterion that was applied to terminate the algorithm.
Best,
Ravi
Original Message
From: Rdevel [mailto: [hidden email]] On Behalf Of Therneau, Terry M., Ph.D.
Sent: Friday, July 21, 2017 8:04 AM
To: [hidden email]; Mark Leeds < [hidden email]>; [hidden email]; [hidden email]
Subject: Re: [Rd] Wrongly converging glm()
I'm chiming in late since I read the news in digest form, and I won't copy the entire conversation to date.
The issue raised comes up quite often in Cox models, so often that the Therneau and Grambsch book has a section on the issue (3.5, p 58). After a few initial iterations the offending coefficient will increase by a constant at each iteration while the loglikelihood approaches an asymptote (essentially once the other coefficients "settle down").
The coxph routine tries to detect this case and print a warning, and this turns out to be very hard to do accurately. I worked hard at tuning the threshold(s) for the message several years ago and finally gave up; I am guessing that the warning misses > 5% of the cases when the issue is true, and that 5% of the warnings that do print are incorrect.
(And these estimates may be too optimistic.) Highly correlated predictors tend to trip
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.
All in all, I am not completely sure whether the message does more harm than good. I'd be quite reluctant to go down the same path again with the glm function.
Terry Therneau
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining
the cause
of the termination. The codes returned were fine. The problem was that
the model I was using could have multiple solutions ( regardless of the data
sent in ) so, even though the stopping criteria was reached, it turned out
that one of the parameters ( there were two parameters ) could have really
been anything and the same likelihood value would be returned. So, what I
learned the hard way was termination due to reasonable stopping criteria
DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for
a long time and only happened to notice it when playing around with the
likelihood by fixing the offending parameter to various values and
optimizing over the
nonoffending parameter. Thanks for eloquent explanation.
Mark
On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan < [hidden email]>
wrote:
> Please allow me to add my 3 cents. Stopping an iterative optimization
> algorithm at an "appropriate" juncture is very tricky. All one can say is
> that the algorithm terminated because it triggered a particular stopping
> criterion. A good software will tell you why it stopped  i.e. the
> stopping criterion that was triggered. It is extremely difficult to make a
> failsafe guarantee that the triggered stopping criterion is the correct one
> and that the answer obtained is trustworthy. It is up to the user to
> determine whether the answer makes sense. In the case of maximizing a
> likelihood function, it is perfectly reasonable to stop when the algorithm
> has not made any progress in increasing the log likelihood. In this case,
> the software should print out something like "algorithm terminated due to
> lack of improvement in loglikelihood." Therefore, I don't see a need to
> issue any warning, but simply report the stopping criterion that was
> applied to terminate the algorithm.
>
> Best,
> Ravi
>
> Original Message
> From: Rdevel [mailto: [hidden email]] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Friday, July 21, 2017 8:04 AM
> To: [hidden email]; Mark Leeds < [hidden email]>;
> [hidden email]; [hidden email]
> Subject: Re: [Rd] Wrongly converging glm()
>
> I'm chiming in late since I read the news in digest form, and I won't copy
> the entire conversation to date.
>
> The issue raised comes up quite often in Cox models, so often that the
> Therneau and Grambsch book has a section on the issue (3.5, p 58). After a
> few initial iterations the offending coefficient will increase by a
> constant at each iteration while the loglikelihood approaches an asymptote
> (essentially once the other coefficients "settle down").
>
> The coxph routine tries to detect this case and print a warning, and this
> turns out to be very hard to do accurately. I worked hard at tuning the
> threshold(s) for the message several years ago and finally gave up; I am
> guessing that the warning misses > 5% of the cases when the issue is true,
> and that 5% of the warnings that do print are incorrect.
> (And these estimates may be too optimistic.) Highly correlated
> predictors tend to trip
> it up, e.g., the truncated power spline basis used by the rcs function in
> Hmisc.
>
> All in all, I am not completely sure whether the message does more harm
> than good. I'd be quite reluctant to go down the same path again with the
> glm function.
>
> Terry Therneau
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


“So, what I learned the hard way was termination due to reasonable stopping criteria DOES NOT NECESSARILY EQUAL OPTIMAL.”
Yes, I agree, Mark.
Let me add another observation. In the “optimx” package, John Nash and I implemented a check for optimality conditions – first and second order KKT conditions. This involves checking whether the gradient is sufficiently small and the Hessian is positive definite (for local minimum) at the final parameter values. However, it can be quite time consuming to compute these quantities and in some problems checking KKT can take up more effort than finding the solution! Furthermore, it is difficult to come up with good thresholds for determining “small” gradient and “positive definite” Hessian, since these can depend upon the scale of the objective function and the parameters.
Ravi
From: Mark Leeds [mailto: [hidden email]]
Sent: Friday, July 21, 2017 3:09 PM
To: Ravi Varadhan < [hidden email]>
Cc: Therneau, Terry M., Ph.D. < [hidden email]>; [hidden email]; [hidden email]; [hidden email]
Subject: Re: [Rd] Wrongly converging glm()
Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining the cause
of the termination. The codes returned were fine. The problem was that
the model I was using could have multiple solutions ( regardless of the data
sent in ) so, even though the stopping criteria was reached, it turned out that one of the parameters ( there were two parameters ) could have really been anything and the same likelihood value would be returned. So, what I learned the hard way was termination due to reasonable stopping criteria DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for a long time and only happened to notice it when playing around with the likelihood by fixing the offending parameter to various values and optimizing over the
nonoffending parameter. Thanks for eloquent explanation.
Mark
On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan < [hidden email]<mailto: [hidden email]>> wrote:
Please allow me to add my 3 cents. Stopping an iterative optimization algorithm at an "appropriate" juncture is very tricky. All one can say is that the algorithm terminated because it triggered a particular stopping criterion. A good software will tell you why it stopped  i.e. the stopping criterion that was triggered. It is extremely difficult to make a failsafe guarantee that the triggered stopping criterion is the correct one and that the answer obtained is trustworthy. It is up to the user to determine whether the answer makes sense. In the case of maximizing a likelihood function, it is perfectly reasonable to stop when the algorithm has not made any progress in increasing the log likelihood. In this case, the software should print out something like "algorithm terminated due to lack of improvement in loglikelihood." Therefore, I don't see a need to issue any warning, but simply report the stopping criterion that was applied to terminate the algorithm.
Best,
Ravi
Original Message
From: Rdevel [mailto: [hidden email]<mailto: [hidden email]>] On Behalf Of Therneau, Terry M., Ph.D.
Sent: Friday, July 21, 2017 8:04 AM
To: [hidden email]<mailto: [hidden email]>; Mark Leeds < [hidden email]<mailto: [hidden email]>>; [hidden email]<mailto: [hidden email]>; [hidden email]<mailto: [hidden email]>
Subject: Re: [Rd] Wrongly converging glm()
I'm chiming in late since I read the news in digest form, and I won't copy the entire conversation to date.
The issue raised comes up quite often in Cox models, so often that the Therneau and Grambsch book has a section on the issue (3.5, p 58). After a few initial iterations the offending coefficient will increase by a constant at each iteration while the loglikelihood approaches an asymptote (essentially once the other coefficients "settle down").
The coxph routine tries to detect this case and print a warning, and this turns out to be very hard to do accurately. I worked hard at tuning the threshold(s) for the message several years ago and finally gave up; I am guessing that the warning misses > 5% of the cases when the issue is true, and that 5% of the warnings that do print are incorrect.
(And these estimates may be too optimistic.) Highly correlated predictors tend to trip
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.
All in all, I am not completely sure whether the message does more harm than good. I'd be quite reluctant to go down the same path again with the glm function.
Terry Therneau
______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel [[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel


H Ravi: Yes, IIRC that's EXACTLY what was going on in my case. Based on
the codes from Rvmmin, the objective functon wasn't changing much and I
think norm of the gradient was close to zero so it was difficult for me to
detect the issue. I found it when I
happenered to notice that the objective function seemed independent of one
of the parameters.
On Fri, Jul 21, 2017 at 4:36 PM, Ravi Varadhan < [hidden email]>
wrote:
> “So, what I learned the hard way was termination due to reasonable
> stopping criteria DOES NOT NECESSARILY EQUAL OPTIMAL.”
>
>
>
> Yes, I agree, Mark.
>
>
>
> Let me add another observation. In the “optimx” package, John Nash and I
> implemented a check for optimality conditions – first and second order KKT
> conditions. This involves checking whether the gradient is sufficiently
> small and the Hessian is positive definite (for local minimum) at the final
> parameter values. However, it can be quite time consuming to compute these
> quantities and in some problems checking KKT can take up more effort than
> finding the solution! Furthermore, it is difficult to come up with good
> thresholds for determining “small” gradient and “positive definite”
> Hessian, since these can depend upon the scale of the objective function
> and the parameters.
>
>
>
> Ravi
>
>
>
> *From:* Mark Leeds [mailto: [hidden email]]
> *Sent:* Friday, July 21, 2017 3:09 PM
> *To:* Ravi Varadhan < [hidden email]>
> *Cc:* Therneau, Terry M., Ph.D. < [hidden email]>; [hidden email];
> [hidden email]; [hidden email]
>
> *Subject:* Re: [Rd] Wrongly converging glm()
>
>
>
> Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining
> the cause
>
> of the termination. The codes returned were fine. The problem was that
>
> the model I was using could have multiple solutions ( regardless of the
> data
> sent in ) so, even though the stopping criteria was reached, it turned out
> that one of the parameters ( there were two parameters ) could have really
> been anything and the same likelihood value would be returned. So, what I
> learned the hard way was termination due to reasonable stopping criteria
> DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for
> a long time and only happened to notice it when playing around with the
> likelihood by fixing the offending parameter to various values and
> optimizing over the
> nonoffending parameter. Thanks for eloquent explanation.
>
>
> Mark
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan < [hidden email]>
> wrote:
>
> Please allow me to add my 3 cents. Stopping an iterative optimization
> algorithm at an "appropriate" juncture is very tricky. All one can say is
> that the algorithm terminated because it triggered a particular stopping
> criterion. A good software will tell you why it stopped  i.e. the
> stopping criterion that was triggered. It is extremely difficult to make a
> failsafe guarantee that the triggered stopping criterion is the correct one
> and that the answer obtained is trustworthy. It is up to the user to
> determine whether the answer makes sense. In the case of maximizing a
> likelihood function, it is perfectly reasonable to stop when the algorithm
> has not made any progress in increasing the log likelihood. In this case,
> the software should print out something like "algorithm terminated due to
> lack of improvement in loglikelihood." Therefore, I don't see a need to
> issue any warning, but simply report the stopping criterion that was
> applied to terminate the algorithm.
>
> Best,
> Ravi
>
> Original Message
> From: Rdevel [mailto: [hidden email]] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Friday, July 21, 2017 8:04 AM
> To: [hidden email]; Mark Leeds < [hidden email]>;
> [hidden email]; [hidden email]
> Subject: Re: [Rd] Wrongly converging glm()
>
> I'm chiming in late since I read the news in digest form, and I won't copy
> the entire conversation to date.
>
> The issue raised comes up quite often in Cox models, so often that the
> Therneau and Grambsch book has a section on the issue (3.5, p 58). After a
> few initial iterations the offending coefficient will increase by a
> constant at each iteration while the loglikelihood approaches an asymptote
> (essentially once the other coefficients "settle down").
>
> The coxph routine tries to detect this case and print a warning, and this
> turns out to be very hard to do accurately. I worked hard at tuning the
> threshold(s) for the message several years ago and finally gave up; I am
> guessing that the warning misses > 5% of the cases when the issue is true,
> and that 5% of the warnings that do print are incorrect.
> (And these estimates may be too optimistic.) Highly correlated
> predictors tend to trip
> it up, e.g., the truncated power spline basis used by the rcs function in
> Hmisc.
>
> All in all, I am not completely sure whether the message does more harm
> than good. I'd be quite reluctant to go down the same path again with the
> glm function.
>
> Terry Therneau
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rdevel>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rdevel

