Hosmer-Lemeshow 'goodness of fit'

6 messages
Open this post in threaded view
|

Hosmer-Lemeshow 'goodness of fit'

 I'm trying to do a Hosmer-Lemeshow 'goodness of fit' test on my logistic regression model. I found some code here: http://sas-and-r.blogspot.com/2010/09/example-87-hosmer-and-lemeshow-goodness.htmlThe R code is above is a little complicated for me but I'm having trouble with my answer: Hosmer-Lemeshow: p=0.6163585 le Cessie and Houwelingen test (Design library): p=0.2843620 The above link indicated they should be approximately equal which in my case they are not, any suggestions or is there a package function people would recommend in R for use with a logistic regression model? Thanks in advance, -Rob Schutt -------------------------------- Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia ------------------ ######################################################## # Compute the Hosmer-Lemeshow 'goodness-of-fit' test cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG         + relevel (as.factor (num.obst.vessels),"one")         + Current.smoker + DM + HTN + ace.inhibitor         + MI, family = binomial(link = "logit")) hosmerlem = function(y, yhat, g=10) {   cutyhat = cut(yhat,      breaks = quantile(yhat, probs=seq(0,        1, 1/g)), include.lowest=TRUE)   obs = xtabs(cbind(1 - y, y) ~ cutyhat)   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)   chisq = sum((obs - expect)^2/expect)   P = 1 - pchisq(chisq, g - 2)   return(list(chisq=chisq,p.value=P)) } hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) ########################################################33 # Compute the le Cessie and Houwelingen test f <- lrm(Collaterals ~ CHF + Age + CABG         + relevel (as.factor (num.obst.vessels),"one")         + Current.smoker + DM + HTN + ace.inhibitor         + MI, x=TRUE, y=TRUE) library(Design)   resid(f, 'gof') Output: > hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) \$chisq [1] 6.275889 \$p.value [1] 0.6163585> resid(f, 'gof') Sum of squared errors     Expected value|H0                    SD           118.5308396           118.3771115             0.1435944                     Z                     P             1.0705717             0.2843620
Open this post in threaded view
|

Re: Hosmer-Lemeshow 'goodness of fit'

 Please read the documentation carefully, and replace the Design package with the newer rms package. The older Hosmer-Lemeshow test requires binning and has lower power.  It also does not penalize for overfitting.  The newer goodness of fit test in rms/Design should not agree with Hosmer-Lemeshow. Frank viostorm wrote I'm trying to do a Hosmer-Lemeshow 'goodness of fit' test on my logistic regression model. I found some code here: http://sas-and-r.blogspot.com/2010/09/example-87-hosmer-and-lemeshow-goodness.htmlThe R code is above is a little complicated for me but I'm having trouble with my answer: Hosmer-Lemeshow: p=0.6163585 le Cessie and Houwelingen test (Design library): p=0.2843620 The above link indicated they should be approximately equal which in my case they are not, any suggestions or is there a package function people would recommend in R for use with a logistic regression model? Thanks in advance, -Rob Schutt -------------------------------- Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia ------------------ ######################################################## # Compute the Hosmer-Lemeshow 'goodness-of-fit' test cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG         + relevel (as.factor (num.obst.vessels),"one")         + Current.smoker + DM + HTN + ace.inhibitor         + MI, family = binomial(link = "logit")) hosmerlem = function(y, yhat, g=10) {   cutyhat = cut(yhat,      breaks = quantile(yhat, probs=seq(0,        1, 1/g)), include.lowest=TRUE)   obs = xtabs(cbind(1 - y, y) ~ cutyhat)   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)   chisq = sum((obs - expect)^2/expect)   P = 1 - pchisq(chisq, g - 2)   return(list(chisq=chisq,p.value=P)) } hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) ########################################################33 # Compute the le Cessie and Houwelingen test f <- lrm(Collaterals ~ CHF + Age + CABG         + relevel (as.factor (num.obst.vessels),"one")         + Current.smoker + DM + HTN + ace.inhibitor         + MI, x=TRUE, y=TRUE) library(Design)   resid(f, 'gof') Output: > hosmerlem(y=Collaterals, yhat=fitted(cd.full_model)) \$chisq [1] 6.275889 \$p.value [1] 0.6163585> resid(f, 'gof') Sum of squared errors     Expected value|H0                    SD           118.5308396           118.3771115             0.1435944                     Z                     P             1.0705717             0.2843620 Frank Harrell Department of Biostatistics, Vanderbilt University
Open this post in threaded view
|

Re: Hosmer-Lemeshow 'goodness of fit'

 Again, thanks so much for your help. Is there a "for dummies" version for interpreting the le Cessie and Houwelingen test.  I read the 1991 biometrics paper but honestly got lost in the math. Is it interpreted the same way the Hosmer-Lemeshow test is?  ie, a non-significant result means model fits well.   I guess what does, what would my p-value of p=0.284362 tell you about my model?   I would like to describe the goodness of fit of my model in a paper but I'm worried the average medical reader would not know how to interpret the result of this test whereas there are lots of references on interpreting Hosmer-Lemeshow. (my cross validated c-statistic was 0.69 and r^2 was 0.15) Thanks again for your all help! (also with my k-fold crossvalidation question!) -Rob -------------------------------- Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia
Open this post in threaded view
|

Re: Hosmer-Lemeshow 'goodness of fit'

 The test in the rms package's residuals.lrm function is the le Cessie - van Houwelingen - Copas - Hosmer unweighted sum of squares test for global goodness of fit.  Like all statistical tests, a large P-value has no information other than there was not sufficient evidence to reject the null hypothesis.  Here the null hypothesis is that the true probabilities are those specified by the model.  Such an omnibus test, though having good general power (better than Hosmer-Lemeshow) lacks power to detect specific alternatives.  That's why I spend more time allowing for nonlinearity of predictors. Frank viostorm wrote Again, thanks so much for your help. Is there a "for dummies" version for interpreting the le Cessie and Houwelingen test.  I read the 1991 biometrics paper but honestly got lost in the math. Is it interpreted the same way the Hosmer-Lemeshow test is?  ie, a non-significant result means model fits well.   I guess what does, what would my p-value of p=0.284362 tell you about my model?   I would like to describe the goodness of fit of my model in a paper but I'm worried the average medical reader would not know how to interpret the result of this test whereas there are lots of references on interpreting Hosmer-Lemeshow. (my cross validated c-statistic was 0.69 and r^2 was 0.15) Thanks again for your all help! (also with my k-fold crossvalidation question!) -Rob -------------------------------- Robert Schutt, MD, MCS Resident - Department of Internal Medicine University of Virginia, Charlottesville, Virginia Frank Harrell Department of Biostatistics, Vanderbilt University