

I'm trying to do a HosmerLemeshow 'goodness of fit' test on my logistic regression model.
I found some code here:
http://sasandr.blogspot.com/2010/09/example87hosmerandlemeshowgoodness.htmlThe R code is above is a little complicated for me but I'm having trouble with my answer:
HosmerLemeshow: p=0.6163585
le Cessie and Houwelingen test (Design library): p=0.2843620
The above link indicated they should be approximately equal which in my case they are not, any suggestions or is there a package function people would recommend in R for use with a logistic regression model?
Thanks in advance,
Rob Schutt

Robert Schutt, MD, MCS
Resident  Department of Internal Medicine
University of Virginia, Charlottesville, Virginia

########################################################
# Compute the HosmerLemeshow 'goodnessoffit' test
cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG
+ relevel (as.factor (num.obst.vessels),"one")
+ Current.smoker + DM + HTN + ace.inhibitor
+ MI, family = binomial(link = "logit"))
hosmerlem = function(y, yhat, g=10) {
cutyhat = cut(yhat,
breaks = quantile(yhat, probs=seq(0,
1, 1/g)), include.lowest=TRUE)
obs = xtabs(cbind(1  y, y) ~ cutyhat)
expect = xtabs(cbind(1  yhat, yhat) ~ cutyhat)
chisq = sum((obs  expect)^2/expect)
P = 1  pchisq(chisq, g  2)
return(list(chisq=chisq,p.value=P))
}
hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
########################################################33
# Compute the le Cessie and Houwelingen test
f < lrm(Collaterals ~ CHF + Age + CABG
+ relevel (as.factor (num.obst.vessels),"one")
+ Current.smoker + DM + HTN + ace.inhibitor
+ MI, x=TRUE, y=TRUE)
library(Design)
resid(f, 'gof')
Output:
> hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
$chisq
[1] 6.275889
$p.value
[1] 0.6163585> resid(f, 'gof')
Sum of squared errors Expected valueH0 SD
118.5308396 118.3771115 0.1435944
Z P
1.0705717 0.2843620


Please read the documentation carefully, and replace the Design package with the newer rms package.
The older HosmerLemeshow test requires binning and has lower power. It also does not penalize for overfitting. The newer goodness of fit test in rms/Design should not agree with HosmerLemeshow.
Frank
viostorm wrote
I'm trying to do a HosmerLemeshow 'goodness of fit' test on my logistic regression model.
I found some code here:
http://sasandr.blogspot.com/2010/09/example87hosmerandlemeshowgoodness.htmlThe R code is above is a little complicated for me but I'm having trouble with my answer:
HosmerLemeshow: p=0.6163585
le Cessie and Houwelingen test (Design library): p=0.2843620
The above link indicated they should be approximately equal which in my case they are not, any suggestions or is there a package function people would recommend in R for use with a logistic regression model?
Thanks in advance,
Rob Schutt

Robert Schutt, MD, MCS
Resident  Department of Internal Medicine
University of Virginia, Charlottesville, Virginia

########################################################
# Compute the HosmerLemeshow 'goodnessoffit' test
cd.full_model = glm(formula = Collaterals ~ CHF + Age + CABG
+ relevel (as.factor (num.obst.vessels),"one")
+ Current.smoker + DM + HTN + ace.inhibitor
+ MI, family = binomial(link = "logit"))
hosmerlem = function(y, yhat, g=10) {
cutyhat = cut(yhat,
breaks = quantile(yhat, probs=seq(0,
1, 1/g)), include.lowest=TRUE)
obs = xtabs(cbind(1  y, y) ~ cutyhat)
expect = xtabs(cbind(1  yhat, yhat) ~ cutyhat)
chisq = sum((obs  expect)^2/expect)
P = 1  pchisq(chisq, g  2)
return(list(chisq=chisq,p.value=P))
}
hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
########################################################33
# Compute the le Cessie and Houwelingen test
f < lrm(Collaterals ~ CHF + Age + CABG
+ relevel (as.factor (num.obst.vessels),"one")
+ Current.smoker + DM + HTN + ace.inhibitor
+ MI, x=TRUE, y=TRUE)
library(Design)
resid(f, 'gof')
Output:
> hosmerlem(y=Collaterals, yhat=fitted(cd.full_model))
$chisq
[1] 6.275889
$p.value
[1] 0.6163585> resid(f, 'gof')
Sum of squared errors Expected valueH0 SD
118.5308396 118.3771115 0.1435944
Z P
1.0705717 0.2843620
Frank Harrell
Department of Biostatistics, Vanderbilt University


Again, thanks so much for your help.
Is there a "for dummies" version for interpreting the le Cessie and Houwelingen test. I read the 1991 biometrics paper but honestly got lost in the math.
Is it interpreted the same way the HosmerLemeshow test is? ie, a nonsignificant result means model fits well.
I guess what does, what would my pvalue of p=0.284362 tell you about my model?
I would like to describe the goodness of fit of my model in a paper but I'm worried the average medical reader would not know how to interpret the result of this test whereas there are lots of references on interpreting HosmerLemeshow.
(my cross validated cstatistic was 0.69 and r^2 was 0.15)
Thanks again for your all help! (also with my kfold crossvalidation question!)
Rob

Robert Schutt, MD, MCS
Resident  Department of Internal Medicine
University of Virginia, Charlottesville, Virginia


The test in the rms package's residuals.lrm function is the le Cessie  van Houwelingen  Copas  Hosmer unweighted sum of squares test for global goodness of fit. Like all statistical tests, a large Pvalue has no information other than there was not sufficient evidence to reject the null hypothesis. Here the null hypothesis is that the true probabilities are those specified by the model. Such an omnibus test, though having good general power (better than HosmerLemeshow) lacks power to detect specific alternatives. That's why I spend more time allowing for nonlinearity of predictors.
Frank
viostorm wrote
Again, thanks so much for your help.
Is there a "for dummies" version for interpreting the le Cessie and Houwelingen test. I read the 1991 biometrics paper but honestly got lost in the math.
Is it interpreted the same way the HosmerLemeshow test is? ie, a nonsignificant result means model fits well.
I guess what does, what would my pvalue of p=0.284362 tell you about my model?
I would like to describe the goodness of fit of my model in a paper but I'm worried the average medical reader would not know how to interpret the result of this test whereas there are lots of references on interpreting HosmerLemeshow.
(my cross validated cstatistic was 0.69 and r^2 was 0.15)
Thanks again for your all help! (also with my kfold crossvalidation question!)
Rob

Robert Schutt, MD, MCS
Resident  Department of Internal Medicine
University of Virginia, Charlottesville, Virginia
Frank Harrell
Department of Biostatistics, Vanderbilt University


I am a new user in R, so I am sorry if this is a basic question but I am kind of lost here and I really would appreciatte your help.
I have behavioral information of sea lions and I've done a binomial Generalized Linear model using Rcmdr, to understand what variables are affecting the place where Male's fights are being made (water or land). One of my possible models includes Agression Type, number of females around the fight and temperature in the moment.
*******************
> GLM.2 < glm(Place ~ AgType + Females + Temp, family=binomial(cloglog),
+ data=June)
> summary(GLM.2)
Call:
glm(formula = Place ~ AgType + Females + Temp, family = binomial(cloglog),
data = June)
Deviance Residuals:
Min 1Q Median 3Q Max
2.4448 0.7248 0.4220 0.6741 1.8798
Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 3.43071 0.82164 4.175 2.97e05 ***
AgType 1.23463 0.37166 3.322 0.000894 ***
Females 0.05812 0.01320 4.404 1.06e05 ***
Temp 0.08614 0.02453 3.512 0.000444 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 175.79 on 141 degrees of freedom
Residual deviance: 125.33 on 138 degrees of freedom
AIC: 133.33
Number of Fisher Scoring iterations: 8
****************************************
When I ran hosmerlem test it seems that I have a good fit to the data.
************************
> hosmerlem(Place, fitted(GLM.2))
X^2 Df P(>Chi)
7.5123428 8.0000000 0.4824925
*****************************
However if I run only AgType, or Agtype + Temp, when running Hosmerlem test; it gives me an error that says: ERROR: 'breaks' are not unique.
I've been having the same error when I try to run other models or when I use different link functions like logit.
So... right now my questions are:
1. What is this error means? as far as I have seen in the Hosmerlem test code that I am using, it seems to depend on the yhat probs, but to be honest I don't really understand what is this "yhat". The code I am using is:
hosmerlem <
function (y, yhat, g = 10)
{
cutyhat < cut(yhat, breaks = quantile(yhat, probs = seq(0,
1, 1/g)), include.lowest = T)
obs < xtabs(cbind(1  y, y) ~ cutyhat)
expect < xtabs(cbind(1  yhat, yhat) ~ cutyhat)
chisq < sum((obs  expect)^2/expect)
P < 1  pchisq(chisq, g  2)
c("X^2" = chisq, Df = g  2, "P(>Chi)" = P)
}
2. I've seen in older posts that Cessie  van Houwelingen  Copas  Hosmer unweighted sum of squares test for global goodness of fit test; seems to be a better alternative. However I am probably doing something wrong because I can't get it work.
I installed and loaded rms package and I ran the following command:
> resid(GLM.2, 'gof')
As far as I understand 'gof' is another package with this description: Implementation of modelchecking technique based on cumulative residuals. So, for the command I also installed and loaded 'gof' package
However I get this: ERROR: 'arg' should be one of "deviance", "pearson", "working", "response", "partial"
If I use any of the arguments that it suggests, it gives me a list of residuals, but I doesn't show the Z or the P.
So, if someone could help me understand what I am doing wrong I really would appreciate it.


At 19:34 29/03/2012, JimeBoHe wrote:
>I am a new user in R, so I am sorry if this is a basic question but I am kind
>of lost here and I really would appreciatte your help.
Dear Jimena
Comments inline
>I have behavioral information of sea lions and I've done a binomial
>Generalized Linear model using Rcmdr, to understand what variables are
>affecting the place where Male's fights are being made (water or land). One
>of my possible models includes Agression Type, number of females around the
>fight and temperature in the moment.
>
>*******************
> > GLM.2 < glm(Place ~ AgType + Females + Temp, family=binomial(cloglog),
>+ data=June)
>
> > summary(GLM.2)
>
>Call:
>glm(formula = Place ~ AgType + Females + Temp, family = binomial(cloglog),
> data = June)
>
>Deviance Residuals:
> Min 1Q Median 3Q Max
>2.4448 0.7248 0.4220 0.6741 1.8798
>
>Coefficients:
> Estimate Std. Error z value Pr(>z)
>(Intercept) 3.43071 0.82164 4.175 2.97e05 ***
>AgType 1.23463 0.37166 3.322 0.000894 ***
>Females 0.05812 0.01320 4.404 1.06e05 ***
>Temp 0.08614 0.02453 3.512 0.000444 ***
>
>Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 175.79 on 141 degrees of freedom
>Residual deviance: 125.33 on 138 degrees of freedom
>AIC: 133.33
>
>Number of Fisher Scoring iterations: 8
>****************************************
>
>
>When I ran hosmerlem test it seems that I have a good fit to the data.
>
>************************
> > hosmerlem(Place, fitted(GLM.2))
> X^2 Df P(>Chi)
>7.5123428 8.0000000 0.4824925
>*****************************
>
>However if I run only AgType, or Agtype + Temp, when running Hosmerlem test;
>it gives me an error that says: * ERROR: 'breaks' are not unique*.
>
>I've been having the same error when I try to run other models or when I use
>different link functions like logit.
>
>
>So... right now my questions are:
>
>1. What is this error means? as far as I have seen in the Hosmerlem test
>code that I am using, it seems to depend on the yhat probs, but to be honest
>I don't really understand what is this "yhat". The code I am using is:
yhat is the predicted value of y (the outcome variable). You do not
provide us with a reproducible example so this is only a guess but I
suspect if you look at the values of yhat for the failing models you
will find you have many replicated values and so at least two values
of the breakpoints are identical. But, as I say, this is at best a guess.
>hosmerlem <
>function (y, yhat, g = 10)
>{
> cutyhat < cut(yhat, breaks = quantile(yhat, probs = seq(0,
> 1, 1/g)), include.lowest = T)
> obs < xtabs(cbind(1  y, y) ~ cutyhat)
> expect < xtabs(cbind(1  yhat, yhat) ~ cutyhat)
> chisq < sum((obs  expect)^2/expect)
> P < 1  pchisq(chisq, g  2)
> c("X^2" = chisq, Df = g  2, "P(>Chi)" = P)
>}
>
>
>2. I've seen in older posts that Cessie  van Houwelingen  Copas  Hosmer
>unweighted sum of squares test for global goodness of fit test; seems to be
>a better alternative. However I am probably doing something wrong because I
>can't get it work.
>
>I installed and loaded rms package and I ran the following command:
>
> > resid(GLM.2, 'gof')
>
>As far as I understand 'gof' is another package with this description:
>Implementation of modelchecking technique based on cumulative residuals.
>So, for the command I also installed and loaded 'gof' package
You seem to be so far away from success here that I think remote
advice is not going to help.
>However I get this: *ERROR: 'arg' should be one of "deviance", "pearson",
>"working", "response", "partial"*
>
>If I use any of the arguments that it suggests, it gives me a list of
>residuals, but I doesn't show the Z or the P.
That is because it is doing what it is documented to. I am not
familiar with the gof function nor do I see it in the index for rms
(as you implied) so it is a bit hard to know what you are trying to do here.
Michael Dewey
[hidden email]
http://www.aghmed.fsnet.co.uk/home.html______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

