Hosmer Lemeshow test

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hosmer Lemeshow test

Endy BlackEndy
Hi to everybody. I use the following routine (i found it in the internet)
to compute the Hosmer-Lemeshow test in the framework of logistic regression.

hosmerlemeshow = function(obj, g=10) {
# first, check to see if we fed in the right kind of object
   stopifnot(family(obj)$family=="binomial" && family(obj)$link=="logit")
   y = obj$model[[1]]
   # the double bracket (above) gets the index of items within an object
   if (is.factor(y))
      y = as.numeric(y)==2
   yhat = obj$fitted.values
   cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)

   obs = xtabs(cbind(1 - y, y) ~ cutyhat)
   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
   if (any(expect < 5))
   #   warning("Some expected counts are less than 5. Use smaller number of
groups")
   chisq = sum((obs - expect)^2/expect)
   P = 1 - pchisq(chisq, g - 2)
   cat("H-L2 statistic",round(chisq,4)," and its p value",P,"\n")
   # by returning an object of class "htest", the function will perform
like the
   # built-in hypothesis tests
   return(structure(list(
      method = c(paste("Hosmer and Lemeshow goodness-of-fit test with", g,
"bins", sep=" ")),
      data.name = deparse(substitute(obj)),
      statistic = c(X2=chisq),
      parameter = c(df=g-2),
      p.value = P
   ), class='htest'))
   return(list(chisq=chisq,p.value=P))
}

However when i run it with the data set below i get the value NaN. Using
the same data set with SPSS i get a specific value.

FlightNo Temp ThermalDisast
1 66 0
2 70 1
3 69 0
4 68 0
5 67 0
6 72 0
7 73 0
8 70 0
9 57 1
10 63 1
11 70 1
12 78 0
13 67 0
14 53 1
15 67 0
16 75 0
17 70 0
18 81 0
19 76 0
20 79 0
21 75 1
22 76 0
23 58 1

Any suggestions will greatly appreciated.
With regards
Endy

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hosmer Lemeshow test

David Carlson
The warning is commented out or it would have been printed:
   #   warning("Some expected counts are less than 5. Use smaller number of
groups")

For 23 data points, the default of 10 bins is too many since one of the bins
is 0. Eight bins gives the warning, but prints results. You didn't indicate
how many bins SPSS used.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Endy BlackEndy
Sent: Tuesday, April 23, 2013 10:31 AM
To: [hidden email]
Subject: [R] Hosmer Lemeshow test

Hi to everybody. I use the following routine (i found it in the internet) to
compute the Hosmer-Lemeshow test in the framework of logistic regression.

hosmerlemeshow = function(obj, g=10) {
# first, check to see if we fed in the right kind of object
   stopifnot(family(obj)$family=="binomial" && family(obj)$link=="logit")
   y = obj$model[[1]]
   # the double bracket (above) gets the index of items within an object
   if (is.factor(y))
      y = as.numeric(y)==2
   yhat = obj$fitted.values
   cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)

   obs = xtabs(cbind(1 - y, y) ~ cutyhat)
   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
   if (any(expect < 5))
   #   warning("Some expected counts are less than 5. Use smaller number of
groups")
   chisq = sum((obs - expect)^2/expect)
   P = 1 - pchisq(chisq, g - 2)
   cat("H-L2 statistic",round(chisq,4)," and its p value",P,"\n")
   # by returning an object of class "htest", the function will perform like
the
   # built-in hypothesis tests
   return(structure(list(
      method = c(paste("Hosmer and Lemeshow goodness-of-fit test with", g,
"bins", sep=" ")),
      data.name = deparse(substitute(obj)),
      statistic = c(X2=chisq),
      parameter = c(df=g-2),
      p.value = P
   ), class='htest'))
   return(list(chisq=chisq,p.value=P))
}

However when i run it with the data set below i get the value NaN. Using the
same data set with SPSS i get a specific value.

FlightNo Temp ThermalDisast
1 66 0
2 70 1
3 69 0
4 68 0
5 67 0
6 72 0
7 73 0
8 70 0
9 57 1
10 63 1
11 70 1
12 78 0
13 67 0
14 53 1
15 67 0
16 75 0
17 70 0
18 81 0
19 76 0
20 79 0
21 75 1
22 76 0
23 58 1

Any suggestions will greatly appreciated.
With regards
Endy

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Hosmer Lemeshow test

Frank Harrell
The H-L test is now considered to be obsolete by many.  One replacement is the Hosmer - le Cessie test as implemented in the rms package residuals.lrm function.  This is a one degree of freedom test.

One problem with H-L is its use of arbitrary binning and suboptimal power.  The new test requires no binning at all.  Probably better still are directed tests such as tests of linearity and additivity.
Frank
David Carlson wrote
The warning is commented out or it would have been printed:
   #   warning("Some expected counts are less than 5. Use smaller number of
groups")

For 23 data points, the default of 10 bins is too many since one of the bins
is 0. Eight bins gives the warning, but prints results. You didn't indicate
how many bins SPSS used.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Endy BlackEndy
Sent: Tuesday, April 23, 2013 10:31 AM
To: [hidden email]
Subject: [R] Hosmer Lemeshow test

Hi to everybody. I use the following routine (i found it in the internet) to
compute the Hosmer-Lemeshow test in the framework of logistic regression.

hosmerlemeshow = function(obj, g=10) {
# first, check to see if we fed in the right kind of object
   stopifnot(family(obj)$family=="binomial" && family(obj)$link=="logit")
   y = obj$model[[1]]
   # the double bracket (above) gets the index of items within an object
   if (is.factor(y))
      y = as.numeric(y)==2
   yhat = obj$fitted.values
   cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)

   obs = xtabs(cbind(1 - y, y) ~ cutyhat)
   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
   if (any(expect < 5))
   #   warning("Some expected counts are less than 5. Use smaller number of
groups")
   chisq = sum((obs - expect)^2/expect)
   P = 1 - pchisq(chisq, g - 2)
   cat("H-L2 statistic",round(chisq,4)," and its p value",P,"\n")
   # by returning an object of class "htest", the function will perform like
the
   # built-in hypothesis tests
   return(structure(list(
      method = c(paste("Hosmer and Lemeshow goodness-of-fit test with", g,
"bins", sep=" ")),
      data.name = deparse(substitute(obj)),
      statistic = c(X2=chisq),
      parameter = c(df=g-2),
      p.value = P
   ), class='htest'))
   return(list(chisq=chisq,p.value=P))
}

However when i run it with the data set below i get the value NaN. Using the
same data set with SPSS i get a specific value.

FlightNo Temp ThermalDisast
1 66 0
2 70 1
3 69 0
4 68 0
5 67 0
6 72 0
7 73 0
8 70 0
9 57 1
10 63 1
11 70 1
12 78 0
13 67 0
14 53 1
15 67 0
16 75 0
17 70 0
18 81 0
19 76 0
20 79 0
21 75 1
22 76 0
23 58 1

Any suggestions will greatly appreciated.
With regards
Endy

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University