Michael Dewey wrote:

> At 17:12 09/04/06, RamÃ³n Casero CaÃ±as wrote:

>

> I am not sure what the problem you really want to solve is but it seems

> that

> a) abnormality is rare

> b) the logistic regression predicts it to be rare.

> If you want a prediction system why not try different cut-offs (other

> than 0.5 on the probability scale) and perhaps plot sensitivity and

> specificity to help to choose a cut-off?

Thanks for your suggestions, Michael. It took me some time to figure out

how to do this in R (as trivial as it may be for others). Some comments

about what I've done follow, in case anyone is interested.

The problem is a) abnormality is rare (Prevalence=14%) and b) there is

not much difference in the independent variable between abnormal and

normal. So the logistic regression model predicts that P(abnormal) <=

0.4. I got confused with this, as I expected a cut-off point of P=0.5 to

decide between normal/abnormal. But you are right, in that another

cut-off point can be chosen.

For a cut-off of e.g. P(abnormal)=0.15, Sensitivity=65% and

Specificity=52%. They are pretty bad, although for clinical purposes I

would say that Positive/Negative Predictive Values are more interesting.

But then PPV=19% and NPV=90%, which isn't great. As an overall test of

how good the model is for classification I have computed the area under

the ROC, from your suggestion of using Sensitivity and Specificity.

I couldn't find how to do this directly with R, so I implemented it

myself (it's not difficult but I'm new here). I tried with package ROCR,

but apparently it doesn't cover binary outcomes.

The area under the ROC is 0.64, so I would say that even though the

model seems to fit the data, it just doesn't allow acceptable

discrimination, not matter what the cut-off point.

I have also studied the effect of low prevalence. For this, I used

option ran.gen in the boot function (package boot) to define a function

that resamples the data so that it balances abnormal and normal cases.

A logistic regression model is fitted to each replicate, to a parametric

bootstrap, and thus compute the bias of the estimates of the model

coefficients, beta0 and beta1. This shows very small bias for beta1, but

a rather large bias for beta0.

So I would say that prevalence has an effect on beta0, but not beta1.

This is good, because a common measure like the odds ratio depends only

on beta1.

Cheers,

--

Ramón Casero Cañas

http://www.robots.ox.ac.uk/~rcasero/wikihttp://www.robots.ox.ac.uk/~rcasero/blog______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html