ROCR package question for evaluating two regression models

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ROCR package question for evaluating two regression models

Andra Isan
Hello All, 
I have used logistic regression glm in R and I am evaluating two models both learned with glm but with different predictors. model1 <- glm (Y ~ x4+ x5+ x6+ x7, data = dat, family = binomial(link=logit))model2 <- glm (Y~ x1 + x2 +x3 , data = dat, family = binomial(link=logit)) 
and I would like to compare these two models based on the prediction that I get from each model:
pred1 = predict(model1, test.data, type = "response")pred2 = predict(model2, test.data, type = "response")
I have used ROCR package to compare them:pr1 = prediction(pred1,test.y)pf1 = performance(pr1, measure = "prec", x.measure = "rec")  plot(pf1) which cutoff this plot is based on?
pr2 = prediction(pred2,test.y)pf2 = performance(pr2, measure = "prec", x.measure = "rec")pf2_roc  = performance(pr2,measure="err")plot(pf2)
First of all, I would like to use cutoff = 0.5 and plot the ROC, precision-recall curves based on that cutoff value. In other words, how to define a cut off value in performance function?For example, in pf2_roc  = performance(pr2,measure="err"), when I do plot(pf2_roc), it plots for every single cutoff point. I only want to have one cut off point, is there any way to do that?Second, I would like to see the performance of the two models based on the above measures on the same plot so the comparison would be easier. In other words, how can I plot (pf1, pf2) and compare them together?plot(pf1, pf2) would give me an error as follows:Error in as.double(x) :   cannot coerce type 'S4' to vector of type 'double'
Could you please help me with that?
Thanks a lot,Andra



        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ROCR package question for evaluating two regression models

Frank Harrell
It is not possible to have one cutoff point unless you have a very strange utility function.  Nor is there a need for a cutoff when using a probability model.

It is not advisable to compare models based on ROC area as this loses power.  A likelihood-based approach is recommended.
Frank
Andra Isan wrote
Hello All, 
I have used logistic regression glm in R and I am evaluating two models both learned with glm but with different predictors. model1 <- glm (Y ~ x4+ x5+ x6+ x7, data = dat, family = binomial(link=logit))model2 <- glm (Y~ x1 + x2 +x3 , data = dat, family = binomial(link=logit)) 
and I would like to compare these two models based on the prediction that I get from each model:
pred1 = predict(model1, test.data, type = "response")pred2 = predict(model2, test.data, type = "response")
I have used ROCR package to compare them:pr1 = prediction(pred1,test.y)pf1 = performance(pr1, measure = "prec", x.measure = "rec")  plot(pf1) which cutoff this plot is based on?
pr2 = prediction(pred2,test.y)pf2 = performance(pr2, measure = "prec", x.measure = "rec")pf2_roc  = performance(pr2,measure="err")plot(pf2)
First of all, I would like to use cutoff = 0.5 and plot the ROC, precision-recall curves based on that cutoff value. In other words, how to define a cut off value in performance function?For example, in pf2_roc  = performance(pr2,measure="err"), when I do plot(pf2_roc), it plots for every single cutoff point. I only want to have one cut off point, is there any way to do that?Second, I would like to see the performance of the two models based on the above measures on the same plot so the comparison would be easier. In other words, how can I plot (pf1, pf2) and compare them together?plot(pf1, pf2) would give me an error as follows:Error in as.double(x) :   cannot coerce type 'S4' to vector of type 'double'
Could you please help me with that?
Thanks a lot,Andra



        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: ROCR package question for evaluating two regression models

RockO
In reply to this post by Andra Isan
Hi Andra,
I have been doing some ROC analysis for a new diagnosis test. I used the pROC package to assess thresholds and compare different diagnosis tests to a "gold standard". In your case, let say the gold standard are the observed values y0.

Here is an example:
y0 <- sample(0:1,50,replace=TRUE) # Make observed binomial values
test1<-sample(0:100,50,replace=TRUE)/100
y1 <- ifelse(y0==0,test,1-test)  # Make first predicted model values
test2<-sample(0:100,50,replace=TRUE)/100
y2 <- ifelse(y0==0,test,1-test)  # make 2nd predicted model values

library(pROC)
i1<-roc(response=y0,predictor=y1,percent=TRUE, plot=TRUE, of="threshold",ci=T, lwd=1,lty=2,thresholds="best", asp=1)
i2<-roc(response=y0,predictor=y2,percent=TRUE, plot=TRUE, of="threshold",ci=T, lwd=1,lty=3,thresholds="best", add=T)

coords(i1,x="best",best.method="youden") # Best threshold of y1 with the Youden index
coords(i2,x="best",best.method="youden") # Best threshold of y1 with the Youden index

roc.test(i1,i2) # Compare the performance of the best threshold of y1 and y2

See ?pROC for more details.

Hope this help,

Rock