# ROC optimal threshold

7 messages
Open this post in threaded view
|
Report Content as Inappropriate

## ROC optimal threshold

 hello, I am using the ROC package to evaluate predictive models I have successfully plot the ROC curve, however ¿is there anyway to obtain the value of operating point=optimal threshold value (i.e. the nearest point of the curve to the top-left corner of the axes)? thank you very much, jose daniel anadon area de ecologia universidad miguel hernandez españa ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|
Report Content as Inappropriate

 Jose - I've struggled a bit with the same question, said another way: "how do you find the value in a ROC curve that minimizes false positives while maximizing true positives"? Here's something I've come up with. I'd be curious to hear from the list whether anyone thinks this code might get stuck in local minima, or if it does find the global minimum each time. (I think it's ok). >From your ROC object you need to grab the sensitivity (=true positive rate) and specificity (= 1- false positive rate) and the cutoff levels.  Then find the value that minimizes abs(sensitivity-specificity), or  sqrt((1-sens)^2)+(1-spec)^2)) as follows: absMin <- extract[which.min(abs(extract$sens-extract$spec)),]; sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),]; In this example, 'extract' is a dataframe containing three columns: extract$sens = sensitivity values, extract$spec = specificity values, extract$votes = cutoff values. The command subsets the dataframe to a single row containing the desired cutoff and the sens and spec values that are associated with it. Most of the time these two answers (abs or sqrt) are the same, sometimes they differ quite a bit. I do not see this application of ROC curves very often. A question for those much more knowledgeable than I.... is there a problem with using ROC curves in this manner? Tim Howard Date: Fri, 31 Mar 2006 11:58:14 +0200 From: "Anadon Herrera, Jose Daniel" <[hidden email]> Subject: [R] ROC optimal threshold To: "'[hidden email]'" <[hidden email]> Message-ID: <[hidden email]> Content-Type: text/plain; charset=iso-8859-1 hello, I am using the ROC package to evaluate predictive models I have successfully plot the ROC curve, however ?is there anyway to obtain the value of operating point=optimal threshold value (i.e. the nearest point of the curve to the top-left corner of the axes)? thank you very much, jose daniel anadon area de ecologia universidad miguel hernandez espa?a ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Reply | Threaded Open this post in threaded view | Report Content as Inappropriate ## Re: ROC optimal threshold  Hi Tim and José, >> Date: Fri, 31 Mar 2006 11:58:14 +0200 >> From: "Anadon Herrera, Jose Daniel" <[hidden email]> >> Subject: [R] ROC optimal threshold >> >> I am using the ROC package to evaluate predictive models >> I have successfully plot the ROC curve, however >> >> ?is there anyway to obtain the value of operating point=optimal >> threshold >> value (i.e. the nearest point of the curve to the top-left corner >> of the >> axes)? On Mar 31, 2006, at 8:01 AM, Tim Howard wrote: > I've struggled a bit with the same question, said another way: "how > do you find the value in a ROC curve that minimizes false positives > while maximizing true positives"? > > Here's something I've come up with. I'd be curious to hear from the > list whether anyone thinks this code might get stuck in local > minima, or if it does find the global minimum each time. (I think > it's ok). > >> From your ROC object you need to grab the sensitivity (=true >> positive rate) and specificity (= 1- false positive rate) and the >> cutoff levels. Then find the value that minimizes abs(sensitivity- >> specificity), or sqrt((1-sens)^2)+(1-spec)^2)) as follows: > > absMin <- extract[which.min(abs(extract$sens-extract$spec)),]; > sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract > $spec)^2)),]; > > In this example, 'extract' is a dataframe containing three columns: > extract$sens = sensitivity values, extract$spec = specificity > values, extract$votes = cutoff values. The command subsets the   > dataframe to a single row containing the desired cutoff and the   > sens and spec values that are associated with it. > > Most of the time these two answers (abs or sqrt) are the same,   > sometimes they differ quite a bit. > > I do not see this application of ROC curves very often. A question   > for those much more knowledgeable than I.... is there a problem   > with using ROC curves in this manner? > > Tim Howard @BOOK{MacmillanCreelman2005,    title = {Detection theory: {A} user's guide},    publisher = {Lawrence Erlbaum Associates},    year = {2005},    address = {Mahwah, NJ, USA},    edition = {2nd},    author = {Macmillan, Neil A and Creelman, C Douglas}, } on p. 43 shows that the ideal value of the cutoff depends on the   reward function R that specifies the payoff for each outcome: $LR(x) = \beta = \frac{R(true negative) - R{false positive)}{R(true positive) - R(false negative)} \frac{p(noise)}{p(signal)}$ I believe that your attempt to minimize false positives while   maximizing true positives amounts to maximizing the proportion of   correct answers. For that you just set $\beta = 0$. Otherwise it   might be best to explicitly state your costs and benefits by   specifying the reward function R. _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS:     P.O.Box 400400    Charlottesville, VA 22904-4400 Parcels:    Room 102        Gilmer Hall          McCormick Road    Charlottesville, VA 22903 Office:    B011    +1-434-982-4729 Lab:        B019    +1-434-982-4751 Fax:        +1-434-982-4766 WWW:    http://www.people.virginia.edu/~mk9y/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|
Report Content as Inappropriate

 In reply to this post by Tim Howard If you define a cost function for a given threshold k as    cost(k) = FP(k) + lambda * FN(k) then choose k that minimises cost. FP and FN are false positives and false negatives at threshold k. You change lambda to a value greater than 1 if you want to penalise FN more than FP. There are many situations where this is desirable. For example when you have highly unbalanced class sizes. For example consider a problem where you want to predict rare events and you will be penalised much more heavily if you miss an event than a non-event. I believe the ROC was designed to compare two methods over a range of thresholds and not for choosing the threshold itself. Regards, Adai On Fri, 2006-03-31 at 08:01 -0500, Tim Howard wrote: > Jose - > > I've struggled a bit with the same question, said another way: "how do you find the value in a ROC curve that minimizes false positives while maximizing true positives"? > > Here's something I've come up with. I'd be curious to hear from the list whether anyone thinks this code might get stuck in local minima, or if it does find the global minimum each time. (I think it's ok). > > >From your ROC object you need to grab the sensitivity (=true positive rate) and specificity (= 1- false positive rate) and the cutoff levels.  Then find the value that minimizes abs(sensitivity-specificity), or  sqrt((1-sens)^2)+(1-spec)^2)) as follows: > > absMin <- extract[which.min(abs(extract$sens-extract$spec)),]; > sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),]; > > In this example, 'extract' is a dataframe containing three columns: extract$sens = sensitivity values, extract$spec = specificity values, extract$votes = cutoff values. The command subsets the dataframe to a single row containing the desired cutoff and the sens and spec values that are associated with it. > > Most of the time these two answers (abs or sqrt) are the same, sometimes they differ quite a bit. > > I do not see this application of ROC curves very often. A question for those much more knowledgeable than I.... is there a problem with using ROC curves in this manner? > > Tim Howard > > > > > Date: Fri, 31 Mar 2006 11:58:14 +0200 > From: "Anadon Herrera, Jose Daniel" <[hidden email]> > Subject: [R] ROC optimal threshold > To: "'[hidden email]'" <[hidden email]> > Message-ID: > <[hidden email]> > Content-Type: text/plain; charset=iso-8859-1 > > hello, > > I am using the ROC package to evaluate predictive models > I have successfully plot the ROC curve, however > > ?is there anyway to obtain the value of operating point=optimal threshold > value (i.e. the nearest point of the curve to the top-left corner of the > axes)? > > thank you very much, > > > jose daniel anadon > area de ecologia > universidad miguel hernandez > > espa?a > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Reply | Threaded Open this post in threaded view | Report Content as Inappropriate ## Re: ROC optimal threshold  In reply to this post by Michael Kubovy Michael Kubovy wrote: > Hi Tim and José, > > >>>Date: Fri, 31 Mar 2006 11:58:14 +0200 >>>From: "Anadon Herrera, Jose Daniel" <[hidden email]> >>>Subject: [R] ROC optimal threshold >>> >>>I am using the ROC package to evaluate predictive models >>>I have successfully plot the ROC curve, however >>> >>>?is there anyway to obtain the value of operating point=optimal >>>threshold >>>value (i.e. the nearest point of the curve to the top-left corner >>>of the >>>axes)? > > > On Mar 31, 2006, at 8:01 AM, Tim Howard wrote: > > >>I've struggled a bit with the same question, said another way: "how >>do you find the value in a ROC curve that minimizes false positives >>while maximizing true positives"? >> >>Here's something I've come up with. I'd be curious to hear from the >>list whether anyone thinks this code might get stuck in local >>minima, or if it does find the global minimum each time. (I think >>it's ok). >> >> >>>From your ROC object you need to grab the sensitivity (=true >>>positive rate) and specificity (= 1- false positive rate) and the >>>cutoff levels. Then find the value that minimizes abs(sensitivity- >>>specificity), or sqrt((1-sens)^2)+(1-spec)^2)) as follows: >> >>absMin <- extract[which.min(abs(extract$sens-extract$spec)),]; >>sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract >>$spec)^2)),]; >> >>In this example, 'extract' is a dataframe containing three columns: >>extract$sens = sensitivity values, extract$spec = specificity >>values, extract$votes = cutoff values. The command subsets the   >>dataframe to a single row containing the desired cutoff and the   >>sens and spec values that are associated with it. >> >>Most of the time these two answers (abs or sqrt) are the same,   >>sometimes they differ quite a bit. >> >>I do not see this application of ROC curves very often. A question   >>for those much more knowledgeable than I.... is there a problem   >>with using ROC curves in this manner? >> >>Tim Howard > > > @BOOK{MacmillanCreelman2005, >    title = {Detection theory: {A} user's guide}, >    publisher = {Lawrence Erlbaum Associates}, >    year = {2005}, >    address = {Mahwah, NJ, USA}, >    edition = {2nd}, >    author = {Macmillan, Neil A and Creelman, C Douglas}, > } > on p. 43 shows that the ideal value of the cutoff depends on the   > reward function R that specifies the payoff for each outcome: > $> LR(x) = \beta = \frac{R(true negative) - R{false positive)}{R(true > positive) - R(false negative)} \frac{p(noise)}{p(signal)} >$ > > I believe that your attempt to minimize false positives while   > maximizing true positives amounts to maximizing the proportion of   > correct answers. For that you just set $\beta = 0$. Otherwise it   > might be best to explicitly state your costs and benefits by   > specifying the reward function R. > _____________________________ > Professor Michael Kubovy Choosing cutoffs is frought with difficulties, arbitrariness, inefficiency, and the necessity to use a complex adjustment for multiple comparisons in later analysis steps unless the dataset used to generate the cutoff was so large as could be considered infinite. -- Frank E Harrell Jr   Professor and Chair           School of Medicine                       Department of Biostatistics   Vanderbilt University ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Frank Harrell Department of Biostatistics, Vanderbilt University
Open this post in threaded view
|
Report Content as Inappropriate