Quantcast

ROC curve in R

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ROC curve in R

Rithesh M. Mohan
Hi,

 

I need to build ROC curve in R, can you please provide data steps / code
or guide me through it.

 

Thanks and Regards

Rithesh M Mohan


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Gaurav Yadav

http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=normal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs

there is a lot of help try help.search("ROC curve") gave
Help files with alias or concept or title matching 'ROC curve' using fuzzy
matching:



granulo(ade4)                             Granulometric Curves
plot.roc(analogue)                        Plot ROC curves and associated
diagnostics
roc(analogue)                             ROC curve analysis
colAUC(caTools)                           Column-wise Area Under ROC Curve
(AUC)
DProc(DPpackage)                          Semiparametric Bayesian ROC
curve analysis
cv.enet(elasticnet)                       Computes K-fold cross-validated
error curve for elastic net
ROC(Epi)                                  Function to compute and draw
ROC-curves.
lroc(epicalc)                             ROC curve
cv.lars(lars)                             Computes K-fold cross-validated
error curve for lars
roc.demo(TeachingDemos)                   Demonstrate ROC curves by
interactively building one

HTH
see the help and examples those will suffice

Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.



Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [hidden email]
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is



"Rithesh M. Mohan" <[hidden email]>
Sent by: [hidden email]
07/26/2007 11:26 AM

To
<[hidden email]>
cc

Subject
[R] ROC curve in R






Hi,

 

I need to build ROC curve in R, can you please provide data steps / code
or guide me through it.

 

Thanks and Regards

Rithesh M Mohan


                 [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Tobias Sing-2
In reply to this post by Rithesh M. Mohan
You might also want to try the ROCR package (http://rocr.bioinf.mpi-sb.mpg.de/).
Tutorial slides: http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt
Overview paper:
http://bioinformatics.oxfordjournals.org/cgi/content/full/21/20/3940

Good luck,
  Tobias


On 7/26/07, Rithesh M. Mohan <[hidden email]> wrote:

> Hi,
>
>
>
> I need to build ROC curve in R, can you please provide data steps / code
> or guide me through it.
>
>
>
> Thanks and Regards
>
> Rithesh M Mohan
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Tobias Sing
Computational Biology and Applied Algorithmics
Max Planck Institute for Informatics
Saarbrucken, Germany
Phone: +49 681 9325 315
Fax: +49 681 9325 399
http://www.tobiassing.net

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Frank Harrell
In reply to this post by Gaurav Yadav
Note that even though the ROC curve as a whole is an interesting
'statistic' (its area is a linear translation of the
Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
statistics), each individual point on it is an improper scoring rule,
i.e., a rule that is optimized by fitting an inappropriate model.  Using
curves to select cutoffs is a low-precision and arbitrary operation, and
the cutoffs do not replicate from study to study.  Probably the worst
problem with drawing an ROC curve is that it tempts analysts to try to
find cutoffs where none really exist, and it makes analysts ignore the
whole field of decision theory.

Frank Harrell


[hidden email] wrote:

> http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=normal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs
>
> there is a lot of help try help.search("ROC curve") gave
> Help files with alias or concept or title matching 'ROC curve' using fuzzy
> matching:
>
>
>
> granulo(ade4)                             Granulometric Curves
> plot.roc(analogue)                        Plot ROC curves and associated
> diagnostics
> roc(analogue)                             ROC curve analysis
> colAUC(caTools)                           Column-wise Area Under ROC Curve
> (AUC)
> DProc(DPpackage)                          Semiparametric Bayesian ROC
> curve analysis
> cv.enet(elasticnet)                       Computes K-fold cross-validated
> error curve for elastic net
> ROC(Epi)                                  Function to compute and draw
> ROC-curves.
> lroc(epicalc)                             ROC curve
> cv.lars(lars)                             Computes K-fold cross-validated
> error curve for lars
> roc.demo(TeachingDemos)                   Demonstrate ROC curves by
> interactively building one
>
> HTH
> see the help and examples those will suffice
>
> Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
>
>
>
> Regards,
>
> Gaurav Yadav
> +++++++++++
> Assistant Manager, CCIL, Mumbai (India)
> Mob: +919821286118 Email: [hidden email]
> Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
>
>
>
> "Rithesh M. Mohan" <[hidden email]>
> Sent by: [hidden email]
> 07/26/2007 11:26 AM
>
> To
> <[hidden email]>
> cc
>
> Subject
> [R] ROC curve in R
>
>
>
>
>
>
> Hi,
>
>  
>
> I need to build ROC curve in R, can you please provide data steps / code
> or guide me through it.
>
>  
>
> Thanks and Regards
>
> Rithesh M Mohan
>
>
>                  [[alternative HTML version deleted]]
>
-
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Dylan Beaudette-2
On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:

> Note that even though the ROC curve as a whole is an interesting
> 'statistic' (its area is a linear translation of the
> Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
> statistics), each individual point on it is an improper scoring rule,
> i.e., a rule that is optimized by fitting an inappropriate model.  Using
> curves to select cutoffs is a low-precision and arbitrary operation, and
> the cutoffs do not replicate from study to study.  Probably the worst
> problem with drawing an ROC curve is that it tempts analysts to try to
> find cutoffs where none really exist, and it makes analysts ignore the
> whole field of decision theory.
>
> Frank Harrell

Frank,

This thread has caught may attention for a couple reasons, possibly related to
my novice-level experience.

1. in a logistic regression study, where i am predicting the probability of
the response being 1 (for example) - there exists a continuum of probability
values - and a finite number of {1,0} realities when i either look within the
original data set, or with a new 'verification' data set. I understand that
drawing a line through the probabilities returned from the logistic
regression is a loss of information, but there are times when a 'hard'
decision requiring prediction of {1,0} is required. I have found that the
ROCR package (not necessarily the ROC Curve) can be useful in identifying the
probability cutoff where accuracy is maximized. Is this an unreasonable way
of using logistic regression as a predictor?

2. The ROC curve can be a helpful way of communicating false positives / false
negatives to other users who are less familiar with the output and
interpretation of logistic regression.


3. I have been using the area under the ROC Curve, kendall's tau, and cohen's
kappa to evaluate the accuracy of a logistic regression based prediction, the
last two statistics based on a some probability cutoff identified before
hand.


How does the topic of decision theory relate to some of the circumstances
described above? Is there a better way to do some of these things?

Cheers,

Dylan



>
> [hidden email] wrote:
> > http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=no
> >rmal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs
> >
> > there is a lot of help try help.search("ROC curve") gave
> > Help files with alias or concept or title matching 'ROC curve' using
> > fuzzy matching:
> >
> >
> >
> > granulo(ade4)                             Granulometric Curves
> > plot.roc(analogue)                        Plot ROC curves and associated
> > diagnostics
> > roc(analogue)                             ROC curve analysis
> > colAUC(caTools)                           Column-wise Area Under ROC
> > Curve (AUC)
> > DProc(DPpackage)                          Semiparametric Bayesian ROC
> > curve analysis
> > cv.enet(elasticnet)                       Computes K-fold cross-validated
> > error curve for elastic net
> > ROC(Epi)                                  Function to compute and draw
> > ROC-curves.
> > lroc(epicalc)                             ROC curve
> > cv.lars(lars)                             Computes K-fold cross-validated
> > error curve for lars
> > roc.demo(TeachingDemos)                   Demonstrate ROC curves by
> > interactively building one
> >
> > HTH
> > see the help and examples those will suffice
> >
> > Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
> >
> >
> >
> > Regards,
> >
> > Gaurav Yadav
> > +++++++++++
> > Assistant Manager, CCIL, Mumbai (India)
> > Mob: +919821286118 Email: [hidden email]
> > Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
> >
> >
> >
> > "Rithesh M. Mohan" <[hidden email]>
> > Sent by: [hidden email]
> > 07/26/2007 11:26 AM
> >
> > To
> > <[hidden email]>
> > cc
> >
> > Subject
> > [R] ROC curve in R
> >
> >
> >
> >
> >
> >
> > Hi,
> >
> >
> >
> > I need to build ROC curve in R, can you please provide data steps / code
> > or guide me through it.
> >
> >
> >
> > Thanks and Regards
> >
> > Rithesh M Mohan
> >
> >
> >                  [[alternative HTML version deleted]]
>
> -
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Frank Harrell
Dylan Beaudette wrote:

> On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
>> Note that even though the ROC curve as a whole is an interesting
>> 'statistic' (its area is a linear translation of the
>> Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
>> statistics), each individual point on it is an improper scoring rule,
>> i.e., a rule that is optimized by fitting an inappropriate model.  Using
>> curves to select cutoffs is a low-precision and arbitrary operation, and
>> the cutoffs do not replicate from study to study.  Probably the worst
>> problem with drawing an ROC curve is that it tempts analysts to try to
>> find cutoffs where none really exist, and it makes analysts ignore the
>> whole field of decision theory.
>>
>> Frank Harrell
>
> Frank,
>
> This thread has caught may attention for a couple reasons, possibly related to
> my novice-level experience.
>
> 1. in a logistic regression study, where i am predicting the probability of
> the response being 1 (for example) - there exists a continuum of probability
> values - and a finite number of {1,0} realities when i either look within the
> original data set, or with a new 'verification' data set. I understand that
> drawing a line through the probabilities returned from the logistic
> regression is a loss of information, but there are times when a 'hard'
> decision requiring prediction of {1,0} is required. I have found that the
> ROCR package (not necessarily the ROC Curve) can be useful in identifying the
> probability cutoff where accuracy is maximized. Is this an unreasonable way
> of using logistic regression as a predictor?

Logistic regression (with suitable attention to not assuming linearity
and to avoiding overfitting) is a great way to estimate P[Y=1].  Given
good predicted P[Y=1] and utilities (losses, costs) for incorrect
positive and negative decisions, an optimal decision is one that
optimizes expected utility.  The ROC curve does not play a direct role
in this regard.  If per-subject utilities are not available, the analyst
may make various assumptions about utilities (including the unreasonable
but often used assumption that utilities do not vary over subjects) to
find a cutoff on P[Y=1].  A very nice feature of P[Y=1] is that error
probabilities are self-contained.  For example if P[Y=1] = .02 for a
single subject and you predict Y=0, the probability of an error is .02
by definition.  One doesn't need to compute an overall error probability
over the whole distribution of subjects' risks.  If the cost of a false
negative is C, the expected cost is .02*C in this example.

>
> 2. The ROC curve can be a helpful way of communicating false positives / false
> negatives to other users who are less familiar with the output and
> interpretation of logistic regression.

What is more useful than that is a rigorous calibration curve estimate
to demonstrate the faithfulness of predicted P[Y=1] and a histogram
showing the distribution of predicted P[Y=1].  Models that put a lot of
predictions near 0 or 1 are the most discriminating.  Calibration curves
and risk distributions are easier to explain than ROC curves.  Too often
a statistician will solve for a cutoff on P[Y=1], imposing her own
utility function without querying any subjects.

>
>
> 3. I have been using the area under the ROC Curve, kendall's tau, and cohen's
> kappa to evaluate the accuracy of a logistic regression based prediction, the
> last two statistics based on a some probability cutoff identified before
> hand.

ROC area (equiv. to Wilcoxon-Mann-Whitney and Somers' Dxy rank
correlation between pred. P[Y=1] and Y) is a measure of pure
discrimination, not a measure of accuracy per se.  Rank correlation
(concordance) measures do not require the use of cutoffs.

>
>
> How does the topic of decision theory relate to some of the circumstances
> described above? Is there a better way to do some of these things?

See above re: expected loses/utilities.

Good questions.

Frank

>
> Cheers,
>
> Dylan
>
>
>
>> [hidden email] wrote:
>>> http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=no
>>> rmal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs
>>>
>>> there is a lot of help try help.search("ROC curve") gave
>>> Help files with alias or concept or title matching 'ROC curve' using
>>> fuzzy matching:
>>>
>>>
>>>
>>> granulo(ade4)                             Granulometric Curves
>>> plot.roc(analogue)                        Plot ROC curves and associated
>>> diagnostics
>>> roc(analogue)                             ROC curve analysis
>>> colAUC(caTools)                           Column-wise Area Under ROC
>>> Curve (AUC)
>>> DProc(DPpackage)                          Semiparametric Bayesian ROC
>>> curve analysis
>>> cv.enet(elasticnet)                       Computes K-fold cross-validated
>>> error curve for elastic net
>>> ROC(Epi)                                  Function to compute and draw
>>> ROC-curves.
>>> lroc(epicalc)                             ROC curve
>>> cv.lars(lars)                             Computes K-fold cross-validated
>>> error curve for lars
>>> roc.demo(TeachingDemos)                   Demonstrate ROC curves by
>>> interactively building one
>>>
>>> HTH
>>> see the help and examples those will suffice
>>>
>>> Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Gaurav Yadav
>>> +++++++++++
>>> Assistant Manager, CCIL, Mumbai (India)
>>> Mob: +919821286118 Email: [hidden email]
>>> Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
>>>
>>>
>>>
>>> "Rithesh M. Mohan" <[hidden email]>
>>> Sent by: [hidden email]
>>> 07/26/2007 11:26 AM
>>>
>>> To
>>> <[hidden email]>
>>> cc
>>>
>>> Subject
>>> [R] ROC curve in R
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I need to build ROC curve in R, can you please provide data steps / code
>>> or guide me through it.
>>>
>>>
>>>
>>> Thanks and Regards
>>>
>>> Rithesh M Mohan
>>>
>>>
>>>                  [[alternative HTML version deleted]]
>> -
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                       Department of Biostatistics   Vanderbilt University
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Gaurav Yadav
In reply to this post by Rithesh M. Mohan

Hi Ritesh
***please note Ritesh always mark a copy to the R-help mailing list :) ***

Please visit this link to get help in R
http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt#384,8,Examples 
(2/8): Precision/recall curves

futher :) what do you mean by PSA and cohort :) after some googling i got
this

co·hort(khôrt)
n.
1. A group or band of people.
2. A companion or associate.
3. A generational group as defined in demographics, statistics, or market
research: "The cohort of people aged 30 to 39 . . . were more
conservative" American Demographics.
4.
a. One of the 10 divisions of a Roman legion, consisting of 300 to 600
men.
b. A group of soldiers.

and for PSA i got  Prostate-specific antigen. A substance produced by the
prostate that may be found in an increased amount in the blood of men who
have prostate cancer, benign prostatic hyperplasia, or infection or
inflammation of the prostate.

Now please clarify what you want to model :) please dont take it otherwise
i am not from biology field. Please clarify :)


Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [hidden email]
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is



"Rithesh M. Mohan" <[hidden email]>
07/30/2007 01:30 PM

To
<[hidden email]>
cc

Subject
Re: [R] ROC curve in R






Hi Gaurav,
 
Need your help, I’m relatively new to R or even stats, so can you please
give me step by step details to get ROC curve in R.
 
Requirement.
 
To build ROC curve using only PSA(variable) alone of the original cohort
against the ROC of the Model of the original cohort.
 


It would be really great if you could help me with this.


 
Thanks and Regards
Rithesh


============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the "message") are confidential and intended
solely for the addressees. Unauthorized reading, copying, dissemination, distribution or
disclosure either whole or partial, is prohibited. If you receive this message in error,
please delete it and immediately notify the sender. Communicating through email is not
secure and capable of interception, corruption and delays. Anyone communicating with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks involved and their
consequences. The internet can not guarantee the integrity of this message. CCIL shall
(will) not therefore be liable for the message if modified. The recipient should check this
email and any attachments for the presence of viruses. CCIL accepts no liability for any
damage caused by any virus transmitted by this email.

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Rithesh M. Mohan
In reply to this post by Rithesh M. Mohan
Sorry Gaurav,

 

I'll make sure I mark a copy to r-help also.

 

As I have told, I'm new to R and even to statistics, so it will take some time for me to learn it.

 

Just help me get a simple ROC curve, please give an example of your own and explain the steps, no mater if its biology or any other field, I just need to get the logic behind it.

 

Thanks & Regards

Rithesh M Mohan

 

 

________________________________

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, July 30, 2007 4:28 PM
To: Rithesh M. Mohan
Cc: [hidden email]
Subject: Re: [R] ROC curve in R




Hi Ritesh
***please note Ritesh always mark a copy to the R-help mailing list :) ***

Please visit this link to get help in R
http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt#384,8,Examples (2/8): Precision/recall curves

futher :) what do you mean by PSA and cohort :) after some googling i got this

co·hort(khôrt)
n.
1. A group or band of people.
2. A companion or associate.
3. A generational group as defined in demographics, statistics, or market research: "The cohort of people aged 30 to 39 . . . were more conservative" American Demographics.
4.
a. One of the 10 divisions of a Roman legion, consisting of 300 to 600 men.
b. A group of soldiers.

and for PSA i got  Prostate-specific antigen. A substance produced by the prostate that may be found in an increased amount in the blood of men who have prostate cancer, benign prostatic hyperplasia, or infection or inflammation of the prostate.

Now please clarify what you want to model :) please dont take it otherwise i am not from biology field. Please clarify :)


Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [hidden email]
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is



"Rithesh M. Mohan" <[hidden email]>

07/30/2007 01:30 PM

To

<[hidden email]>

cc



Subject

Re: [R] ROC curve in R










Hi Gaurav,
 
Need your help, I'm relatively new to R or even stats, so can you please give me step by step details to get ROC curve in R.
 
Requirement.
 
To build ROC curve using only PSA(variable) alone of the original cohort against the ROC of the Model of the original cohort.
 


It would be really great if you could help me with this.


 
Thanks and Regards
Rithesh

============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Gaurav Yadav
In reply to this post by Rithesh M. Mohan

Hi Ritesh,

 what i understad of ROC analysis will be coming in other mail :)
excellent introduction can be found at
http://www.csee.usf.edu/~candamo/site/papers/ROCintro.pdf

http://rocr.bioinf.mpi-sb.mpg.de/

take this zip file :)
http://rocr.bioinf.mpi-sb.mpg.de/ROCR_1.0-2.zip
also ROCR and analogue R manual :) they are having good examples :)

please read it in english with the papers given above then it would be
really easy to interpret ROC curve.
Just try to grasp a simple thing that what is on x axis and what is on y
axis, further whether the values are in ascending or descending order.
accordingly try to visualize how the ROC space has be analogly divided to
give digital classification :)

########code starts here and taken from manual of
nanalogue####################
library(analogue)

## continue the example from roc()
example(roc)

## draw the ROC curve
plot(swap.roc, 1)

## draw the four default diagnostic plots
opar <- par(mfrow = c(2,2))
plot(swap.roc)
par(opar)


#################end of code snippet###########################



############R software working session##################

>
> ## draw the ROC curve
> plot(swap.roc, 1)
>
> ## draw the four default diagnostic plots
> opar <- par(mfrow = c(2,2))
> plot(swap.roc)
> par(opar)
> ## continue the example from roc()
> example(roc)
roc> ## continue the example from join()
roc> example(join)

join> ## load the example data
join> data(swapdiat)

join> data(swappH)

join> data(rlgh)

join> ## process so common set of columns for training and test
join> ## number of training set samples
join> n.train <- nrow(swapdiat)

join> ## merge training and test set on columns
join> dat <- join(swapdiat, rlgh, verbose = TRUE)

Summary:

            Rows Cols
Data set 1:  167  277
Data set 2:  101  139
Merged:      268  277


join> ## convert to proportions
join> dat <- dat / 100

join> ## subset data back into training and test sets
join> swapdiat <- dat[1:n.train, ]

join> rlgh <- dat[(n.train+1):nrow(dat), ]

roc> ## fit the MAT model using the squared chord distance measure
roc> swap.mat <- mat(swapdiat, swappH, method = "SQchord")

roc> ## fit the ROC curve to the SWAP diatom data using the MAT results
roc> ## Generate a grouping for the SWAP lakes
roc> clust <- hclust(as.dist(swap.mat$Dij), method = "ward")

roc> grps <- cutree(clust, 12)

roc> ## fit the ROC curve
roc> swap.roc <- roc(swap.mat, groups = grps)

roc> swap.roc

        ROC curve of dissimilarities

Optimal Dissimilarity = 0.894

AUC = 0.889, p-value: < 2.22e-16
No. within: 1214   No. outside: 12647

>
> ## draw the ROC curve
> plot(swap.roc, 1)
>
> ## draw the four default diagnostic plots
> opar <- par(mfrow = c(2,2))
> plot(swap.roc)
> par(opar)
>


##############end of demonstration session#########################



Sorry Gaurav,
 
I’ll make sure I mark a copy to r-help also.
 
As I have told, I’m new to R and even to statistics, so it will take some
time for me to learn it.
 
Just help me get a simple ROC curve, please give an example of your own
and explain the steps, no mater if its biology or any other field, I just
need to get the logic behind it.
 
Thanks & Regards
Rithesh M Mohan
 
 

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, July 30, 2007 4:28 PM
To: Rithesh M. Mohan
Cc: [hidden email]
Subject: Re: [R] ROC curve in R
 

Hi Ritesh
***please note Ritesh always mark a copy to the R-help mailing list :) ***


Please visit this link to get help in R
http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt#384,8,Examples 
(2/8): Precision/recall curves

futher :) what do you mean by PSA and cohort :) after some googling i got
this

co·hort(khôrt)
n.
1. A group or band of people.
2. A companion or associate.
3. A generational group as defined in demographics, statistics, or market
research: "The cohort of people aged 30 to 39 . . . were more
conservative" American Demographics.
4.
a. One of the 10 divisions of a Roman legion, consisting of 300 to 600
men.
b. A group of soldiers.

and for PSA i got  Prostate-specific antigen. A substance produced by the
prostate that may be found in an increased amount in the blood of men who
have prostate cancer, benign prostatic hyperplasia, or infection or
inflammation of the prostate.

Now please clarify what you want to model :) please dont take it otherwise
i am not from biology field. Please clarify :)


Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [hidden email]
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is


"Rithesh M. Mohan" <[hidden email]>
07/30/2007 01:30 PM


To
<[hidden email]>
cc
 
Subject
Re: [R] ROC curve in R
 


 
 




Hi Gaurav,
 
Need your help, I’m relatively new to R or even stats, so can you please
give me step by step details to get ROC curve in R.
 
Requirement.
 
To build ROC curve using only PSA(variable) alone of the original cohort
against the ROC of the Model of the original cohort.
 


It would be really great if you could help me with this.


 
Thanks and Regards
Rithesh

============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the "message") are confidential
and intended
solely for the addressees. Unauthorized reading, copying, dissemination,
distribution or
disclosure either whole or partial, is prohibited. If you receive this
message in error,
please delete it and immediately notify the sender. Communicating through
email is not
secure and capable of interception, corruption and delays. Anyone
communicating with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks
involved and their
consequences. The internet can not guarantee the integrity of this
message. CCIL shall
(will) not therefore be liable for the message if modified. The recipient
should check this
email and any attachments for the presence of viruses. CCIL accepts no
liability for any
damage caused by any virus transmitted by this email.
 


============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the "message") are confidential and intended
solely for the addressees. Unauthorized reading, copying, dissemination, distribution or
disclosure either whole or partial, is prohibited. If you receive this message in error,
please delete it and immediately notify the sender. Communicating through email is not
secure and capable of interception, corruption and delays. Anyone communicating with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks involved and their
consequences. The internet can not guarantee the integrity of this message. CCIL shall
(will) not therefore be liable for the message if modified. The recipient should check this
email and any attachments for the presence of viruses. CCIL accepts no liability for any
damage caused by any virus transmitted by this email.

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Rithesh M. Mohan
In reply to this post by Rithesh M. Mohan
Thanks Gaurav,

 

I'll try this and get back to you.

 

Rithesh M Mohan

 

________________________________

From: [hidden email] [mailto:[hidden email]]
Sent: Monday, July 30, 2007 6:01 PM
To: Rithesh M. Mohan
Cc: [hidden email]
Subject: RE: [R] ROC curve in R




Hi Ritesh,

 what i understad of ROC analysis will be coming in other mail :)
excellent introduction can be found at  http://www.csee.usf.edu/~candamo/site/papers/ROCintro.pdf 

http://rocr.bioinf.mpi-sb.mpg.de/ 

take this zip file :)
http://rocr.bioinf.mpi-sb.mpg.de/ROCR_1.0-2.zip 
also ROCR and analogue R manual :) they are having good examples :)

please read it in english with the papers given above then it would be really easy to interpret ROC curve.
Just try to grasp a simple thing that what is on x axis and what is on y axis, further whether the values are in ascending or descending order.
accordingly try to visualize how the ROC space has be analogly divided to give digital classification :)

########code starts here and taken from manual of nanalogue####################
library(analogue)

## continue the example from roc()
example(roc)

## draw the ROC curve
plot(swap.roc, 1)

## draw the four default diagnostic plots
opar <- par(mfrow = c(2,2))
plot(swap.roc)
par(opar)


#################end of code snippet###########################



############R software working session##################

>
> ## draw the ROC curve
> plot(swap.roc, 1)
>
> ## draw the four default diagnostic plots
> opar <- par(mfrow = c(2,2))
> plot(swap.roc)
> par(opar)
> ## continue the example from roc()
> example(roc)
roc> ## continue the example from join()
roc> example(join)

join> ## load the example data
join> data(swapdiat)

join> data(swappH)

join> data(rlgh)

join> ## process so common set of columns for training and test
join> ## number of training set samples
join> n.train <- nrow(swapdiat)

join> ## merge training and test set on columns
join> dat <- join(swapdiat, rlgh, verbose = TRUE)

Summary:

            Rows Cols
Data set 1:  167  277
Data set 2:  101  139
Merged:      268  277


join> ## convert to proportions
join> dat <- dat / 100

join> ## subset data back into training and test sets
join> swapdiat <- dat[1:n.train, ]

join> rlgh <- dat[(n.train+1):nrow(dat), ]

roc> ## fit the MAT model using the squared chord distance measure
roc> swap.mat <- mat(swapdiat, swappH, method = "SQchord")

roc> ## fit the ROC curve to the SWAP diatom data using the MAT results
roc> ## Generate a grouping for the SWAP lakes
roc> clust <- hclust(as.dist(swap.mat$Dij), method = "ward")

roc> grps <- cutree(clust, 12)

roc> ## fit the ROC curve
roc> swap.roc <- roc(swap.mat, groups = grps)

roc> swap.roc

        ROC curve of dissimilarities

Optimal Dissimilarity = 0.894

AUC = 0.889, p-value: < 2.22e-16
No. within: 1214   No. outside: 12647

>
> ## draw the ROC curve
> plot(swap.roc, 1)
>
> ## draw the four default diagnostic plots
> opar <- par(mfrow = c(2,2))
> plot(swap.roc)
> par(opar)
>


##############end of demonstration session#########################



Sorry Gaurav,
 
I'll make sure I mark a copy to r-help also.
 
As I have told, I'm new to R and even to statistics, so it will take some time for me to learn it.
 
Just help me get a simple ROC curve, please give an example of your own and explain the steps, no mater if its biology or any other field, I just need to get the logic behind it.
 
Thanks & Regards
Rithesh M Mohan
 
 



________________________________


From: [hidden email] [mailto:[hidden email]]
Sent: Monday, July 30, 2007 4:28 PM
To: Rithesh M. Mohan
Cc: [hidden email]
Subject: Re: [R] ROC curve in R
 

Hi Ritesh
***please note Ritesh always mark a copy to the R-help mailing list :) ***

Please visit this link to get help in R
http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt#384,8,Examples (2/8): Precision/recall curves

futher :) what do you mean by PSA and cohort :) after some googling i got this

co·hort(khôrt)
n.
1. A group or band of people.
2. A companion or associate.
3. A generational group as defined in demographics, statistics, or market research: "The cohort of people aged 30 to 39 . . . were more conservative" American Demographics.
4.
a. One of the 10 divisions of a Roman legion, consisting of 300 to 600 men.
b. A group of soldiers.

and for PSA i got  Prostate-specific antigen. A substance produced by the prostate that may be found in an increased amount in the blood of men who have prostate cancer, benign prostatic hyperplasia, or infection or inflammation of the prostate.

Now please clarify what you want to model :) please dont take it otherwise i am not from biology field. Please clarify :)


Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [hidden email]
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is

"Rithesh M. Mohan" <[hidden email]>

07/30/2007 01:30 PM



To

<[hidden email]>

cc

 

Subject

Re: [R] ROC curve in R


 



 







Hi Gaurav,

Need your help, I'm relatively new to R or even stats, so can you please give me step by step details to get ROC curve in R.

Requirement.

To build ROC curve using only PSA(variable) alone of the original cohort against the ROC of the Model of the original cohort.



It would be really great if you could help me with this.



Thanks and Regards
Rithesh

============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the "message") are confidential and intended
solely for the addressees. Unauthorized reading, copying, dissemination, distribution or
disclosure either whole or partial, is prohibited. If you receive this message in error,
please delete it and immediately notify the sender. Communicating through email is not
secure and capable of interception, corruption and delays. Anyone communicating with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks involved and their
consequences. The internet can not guarantee the integrity of this message. CCIL shall
(will) not therefore be liable for the message if modified. The recipient should check this
email and any attachments for the presence of viruses. CCIL accepts no liability for any
damage caused by any virus transmitted by this email.


 

============================================================================================
DISCLAIMER AND CONFIDENTIALITY CAUTION:

This message and any attachments with it (the "message") are confidential and intended
solely for the addressees. Unauthorized reading, copying, dissemination, distribution or
disclosure either whole or partial, is prohibited. If you receive this message in error,
please delete it and immediately notify the sender. Communicating through email is not
secure and capable of interception, corruption and delays. Anyone communicating with The
Clearing Corporation of India Limited (CCIL) by email accepts the risks involved and their
consequences. The internet can not guarantee the integrity of this message. CCIL shall
(will) not therefore be liable for the message if modified. The recipient should check this
email and any attachments for the presence of viruses. CCIL accepts no liability for any
damage caused by any virus transmitted by this email.




        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Dylan Beaudette-2
In reply to this post by Frank Harrell
On Thursday 26 July 2007 10:45, Frank E Harrell Jr wrote:

> Dylan Beaudette wrote:
> > On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
> >> Note that even though the ROC curve as a whole is an interesting
> >> 'statistic' (its area is a linear translation of the
> >> Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
> >> statistics), each individual point on it is an improper scoring rule,
> >> i.e., a rule that is optimized by fitting an inappropriate model.  Using
> >> curves to select cutoffs is a low-precision and arbitrary operation, and
> >> the cutoffs do not replicate from study to study.  Probably the worst
> >> problem with drawing an ROC curve is that it tempts analysts to try to
> >> find cutoffs where none really exist, and it makes analysts ignore the
> >> whole field of decision theory.
> >>
> >> Frank Harrell
> >
> > Frank,
> >
> > This thread has caught may attention for a couple reasons, possibly
> > related to my novice-level experience.
> >
> > 1. in a logistic regression study, where i am predicting the probability
> > of the response being 1 (for example) - there exists a continuum of
> > probability values - and a finite number of {1,0} realities when i either
> > look within the original data set, or with a new 'verification' data set.
> > I understand that drawing a line through the probabilities returned from
> > the logistic regression is a loss of information, but there are times
> > when a 'hard' decision requiring prediction of {1,0} is required. I have
> > found that the ROCR package (not necessarily the ROC Curve) can be useful
> > in identifying the probability cutoff where accuracy is maximized. Is
> > this an unreasonable way of using logistic regression as a predictor?

Thanks for the detailed response Frank. My follow-up questions are below:

> Logistic regression (with suitable attention to not assuming linearity
> and to avoiding overfitting) is a great way to estimate P[Y=1].  Given
> good predicted P[Y=1] and utilities (losses, costs) for incorrect
> positive and negative decisions, an optimal decision is one that
> optimizes expected utility.  The ROC curve does not play a direct role
> in this regard.  

Ok.

> If per-subject utilities are not available, the analyst
> may make various assumptions about utilities (including the unreasonable
> but often used assumption that utilities do not vary over subjects) to
> find a cutoff on P[Y=1].

Can you elaborate on what exactly a "per-subject utility" is? In my case, I am
trying to predict the occurance of specific soil features based on two
predictor variables: 1 continuous, the other categorical.  Thus far my
evaluation of how well this method works is based on how often I can
correctly predict (a categorical) quality.


> A very nice feature of P[Y=1] is that error
> probabilities are self-contained.  For example if P[Y=1] = .02 for a
> single subject and you predict Y=0, the probability of an error is .02
> by definition.  One doesn't need to compute an overall error probability
> over the whole distribution of subjects' risks.  If the cost of a false
> negative is C, the expected cost is .02*C in this example.

Interesting. The hang-up that I am having is that I need to predict from
{O,1}, as the direct users of this information are not currently interested
in in raw probabilities. As far as I know, in order to predict a class from a
probability I need use a cutoff... How else can I accomplish this without
imposing a cutoff on the entire dataset? One thought, identify a cutoff for
each level of the categorical predictor term in the model... (?)

> > 2. The ROC curve can be a helpful way of communicating false positives /
> > false negatives to other users who are less familiar with the output and
> > interpretation of logistic regression.
>
> What is more useful than that is a rigorous calibration curve estimate
> to demonstrate the faithfulness of predicted P[Y=1] and a histogram
> showing the distribution of predicted P[Y=1]

Ok. I can make that histogram - how would one go about making the 'rigorous
calibration curve' ? Note that I have a training set, from which the model is
built, and a smaller testing set for evaluation.


> .  Models that put a lot of
> predictions near 0 or 1 are the most discriminating.  Calibration curves
> and risk distributions are easier to explain than ROC curves.

By 'risk discrimination' do you mean said histogram ?

> Too often
> a statistician will solve for a cutoff on P[Y=1], imposing her own
> utility function without querying any subjects.

in this case I have picked a cutoff that resulted in the smallest number of
incorrectly classified observations , or highest kappa / tau statistics --
the results were very close.


> > 3. I have been using the area under the ROC Curve, kendall's tau, and
> > cohen's kappa to evaluate the accuracy of a logistic regression based
> > prediction, the last two statistics based on a some probability cutoff
> > identified before hand.
>
> ROC area (equiv. to Wilcoxon-Mann-Whitney and Somers' Dxy rank
> correlation between pred. P[Y=1] and Y) is a measure of pure
> discrimination, not a measure of accuracy per se.  Rank correlation
> (concordance) measures do not require the use of cutoffs.

Ok. Hopefully I am not abusing the kappa and tau statistics too badly by using
them to evaluate a probability cutoff... (?)

> > How does the topic of decision theory relate to some of the circumstances
> > described above? Is there a better way to do some of these things?
>
> See above re: expected loses/utilities.
>
> Good questions.
>
> Frank

Thanks for the feedback.

Cheers,

Dylan


> > Cheers,
> >
> > Dylan
> >
> >> [hidden email] wrote:
> >>> http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=
> >>>no rmal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs
> >>>
> >>> there is a lot of help try help.search("ROC curve") gave
> >>> Help files with alias or concept or title matching 'ROC curve' using
> >>> fuzzy matching:
> >>>
> >>>
> >>>
> >>> granulo(ade4)                             Granulometric Curves
> >>> plot.roc(analogue)                        Plot ROC curves and
> >>> associated diagnostics
> >>> roc(analogue)                             ROC curve analysis
> >>> colAUC(caTools)                           Column-wise Area Under ROC
> >>> Curve (AUC)
> >>> DProc(DPpackage)                          Semiparametric Bayesian ROC
> >>> curve analysis
> >>> cv.enet(elasticnet)                       Computes K-fold
> >>> cross-validated error curve for elastic net
> >>> ROC(Epi)                                  Function to compute and draw
> >>> ROC-curves.
> >>> lroc(epicalc)                             ROC curve
> >>> cv.lars(lars)                             Computes K-fold
> >>> cross-validated error curve for lars
> >>> roc.demo(TeachingDemos)                   Demonstrate ROC curves by
> >>> interactively building one
> >>>
> >>> HTH
> >>> see the help and examples those will suffice
> >>>
> >>> Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Gaurav Yadav
> >>> +++++++++++
> >>> Assistant Manager, CCIL, Mumbai (India)
> >>> Mob: +919821286118 Email: [hidden email]
> >>> Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
> >>>
> >>>
> >>>
> >>> "Rithesh M. Mohan" <[hidden email]>
> >>> Sent by: [hidden email]
> >>> 07/26/2007 11:26 AM
> >>>
> >>> To
> >>> <[hidden email]>
> >>> cc
> >>>
> >>> Subject
> >>> [R] ROC curve in R
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I need to build ROC curve in R, can you please provide data steps /
> >>> code or guide me through it.
> >>>
> >>>
> >>>
> >>> Thanks and Regards
> >>>
> >>> Rithesh M Mohan
> >>>
> >>>
> >>>                  [[alternative HTML version deleted]]
> >>
> >> -
> >> Frank E Harrell Jr   Professor and Chair           School of Medicine
> >>                       Department of Biostatistics   Vanderbilt
> >> University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ROC curve in R

Frank Harrell
Dylan Beaudette wrote:

> On Thursday 26 July 2007 10:45, Frank E Harrell Jr wrote:
>> Dylan Beaudette wrote:
>>> On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
>>>> Note that even though the ROC curve as a whole is an interesting
>>>> 'statistic' (its area is a linear translation of the
>>>> Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
>>>> statistics), each individual point on it is an improper scoring rule,
>>>> i.e., a rule that is optimized by fitting an inappropriate model.  Using
>>>> curves to select cutoffs is a low-precision and arbitrary operation, and
>>>> the cutoffs do not replicate from study to study.  Probably the worst
>>>> problem with drawing an ROC curve is that it tempts analysts to try to
>>>> find cutoffs where none really exist, and it makes analysts ignore the
>>>> whole field of decision theory.
>>>>
>>>> Frank Harrell
>>> Frank,
>>>
>>> This thread has caught may attention for a couple reasons, possibly
>>> related to my novice-level experience.
>>>
>>> 1. in a logistic regression study, where i am predicting the probability
>>> of the response being 1 (for example) - there exists a continuum of
>>> probability values - and a finite number of {1,0} realities when i either
>>> look within the original data set, or with a new 'verification' data set.
>>> I understand that drawing a line through the probabilities returned from
>>> the logistic regression is a loss of information, but there are times
>>> when a 'hard' decision requiring prediction of {1,0} is required. I have
>>> found that the ROCR package (not necessarily the ROC Curve) can be useful
>>> in identifying the probability cutoff where accuracy is maximized. Is
>>> this an unreasonable way of using logistic regression as a predictor?
>
> Thanks for the detailed response Frank. My follow-up questions are below:
>
>> Logistic regression (with suitable attention to not assuming linearity
>> and to avoiding overfitting) is a great way to estimate P[Y=1].  Given
>> good predicted P[Y=1] and utilities (losses, costs) for incorrect
>> positive and negative decisions, an optimal decision is one that
>> optimizes expected utility.  The ROC curve does not play a direct role
>> in this regard.  
>
> Ok.
>
>> If per-subject utilities are not available, the analyst
>> may make various assumptions about utilities (including the unreasonable
>> but often used assumption that utilities do not vary over subjects) to
>> find a cutoff on P[Y=1].
>
> Can you elaborate on what exactly a "per-subject utility" is? In my case, I am
> trying to predict the occurance of specific soil features based on two
> predictor variables: 1 continuous, the other categorical.  Thus far my
> evaluation of how well this method works is based on how often I can
> correctly predict (a categorical) quality.

This could be called a per-unit utility in your case.  It is the
consequence of decisions at the point in which you decide Y=0 or Y=1.
If consequences are the same over all units, you just have to deal with
the single ratio of cost of false positive to cost of false negative.

One way to limit bad consequences is to not make any decision when the
predicted probability is in the middle, i.e., the decision is 'obtain
more data'.  That is a real advantage of having a continuous risk estimate.

>
>
>> A very nice feature of P[Y=1] is that error
>> probabilities are self-contained.  For example if P[Y=1] = .02 for a
>> single subject and you predict Y=0, the probability of an error is .02
>> by definition.  One doesn't need to compute an overall error probability
>> over the whole distribution of subjects' risks.  If the cost of a false
>> negative is C, the expected cost is .02*C in this example.
>
> Interesting. The hang-up that I am having is that I need to predict from
> {O,1}, as the direct users of this information are not currently interested
> in in raw probabilities. As far as I know, in order to predict a class from a
> probability I need use a cutoff... How else can I accomplish this without
> imposing a cutoff on the entire dataset? One thought, identify a cutoff for
> each level of the categorical predictor term in the model... (?)

You're right you have to ultimately use a cutoff (or better still,
educate the users about the meaning of probabilities and let them make
the decision without exposing the cutoff).  And see the comment
regarding gray zones above.

>
>>> 2. The ROC curve can be a helpful way of communicating false positives /
>>> false negatives to other users who are less familiar with the output and
>>> interpretation of logistic regression.
>> What is more useful than that is a rigorous calibration curve estimate
>> to demonstrate the faithfulness of predicted P[Y=1] and a histogram
>> showing the distribution of predicted P[Y=1]
>
> Ok. I can make that histogram - how would one go about making the 'rigorous
> calibration curve' ? Note that I have a training set, from which the model is
> built, and a smaller testing set for evaluation.

See the val.prob function in the Design package.  This assumes your test
samples and training samples are both large and are independent.
Otherwise data splitting is too noisy a method and you might consider
calibrate.lrm in Design, fitting all the data.

>
>
>> .  Models that put a lot of
>> predictions near 0 or 1 are the most discriminating.  Calibration curves
>> and risk distributions are easier to explain than ROC curves.
>
> By 'risk discrimination' do you mean said histogram ?

yes

>
>> Too often
>> a statistician will solve for a cutoff on P[Y=1], imposing her own
>> utility function without querying any subjects.
>
> in this case I have picked a cutoff that resulted in the smallest number of
> incorrectly classified observations , or highest kappa / tau statistics --
> the results were very close.

Proportion of incorrect classifications is an improper scoring rule that
tells you about the average performance of the method over all of the
units.  It is not that helpful for an individual unit, as all units may
have different predicted probabilities.  Because it's improper, you will
find examples where a powerful variable is added to a model and the
percent classified correctly decreases.

>
>
>>> 3. I have been using the area under the ROC Curve, kendall's tau, and
>>> cohen's kappa to evaluate the accuracy of a logistic regression based
>>> prediction, the last two statistics based on a some probability cutoff
>>> identified before hand.
>> ROC area (equiv. to Wilcoxon-Mann-Whitney and Somers' Dxy rank
>> correlation between pred. P[Y=1] and Y) is a measure of pure
>> discrimination, not a measure of accuracy per se.  Rank correlation
>> (concordance) measures do not require the use of cutoffs.
>
> Ok. Hopefully I am not abusing the kappa and tau statistics too badly by using
> them to evaluate a probability cutoff... (?)

Kappa, tau, Dxy, gamma, ROC area are all functions of the continuous
predicted risks and the observed Y=0,1.  They don't deal with cutoffs.

>
>>> How does the topic of decision theory relate to some of the circumstances
>>> described above? Is there a better way to do some of these things?
>> See above re: expected loses/utilities.

Decision theory helps you translate maximum current information (often
summarized in a predicted risk) and utilities/losses/costs to decisions.
  I'm looking for a great background article on this; some useful stuff
is in the Encyclopedia of Statistical Sciences but other people may find
some great references for us.

Frank

>>
>> Good questions.
>>
>> Frank
>
> Thanks for the feedback.
>
> Cheers,
>
> Dylan
>
>
>>> Cheers,
>>>
>>> Dylan
>>>
>>>> [hidden email] wrote:
>>>>> http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=
>>>>> no rmal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs
>>>>>
>>>>> there is a lot of help try help.search("ROC curve") gave
>>>>> Help files with alias or concept or title matching 'ROC curve' using
>>>>> fuzzy matching:
>>>>>
>>>>>
>>>>>
>>>>> granulo(ade4)                             Granulometric Curves
>>>>> plot.roc(analogue)                        Plot ROC curves and
>>>>> associated diagnostics
>>>>> roc(analogue)                             ROC curve analysis
>>>>> colAUC(caTools)                           Column-wise Area Under ROC
>>>>> Curve (AUC)
>>>>> DProc(DPpackage)                          Semiparametric Bayesian ROC
>>>>> curve analysis
>>>>> cv.enet(elasticnet)                       Computes K-fold
>>>>> cross-validated error curve for elastic net
>>>>> ROC(Epi)                                  Function to compute and draw
>>>>> ROC-curves.
>>>>> lroc(epicalc)                             ROC curve
>>>>> cv.lars(lars)                             Computes K-fold
>>>>> cross-validated error curve for lars
>>>>> roc.demo(TeachingDemos)                   Demonstrate ROC curves by
>>>>> interactively building one
>>>>>
>>>>> HTH
>>>>> see the help and examples those will suffice
>>>>>
>>>>> Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Gaurav Yadav
>>>>> +++++++++++
>>>>> Assistant Manager, CCIL, Mumbai (India)
>>>>> Mob: +919821286118 Email: [hidden email]
>>>>> Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
>>>>>
>>>>>
>>>>>
>>>>> "Rithesh M. Mohan" <[hidden email]>
>>>>> Sent by: [hidden email]
>>>>> 07/26/2007 11:26 AM
>>>>>
>>>>> To
>>>>> <[hidden email]>
>>>>> cc
>>>>>
>>>>> Subject
>>>>> [R] ROC curve in R
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I need to build ROC curve in R, can you please provide data steps /
>>>>> code or guide me through it.
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards
>>>>>
>>>>> Rithesh M Mohan
>>>>>
>>>>>
>>>>>                  [[alternative HTML version deleted]]
>>>> -
>>>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>>>                       Department of Biostatistics   Vanderbilt
>>>> University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Loading...