

Hello,
I working on a model to predict probabilities.
I don't really care about binary prediction accuracy.
I do really care about the accuracy of my probability predictions.
Frank was nice enough to point me to the val.prob function from the
Design library. It looks very promising for my needs.
I've put together some tests and run the val.prob analysis. It produces
some very informative graphs along with a bunch of performance measures.
Unfortunately, I'm not sure which measure, if any, is the "best" one.
I'm comparing hundreds of different models/parameter combinations/etc.
So Ideally I'd like a single value or two as the "performance measure"
for each one. That way I can pick the "best" model from all my
experiments.
As mentioned above, I'm mainly interested in the accuracy of my
probability predictions.
Does anyone have an opinion about which measure I should look at??
(I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
Thanks!!
N
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Noah Silverman wrote:
> Hello,
>
> I working on a model to predict probabilities.
>
> I don't really care about binary prediction accuracy.
>
> I do really care about the accuracy of my probability predictions.
>
> Frank was nice enough to point me to the val.prob function from the
> Design library. It looks very promising for my needs.
>
> I've put together some tests and run the val.prob analysis. It produces
> some very informative graphs along with a bunch of performance measures.
>
> Unfortunately, I'm not sure which measure, if any, is the "best" one.
> I'm comparing hundreds of different models/parameter combinations/etc.
> So Ideally I'd like a single value or two as the "performance measure"
> for each one. That way I can pick the "best" model from all my
> experiments.
>
> As mentioned above, I'm mainly interested in the accuracy of my
> probability predictions.
>
> Does anyone have an opinion about which measure I should look at??
> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>
> Thanks!!
>
> N
It all depends on the goal, i.e., the relative value you place on
absolute accuracy vs. discrimination ability. The Brier score combines
both and other than interpretability has many advantages.
Frank
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


Thanks for the suggestion.
You explained that Briar combines both accuracy and discrimination
ability. If I understand you right, that is in relation to binary
classification.
I'm not concerned with binary classification, but the accuracy of the
probability predictions.
Is there some kind of score that measures just the accuracy?
Thanks!
N
On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
> Noah Silverman wrote:
>> Hello,
>>
>> I working on a model to predict probabilities.
>>
>> I don't really care about binary prediction accuracy.
>>
>> I do really care about the accuracy of my probability predictions.
>>
>> Frank was nice enough to point me to the val.prob function from the
>> Design library. It looks very promising for my needs.
>>
>> I've put together some tests and run the val.prob analysis. It
>> produces some very informative graphs along with a bunch of
>> performance measures.
>>
>> Unfortunately, I'm not sure which measure, if any, is the "best"
>> one. I'm comparing hundreds of different models/parameter
>> combinations/etc. So Ideally I'd like a single value or two as the
>> "performance measure" for each one. That way I can pick the "best"
>> model from all my experiments.
>>
>> As mentioned above, I'm mainly interested in the accuracy of my
>> probability predictions.
>>
>> Does anyone have an opinion about which measure I should look at??
>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>
>> Thanks!!
>>
>> N
>
> It all depends on the goal, i.e., the relative value you place on
> absolute accuracy vs. discrimination ability. The Brier score combines
> both and other than interpretability has many advantages.
>
> Frank
>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide
>> http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Noah Silverman wrote:
> Thanks for the suggestion.
>
> You explained that Briar combines both accuracy and discrimination
> ability. If I understand you right, that is in relation to binary
> classification.
>
> I'm not concerned with binary classification, but the accuracy of the
> probability predictions.
>
> Is there some kind of score that measures just the accuracy?
>
> Thanks!
>
> N
The Brier score has nothing to do with classification. It is a
probability accuracy score.
Frank
>
> On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
>> Noah Silverman wrote:
>>> Hello,
>>>
>>> I working on a model to predict probabilities.
>>>
>>> I don't really care about binary prediction accuracy.
>>>
>>> I do really care about the accuracy of my probability predictions.
>>>
>>> Frank was nice enough to point me to the val.prob function from the
>>> Design library. It looks very promising for my needs.
>>>
>>> I've put together some tests and run the val.prob analysis. It
>>> produces some very informative graphs along with a bunch of
>>> performance measures.
>>>
>>> Unfortunately, I'm not sure which measure, if any, is the "best"
>>> one. I'm comparing hundreds of different models/parameter
>>> combinations/etc. So Ideally I'd like a single value or two as the
>>> "performance measure" for each one. That way I can pick the "best"
>>> model from all my experiments.
>>>
>>> As mentioned above, I'm mainly interested in the accuracy of my
>>> probability predictions.
>>>
>>> Does anyone have an opinion about which measure I should look at??
>>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>>
>>> Thanks!!
>>>
>>> N
>>
>> It all depends on the goal, i.e., the relative value you place on
>> absolute accuracy vs. discrimination ability. The Brier score combines
>> both and other than interpretability has many advantages.
>>
>> Frank
>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>> PLEASE do read the posting guide
>>> http://www.Rproject.org/postingguide.html>>> and provide commented, minimal, selfcontained, reproducible code.
>>>
>>
>>

Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


Frank,
That makes sense.
I just had a look at the actual algorithm calculating the Briar score.
One thing that confuses me is how the score is calculated.
If I understand the code correctly, it is just: sum((p  y)^2)/n
If I have an example with a label of 1 and a probability prediction of
.4, it is (.4  1)^2
(I know it is the average of these value across all the examples)
Wouldn't it make more sense to stratify the probabilities and then check
the accuracy of each level.
i.e. For predicted probabilities of .10 to .20 the data was actually
labeled true .18 percent of the time. mean(label)
On 8/19/09 11:51 AM, Frank E Harrell Jr wrote:
> Noah Silverman wrote:
>> Thanks for the suggestion.
>>
>> You explained that Briar combines both accuracy and discrimination
>> ability. If I understand you right, that is in relation to binary
>> classification.
>>
>> I'm not concerned with binary classification, but the accuracy of the
>> probability predictions.
>>
>> Is there some kind of score that measures just the accuracy?
>>
>> Thanks!
>>
>> N
>
> The Brier score has nothing to do with classification. It is a
> probability accuracy score.
>
> Frank
>
>>
>> On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
>>> Noah Silverman wrote:
>>>> Hello,
>>>>
>>>> I working on a model to predict probabilities.
>>>>
>>>> I don't really care about binary prediction accuracy.
>>>>
>>>> I do really care about the accuracy of my probability predictions.
>>>>
>>>> Frank was nice enough to point me to the val.prob function from the
>>>> Design library. It looks very promising for my needs.
>>>>
>>>> I've put together some tests and run the val.prob analysis. It
>>>> produces some very informative graphs along with a bunch of
>>>> performance measures.
>>>>
>>>> Unfortunately, I'm not sure which measure, if any, is the "best"
>>>> one. I'm comparing hundreds of different models/parameter
>>>> combinations/etc. So Ideally I'd like a single value or two as the
>>>> "performance measure" for each one. That way I can pick the
>>>> "best" model from all my experiments.
>>>>
>>>> As mentioned above, I'm mainly interested in the accuracy of my
>>>> probability predictions.
>>>>
>>>> Does anyone have an opinion about which measure I should look at??
>>>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>>>
>>>> Thanks!!
>>>>
>>>> N
>>>
>>> It all depends on the goal, i.e., the relative value you place on
>>> absolute accuracy vs. discrimination ability. The Brier score
>>> combines both and other than interpretability has many advantages.
>>>
>>> Frank
>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>> PLEASE do read the posting guide
>>>> http://www.Rproject.org/postingguide.html>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>
>>>
>>>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Noah Silverman wrote:
> Frank,
>
> That makes sense.
>
> I just had a look at the actual algorithm calculating the Briar score.
>
> One thing that confuses me is how the score is calculated.
>
>
>
> If I understand the code correctly, it is just: sum((p  y)^2)/n
>
> If I have an example with a label of 1 and a probability prediction of
> .4, it is (.4  1)^2
> (I know it is the average of these value across all the examples)
Yes and I seem to remember the original score is 1 minus that.
>
> Wouldn't it make more sense to stratify the probabilities and then check
> the accuracy of each level.
The stratification will bring a great deal of noise into the problem.
Better: loess calibration curves or decomposition of the Brier score
into discrimination and calibration components (which is not in the
software).
Frank
>
> i.e. For predicted probabilities of .10 to .20 the data was actually
> labeled true .18 percent of the time. mean(label)
>
>
>
>
>
>
> On 8/19/09 11:51 AM, Frank E Harrell Jr wrote:
>> Noah Silverman wrote:
>>> Thanks for the suggestion.
>>>
>>> You explained that Briar combines both accuracy and discrimination
>>> ability. If I understand you right, that is in relation to binary
>>> classification.
>>>
>>> I'm not concerned with binary classification, but the accuracy of the
>>> probability predictions.
>>>
>>> Is there some kind of score that measures just the accuracy?
>>>
>>> Thanks!
>>>
>>> N
>>
>> The Brier score has nothing to do with classification. It is a
>> probability accuracy score.
>>
>> Frank
>>
>>>
>>> On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
>>>> Noah Silverman wrote:
>>>>> Hello,
>>>>>
>>>>> I working on a model to predict probabilities.
>>>>>
>>>>> I don't really care about binary prediction accuracy.
>>>>>
>>>>> I do really care about the accuracy of my probability predictions.
>>>>>
>>>>> Frank was nice enough to point me to the val.prob function from the
>>>>> Design library. It looks very promising for my needs.
>>>>>
>>>>> I've put together some tests and run the val.prob analysis. It
>>>>> produces some very informative graphs along with a bunch of
>>>>> performance measures.
>>>>>
>>>>> Unfortunately, I'm not sure which measure, if any, is the "best"
>>>>> one. I'm comparing hundreds of different models/parameter
>>>>> combinations/etc. So Ideally I'd like a single value or two as the
>>>>> "performance measure" for each one. That way I can pick the
>>>>> "best" model from all my experiments.
>>>>>
>>>>> As mentioned above, I'm mainly interested in the accuracy of my
>>>>> probability predictions.
>>>>>
>>>>> Does anyone have an opinion about which measure I should look at??
>>>>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>>>>
>>>>> Thanks!!
>>>>>
>>>>> N
>>>>
>>>> It all depends on the goal, i.e., the relative value you place on
>>>> absolute accuracy vs. discrimination ability. The Brier score
>>>> combines both and other than interpretability has many advantages.
>>>>
>>>> Frank
>>>>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>> PLEASE do read the posting guide
>>>>> http://www.Rproject.org/postingguide.html>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>>
>>>>
>>>>
>>
>>

Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


Frank,
Visually, the loess curve really helps me see how the model is doing.
That leads me to two more questions:
1) Can I somehow summarize the loess curve into a single value? (If I'm
comparing a few hundred models/parameters it would be nice to have a
single "performance" value to use.)
2) Is there a way to focus in on a segment of the loess curve. With the
binning setup, I can quickly see that my model is very accurate for a
specific range of probabilities and then loses accuracy. For example,
with binning, my model is very accurate with probabilities from .1 to
.5. Above .5 and it drops off significantly. This is actually very
useful for my application as I know in the real world, I can reliably
count on predictions below .5 and can not count on predictions above .5
Thanks for the continued help!
N
On 8/19/09 12:11 PM, Frank E Harrell Jr wrote:
> Noah Silverman wrote:
>> Frank,
>>
>> That makes sense.
>>
>> I just had a look at the actual algorithm calculating the Briar score.
>> One thing that confuses me is how the score is calculated.
>>
>>
>>
>> If I understand the code correctly, it is just: sum((p  y)^2)/n
>>
>> If I have an example with a label of 1 and a probability prediction
>> of .4, it is (.4  1)^2 (I know it is the average of these value
>> across all the examples)
>
> Yes and I seem to remember the original score is 1 minus that.
>
>>
>> Wouldn't it make more sense to stratify the probabilities and then
>> check the accuracy of each level.
>
> The stratification will bring a great deal of noise into the problem.
> Better: loess calibration curves or decomposition of the Brier score
> into discrimination and calibration components (which is not in the
> software).
>
> Frank
>
>>
>> i.e. For predicted probabilities of .10 to .20 the data was actually
>> labeled true .18 percent of the time. mean(label)
>>
>>
>>
>>
>>
>>
>> On 8/19/09 11:51 AM, Frank E Harrell Jr wrote:
>>> Noah Silverman wrote:
>>>> Thanks for the suggestion.
>>>>
>>>> You explained that Briar combines both accuracy and discrimination
>>>> ability. If I understand you right, that is in relation to binary
>>>> classification.
>>>>
>>>> I'm not concerned with binary classification, but the accuracy of
>>>> the probability predictions.
>>>>
>>>> Is there some kind of score that measures just the accuracy?
>>>>
>>>> Thanks!
>>>>
>>>> N
>>>
>>> The Brier score has nothing to do with classification. It is a
>>> probability accuracy score.
>>>
>>> Frank
>>>
>>>>
>>>> On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
>>>>> Noah Silverman wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I working on a model to predict probabilities.
>>>>>>
>>>>>> I don't really care about binary prediction accuracy.
>>>>>>
>>>>>> I do really care about the accuracy of my probability predictions.
>>>>>>
>>>>>> Frank was nice enough to point me to the val.prob function from
>>>>>> the Design library. It looks very promising for my needs.
>>>>>>
>>>>>> I've put together some tests and run the val.prob analysis. It
>>>>>> produces some very informative graphs along with a bunch of
>>>>>> performance measures.
>>>>>>
>>>>>> Unfortunately, I'm not sure which measure, if any, is the "best"
>>>>>> one. I'm comparing hundreds of different models/parameter
>>>>>> combinations/etc. So Ideally I'd like a single value or two as
>>>>>> the "performance measure" for each one. That way I can pick the
>>>>>> "best" model from all my experiments.
>>>>>>
>>>>>> As mentioned above, I'm mainly interested in the accuracy of my
>>>>>> probability predictions.
>>>>>>
>>>>>> Does anyone have an opinion about which measure I should look at??
>>>>>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>>>>>
>>>>>> Thanks!!
>>>>>>
>>>>>> N
>>>>>
>>>>> It all depends on the goal, i.e., the relative value you place on
>>>>> absolute accuracy vs. discrimination ability. The Brier score
>>>>> combines both and other than interpretability has many advantages.
>>>>>
>>>>> Frank
>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>> PLEASE do read the posting guide
>>>>>> http://www.Rproject.org/postingguide.html>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Noah Silverman wrote:
> Frank,
>
> Visually, the loess curve really helps me see how the model is doing.
>
> That leads me to two more questions:
>
> 1) Can I somehow summarize the loess curve into a single value? (If I'm
> comparing a few hundred models/parameters it would be nice to have a
> single "performance" value to use.)
Two measures are computed by the function: mean absolute error and 0.9
quantile of absolute error.
>
> 2) Is there a way to focus in on a segment of the loess curve. With the
> binning setup, I can quickly see that my model is very accurate for a
> specific range of probabilities and then loses accuracy. For example,
> with binning, my model is very accurate with probabilities from .1 to
> .5. Above .5 and it drops off significantly. This is actually very
That may be an artifact of binning. loess us much better for that.
Signing off for now,
Frank
> useful for my application as I know in the real world, I can reliably
> count on predictions below .5 and can not count on predictions above .5
>
> Thanks for the continued help!
>
> N
>
>
> On 8/19/09 12:11 PM, Frank E Harrell Jr wrote:
>> Noah Silverman wrote:
>>> Frank,
>>>
>>> That makes sense.
>>>
>>> I just had a look at the actual algorithm calculating the Briar score.
>>> One thing that confuses me is how the score is calculated.
>>>
>>>
>>>
>>> If I understand the code correctly, it is just: sum((p  y)^2)/n
>>>
>>> If I have an example with a label of 1 and a probability prediction
>>> of .4, it is (.4  1)^2 (I know it is the average of these value
>>> across all the examples)
>>
>> Yes and I seem to remember the original score is 1 minus that.
>>
>>>
>>> Wouldn't it make more sense to stratify the probabilities and then
>>> check the accuracy of each level.
>>
>> The stratification will bring a great deal of noise into the problem.
>> Better: loess calibration curves or decomposition of the Brier score
>> into discrimination and calibration components (which is not in the
>> software).
>>
>> Frank
>>
>>>
>>> i.e. For predicted probabilities of .10 to .20 the data was actually
>>> labeled true .18 percent of the time. mean(label)
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 8/19/09 11:51 AM, Frank E Harrell Jr wrote:
>>>> Noah Silverman wrote:
>>>>> Thanks for the suggestion.
>>>>>
>>>>> You explained that Briar combines both accuracy and discrimination
>>>>> ability. If I understand you right, that is in relation to binary
>>>>> classification.
>>>>>
>>>>> I'm not concerned with binary classification, but the accuracy of
>>>>> the probability predictions.
>>>>>
>>>>> Is there some kind of score that measures just the accuracy?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> N
>>>>
>>>> The Brier score has nothing to do with classification. It is a
>>>> probability accuracy score.
>>>>
>>>> Frank
>>>>
>>>>>
>>>>> On 8/19/09 10:42 AM, Frank E Harrell Jr wrote:
>>>>>> Noah Silverman wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I working on a model to predict probabilities.
>>>>>>>
>>>>>>> I don't really care about binary prediction accuracy.
>>>>>>>
>>>>>>> I do really care about the accuracy of my probability predictions.
>>>>>>>
>>>>>>> Frank was nice enough to point me to the val.prob function from
>>>>>>> the Design library. It looks very promising for my needs.
>>>>>>>
>>>>>>> I've put together some tests and run the val.prob analysis. It
>>>>>>> produces some very informative graphs along with a bunch of
>>>>>>> performance measures.
>>>>>>>
>>>>>>> Unfortunately, I'm not sure which measure, if any, is the "best"
>>>>>>> one. I'm comparing hundreds of different models/parameter
>>>>>>> combinations/etc. So Ideally I'd like a single value or two as
>>>>>>> the "performance measure" for each one. That way I can pick the
>>>>>>> "best" model from all my experiments.
>>>>>>>
>>>>>>> As mentioned above, I'm mainly interested in the accuracy of my
>>>>>>> probability predictions.
>>>>>>>
>>>>>>> Does anyone have an opinion about which measure I should look at??
>>>>>>> (I see Dxy, C, R2, D, U, Briar, Emax, Eavg, etc.)
>>>>>>>
>>>>>>> Thanks!!
>>>>>>>
>>>>>>> N
>>>>>>
>>>>>> It all depends on the goal, i.e., the relative value you place on
>>>>>> absolute accuracy vs. discrimination ability. The Brier score
>>>>>> combines both and other than interpretability has many advantages.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.Rproject.org/postingguide.html>>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University

