Difference between R and SAS in Corcordance index in ordinal logistic regression

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Difference between R and SAS in Corcordance index in ordinal logistic regression

blackscorpio81
This post has NOT been accepted by the mailing list yet.
This post was updated on .
Dear R users,

Please allow to me ask for your help.
 I am currently using Frank Harrell Jr package "rms" to model ordinal logistic regression with proportional odds. In order to assess model predictive ability, C concordance index is displayed and equals to 0.963.

This is the code I used with the data attached data.csv :

>require(rms)
>a<-read.csv2("/data.csv",row.names = 1,na.strings = c(""," "),dec=".")
>lrm(DA~SJ+TJ,data=a)

Logistic Regression Model

lrm(formula = DA~SJ+TJ, data = a)

Frequencies of Responses

 1  2  3  4
 6 13  9  4

                                              Model Likelihood          Discrimination                  Rank Discrim.    
                                             Ratio Test                        Indexes                               Indexes      
Obs            32                      LR chi2      53.14             R2       0.875                      C       0.963    
max |deriv| 6e-06             d.f.             2                    g              8.690                Dxy     0.925    
                                             Pr(> chi2) <0.0001         gr    5942.469                    gamma   0.960    
                                                                                      gp       0.486                      tau-a   0.673    
                                                                                      Brier    0.022                    

                        Coef              S.E.        Wald  Z     Pr(>|Z|)
y>=2             -0.6161     0.6715        -0.92           0.3589  
y>=3             -6.5949     2.3750        -2.78          0.0055  
y>=4        -16.2358        5.3737         -3.02         0.0025  
SJ                 1.4341      0.5180          2.77         0.0056  
TJ                  0.5312      0.2483         2.14          0.0324

I wanted to compare the results with SAS. I found the same slopes and intercept with opposite signs, which is normal since R models the probabilities P(Y>=k|X) whereas SAS models the probabilities P(Y<=k|X)  (see pdf attached, page 2 , table "Association des probabilités prédites et des réponses observées").SAS_Report_-_Logistic_Regression.pdf

I chose the order for levels.

I controlled that the corresponding probabilities P(Y=k|X)  are the same with both softwares. But I can't understand why in SAS the C index drops from 0.963 down to 0.332.

I read a lot of things about this and it seems to me that both softwares use slightly different technique to compute the C index ; it is nevertheless surprising to me to observe such a shift in the results.

Does anyone have a clue on this ?
Thank you very much for you help
Blackscorpio
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

Frank Harrell
lrm does some binning to make the calculations faster.  The exact calculation is obtained by running

f <- lrm(...)
rcorr.cens(predict(f), DA), which results in:

       C Index            Dxy           S.D.              n        missing
    0.96814404     0.93628809     0.03808336    32.00000000     0.00000000
    uncensored Relevant Pairs     Concordant      Uncertain
   32.00000000   722.00000000   699.00000000     0.00000000

I.e., C=.968 instead of .963.  But this is even farther away than the value from SAS you reported.

If you don't believe the rcorr.cens result, create a tiny example and do the calculations by hand.
Frank

blackscorpio81 wrote
Dear R users,

Please allow to me ask for your help.
 I am currently using Frank Harrell Jr package "rms" to model ordinal logistic regression with proportional odds. In order to assess model predictive ability, C concordance index is displayed and equals to 0.963.

This is the code I used with the data attached data.csv :

>require(rms)
>a<-read.csv2("/data.csv",row.names = 1,na.strings = c(""," "),dec=".")
>lrm(DA~SJ+TJ,data=a)

Logistic Regression Model

lrm(formula = DA~SJ+TJ, data = a)

Frequencies of Responses

 1  2  3  4
 6 13  9  4

                                              Model Likelihood          Discrimination                  Rank Discrim.    
                                             Ratio Test                        Indexes                               Indexes      
Obs            32                      LR chi2      53.14             R2       0.875                      C       0.963    
max |deriv| 6e-06             d.f.             2                    g              8.690                Dxy     0.925    
                                             Pr(> chi2) <0.0001         gr    5942.469                    gamma   0.960    
                                                                                      gp       0.486                      tau-a   0.673    
                                                                                      Brier    0.022                    

                        Coef              S.E.        Wald  Z     Pr(>|Z|)
y>=2             -0.6161     0.6715        -0.92           0.3589  
y>=3             -6.5949     2.3750        -2.78          0.0055  
y>=4        -16.2358        5.3737         -3.02         0.0025  
SJ                 1.4341      0.5180          2.77         0.0056  
TJ                  0.5312      0.2483         2.14          0.0324

I wanted to compare the results with SAS. I found the same slopes and intercept with opposite signs, which is normal since R models the probabilities P(Y>=k|X) whereas SAS models the probabilities P(Y<=k|X)  (see pdf attached, page 2 , table "Association des probabilités prédites et des réponses observées").SAS_Report_-_Logistic_Regression.pdf

I chose the order for levels.

I controlled that the corresponding probabilities P(Y=k|X)  are the same with both softwares. But I can't understand why in SAS the C index drops from 0.963 down to 0.332.

I read a lot of things about this and it seems to me that both softwares use slightly different technique to compute the C index ; it is nevertheless surprising to me to observe such a shift in the results.

Does anyone have a clue on this ?
Thank you very much for you help
Blackscorpio
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

blackscorpio

Dear Dr Harrell,
Thank you very much for your answer. Actually I also tried to found the C index by hand on these data using the mean probabilities and I found 0.968, as you just showed.
I understand now why I had a slight difference with the outpout of lrm. I am thus convinced that this result is correct.

I read on the SAS help that the procedure logistic also proceed to some binning (BINWIDTH option) :

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm

But I cannot explain why the difference between the two softwares is that huge, especially since the class probabilities are the same.

Do you think it could be due to the fact that mean probabilities are computed differently ?

Thank for your help and best regards,
OC


> Date: Thu, 24 Jan 2013 05:28:13 -0800
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
>
> lrm does some binning to make the calculations faster.  The exact calculation
> is obtained by running
>
> f <- lrm(...)
> rcorr.cens(predict(f), DA), which results in:
>
>        C Index            Dxy           S.D.              n        missing
>     0.96814404     0.93628809     0.03808336    32.00000000     0.00000000
>     uncensored Relevant Pairs     Concordant      Uncertain
>    32.00000000   722.00000000   699.00000000     0.00000000
>
> I.e., C=.968 instead of .963.  But this is even farther away than the value
> from SAS you reported.
>
> If you don't believe the rcorr.cens result, create a tiny example and do the
> calculations by hand.
> Frank
>
>
> blackscorpio81 wrote
> > Dear R users,
> >
> > Please allow to me ask for your help.
> >  I am currently using Frank Harrell Jr package "rms" to model ordinal
> > logistic regression with proportional odds. In order to assess model
> > predictive ability, C concordance index is displayed and equals to 0.963.
> >
> > This is the code I used with the data attached
> > data.csv <http://r.789695.n4.nabble.com/file/n4656409/data.csv>  
> >  :
> >
> >>require(rms)
> >>a<-read.csv2("/data.csv",row.names = 1,na.strings = c(""," "),dec=".")
> >>lrm(DA~SJ+TJ,data=a)
> >
> > Logistic Regression Model
> >
> > lrm(formula = DA~SJ+TJ, data = a)
> >
> > Frequencies of Responses
> >
> >  1  2  3  4
> >  6 13  9  4
> >
> >                                               Model Likelihood        
> > Discrimination                  Rank Discrim.    
> >                                              Ratio Test                      
> > Indexes                               Indexes      
> > Obs            32                      LR chi2      53.14             R2      
> > 0.875                      C       0.963    
> > max |deriv| 6e-06             d.f.             2                    g            
> > 8.690                Dxy     0.925    
> >                                              Pr(> chi2) <0.0001         gr  
> > 5942.469                    gamma   0.960    
> >                                                                                      
> > gp       0.486                      tau-a   0.673    
> >                                                                                      
> > Brier    0.022                    
> >
> >                         Coef              S.E.        Wald  Z     Pr(>|Z|)
> > y>=2             -0.6161     0.6715        -0.92           0.3589  
> > y>=3             -6.5949     2.3750        -2.78          0.0055  
> > y>=4        -16.2358        5.3737         -3.02         0.0025  
> > SJ                 1.4341      0.5180          2.77         0.0056  
> > TJ                  0.5312      0.2483         2.14          0.0324
> >
> > I wanted to compare the results with SAS. I found the same slopes and
> > intercept with opposite signs, which is normal since R models the
> > probabilities P(Y>=k|X) whereas SAS models the probabilities P(Y<=k|X)
> > (see pdf attached, page 2 , table "Association des probabilités prédites
> > et des réponses observées").
> > SAS_Report_-_Logistic_Regression.pdf
> > <http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf>  
> >
> > I chose the order for levels.
> >
> > I controlled that the corresponding probabilities P(Y=k|X)  are the same
> > with both softwares. But I can't understand why in SAS the C index drops
> > from 0.963 down to 0.332.
> >
> > I read a lot of things about this and it seems to me that both softwares
> > use slightly different technique to compute the C index ; it is
> > nevertheless surprising to me to observe such a shift in the results.
> >
> > Does anyone have a clue on this ?
> > Thank you very much for you help
> > Blackscorpio
>
>
>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Difference-between-R-and-SAS-in-Corcordance-index-in-ordinal-logistic-regression-tp4656409p4656508.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
     
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

Frank Harrell
This post has NOT been accepted by the mailing list yet.
Please define 'mean probabilities'.

To compute the C-index or Dxy you need anything that is monotonically related to the prediction of interest, including the linear combination of covariates ignoring all intercepts.   In other words you don't need to go to the trouble of computing probabilities unless you are binning, as the binning is usually done on a controllable 0-1 scale.   When I bin I just choose the middle intercept, I seem to recall.  Also try running SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer for C.  [I wrote the original algorithm SAS uses for this in the old SAS PROC LOGIST.  Binning was just for speed.]

You might also re-run SAS after negating the response variable.
Frank
blackscorpio wrote
Dear Dr Harrell,
Thank you very much for your answer. Actually I also tried to found the C index by hand on these data using the mean probabilities and I found 0.968, as you just showed.
I understand now why I had a slight difference with the outpout of lrm. I am thus convinced that this result is correct.

I read on the SAS help that the procedure logistic also proceed to some binning (BINWIDTH option) :

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm

But I cannot explain why the difference between the two softwares is that huge, especially since the class probabilities are the same.

Do you think it could be due to the fact that mean probabilities are computed differently ?

Thank for your help and best regards,
OC


> Date: Thu, 24 Jan 2013 05:28:13 -0800
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
>
> lrm does some binning to make the calculations faster.  The exact calculation
> is obtained by running
>
> f <- lrm(...)
> rcorr.cens(predict(f), DA), which results in:
>
>        C Index            Dxy           S.D.              n        missing
>     0.96814404     0.93628809     0.03808336    32.00000000     0.00000000
>     uncensored Relevant Pairs     Concordant      Uncertain
>    32.00000000   722.00000000   699.00000000     0.00000000
>
> I.e., C=.968 instead of .963.  But this is even farther away than the value
> from SAS you reported.
>
> If you don't believe the rcorr.cens result, create a tiny example and do the
> calculations by hand.
> Frank
>
>
> blackscorpio81 wrote
> > Dear R users,
> >
> > Please allow to me ask for your help.
> >  I am currently using Frank Harrell Jr package "rms" to model ordinal
> > logistic regression with proportional odds. In order to assess model
> > predictive ability, C concordance index is displayed and equals to 0.963.
> >
> > This is the code I used with the data attached
> > data.csv <http://r.789695.n4.nabble.com/file/n4656409/data.csv> 
> >  :
> >
> >>require(rms)
> >>a<-read.csv2("/data.csv",row.names = 1,na.strings = c(""," "),dec=".")
> >>lrm(DA~SJ+TJ,data=a)
> >
> > Logistic Regression Model
> >
> > lrm(formula = DA~SJ+TJ, data = a)
> >
> > Frequencies of Responses
> >
> >  1  2  3  4
> >  6 13  9  4
> >
> >                                               Model Likelihood        
> > Discrimination                  Rank Discrim.    
> >                                              Ratio Test                      
> > Indexes                               Indexes      
> > Obs            32                      LR chi2      53.14             R2      
> > 0.875                      C       0.963    
> > max |deriv| 6e-06             d.f.             2                    g            
> > 8.690                Dxy     0.925    
> >                                              Pr(> chi2) <0.0001         gr  
> > 5942.469                    gamma   0.960    
> >                                                                                      
> > gp       0.486                      tau-a   0.673    
> >                                                                                      
> > Brier    0.022                    
> >
> >                         Coef              S.E.        Wald  Z     Pr(>|Z|)
> > y>=2             -0.6161     0.6715        -0.92           0.3589  
> > y>=3             -6.5949     2.3750        -2.78          0.0055  
> > y>=4        -16.2358        5.3737         -3.02         0.0025  
> > SJ                 1.4341      0.5180          2.77         0.0056  
> > TJ                  0.5312      0.2483         2.14          0.0324
> >
> > I wanted to compare the results with SAS. I found the same slopes and
> > intercept with opposite signs, which is normal since R models the
> > probabilities P(Y>=k|X) whereas SAS models the probabilities P(Y<=k|X)
> > (see pdf attached, page 2 , table "Association des probabilités prédites
> > et des réponses observées").
> > SAS_Report_-_Logistic_Regression.pdf
> > <http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf> 
> >
> > I chose the order for levels.
> >
> > I controlled that the corresponding probabilities P(Y=k|X)  are the same
> > with both softwares. But I can't understand why in SAS the C index drops
> > from 0.963 down to 0.332.
> >
> > I read a lot of things about this and it seems to me that both softwares
> > use slightly different technique to compute the C index ; it is
> > nevertheless surprising to me to observe such a shift in the results.
> >
> > Does anyone have a clue on this ?
> > Thank you very much for you help
> > Blackscorpio
>
>
>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Difference-between-R-and-SAS-in-Corcordance-index-in-ordinal-logistic-regression-tp4656409p4656508.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
     
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

Frank Harrell
In reply to this post by blackscorpio81
Please define 'mean probabilities'.

To compute the C-index or Dxy you need anything that is monotonically
related to the prediction of interest, including the linear combination
of covariates ignoring all intercepts.   In other words you don't need
to go to the trouble of computing probabilities unless you are binning,
as the binning is usually done on a controllable 0-1 scale.   When I bin
I just choose the middle intercept, I seem to recall.  Also try running
SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer
for C.  [I wrote the original algorithm SAS uses for this in the old SAS
PROC LOGIST.  Binning was just for speed.]

You might also re-run SAS after negating the response variable.
Frank

blackscorpio wrote

> Dear Dr Harrell,
> Thank you very much for your answer. Actually I also tried to found the C
> index by hand on these data using the mean probabilities and I found
> 0.968, as you just showed.
> I understand now why I had a slight difference with the outpout of lrm. I
> am thus convinced that this result is correct.
>
> I read on the SAS help that the procedure logistic also proceed to some
> binning (BINWIDTH option) :
>
> http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm
>
> But I cannot explain why the difference between the two softwares is that
> huge, especially since the class probabilities are the same.
>
> Do you think it could be due to the fact that mean probabilities are
> computed differently ?
>
> Thank for your help and best regards,
> OC
>
>
>> Date: Thu, 24 Jan 2013 05:28:13 -0800
>> From:

> f.harrell@

>> To:

> r-help@

>> Subject: Re: [R] Difference between R and SAS in Corcordance index in
>> ordinal logistic regression
>>
>> lrm does some binning to make the calculations faster.  The exact
>> calculation
>> is obtained by running
>>
>> f <- lrm(...)
>> rcorr.cens(predict(f), DA), which results in:
>>
>>        C Index            Dxy           S.D.              n
>> missing
>>     0.96814404     0.93628809     0.03808336    32.00000000
>> 0.00000000
>>     uncensored Relevant Pairs     Concordant      Uncertain
>>    32.00000000   722.00000000   699.00000000     0.00000000
>>
>> I.e., C=68 instead of .963.  But this is even farther away than the
>> value
>> from SAS you reported.
>>
>> If you don't believe the rcorr.cens result, create a tiny example and do
>> the
>> calculations by hand.
>> Frank
>>
>>
>> blackscorpio81 wrote
>> > Dear R users,
>> >
>> > Please allow to me ask for your help.
>> >  I am currently using Frank Harrell Jr package "rms" to model ordinal
>> > logistic regression with proportional odds. In order to assess model
>> > predictive ability, C concordance index is displayed and equals to
>> 0.963.
>> >
>> > This is the code I used with the data attached
>> > data.csv &lt;http://r.789695.n4.nabble.com/file/n4656409/data.csv&gt;
>> >  :
>> >
>> >>require(rms)
>> >>a<-read.csv2("/data.csv",row.names =,na.strings = c(""," "),dec=".")
>> >>lrm(DA~SJ+TJ,data=
>> >
>> > Logistic Regression Model
>> >
>> > lrm(formula =A~SJ+TJ, data = a)
>> >
>> > Frequencies of Responses
>> >
>> >  1  2  3  4
>> >  6 13  9  4
>> >
>> >                                               Model Likelihood
>> > Discrimination                  Rank Discrim.
>> >                                              Ratio Test
>> > Indexes                               Indexes
>> > Obs            32                      LR chi2      53.14
>> R2
>> > 0.875                      C       0.963
>> > max |deriv| 6e-06             d.f.             2                    g
>> > 8.690                Dxy     0.925
>> >                                              Pr(> chi2) <0.0001
>> gr
>> > 5942.469                    gamma   0.960
>> >
>> > gp       0.486                      tau-a   0.673
>> >
>> > Brier    0.022
>> >
>> >                         Coef              S.E.        Wald  Z
>> Pr(>|Z|)
>> > y>=            -0.6161     0.6715        -0.92           0.3589
>> > y>=            -6.5949     2.3750        -2.78          0.0055
>> > y>=       -16.2358        5.3737         -3.02         0.0025
>> > SJ                 1.4341      0.5180          2.77         0.0056
>> > TJ                  0.5312      0.2483         2.14          0.0324
>> >
>> > I wanted to compare the results with SAS. I found the same slopes and
>> > intercept with opposite signs, which is normal since R models the
>> > probabilities P(Y>=X) whereas SAS models the probabilities P(Y<=k|X)
>> > (see pdf attached, page 2 , table "Association des probabilités
>> prédites
>> > et des réponses observées").
>> > SAS_Report_-_Logistic_Regression.pdf
>> >
>> &lt;http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf&gt;
>> >
>> > I chose the order for levels.
>> >
>> > I controlled that the corresponding probabilities P(Y=X)  are the
>> same
>> > with both softwares. But I can't understand why in SAS the C index
>> drops
>> > from 0.963 down to 0.332.
>> >
>> > I read a lot of things about this and it seems to me that both
>> softwares
>> > use slightly different technique to compute the C index ; it is
>> > nevertheless surprising to me to observe such a shift in the results.
>> >
>> > Does anyone have a clue on this ?
>> > Thank you very much for you help
>> > Blackscorpio
>>
>>
>>
>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

blackscorpio

Dear Dr Harrell,

About the mean probabilities, I was refering to the ones computed with the command predict(...,type="mean").
I tried to set the binwidth in SAS to 0.0001 as you suggested.
After having negated the predictors, I found a C index of 0.968, which is exactly the same that rcorr.cens in R and almost the same that lrm, as you explained.
This solves the problem.
For information I tried to change the binwidth value several time before posting the message. The problem was that I found everytime the same results whatever the binwidth,
which I couldn't understand.
I just discover that Enterprise Guide did not take into account these changes, whereas SAS did it. This enabled me to found the correct results thanks to your advice.

Thank you again for you help,
With best wishes,
Olivier

> Date: Thu, 24 Jan 2013 13:09:46 -0600
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
>
> Please define 'mean probabilities'.
>
> To compute the C-index or Dxy you need anything that is monotonically
> related to the prediction of interest, including the linear combination
> of covariates ignoring all intercepts.   In other words you don't need
> to go to the trouble of computing probabilities unless you are binning,
> as the binning is usually done on a controllable 0-1 scale.   When I bin
> I just choose the middle intercept, I seem to recall.  Also try running
> SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer
> for C.  [I wrote the original algorithm SAS uses for this in the old SAS
> PROC LOGIST.  Binning was just for speed.]
>
> You might also re-run SAS after negating the response variable.
> Frank
>
> blackscorpio wrote
> > Dear Dr Harrell,
> > Thank you very much for your answer. Actually I also tried to found the C
> > index by hand on these data using the mean probabilities and I found
> > 0.968, as you just showed.
> > I understand now why I had a slight difference with the outpout of lrm. I
> > am thus convinced that this result is correct.
> >
> > I read on the SAS help that the procedure logistic also proceed to some
> > binning (BINWIDTH option) :
> >
> > http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm
> >
> > But I cannot explain why the difference between the two softwares is that
> > huge, especially since the class probabilities are the same.
> >
> > Do you think it could be due to the fact that mean probabilities are
> > computed differently ?
> >
> > Thank for your help and best regards,
> > OC
> >
> >
> >> Date: Thu, 24 Jan 2013 05:28:13 -0800
> >> From:
>
> > f.harrell@
>
> >> To:
>
> > r-help@
>
> >> Subject: Re: [R] Difference between R and SAS in Corcordance index in
> >> ordinal logistic regression
> >>
> >> lrm does some binning to make the calculations faster.  The exact
> >> calculation
> >> is obtained by running
> >>
> >> f <- lrm(...)
> >> rcorr.cens(predict(f), DA), which results in:
> >>
> >>        C Index            Dxy           S.D.              n
> >> missing
> >>     0.96814404     0.93628809     0.03808336    32.00000000
> >> 0.00000000
> >>     uncensored Relevant Pairs     Concordant      Uncertain
> >>    32.00000000   722.00000000   699.00000000     0.00000000
> >>
> >> I.e., C=68 instead of .963.  But this is even farther away than the
> >> value
> >> from SAS you reported.
> >>
> >> If you don't believe the rcorr.cens result, create a tiny example and do
> >> the
> >> calculations by hand.
> >> Frank
> >>
> >>
> >> blackscorpio81 wrote
> >> > Dear R users,
> >> >
> >> > Please allow to me ask for your help.
> >> >  I am currently using Frank Harrell Jr package "rms" to model ordinal
> >> > logistic regression with proportional odds. In order to assess model
> >> > predictive ability, C concordance index is displayed and equals to
> >> 0.963.
> >> >
> >> > This is the code I used with the data attached
> >> > data.csv &lt;http://r.789695.n4.nabble.com/file/n4656409/data.csv&gt;
> >> >  :
> >> >
> >> >>require(rms)
> >> >>a<-read.csv2("/data.csv",row.names =,na.strings = c(""," "),dec=".")
> >> >>lrm(DA~SJ+TJ,data=
> >> >
> >> > Logistic Regression Model
> >> >
> >> > lrm(formula =A~SJ+TJ, data = a)
> >> >
> >> > Frequencies of Responses
> >> >
> >> >  1  2  3  4
> >> >  6 13  9  4
> >> >
> >> >                                               Model Likelihood
> >> > Discrimination                  Rank Discrim.
> >> >                                              Ratio Test
> >> > Indexes                               Indexes
> >> > Obs            32                      LR chi2      53.14
> >> R2
> >> > 0.875                      C       0.963
> >> > max |deriv| 6e-06             d.f.             2                    g
> >> > 8.690                Dxy     0.925
> >> >                                              Pr(> chi2) <0.0001
> >> gr
> >> > 5942.469                    gamma   0.960
> >> >
> >> > gp       0.486                      tau-a   0.673
> >> >
> >> > Brier    0.022
> >> >
> >> >                         Coef              S.E.        Wald  Z
> >> Pr(>|Z|)
> >> > y>=            -0.6161     0.6715        -0.92           0.3589
> >> > y>=            -6.5949     2.3750        -2.78          0.0055
> >> > y>=       -16.2358        5.3737         -3.02         0.0025
> >> > SJ                 1.4341      0.5180          2.77         0.0056
> >> > TJ                  0.5312      0.2483         2.14          0.0324
> >> >
> >> > I wanted to compare the results with SAS. I found the same slopes and
> >> > intercept with opposite signs, which is normal since R models the
> >> > probabilities P(Y>=X) whereas SAS models the probabilities P(Y<=k|X)
> >> > (see pdf attached, page 2 , table "Association des probabilités
> >> prédites
> >> > et des réponses observées").
> >> > SAS_Report_-_Logistic_Regression.pdf
> >> >
> >> &lt;http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf&gt;
> >> >
> >> > I chose the order for levels.
> >> >
> >> > I controlled that the corresponding probabilities P(Y=X)  are the
> >> same
> >> > with both softwares. But I can't understand why in SAS the C index
> >> drops
> >> > from 0.963 down to 0.332.
> >> >
> >> > I read a lot of things about this and it seems to me that both
> >> softwares
> >> > use slightly different technique to compute the C index ; it is
> >> > nevertheless surprising to me to observe such a shift in the results.
> >> >
> >> > Does anyone have a clue on this ?
> >> > Thank you very much for you help
> >> > Blackscorpio
> >>
> >>
> >>
> >>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
         
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

Frank Harrell
This post has NOT been accepted by the mailing list yet.
For lrm fits, predict(fit, type='mean') predicts the mean Y, not a probability.
Frank
blackscorpio wrote
Dear Dr Harrell,

About the mean probabilities, I was refering to the ones computed with the command predict(...,type="mean").
I tried to set the binwidth in SAS to 0.0001 as you suggested.
After having negated the predictors, I found a C index of 0.968, which is exactly the same that rcorr.cens in R and almost the same that lrm, as you explained.
This solves the problem.
For information I tried to change the binwidth value several time before posting the message. The problem was that I found everytime the same results whatever the binwidth,
which I couldn't understand.
I just discover that Enterprise Guide did not take into account these changes, whereas SAS did it. This enabled me to found the correct results thanks to your advice.

Thank you again for you help,
With best wishes,
Olivier

> Date: Thu, 24 Jan 2013 13:09:46 -0600
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
>
> Please define 'mean probabilities'.
>
> To compute the C-index or Dxy you need anything that is monotonically
> related to the prediction of interest, including the linear combination
> of covariates ignoring all intercepts.   In other words you don't need
> to go to the trouble of computing probabilities unless you are binning,
> as the binning is usually done on a controllable 0-1 scale.   When I bin
> I just choose the middle intercept, I seem to recall.  Also try running
> SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer
> for C.  [I wrote the original algorithm SAS uses for this in the old SAS
> PROC LOGIST.  Binning was just for speed.]
>
> You might also re-run SAS after negating the response variable.
> Frank
>
> blackscorpio wrote
> > Dear Dr Harrell,
> > Thank you very much for your answer. Actually I also tried to found the C
> > index by hand on these data using the mean probabilities and I found
> > 0.968, as you just showed.
> > I understand now why I had a slight difference with the outpout of lrm. I
> > am thus convinced that this result is correct.
> >
> > I read on the SAS help that the procedure logistic also proceed to some
> > binning (BINWIDTH option) :
> >
> > http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm
> >
> > But I cannot explain why the difference between the two softwares is that
> > huge, especially since the class probabilities are the same.
> >
> > Do you think it could be due to the fact that mean probabilities are
> > computed differently ?
> >
> > Thank for your help and best regards,
> > OC
> >
> >
> >> Date: Thu, 24 Jan 2013 05:28:13 -0800
> >> From:
>
> > f.harrell@
>
> >> To:
>
> > r-help@
>
> >> Subject: Re: [R] Difference between R and SAS in Corcordance index in
> >> ordinal logistic regression
> >>
> >> lrm does some binning to make the calculations faster.  The exact
> >> calculation
> >> is obtained by running
> >>
> >> f <- lrm(...)
> >> rcorr.cens(predict(f), DA), which results in:
> >>
> >>        C Index            Dxy           S.D.              n
> >> missing
> >>     0.96814404     0.93628809     0.03808336    32.00000000
> >> 0.00000000
> >>     uncensored Relevant Pairs     Concordant      Uncertain
> >>    32.00000000   722.00000000   699.00000000     0.00000000
> >>
> >> I.e., C=68 instead of .963.  But this is even farther away than the
> >> value
> >> from SAS you reported.
> >>
> >> If you don't believe the rcorr.cens result, create a tiny example and do
> >> the
> >> calculations by hand.
> >> Frank
> >>
> >>
> >> blackscorpio81 wrote
> >> > Dear R users,
> >> >
> >> > Please allow to me ask for your help.
> >> >  I am currently using Frank Harrell Jr package "rms" to model ordinal
> >> > logistic regression with proportional odds. In order to assess model
> >> > predictive ability, C concordance index is displayed and equals to
> >> 0.963.
> >> >
> >> > This is the code I used with the data attached
> >> > data.csv <http://r.789695.n4.nabble.com/file/n4656409/data.csv>
> >> >  :
> >> >
> >> >>require(rms)
> >> >>a<-read.csv2("/data.csv",row.names =,na.strings = c(""," "),dec=".")
> >> >>lrm(DA~SJ+TJ,data=
> >> >
> >> > Logistic Regression Model
> >> >
> >> > lrm(formula =A~SJ+TJ, data = a)
> >> >
> >> > Frequencies of Responses
> >> >
> >> >  1  2  3  4
> >> >  6 13  9  4
> >> >
> >> >                                               Model Likelihood
> >> > Discrimination                  Rank Discrim.
> >> >                                              Ratio Test
> >> > Indexes                               Indexes
> >> > Obs            32                      LR chi2      53.14
> >> R2
> >> > 0.875                      C       0.963
> >> > max |deriv| 6e-06             d.f.             2                    g
> >> > 8.690                Dxy     0.925
> >> >                                              Pr(> chi2) <0.0001
> >> gr
> >> > 5942.469                    gamma   0.960
> >> >
> >> > gp       0.486                      tau-a   0.673
> >> >
> >> > Brier    0.022
> >> >
> >> >                         Coef              S.E.        Wald  Z
> >> Pr(>|Z|)
> >> > y>=            -0.6161     0.6715        -0.92           0.3589
> >> > y>=            -6.5949     2.3750        -2.78          0.0055
> >> > y>=       -16.2358        5.3737         -3.02         0.0025
> >> > SJ                 1.4341      0.5180          2.77         0.0056
> >> > TJ                  0.5312      0.2483         2.14          0.0324
> >> >
> >> > I wanted to compare the results with SAS. I found the same slopes and
> >> > intercept with opposite signs, which is normal since R models the
> >> > probabilities P(Y>=X) whereas SAS models the probabilities P(Y<=k|X)
> >> > (see pdf attached, page 2 , table "Association des probabilités
> >> prédites
> >> > et des réponses observées").
> >> > SAS_Report_-_Logistic_Regression.pdf
> >> >
> >> <http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf>
> >> >
> >> > I chose the order for levels.
> >> >
> >> > I controlled that the corresponding probabilities P(Y=X)  are the
> >> same
> >> > with both softwares. But I can't understand why in SAS the C index
> >> drops
> >> > from 0.963 down to 0.332.
> >> >
> >> > I read a lot of things about this and it seems to me that both
> >> softwares
> >> > use slightly different technique to compute the C index ; it is
> >> > nevertheless surprising to me to observe such a shift in the results.
> >> >
> >> > Does anyone have a clue on this ?
> >> > Thank you very much for you help
> >> > Blackscorpio
> >>
> >>
> >>
> >>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
         
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Difference between R and SAS in Corcordance index in ordinal logistic regression

Frank Harrell
In reply to this post by blackscorpio81
For lrm fits, predict(fit, type='mean') predicts the mean Y, not a
probability.
Frank

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University