changes in coxph in "survival" from older version?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

changes in coxph in "survival" from older version?

Shi, Tao
Hi all,

I found that the two different versions of "survival" packages, namely 2.36-5
vs. 2.36-8 or later, give different results for coxph function.  Please see
below and the data is attached.  The second one was done on Linux, but Windows
gave the same results.  Could you please let me know which one I should trust?

Thanks,

...Tao





#####============================ R2.13.0, survival 2.36-9
=================================
> dat=read.csv("tmp1.csv", header=T)
> fit2 <- coxph(Surv(tt, cens) ~., data=dat)
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge
> summary(fit2)
## the estimates are different
.....
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    


attached base packages:
[1] grDevices datasets  splines   graphics  stats     tcltk     utils    
methods   base    


other attached packages:
[1] svSocket_0.9-51 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3    
survival_2.36-9

loaded via a namespace (and not attached):
[1] cluster_1.13.3  grid_2.13.0     lattice_0.19-23 svMisc_0.9-61  
tools_2.13.0  

#####============================================
=================================


#####============================ R2.12.2, survival 2.36-5
=================================
> dat=read.csv("tmp1.csv", header=T)
> fit2 <- coxph(Surv(tt, cens) ~., data=dat)
> summary(fit2)
## the estimates are different
.....
>
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8  
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C      

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods  
[8] base    

other attached packages:
[1] survival_2.36-5
#####============================================
=================================
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Therneau, Terry M., Ph.D.

On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:
> Hi all,
>
> I found that the two different versions of "survival" packages, namely 2.36-5
> vs. 2.36-8 or later, give different results for coxph function.  Please see
> below and the data is attached.  The second one was done on Linux, but Windows
> gave the same results.  Could you please let me know which one I should trust?
>
> Thanks,

 In your case, neither.  Your data set has 22 events and 17 predictors;
the rule of thumb for a reliable Cox model is 10-20 events per predictor
which implies no more than 2 for your data set.  As a result, the
coefficients of your model have very wide confidence intervals, the coef
for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
26 times the estimate; i.e., there is no biological meaning to the
estimate.

  Nevertheless, why did coxph give a different answer?  The later
version 2.36-9 failed to converge (20 iterations) with a final
log-likelihood of -19.94, the earlier code converges in 10 iterations to
-19.91.  In version 2.36-6 an extra check was put into the maximizer for
coxph in response to an exceptional data set which caused the routine to
fail due to overflow of the exp function; the Newton-Raphson iteration
algorithm had made a terrible guess in it's iteration path, which can
happen with all NR based search methods.  
   I put a limit on the size the linear predictor in the Cox model of
21.  The basic argument is that exp(linear-predictor) = relative risk
for a subject, and that there is not much biological meaning for risks
to be less than exp(-21) ~ 1/(population of the earh).  There is more to
the reasoning, interested parties should look at the comments in
src/coxsafe.c, a 5 line routine with 25 lines of discussion.  I will
happily accept input the "best" value for the constant.

   I never expected to see a data set with both convergence of the LL
and linear predictors larger than +-15.  Looking at the fit (older code)
> round(fit2$linear.predictor, 2)
 [1]   2.26   0.89   4.96 -19.09 -12.10   1.39   2.82   3.10
 [9]  18.57 -25.25  22.94   8.75   5.52 -27.64  14.88 -23.41
[17]  13.70 -28.45  -1.84  10.04  12.62   2.54   6.33  -8.76
[25]   9.68   4.39   2.92   3.51   6.02 -17.24   5.97

This says that, if the model is to be believed, you have several near
immortals in the data set. (Everyone else on earth will perish first).

Terry Therneau

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Shi, Tao
Hi Terry,

Really appreciate your help!  Sorry for my late reply.

I did realize that there are way more predictors in the model.  My initial
thinking was use that as an initial model for stepwise model selection.  Now I
wonder if the model selection result is still valid if the initial model didn't
even converge?

Thanks!

...Tao




----- Original Message ----

> From: Terry Therneau <[hidden email]>
> To: "Shi, Tao" <[hidden email]>
> Cc: [hidden email]
> Sent: Thu, May 12, 2011 6:42:09 AM
> Subject: Re: changes in coxph in "survival" from older version?
>
>
> On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:
> > Hi all,
> >
> > I found that the two different versions of "survival" packages, namely  
>2.36-5
>
> > vs. 2.36-8 or later, give different results for coxph  function.  Please see

> > below and the data is attached.  The  second one was done on Linux, but
>Windows
>
> > gave the same results.   Could you please let me know which one I should
>trust?
> >
> >  Thanks,
>
>  In your case, neither.  Your data set has 22 events and 17  predictors;
> the rule of thumb for a reliable Cox model is 10-20 events per  predictor
> which implies no more than 2 for your data set.  As a result,  the
> coefficients of your model have very wide confidence intervals, the  coef
> for Male for instance has se of 3.26, meaning the CI goes from 1/26  to
> 26 times the estimate; i.e., there is no biological meaning to  the
> estimate.
>
>   Nevertheless, why did coxph give a different  answer?  The later
> version 2.36-9 failed to converge (20 iterations)  with a final
> log-likelihood of -19.94, the earlier code converges in 10  iterations to
> -19.91.  In version 2.36-6 an extra check was put into the  maximizer for
> coxph in response to an exceptional data set which caused the  routine to
> fail due to overflow of the exp function; the Newton-Raphson  iteration
> algorithm had made a terrible guess in it's iteration path, which  can
> happen with all NR based search methods.  
>    I put a limit  on the size the linear predictor in the Cox model of
> 21.  The basic  argument is that exp(linear-predictor) = relative risk
> for a subject, and  that there is not much biological meaning for risks
> to be less than exp(-21)  ~ 1/(population of the earh).  There is more to
> the reasoning,  interested parties should look at the comments in
> src/coxsafe.c, a 5 line  routine with 25 lines of discussion.  I will
> happily accept input the  "best" value for the constant.
>
>    I never expected to see a data set  with both convergence of the LL
> and linear predictors larger than +-15.   Looking at the fit (older code)
> > round(fit2$linear.predictor, 2)
>   [1]   2.26   0.89   4.96 -19.09 -12.10   1.39    2.82   3.10
>  [9]  18.57 -25.25  22.94   8.75   5.52  -27.64  14.88 -23.41
> [17]  13.70 -28.45  -1.84   10.04  12.62   2.54   6.33  -8.76
> [25]   9.68    4.39   2.92   3.51   6.02 -17.24   5.97
>
> This says  that, if the model is to be believed, you have several near
> immortals in the  data set. (Everyone else on earth will perish first).
>
> Terry  Therneau
>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Frank Harrell
Please don't be serious about doing variable selection with this dataset.
Frank
Shi, Tao wrote
Hi Terry,

Really appreciate your help!  Sorry for my late reply.

I did realize that there are way more predictors in the model.  My initial
thinking was use that as an initial model for stepwise model selection.  Now I
wonder if the model selection result is still valid if the initial model didn't
even converge?

Thanks!

...Tao




----- Original Message ----
> From: Terry Therneau <[hidden email]>
> To: "Shi, Tao" <[hidden email]>
> Cc: [hidden email]
> Sent: Thu, May 12, 2011 6:42:09 AM
> Subject: Re: changes in coxph in "survival" from older version?
>
>
> On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:
> > Hi all,
> >
> > I found that the two different versions of "survival" packages, namely  
>2.36-5
>
> > vs. 2.36-8 or later, give different results for coxph  function.  Please see

> > below and the data is attached.  The  second one was done on Linux, but
>Windows
>
> > gave the same results.   Could you please let me know which one I should
>trust?
> >
> >  Thanks,
>
>  In your case, neither.  Your data set has 22 events and 17  predictors;
> the rule of thumb for a reliable Cox model is 10-20 events per  predictor
> which implies no more than 2 for your data set.  As a result,  the
> coefficients of your model have very wide confidence intervals, the  coef
> for Male for instance has se of 3.26, meaning the CI goes from 1/26  to
> 26 times the estimate; i.e., there is no biological meaning to  the
> estimate.
>
>   Nevertheless, why did coxph give a different  answer?  The later
> version 2.36-9 failed to converge (20 iterations)  with a final
> log-likelihood of -19.94, the earlier code converges in 10  iterations to
> -19.91.  In version 2.36-6 an extra check was put into the  maximizer for
> coxph in response to an exceptional data set which caused the  routine to
> fail due to overflow of the exp function; the Newton-Raphson  iteration
> algorithm had made a terrible guess in it's iteration path, which  can
> happen with all NR based search methods.  
>    I put a limit  on the size the linear predictor in the Cox model of
> 21.  The basic  argument is that exp(linear-predictor) = relative risk
> for a subject, and  that there is not much biological meaning for risks
> to be less than exp(-21)  ~ 1/(population of the earh).  There is more to
> the reasoning,  interested parties should look at the comments in
> src/coxsafe.c, a 5 line  routine with 25 lines of discussion.  I will
> happily accept input the  "best" value for the constant.
>
>    I never expected to see a data set  with both convergence of the LL
> and linear predictors larger than +-15.   Looking at the fit (older code)
> > round(fit2$linear.predictor, 2)
>   [1]   2.26   0.89   4.96 -19.09 -12.10   1.39    2.82   3.10
>  [9]  18.57 -25.25  22.94   8.75   5.52  -27.64  14.88 -23.41
> [17]  13.70 -28.45  -1.84   10.04  12.62   2.54   6.33  -8.76
> [25]   9.68    4.39   2.92   3.51   6.02 -17.24   5.97
>
> This says  that, if the model is to be believed, you have several near
> immortals in the  data set. (Everyone else on earth will perish first).
>
> Terry  Therneau
>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Shi, Tao
Hi Frank,

I know it's kind of beyond the scope of R help, but I would appreciate it if you
can elaborate further on this.  Are you worrying about this variable selection
approach or you think that there is something wrong with the data (it's actually
a real dataset)?  If it's the first one, I believe I can always narrow down the
variables based on univarate analysis and build a multivariate model from that.

Many thanks in advance.

...Tao




----- Original Message ----

> From: Frank Harrell <[hidden email]>
> To: [hidden email]
> Sent: Mon, May 16, 2011 11:25:20 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> Please don't be serious about doing variable selection with this  dataset.
> Frank
>
> Shi, Tao wrote:
> >
> > Hi Terry,
> >
> > Really appreciate your help!  Sorry for my late reply.
> >
> > I did realize that there are way more predictors in the model.  My  initial
> > thinking was use that as an initial model for stepwise model  selection.
> > Now I
> > wonder if the model selection result is still  valid if the initial model
> > didn't
> > even converge?
> >
> > Thanks!
> >
> > ...Tao
> >
> >
> >
> >
> > ----- Original Message ----
> >> From: Terry Therneau &lt;[hidden email]&gt;
> >> To:  "Shi, Tao" &lt;[hidden email]&gt;
> >> Cc: [hidden email]
> >> Sent:  Thu, May 12, 2011 6:42:09 AM
> >> Subject: Re: changes in coxph in  "survival" from older version?
> >>
> >>
> >> On Wed,  2011-05-11 at 16:11 -0700, Shi, Tao wrote:
> >> > Hi all,
> >>  >
> >> > I found that the two different versions of "survival"  packages, namely  
> >>2.36-5
> >>
> >> > vs.  2.36-8 or later, give different results for coxph  function.
> >>  Please see
> >
> >> > below and the data is attached.   The  second one was done on Linux, but
> >>Windows
> >>
> >> > gave the same results.   Could you please let  me know which one I
> >> should
> >>trust?
> >> >
> >> >  Thanks,
> >>
> >>  In your case,  neither.  Your data set has 22 events and 17  predictors;
> >>  the rule of thumb for a reliable Cox model is 10-20 events per   predictor
> >> which implies no more than 2 for your data set.  As a  result,  the
> >> coefficients of your model have very wide  confidence intervals, the  coef
> >> for Male for instance has se of  3.26, meaning the CI goes from 1/26  to
> >> 26 times the estimate;  i.e., there is no biological meaning to  the
> >>  estimate.
> >>
> >>   Nevertheless, why did coxph give a  different  answer?  The later
> >> version 2.36-9 failed to  converge (20 iterations)  with a final
> >> log-likelihood of  -19.94, the earlier code converges in 10  iterations to
> >>  -19.91.  In version 2.36-6 an extra check was put into the  maximizer  for
> >> coxph in response to an exceptional data set which caused  the  routine to
> >> fail due to overflow of the exp function; the  Newton-Raphson  iteration
> >> algorithm had made a terrible guess  in it's iteration path, which  can
> >> happen with all NR based  search methods.  
> >>    I put a limit  on the size  the linear predictor in the Cox model of
> >> 21.  The basic   argument is that exp(linear-predictor) = relative risk
> >> for a  subject, and  that there is not much biological meaning for  risks
> >> to be less than exp(-21)  ~ 1/(population of the  earh).  There is more to
> >> the reasoning,  interested  parties should look at the comments in
> >> src/coxsafe.c, a 5 line   routine with 25 lines of discussion.  I will
> >> happily accept  input the  "best" value for the constant.
> >>
> >>     I never expected to see a data set  with both convergence of the  LL
> >> and linear predictors larger than +-15.   Looking at the fit  (older code)
> >> > round(fit2$linear.predictor, 2)
> >>    [1]   2.26   0.89   4.96 -19.09 -12.10   1.39     2.82   3.10
> >>  [9]  18.57 -25.25  22.94    8.75   5.52  -27.64  14.88 -23.41
> >> [17]  13.70  -28.45  -1.84   10.04  12.62   2.54   6.33   -8.76
> >> [25]   9.68    4.39   2.92    3.51   6.02 -17.24   5.97
> >>
> >> This says   that, if the model is to be believed, you have several near
> >>  immortals in the  data set. (Everyone else on earth will perish  first).
> >>
> >> Terry  Therneau
> >>
> >>
> >>
> >>
> >
> >  ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the  posting guide
> > http://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible code.
> >
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt  University
> --
> View this message in context:  
>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3527017.html
>
> Sent  from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting  guide http://www.R-project.org/posting-guide.html
> and provide commented,  minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Frank Harrell
The problem is the use of variable selection without simultaneous shrinkage.  It will result in an entirely unreliable model and essentially will choose a random sample of the predictors.  See http://www.childrensmercy.org/stats/faq/faq12.asp
Frank
Shi, Tao wrote
Hi Frank,

I know it's kind of beyond the scope of R help, but I would appreciate it if you
can elaborate further on this.  Are you worrying about this variable selection
approach or you think that there is something wrong with the data (it's actually
a real dataset)?  If it's the first one, I believe I can always narrow down the
variables based on univarate analysis and build a multivariate model from that.

Many thanks in advance.

...Tao




----- Original Message ----
> From: Frank Harrell <[hidden email]>
> To: [hidden email]
> Sent: Mon, May 16, 2011 11:25:20 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> Please don't be serious about doing variable selection with this  dataset.
> Frank
>
> Shi, Tao wrote:
> >
> > Hi Terry,
> >
> > Really appreciate your help!  Sorry for my late reply.
> >
> > I did realize that there are way more predictors in the model.  My  initial
> > thinking was use that as an initial model for stepwise model  selection.
> > Now I
> > wonder if the model selection result is still  valid if the initial model
> > didn't
> > even converge?
> >
> > Thanks!
> >
> > ...Tao
> >
> >
> >
> >
> > ----- Original Message ----
> >> From: Terry Therneau <[hidden email]>
> >> To:  "Shi, Tao" <[hidden email]>
> >> Cc: [hidden email]
> >> Sent:  Thu, May 12, 2011 6:42:09 AM
> >> Subject: Re: changes in coxph in  "survival" from older version?
> >>
> >>
> >> On Wed,  2011-05-11 at 16:11 -0700, Shi, Tao wrote:
> >> > Hi all,
> >>  >
> >> > I found that the two different versions of "survival"  packages, namely  
> >>2.36-5
> >>
> >> > vs.  2.36-8 or later, give different results for coxph  function.
> >>  Please see
> >
> >> > below and the data is attached.   The  second one was done on Linux, but
> >>Windows
> >>
> >> > gave the same results.   Could you please let  me know which one I
> >> should
> >>trust?
> >> >
> >> >  Thanks,
> >>
> >>  In your case,  neither.  Your data set has 22 events and 17  predictors;
> >>  the rule of thumb for a reliable Cox model is 10-20 events per   predictor
> >> which implies no more than 2 for your data set.  As a  result,  the
> >> coefficients of your model have very wide  confidence intervals, the  coef
> >> for Male for instance has se of  3.26, meaning the CI goes from 1/26  to
> >> 26 times the estimate;  i.e., there is no biological meaning to  the
> >>  estimate.
> >>
> >>   Nevertheless, why did coxph give a  different  answer?  The later
> >> version 2.36-9 failed to  converge (20 iterations)  with a final
> >> log-likelihood of  -19.94, the earlier code converges in 10  iterations to
> >>  -19.91.  In version 2.36-6 an extra check was put into the  maximizer  for
> >> coxph in response to an exceptional data set which caused  the  routine to
> >> fail due to overflow of the exp function; the  Newton-Raphson  iteration
> >> algorithm had made a terrible guess  in it's iteration path, which  can
> >> happen with all NR based  search methods.  
> >>    I put a limit  on the size  the linear predictor in the Cox model of
> >> 21.  The basic   argument is that exp(linear-predictor) = relative risk
> >> for a  subject, and  that there is not much biological meaning for  risks
> >> to be less than exp(-21)  ~ 1/(population of the  earh).  There is more to
> >> the reasoning,  interested  parties should look at the comments in
> >> src/coxsafe.c, a 5 line   routine with 25 lines of discussion.  I will
> >> happily accept  input the  "best" value for the constant.
> >>
> >>     I never expected to see a data set  with both convergence of the  LL
> >> and linear predictors larger than +-15.   Looking at the fit  (older code)
> >> > round(fit2$linear.predictor, 2)
> >>    [1]   2.26   0.89   4.96 -19.09 -12.10   1.39     2.82   3.10
> >>  [9]  18.57 -25.25  22.94    8.75   5.52  -27.64  14.88 -23.41
> >> [17]  13.70  -28.45  -1.84   10.04  12.62   2.54   6.33   -8.76
> >> [25]   9.68    4.39   2.92    3.51   6.02 -17.24   5.97
> >>
> >> This says   that, if the model is to be believed, you have several near
> >>  immortals in the  data set. (Everyone else on earth will perish  first).
> >>
> >> Terry  Therneau
> >>
> >>
> >>
> >>
> >
> >  ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the  posting guide
> > http://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible code.
> >
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt  University
> --
> View this message in context:  
>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3527017.html
>
> Sent  from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting  guide http://www.R-project.org/posting-guide.html
> and provide commented,  minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Therneau, Terry M., Ph.D.
In reply to this post by Shi, Tao
-- begin included message ---
I did realize that there are way more predictors in the model.  My
initial thinking was use that as an initial model for stepwise model
selection.  Now I  wonder if the model selection result is still valid
if the initial model didn't even converge?
--- end inclusion ---

You have 17 predictors with only 22 events.  All methods of "variable
selection" in such a scenario will give essentially random results.
There is simply not enough information present to determine a best
predictor or best subset of predictors.  

Terry Therneau

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Frank Harrell
It's worse if the model does converge because then you don't have a warning about the result being nonsense.
Frank

Terry Therneau-2 wrote
-- begin included message ---
I did realize that there are way more predictors in the model.  My
initial thinking was use that as an initial model for stepwise model
selection.  Now I  wonder if the model selection result is still valid
if the initial model didn't even converge?
--- end inclusion ---

You have 17 predictors with only 22 events.  All methods of "variable
selection" in such a scenario will give essentially random results.
There is simply not enough information present to determine a best
predictor or best subset of predictors.  

Terry Therneau

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Shi, Tao
Thank you, Frank and Terry, for all your answers!  I'll upgrade my "survival"
package for sure!

It seems to me that you two are pointing to two different issues: 1) Is stepwise
model selection a good approach (for any data)?  2) Whether the data I have has
enough information that even worth to model?  For #1, I'm not in a good position
to judge and need to read up on it.  For #2, I'm still a bit confused about
Terry's last comment.  If we forget about multivariate model building and just
look at variable one by one and select the best predictor (let's say it's highly
significant, e.g. p<0.0001), the resulting univariate model still can be wrong?

What if I use this data as a validation set to validate an existing model?  
Anything different?

Many thanks!

...Tao
   



----- Original Message ----

> From: Frank Harrell <[hidden email]>
> To: [hidden email]
> Sent: Tue, May 17, 2011 10:51:02 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> It's worse if the model does converge because then you don't have a  warning
> about the result being nonsense.
> Frank
>
>
> Terry Therneau-2  wrote:
> >
> > -- begin included message ---
> > I did realize that  there are way more predictors in the model.  My
> > initial thinking  was use that as an initial model for stepwise model
> > selection.  Now  I  wonder if the model selection result is still valid
> > if the  initial model didn't even converge?
> > --- end inclusion ---
> >
> > You have 17 predictors with only 22 events.  All methods of  "variable
> > selection" in such a scenario will give essentially random  results.
> > There is simply not enough information present to determine a  best
> > predictor or best subset of predictors.  
> >
> >  Terry Therneau
> >
> >  ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the  posting guide
> > http://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible code.
> >
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt  University
> --
> View this message in context:  
>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3530024.html
>
> Sent  from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting  guide http://www.R-project.org/posting-guide.html
> and provide commented,  minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Frank Harrell
Hi Tao,

For you situation (and even MUCH larger number of events), multivariable modeling will be unreliable unless you use shrinkage, variable selection will select the wrong variables, and univariable screening leads to massive bias in later stages.

Terry converted me from SAS to S-Plus in 1991 when I visited Mayo Clinic and he showed me how natural the language was to put a loop around the kind of stepwise analyses requested by users.  The bootstrap showed that the list of predictors selected was very random.

Another demonstration of this is to bootstrap the ranks of the predictors, ranked by any measure you want (adjusted chi-square, univariable chi-square, ROC area).  The confidence intervals for the ranks will be extremely wide.
Frank

Shi, Tao wrote
Thank you, Frank and Terry, for all your answers!  I'll upgrade my "survival"
package for sure!

It seems to me that you two are pointing to two different issues: 1) Is stepwise
model selection a good approach (for any data)?  2) Whether the data I have has
enough information that even worth to model?  For #1, I'm not in a good position
to judge and need to read up on it.  For #2, I'm still a bit confused about
Terry's last comment.  If we forget about multivariate model building and just
look at variable one by one and select the best predictor (let's say it's highly
significant, e.g. p<0.0001), the resulting univariate model still can be wrong?

What if I use this data as a validation set to validate an existing model?  
Anything different?

Many thanks!

...Tao
   



----- Original Message ----
> From: Frank Harrell <[hidden email]>
> To: [hidden email]
> Sent: Tue, May 17, 2011 10:51:02 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> It's worse if the model does converge because then you don't have a  warning
> about the result being nonsense.
> Frank
>
>
> Terry Therneau-2  wrote:
> >
> > -- begin included message ---
> > I did realize that  there are way more predictors in the model.  My
> > initial thinking  was use that as an initial model for stepwise model
> > selection.  Now  I  wonder if the model selection result is still valid
> > if the  initial model didn't even converge?
> > --- end inclusion ---
> >
> > You have 17 predictors with only 22 events.  All methods of  "variable
> > selection" in such a scenario will give essentially random  results.
> > There is simply not enough information present to determine a  best
> > predictor or best subset of predictors.  
> >
> >  Terry Therneau
> >
> >  ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the  posting guide
> > http://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible code.
> >
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt  University
> --
> View this message in context:  
>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3530024.html
>
> Sent  from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting  guide http://www.R-project.org/posting-guide.html
> and provide commented,  minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Therneau, Terry M., Ph.D.
In reply to this post by Shi, Tao

On Thu, 2011-05-19 at 17:03 -0700, Shi, Tao wrote:

> Thank you, Frank and Terry, for all your answers!  I'll upgrade my "survival"
> package for sure!
>
> It seems to me that you two are pointing to two different issues: 1) Is stepwise
> model selection a good approach (for any data)?  2) Whether the data I have has
> enough information that even worth to model?  For #1, I'm not in a good position
> to judge and need to read up on it.  For #2, I'm still a bit confused about
> Terry's last comment.  If we forget about multivariate model building and just
> look at variable one by one and select the best predictor (let's say it's highly
> significant, e.g. p<0.0001), the resulting univariate model still can be wrong?
>
> What if I use this data as a validation set to validate an existing model?  
> Anything different?
>
> Many thanks!

 Stepwise regression is a bad idea. Whether you let the machine do it or
you have a human do it (run all univariates, read the output, pick the
best) it is still stepwise selection.  It is still very unstable, even
with very large sample size.

  Terry T.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changes in coxph in "survival" from older version?

Shi, Tao
Thank you very much, Frank and Terry, again, for all your answers!

...Tao




----- Original Message ----

> From: Terry Therneau <[hidden email]>
> To: "Shi, Tao" <[hidden email]>
> Cc: Frank Harrell <[hidden email]>; [hidden email]
> Sent: Fri, May 20, 2011 6:36:28 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
>
> On Thu, 2011-05-19 at 17:03 -0700, Shi, Tao wrote:
> > Thank you, Frank  and Terry, for all your answers!  I'll upgrade my
>"survival"
>
> >  package for sure!
> >
> > It seems to me that you two are pointing to  two different issues: 1) Is
>stepwise
>
> > model selection a good approach  (for any data)?  2) Whether the data I have
>has
>
> > enough information  that even worth to model?  For #1, I'm not in a good
>position
>
> > to  judge and need to read up on it.  For #2, I'm still a bit confused about

> > Terry's last comment.  If we forget about multivariate model  building and
>just
>
> > look at variable one by one and select the best  predictor (let's say it's
>highly
>
> > significant, e.g. p<0.0001), the  resulting univariate model still can be
>wrong?
> >
> > What if I use  this data as a validation set to validate an existing model?  

> >  Anything different?
> >
> > Many thanks!
>
>  Stepwise regression is  a bad idea. Whether you let the machine do it or
> you have a human do it (run  all univariates, read the output, pick the
> best) it is still stepwise  selection.  It is still very unstable, even
> with very large sample  size.
>
>   Terry T.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.