

Hi all,
I found that the two different versions of "survival" packages, namely 2.365
vs. 2.368 or later, give different results for coxph function. Please see
below and the data is attached. The second one was done on Linux, but Windows
gave the same results. Could you please let me know which one I should trust?
Thanks,
...Tao
#####============================ R2.13.0, survival 2.369
=================================
> dat=read.csv("tmp1.csv", header=T)
> fit2 < coxph(Surv(tt, cens) ~., data=dat)
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Ran out of iterations and did not converge
> summary(fit2)
## the estimates are different
.....
> sessionInfo()
R version 2.13.0 (20110413)
Platform: i386pcmingw32/i386 (32bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] grDevices datasets splines graphics stats tcltk utils
methods base
other attached packages:
[1] svSocket_0.951 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.83
survival_2.369
loaded via a namespace (and not attached):
[1] cluster_1.13.3 grid_2.13.0 lattice_0.1923 svMisc_0.961
tools_2.13.0
#####============================================
=================================
#####============================ R2.12.2, survival 2.365
=================================
> dat=read.csv("tmp1.csv", header=T)
> fit2 < coxph(Surv(tt, cens) ~., data=dat)
> summary(fit2)
## the estimates are different
.....
>
> sessionInfo()
R version 2.12.2 (20110225)
Platform: x86_64redhatlinuxgnu (64bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] survival_2.365
#####============================================
================================= ______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Wed, 20110511 at 16:11 0700, Shi, Tao wrote:
> Hi all,
>
> I found that the two different versions of "survival" packages, namely 2.365
> vs. 2.368 or later, give different results for coxph function. Please see
> below and the data is attached. The second one was done on Linux, but Windows
> gave the same results. Could you please let me know which one I should trust?
>
> Thanks,
In your case, neither. Your data set has 22 events and 17 predictors;
the rule of thumb for a reliable Cox model is 1020 events per predictor
which implies no more than 2 for your data set. As a result, the
coefficients of your model have very wide confidence intervals, the coef
for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
26 times the estimate; i.e., there is no biological meaning to the
estimate.
Nevertheless, why did coxph give a different answer? The later
version 2.369 failed to converge (20 iterations) with a final
loglikelihood of 19.94, the earlier code converges in 10 iterations to
19.91. In version 2.366 an extra check was put into the maximizer for
coxph in response to an exceptional data set which caused the routine to
fail due to overflow of the exp function; the NewtonRaphson iteration
algorithm had made a terrible guess in it's iteration path, which can
happen with all NR based search methods.
I put a limit on the size the linear predictor in the Cox model of
21. The basic argument is that exp(linearpredictor) = relative risk
for a subject, and that there is not much biological meaning for risks
to be less than exp(21) ~ 1/(population of the earh). There is more to
the reasoning, interested parties should look at the comments in
src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will
happily accept input the "best" value for the constant.
I never expected to see a data set with both convergence of the LL
and linear predictors larger than +15. Looking at the fit (older code)
> round(fit2$linear.predictor, 2)
[1] 2.26 0.89 4.96 19.09 12.10 1.39 2.82 3.10
[9] 18.57 25.25 22.94 8.75 5.52 27.64 14.88 23.41
[17] 13.70 28.45 1.84 10.04 12.62 2.54 6.33 8.76
[25] 9.68 4.39 2.92 3.51 6.02 17.24 5.97
This says that, if the model is to be believed, you have several near
immortals in the data set. (Everyone else on earth will perish first).
Terry Therneau
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Terry,
Really appreciate your help! Sorry for my late reply.
I did realize that there are way more predictors in the model. My initial
thinking was use that as an initial model for stepwise model selection. Now I
wonder if the model selection result is still valid if the initial model didn't
even converge?
Thanks!
...Tao
 Original Message 
> From: Terry Therneau < [hidden email]>
> To: "Shi, Tao" < [hidden email]>
> Cc: [hidden email]
> Sent: Thu, May 12, 2011 6:42:09 AM
> Subject: Re: changes in coxph in "survival" from older version?
>
>
> On Wed, 20110511 at 16:11 0700, Shi, Tao wrote:
> > Hi all,
> >
> > I found that the two different versions of "survival" packages, namely
>2.365
>
> > vs. 2.368 or later, give different results for coxph function. Please see
> > below and the data is attached. The second one was done on Linux, but
>Windows
>
> > gave the same results. Could you please let me know which one I should
>trust?
> >
> > Thanks,
>
> In your case, neither. Your data set has 22 events and 17 predictors;
> the rule of thumb for a reliable Cox model is 1020 events per predictor
> which implies no more than 2 for your data set. As a result, the
> coefficients of your model have very wide confidence intervals, the coef
> for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
> 26 times the estimate; i.e., there is no biological meaning to the
> estimate.
>
> Nevertheless, why did coxph give a different answer? The later
> version 2.369 failed to converge (20 iterations) with a final
> loglikelihood of 19.94, the earlier code converges in 10 iterations to
> 19.91. In version 2.366 an extra check was put into the maximizer for
> coxph in response to an exceptional data set which caused the routine to
> fail due to overflow of the exp function; the NewtonRaphson iteration
> algorithm had made a terrible guess in it's iteration path, which can
> happen with all NR based search methods.
> I put a limit on the size the linear predictor in the Cox model of
> 21. The basic argument is that exp(linearpredictor) = relative risk
> for a subject, and that there is not much biological meaning for risks
> to be less than exp(21) ~ 1/(population of the earh). There is more to
> the reasoning, interested parties should look at the comments in
> src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will
> happily accept input the "best" value for the constant.
>
> I never expected to see a data set with both convergence of the LL
> and linear predictors larger than +15. Looking at the fit (older code)
> > round(fit2$linear.predictor, 2)
> [1] 2.26 0.89 4.96 19.09 12.10 1.39 2.82 3.10
> [9] 18.57 25.25 22.94 8.75 5.52 27.64 14.88 23.41
> [17] 13.70 28.45 1.84 10.04 12.62 2.54 6.33 8.76
> [25] 9.68 4.39 2.92 3.51 6.02 17.24 5.97
>
> This says that, if the model is to be believed, you have several near
> immortals in the data set. (Everyone else on earth will perish first).
>
> Terry Therneau
>
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Please don't be serious about doing variable selection with this dataset.
Frank
Shi, Tao wrote
Hi Terry,
Really appreciate your help! Sorry for my late reply.
I did realize that there are way more predictors in the model. My initial
thinking was use that as an initial model for stepwise model selection. Now I
wonder if the model selection result is still valid if the initial model didn't
even converge?
Thanks!
...Tao
 Original Message 
> From: Terry Therneau < [hidden email]>
> To: "Shi, Tao" < [hidden email]>
> Cc: [hidden email]> Sent: Thu, May 12, 2011 6:42:09 AM
> Subject: Re: changes in coxph in "survival" from older version?
>
>
> On Wed, 20110511 at 16:11 0700, Shi, Tao wrote:
> > Hi all,
> >
> > I found that the two different versions of "survival" packages, namely
>2.365
>
> > vs. 2.368 or later, give different results for coxph function. Please see
> > below and the data is attached. The second one was done on Linux, but
>Windows
>
> > gave the same results. Could you please let me know which one I should
>trust?
> >
> > Thanks,
>
> In your case, neither. Your data set has 22 events and 17 predictors;
> the rule of thumb for a reliable Cox model is 1020 events per predictor
> which implies no more than 2 for your data set. As a result, the
> coefficients of your model have very wide confidence intervals, the coef
> for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
> 26 times the estimate; i.e., there is no biological meaning to the
> estimate.
>
> Nevertheless, why did coxph give a different answer? The later
> version 2.369 failed to converge (20 iterations) with a final
> loglikelihood of 19.94, the earlier code converges in 10 iterations to
> 19.91. In version 2.366 an extra check was put into the maximizer for
> coxph in response to an exceptional data set which caused the routine to
> fail due to overflow of the exp function; the NewtonRaphson iteration
> algorithm had made a terrible guess in it's iteration path, which can
> happen with all NR based search methods.
> I put a limit on the size the linear predictor in the Cox model of
> 21. The basic argument is that exp(linearpredictor) = relative risk
> for a subject, and that there is not much biological meaning for risks
> to be less than exp(21) ~ 1/(population of the earh). There is more to
> the reasoning, interested parties should look at the comments in
> src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will
> happily accept input the "best" value for the constant.
>
> I never expected to see a data set with both convergence of the LL
> and linear predictors larger than +15. Looking at the fit (older code)
> > round(fit2$linear.predictor, 2)
> [1] 2.26 0.89 4.96 19.09 12.10 1.39 2.82 3.10
> [9] 18.57 25.25 22.94 8.75 5.52 27.64 14.88 23.41
> [17] 13.70 28.45 1.84 10.04 12.62 2.54 6.33 8.76
> [25] 9.68 4.39 2.92 3.51 6.02 17.24 5.97
>
> This says that, if the model is to be believed, you have several near
> immortals in the data set. (Everyone else on earth will perish first).
>
> Terry Therneau
>
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


Hi Frank,
I know it's kind of beyond the scope of R help, but I would appreciate it if you
can elaborate further on this. Are you worrying about this variable selection
approach or you think that there is something wrong with the data (it's actually
a real dataset)? If it's the first one, I believe I can always narrow down the
variables based on univarate analysis and build a multivariate model from that.
Many thanks in advance.
...Tao
 Original Message 
> From: Frank Harrell < [hidden email]>
> To: [hidden email]
> Sent: Mon, May 16, 2011 11:25:20 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> Please don't be serious about doing variable selection with this dataset.
> Frank
>
> Shi, Tao wrote:
> >
> > Hi Terry,
> >
> > Really appreciate your help! Sorry for my late reply.
> >
> > I did realize that there are way more predictors in the model. My initial
> > thinking was use that as an initial model for stepwise model selection.
> > Now I
> > wonder if the model selection result is still valid if the initial model
> > didn't
> > even converge?
> >
> > Thanks!
> >
> > ...Tao
> >
> >
> >
> >
> >  Original Message 
> >> From: Terry Therneau < [hidden email]>
> >> To: "Shi, Tao" < [hidden email]>
> >> Cc: [hidden email]
> >> Sent: Thu, May 12, 2011 6:42:09 AM
> >> Subject: Re: changes in coxph in "survival" from older version?
> >>
> >>
> >> On Wed, 20110511 at 16:11 0700, Shi, Tao wrote:
> >> > Hi all,
> >> >
> >> > I found that the two different versions of "survival" packages, namely
> >>2.365
> >>
> >> > vs. 2.368 or later, give different results for coxph function.
> >> Please see
> >
> >> > below and the data is attached. The second one was done on Linux, but
> >>Windows
> >>
> >> > gave the same results. Could you please let me know which one I
> >> should
> >>trust?
> >> >
> >> > Thanks,
> >>
> >> In your case, neither. Your data set has 22 events and 17 predictors;
> >> the rule of thumb for a reliable Cox model is 1020 events per predictor
> >> which implies no more than 2 for your data set. As a result, the
> >> coefficients of your model have very wide confidence intervals, the coef
> >> for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
> >> 26 times the estimate; i.e., there is no biological meaning to the
> >> estimate.
> >>
> >> Nevertheless, why did coxph give a different answer? The later
> >> version 2.369 failed to converge (20 iterations) with a final
> >> loglikelihood of 19.94, the earlier code converges in 10 iterations to
> >> 19.91. In version 2.366 an extra check was put into the maximizer for
> >> coxph in response to an exceptional data set which caused the routine to
> >> fail due to overflow of the exp function; the NewtonRaphson iteration
> >> algorithm had made a terrible guess in it's iteration path, which can
> >> happen with all NR based search methods.
> >> I put a limit on the size the linear predictor in the Cox model of
> >> 21. The basic argument is that exp(linearpredictor) = relative risk
> >> for a subject, and that there is not much biological meaning for risks
> >> to be less than exp(21) ~ 1/(population of the earh). There is more to
> >> the reasoning, interested parties should look at the comments in
> >> src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will
> >> happily accept input the "best" value for the constant.
> >>
> >> I never expected to see a data set with both convergence of the LL
> >> and linear predictors larger than +15. Looking at the fit (older code)
> >> > round(fit2$linear.predictor, 2)
> >> [1] 2.26 0.89 4.96 19.09 12.10 1.39 2.82 3.10
> >> [9] 18.57 25.25 22.94 8.75 5.52 27.64 14.88 23.41
> >> [17] 13.70 28.45 1.84 10.04 12.62 2.54 6.33 8.76
> >> [25] 9.68 4.39 2.92 3.51 6.02 17.24 5.97
> >>
> >> This says that, if the model is to be believed, you have several near
> >> immortals in the data set. (Everyone else on earth will perish first).
> >>
> >> Terry Therneau
> >>
> >>
> >>
> >>
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
>
> 
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> 
> View this message in context:
> http://r.789695.n4.nabble.com/changesincoxphinsurvivalfromolderversiontp3516101p3527017.html>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The problem is the use of variable selection without simultaneous shrinkage. It will result in an entirely unreliable model and essentially will choose a random sample of the predictors. See http://www.childrensmercy.org/stats/faq/faq12.aspFrank
Shi, Tao wrote
Hi Frank,
I know it's kind of beyond the scope of R help, but I would appreciate it if you
can elaborate further on this. Are you worrying about this variable selection
approach or you think that there is something wrong with the data (it's actually
a real dataset)? If it's the first one, I believe I can always narrow down the
variables based on univarate analysis and build a multivariate model from that.
Many thanks in advance.
...Tao
 Original Message 
> From: Frank Harrell < [hidden email]>
> To: [hidden email]> Sent: Mon, May 16, 2011 11:25:20 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> Please don't be serious about doing variable selection with this dataset.
> Frank
>
> Shi, Tao wrote:
> >
> > Hi Terry,
> >
> > Really appreciate your help! Sorry for my late reply.
> >
> > I did realize that there are way more predictors in the model. My initial
> > thinking was use that as an initial model for stepwise model selection.
> > Now I
> > wonder if the model selection result is still valid if the initial model
> > didn't
> > even converge?
> >
> > Thanks!
> >
> > ...Tao
> >
> >
> >
> >
> >  Original Message 
> >> From: Terry Therneau < [hidden email]>
> >> To: "Shi, Tao" < [hidden email]>
> >> Cc: [hidden email]> >> Sent: Thu, May 12, 2011 6:42:09 AM
> >> Subject: Re: changes in coxph in "survival" from older version?
> >>
> >>
> >> On Wed, 20110511 at 16:11 0700, Shi, Tao wrote:
> >> > Hi all,
> >> >
> >> > I found that the two different versions of "survival" packages, namely
> >>2.365
> >>
> >> > vs. 2.368 or later, give different results for coxph function.
> >> Please see
> >
> >> > below and the data is attached. The second one was done on Linux, but
> >>Windows
> >>
> >> > gave the same results. Could you please let me know which one I
> >> should
> >>trust?
> >> >
> >> > Thanks,
> >>
> >> In your case, neither. Your data set has 22 events and 17 predictors;
> >> the rule of thumb for a reliable Cox model is 1020 events per predictor
> >> which implies no more than 2 for your data set. As a result, the
> >> coefficients of your model have very wide confidence intervals, the coef
> >> for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
> >> 26 times the estimate; i.e., there is no biological meaning to the
> >> estimate.
> >>
> >> Nevertheless, why did coxph give a different answer? The later
> >> version 2.369 failed to converge (20 iterations) with a final
> >> loglikelihood of 19.94, the earlier code converges in 10 iterations to
> >> 19.91. In version 2.366 an extra check was put into the maximizer for
> >> coxph in response to an exceptional data set which caused the routine to
> >> fail due to overflow of the exp function; the NewtonRaphson iteration
> >> algorithm had made a terrible guess in it's iteration path, which can
> >> happen with all NR based search methods.
> >> I put a limit on the size the linear predictor in the Cox model of
> >> 21. The basic argument is that exp(linearpredictor) = relative risk
> >> for a subject, and that there is not much biological meaning for risks
> >> to be less than exp(21) ~ 1/(population of the earh). There is more to
> >> the reasoning, interested parties should look at the comments in
> >> src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will
> >> happily accept input the "best" value for the constant.
> >>
> >> I never expected to see a data set with both convergence of the LL
> >> and linear predictors larger than +15. Looking at the fit (older code)
> >> > round(fit2$linear.predictor, 2)
> >> [1] 2.26 0.89 4.96 19.09 12.10 1.39 2.82 3.10
> >> [9] 18.57 25.25 22.94 8.75 5.52 27.64 14.88 23.41
> >> [17] 13.70 28.45 1.84 10.04 12.62 2.54 6.33 8.76
> >> [25] 9.68 4.39 2.92 3.51 6.02 17.24 5.97
> >>
> >> This says that, if the model is to be believed, you have several near
> >> immortals in the data set. (Everyone else on earth will perish first).
> >>
> >> Terry Therneau
> >>
> >>
> >>
> >>
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
>
> 
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> 
> View this message in context:
> http://r.789695.n4.nabble.com/changesincoxphinsurvivalfromolderversiontp3516101p3527017.html>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


 begin included message 
I did realize that there are way more predictors in the model. My
initial thinking was use that as an initial model for stepwise model
selection. Now I wonder if the model selection result is still valid
if the initial model didn't even converge?
 end inclusion 
You have 17 predictors with only 22 events. All methods of "variable
selection" in such a scenario will give essentially random results.
There is simply not enough information present to determine a best
predictor or best subset of predictors.
Terry Therneau
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


It's worse if the model does converge because then you don't have a warning about the result being nonsense.
Frank
Terry Therneau2 wrote
 begin included message 
I did realize that there are way more predictors in the model. My
initial thinking was use that as an initial model for stepwise model
selection. Now I wonder if the model selection result is still valid
if the initial model didn't even converge?
 end inclusion 
You have 17 predictors with only 22 events. All methods of "variable
selection" in such a scenario will give essentially random results.
There is simply not enough information present to determine a best
predictor or best subset of predictors.
Terry Therneau
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


Thank you, Frank and Terry, for all your answers! I'll upgrade my "survival"
package for sure!
It seems to me that you two are pointing to two different issues: 1) Is stepwise
model selection a good approach (for any data)? 2) Whether the data I have has
enough information that even worth to model? For #1, I'm not in a good position
to judge and need to read up on it. For #2, I'm still a bit confused about
Terry's last comment. If we forget about multivariate model building and just
look at variable one by one and select the best predictor (let's say it's highly
significant, e.g. p<0.0001), the resulting univariate model still can be wrong?
What if I use this data as a validation set to validate an existing model?
Anything different?
Many thanks!
...Tao
 Original Message 
> From: Frank Harrell < [hidden email]>
> To: [hidden email]
> Sent: Tue, May 17, 2011 10:51:02 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> It's worse if the model does converge because then you don't have a warning
> about the result being nonsense.
> Frank
>
>
> Terry Therneau2 wrote:
> >
> >  begin included message 
> > I did realize that there are way more predictors in the model. My
> > initial thinking was use that as an initial model for stepwise model
> > selection. Now I wonder if the model selection result is still valid
> > if the initial model didn't even converge?
> >  end inclusion 
> >
> > You have 17 predictors with only 22 events. All methods of "variable
> > selection" in such a scenario will give essentially random results.
> > There is simply not enough information present to determine a best
> > predictor or best subset of predictors.
> >
> > Terry Therneau
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
>
> 
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> 
> View this message in context:
> http://r.789695.n4.nabble.com/changesincoxphinsurvivalfromolderversiontp3516101p3530024.html>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Tao,
For you situation (and even MUCH larger number of events), multivariable modeling will be unreliable unless you use shrinkage, variable selection will select the wrong variables, and univariable screening leads to massive bias in later stages.
Terry converted me from SAS to SPlus in 1991 when I visited Mayo Clinic and he showed me how natural the language was to put a loop around the kind of stepwise analyses requested by users. The bootstrap showed that the list of predictors selected was very random.
Another demonstration of this is to bootstrap the ranks of the predictors, ranked by any measure you want (adjusted chisquare, univariable chisquare, ROC area). The confidence intervals for the ranks will be extremely wide.
Frank
Shi, Tao wrote
Thank you, Frank and Terry, for all your answers! I'll upgrade my "survival"
package for sure!
It seems to me that you two are pointing to two different issues: 1) Is stepwise
model selection a good approach (for any data)? 2) Whether the data I have has
enough information that even worth to model? For #1, I'm not in a good position
to judge and need to read up on it. For #2, I'm still a bit confused about
Terry's last comment. If we forget about multivariate model building and just
look at variable one by one and select the best predictor (let's say it's highly
significant, e.g. p<0.0001), the resulting univariate model still can be wrong?
What if I use this data as a validation set to validate an existing model?
Anything different?
Many thanks!
...Tao
 Original Message 
> From: Frank Harrell < [hidden email]>
> To: [hidden email]> Sent: Tue, May 17, 2011 10:51:02 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
> It's worse if the model does converge because then you don't have a warning
> about the result being nonsense.
> Frank
>
>
> Terry Therneau2 wrote:
> >
> >  begin included message 
> > I did realize that there are way more predictors in the model. My
> > initial thinking was use that as an initial model for stepwise model
> > selection. Now I wonder if the model selection result is still valid
> > if the initial model didn't even converge?
> >  end inclusion 
> >
> > You have 17 predictors with only 22 events. All methods of "variable
> > selection" in such a scenario will give essentially random results.
> > There is simply not enough information present to determine a best
> > predictor or best subset of predictors.
> >
> > Terry Therneau
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
> >
>
>
> 
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> 
> View this message in context:
> http://r.789695.n4.nabble.com/changesincoxphinsurvivalfromolderversiontp3516101p3530024.html>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University


On Thu, 20110519 at 17:03 0700, Shi, Tao wrote:
> Thank you, Frank and Terry, for all your answers! I'll upgrade my "survival"
> package for sure!
>
> It seems to me that you two are pointing to two different issues: 1) Is stepwise
> model selection a good approach (for any data)? 2) Whether the data I have has
> enough information that even worth to model? For #1, I'm not in a good position
> to judge and need to read up on it. For #2, I'm still a bit confused about
> Terry's last comment. If we forget about multivariate model building and just
> look at variable one by one and select the best predictor (let's say it's highly
> significant, e.g. p<0.0001), the resulting univariate model still can be wrong?
>
> What if I use this data as a validation set to validate an existing model?
> Anything different?
>
> Many thanks!
Stepwise regression is a bad idea. Whether you let the machine do it or
you have a human do it (run all univariates, read the output, pick the
best) it is still stepwise selection. It is still very unstable, even
with very large sample size.
Terry T.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thank you very much, Frank and Terry, again, for all your answers!
...Tao
 Original Message 
> From: Terry Therneau < [hidden email]>
> To: "Shi, Tao" < [hidden email]>
> Cc: Frank Harrell < [hidden email]>; [hidden email]
> Sent: Fri, May 20, 2011 6:36:28 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
>
>
> On Thu, 20110519 at 17:03 0700, Shi, Tao wrote:
> > Thank you, Frank and Terry, for all your answers! I'll upgrade my
>"survival"
>
> > package for sure!
> >
> > It seems to me that you two are pointing to two different issues: 1) Is
>stepwise
>
> > model selection a good approach (for any data)? 2) Whether the data I have
>has
>
> > enough information that even worth to model? For #1, I'm not in a good
>position
>
> > to judge and need to read up on it. For #2, I'm still a bit confused about
> > Terry's last comment. If we forget about multivariate model building and
>just
>
> > look at variable one by one and select the best predictor (let's say it's
>highly
>
> > significant, e.g. p<0.0001), the resulting univariate model still can be
>wrong?
> >
> > What if I use this data as a validation set to validate an existing model?
> > Anything different?
> >
> > Many thanks!
>
> Stepwise regression is a bad idea. Whether you let the machine do it or
> you have a human do it (run all univariates, read the output, pick the
> best) it is still stepwise selection. It is still very unstable, even
> with very large sample size.
>
> Terry T.
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

