Survival Analysis and Predict time-to-death

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Survival Analysis and Predict time-to-death

survivalUser
Dear All,

I would like to build a model, based on survival analysis on some data, that is able to predict the expected time until death for a new data instance.

Data
For each individual in the population I have the, for each unit of time, the status information and several continuous covariates for that particular time. The data is right censored since at the end of the time interval analyzed, instances could be still alive and die later.

Model
I created the model using R and the survreg function:

lfit <- survreg(Surv(time, status) ~ X)

where:
- time is the time vector
- status is the status vector (0 alive, 1 death)
- X is a bind of multiple vectors of covariates

Predict time to death
Given a new individual with some covariates values, I would like to predict the estimated time to death. In other words, the number of time units for which the individual will be still alive till his death.

I think I can use this:

ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')

Is that correct? Am I going to get the expected-time-to-death that I would like to have?

In theory, I could provide also the time information (the time when the individual has those covariates values), should I simply add that in the newdata:

ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA), type='response')

Is that correct? Is this going to improve the prediction? (for my data, the time already passed should be an important variable).

Any other suggestions or comments?

Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

David Winsemius

On Aug 17, 2015, at 12:10 PM, survivalUser wrote:

> Dear All,
>
> I would like to build a model, based on survival analysis on some data, that
> is able to predict the /*expected time until death*/ for a new data
> instance.

Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation  you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival.

The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page.

> Data
> For each individual in the population I have the, for each unit of time, the
> status information and several continuous covariates for that particular
> time. The data is right censored since at the end of the time interval
> analyzed, instances could be still alive and die later.
>
> Model
> I created the model using R and the survreg function:
>
> lfit <- survreg(Surv(time, status) ~ X)
>
> where:
> - time is the time vector
> - status is the status vector (0 alive, 1 death)
> - X is a bind of multiple vectors of covariates
>
> Predict time to death
> Given a new individual with some covariates values, I would like to predict
> the estimated time to death. In other words, the number of time units for
> which the individual will be still alive till his death.
>
> I think I can use this:
>
> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')

I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial?

> Is that correct? Am I going to get the expected-time-to-death that I would
> like to have?

Most people would be using `survfit` to construct survival estimates.

>
> In theory, I could provide also the time information (the time when the
> individual has those covariates values), should I simply add that in the
> newdata:
>
> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA),
> type='response')
>
> Is that correct?

This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates.


> Is this going to improve the prediction?

It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds.

> (for my data, the
> time already passed should be an important variable).
>
> Any other suggestions or comments?
>
> Thank you!
>

R-help at r-project.org

The real Rhelp mailing list  ....   not the impostor Rhelp at Nabble

-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

survivalUser
Thank you David for your answer.

Some follow-up questions:

- So, do you think that try to estimate the life expectancy would be risky and probably not justifiable? Is there some sort of 'confidence' that the model could give me for a prediction?

- type=response - I found it here:
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/predict.survreg.html

I have not tried it yet, but I was planning to use that because it says that predict the "original scale of the data".

- Yes, I think they are time-varying predictors. Would you suggest other models? (coxph?)

Overall, do you think this analysis is feasible/correct? Predicting how much time a new individual (with those covariates) will be alive till death, is a reasonable thing to predict with survival model?

Thank you again!
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

Bert Gunter-2
In reply to this post by David Winsemius
David:

I may have misunderstood you here, specifically:

"As such I would ask if you really wanted to use a parametric survival
model in the first place? "

The K-M curve is , of course, a **non-parametric** fit, and that is
why there can be no mean survival time unless the last point is a
death.

If you use the sample data to estimate a **parametric** model, then,
of course, you can estimate mean survival time (at any covariate
value) as the mean of the predicted parameter estimates (e.g. through
a link function).

I would certainly agree that the OP seems pretty confused about all
this. And apologies if I have misunderstood.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Mon, Aug 17, 2015 at 1:51 PM, David Winsemius <[hidden email]> wrote:

>
> On Aug 17, 2015, at 12:10 PM, survivalUser wrote:
>
>> Dear All,
>>
>> I would like to build a model, based on survival analysis on some data, that
>> is able to predict the /*expected time until death*/ for a new data
>> instance.
>
> Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation  you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival.
>
> The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page.
>
>> Data
>> For each individual in the population I have the, for each unit of time, the
>> status information and several continuous covariates for that particular
>> time. The data is right censored since at the end of the time interval
>> analyzed, instances could be still alive and die later.
>>
>> Model
>> I created the model using R and the survreg function:
>>
>> lfit <- survreg(Surv(time, status) ~ X)
>>
>> where:
>> - time is the time vector
>> - status is the status vector (0 alive, 1 death)
>> - X is a bind of multiple vectors of covariates
>>
>> Predict time to death
>> Given a new individual with some covariates values, I would like to predict
>> the estimated time to death. In other words, the number of time units for
>> which the individual will be still alive till his death.
>>
>> I think I can use this:
>>
>> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')
>
> I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial?
>
>> Is that correct? Am I going to get the expected-time-to-death that I would
>> like to have?
>
> Most people would be using `survfit` to construct survival estimates.
>
>>
>> In theory, I could provide also the time information (the time when the
>> individual has those covariates values), should I simply add that in the
>> newdata:
>>
>> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA),
>> type='response')
>>
>> Is that correct?
>
> This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates.
>
>
>> Is this going to improve the prediction?
>
> It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds.
>
>> (for my data, the
>> time already passed should be an important variable).
>>
>> Any other suggestions or comments?
>>
>> Thank you!
>>
>
> R-help at r-project.org
>
> The real Rhelp mailing list  ....   not the impostor Rhelp at Nabble
>
> -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> --
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

David Winsemius

On Aug 17, 2015, at 3:39 PM, Bert Gunter wrote:

> David:
>
> I may have misunderstood you here, specifically:
>
> "As such I would ask if you really wanted to use a parametric survival
> model in the first place? "
>
> The K-M curve is , of course, a **non-parametric** fit, and that is
> why there can be no mean survival time unless the last point is a
> death.
>
> If you use the sample data to estimate a **parametric** model, then,
> of course, you can estimate mean survival time (at any covariate
> value) as the mean of the predicted parameter estimates (e.g. through
> a link function).
>
> I would certainly agree that the OP seems pretty confused about all
> this. And apologies if I have misunderstood.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
>
>
> On Mon, Aug 17, 2015 at 1:51 PM, David Winsemius <[hidden email]> wrote:
>>
>> On Aug 17, 2015, at 12:10 PM, survivalUser wrote:
>>
>>> Dear All,
>>>
>>> I would like to build a model, based on survival analysis on some data, that
>>> is able to predict the /*expected time until death*/ for a new data
>>> instance.
>>
>> Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation  you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival.
>>
>> The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page.
>>
>>> Data
>>> For each individual in the population I have the, for each unit of time, the
>>> status information and several continuous covariates for that particular
>>> time. The data is right censored since at the end of the time interval
>>> analyzed, instances could be still alive and die later.
>>>
>>> Model
>>> I created the model using R and the survreg function:
>>>
>>> lfit <- survreg(Surv(time, status) ~ X)
>>>
>>> where:
>>> - time is the time vector
>>> - status is the status vector (0 alive, 1 death)
>>> - X is a bind of multiple vectors of covariates
>>>
>>> Predict time to death
>>> Given a new individual with some covariates values, I would like to predict
>>> the estimated time to death. In other words, the number of time units for
>>> which the individual will be still alive till his death.
>>>
>>> I think I can use this:
>>>
>>> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')
>>
>> I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial?
>>
>>> Is that correct? Am I going to get the expected-time-to-death that I would
>>> like to have?
>>
>> Most people would be using `survfit` to construct survival estimates.
>>
>>>
>>> In theory, I could provide also the time information (the time when the
>>> individual has those covariates values), should I simply add that in the
>>> newdata:
>>>
>>> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA),
>>> type='response')
>>>
>>> Is that correct?
>>
>> This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates.
>>
>>
>>> Is this going to improve the prediction?
>>
>> It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds.
>>
>>> (for my data, the
>>> time already passed should be an important variable).
>>>
>>> Any other suggestions or comments?
>>>
>>> Thank you!
>>>
>>
>> R-help at r-project.org
>>
>> The real Rhelp mailing list  ....   not the impostor Rhelp at Nabble
>>
>> -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

David Winsemius
In reply to this post by David Winsemius

On Aug 17, 2015, at 1:51 PM, David Winsemius wrote:

>
> On Aug 17, 2015, at 12:10 PM, survivalUser wrote:
>
>> Dear All,
>>
>> I would like to build a model, based on survival analysis on some data, that
>> is able to predict the /*expected time until death*/ for a new data
>> instance.
>
> Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation  you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival.

Dear survivalUser;

I've been reminded that you later asked for a parametric model built with survreg. The above commentary applies to the coxph models and objects and not to survreg objects. If you do have a parametric model, even with incomplete observation then calculating life expectancy should be a simple matter of plugging the parameters for the distribution's mean value, since life-expectancy is the statistical mean. So maybe you do want such a modle. The default survreg  distribution is "weibull" so just go to your mathematical statistics text and look up the formula for the mean of a Weibull distribution with the estimated parameters.

--
David.

>
> The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page.
>
>> Data
>> For each individual in the population I have the, for each unit of time, the
>> status information and several continuous covariates for that particular
>> time. The data is right censored since at the end of the time interval
>> analyzed, instances could be still alive and die later.
>>
>> Model
>> I created the model using R and the survreg function:
>>
>> lfit <- survreg(Surv(time, status) ~ X)
>>
>> where:
>> - time is the time vector
>> - status is the status vector (0 alive, 1 death)
>> - X is a bind of multiple vectors of covariates
>>
>> Predict time to death
>> Given a new individual with some covariates values, I would like to predict
>> the estimated time to death. In other words, the number of time units for
>> which the individual will be still alive till his death.
>>
>> I think I can use this:
>>
>> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')
>
> I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial?
>
>> Is that correct? Am I going to get the expected-time-to-death that I would
>> like to have?
>
> Most people would be using `survfit` to construct survival estimates.
>
>>
>> In theory, I could provide also the time information (the time when the
>> individual has those covariates values), should I simply add that in the
>> newdata:
>>
>> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA),
>> type='response')
>>
>> Is that correct?
>
> This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates.
>
>
>> Is this going to improve the prediction?
>
> It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds.
>
>> (for my data, the
>> time already passed should be an important variable).
>>
>> Any other suggestions or comments?
>>
>> Thank you!
>>
>
> R-help at r-project.org
>
> The real Rhelp mailing list  ....   not the impostor Rhelp at Nabble
>
> -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> --
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

David Winsemius
In reply to this post by Bert Gunter-2
Ooops. I meant to drop that other message but hit the send icon instead.


On Aug 17, 2015, at 3:39 PM, Bert Gunter wrote:

> David:
>
> I may have misunderstood you here, specifically:
>
> "As such I would ask if you really wanted to use a parametric survival
> model in the first place? "

>
> The K-M curve is , of course, a **non-parametric** fit, and that is
> why there can be no mean survival time unless the last point is a
> death.
>
> If you use the sample data to estimate a **parametric** model, then,
> of course, you can estimate mean survival time (at any covariate
> value) as the mean of the predicted parameter estimates (e.g. through
> a link function).

Agree. I should have thought about that. I can post a clarification since this also mean my earlier comments about getting mean and median were off-target.

Best;
David.

>
> I would certainly agree that the OP seems pretty confused about all
> this. And apologies if I have misunderstood.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
>
>
> On Mon, Aug 17, 2015 at 1:51 PM, David Winsemius <[hidden email]> wrote:
>>
>> On Aug 17, 2015, at 12:10 PM, survivalUser wrote:
>>
>>> Dear All,
>>>
>>> I would like to build a model, based on survival analysis on some data, that
>>> is able to predict the /*expected time until death*/ for a new data
>>> instance.
>>
>> Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation  you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival.
>>
>> The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page.
>>
>>> Data
>>> For each individual in the population I have the, for each unit of time, the
>>> status information and several continuous covariates for that particular
>>> time. The data is right censored since at the end of the time interval
>>> analyzed, instances could be still alive and die later.
>>>
>>> Model
>>> I created the model using R and the survreg function:
>>>
>>> lfit <- survreg(Surv(time, status) ~ X)
>>>
>>> where:
>>> - time is the time vector
>>> - status is the status vector (0 alive, 1 death)
>>> - X is a bind of multiple vectors of covariates
>>>
>>> Predict time to death
>>> Given a new individual with some covariates values, I would like to predict
>>> the estimated time to death. In other words, the number of time units for
>>> which the individual will be still alive till his death.
>>>
>>> I think I can use this:
>>>
>>> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')
>>
>> I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial?
>>
>>> Is that correct? Am I going to get the expected-time-to-death that I would
>>> like to have?
>>
>> Most people would be using `survfit` to construct survival estimates.
>>
>>>
>>> In theory, I could provide also the time information (the time when the
>>> individual has those covariates values), should I simply add that in the
>>> newdata:
>>>
>>> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA),
>>> type='response')
>>>
>>> Is that correct?
>>
>> This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates.
>>
>>
>>> Is this going to improve the prediction?
>>
>> It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds.
>>
>>> (for my data, the
>>> time already passed should be an important variable).
>>>
>>> Any other suggestions or comments?
>>>
>>> Thank you!
>>>
>>
>> R-help at r-project.org
>>
>> The real Rhelp mailing list  ....   not the impostor Rhelp at Nabble
>>
>> -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival analysis and predict time-to-death

Therneau, Terry M., Ph.D.
In reply to this post by survivalUser
I read this list a day late as a digest so my answers are rarely the first.  (Which is
nice as David W answers most of the survival questions for me!)

What you are asking is reasonable, and in fact is common practice in the realm of
industrial reliability, e.g., Meeker and Escobar, Statistical Methods for Reliability
Analysis.  Extrapolation of the survival curve to obtain the mean and percentiles of the
lifetime distribution for some device (e.g. a washing machine) is their bread and butter,
used for instance to determine the right size for an inventory of spare parts.  For most
of us on this list who do medical statistics and live in the Kaplan-Meier/ Cox model world
the ideas are uncommon.  I was lucky enough to sit through one of Bill Meeker's short
courses and retain some (minimal) memory of it.

   1. You are correct that parametric models are essential.  If the extrapolation is
substantial (30% or more censored, say), then the choice of distribution can be critical.
  If failure is due to repeated insult, e.g., the multi-hit model, then Weibull tends to
be preferred; if it is from degradation, e.g., flexing of a diaphram, then the log-normal.
  Beyond this you need more guidance than mine.

   2. The survreg routine assumes that log(y) ~ covariates + error.  For a log-normal
distribion the error is Gaussian and thus the predict(fit, type='response') will be
exp(predicted mean of log time), which is not the predicted mean time.  For Weibull the
error dist is asymmetric so things are more muddy.  Each is the MLE prediction for the
subject, just not interpretable as a mean.  To get the actual mean you need to look up the
formulas for Weibull and/or lognormal in a textbook, and map from the survreg
parameterization to whatever one the textbook uses.  The two parameterizations are never
the same.

   3. Another option is predicted quantiles.  ?predict.survreg shows how to get the entire
survival curve.  The mean can be obtained as the area under the survival curve.  Relevant
to your question, the expected time remaining for a subject still alive at time =10, say,
is  integral(S(t), from 10 to infin) / S(10), where S is the survival curve.  You can also
read off quantiles of the expected remaining life.

Terry Therneau
(author of the survival package)

On 08/18/2015 05:00 AM, [hidden email] wrote:

> Dear All,
>
> I would like to build a model, based on survival analysis on some data, that
> is able to predict the /*expected time until death*/ for a new data
> instance.
>
> Data
> For each individual in the population I have the, for each unit of time, the
> status information and several continuous covariates for that particular
> time. The data is right censored since at the end of the time interval
> analyzed, instances could be still alive and die later.
>
> Model
> I created the model using R and the survreg function:
>
> lfit <- survreg(Surv(time, status) ~ X)
>
> where:
> - time is the time vector
> - status is the status vector (0 alive, 1 death)
> - X is a bind of multiple vectors of covariates
>
> Predict time to death
> Given a new individual with some covariates values, I would like to predict
> the estimated time to death. In other words, the number of time units for
> which the individual will be still alive till his death.
>
> I think I can use this:
>
> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')
>
> Is that correct? Am I going to get the expected-time-to-death that I would
> like to have?
>
> In theory, I could provide also the time information (the time when the
> individual has those covariates values), should I simply add that in the
> newdata:
>
> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA),
> type='response')
>
> Is that correct? Is this going to improve the prediction? (for my data, the
> time already passed should be an important variable).
>
> Any other suggestions or comments?
>
> Thank you!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

Göran Broström-3
In reply to this post by David Winsemius


On 2015-08-18 01:44, David Winsemius wrote:

>
> On Aug 17, 2015, at 1:51 PM, David Winsemius wrote:
>
>>
>> On Aug 17, 2015, at 12:10 PM, survivalUser wrote:
>>
>>> Dear All,
>>>
>>> I would like to build a model, based on survival analysis on some
>>> data, that is able to predict the /*expected time until death*/
>>> for a new data instance.
>>
>> Are you sure you want to use life expectancy as the outcome? In
>> order to establish a mathematical expectation  you need to have
>> know the risk at all time in the future, which as pointed out in
>> the print.survfit help page is undefined unless the last
>> observation is a death. Very few datasets support such an estimate.
>> If on the other hand you have sufficient events in the future, then
>> you may be able to more readily justify an estimate of a median
>> survival.
>
> Dear survivalUser;
>
> I've been reminded that you later asked for a parametric model built
> with survreg. The above commentary applies to the coxph models and
> objects and not to survreg objects. If you do have a parametric
> model, even with incomplete observation then calculating life
> expectancy should be a simple matter of plugging the parameters for
> the distribution's mean value, since life-expectancy is the
> statistical mean. So maybe you do want such a modle. The default
> survreg  distribution is "weibull" so just go to your mathematical
> statistics text and look up the formula for the mean of a Weibull
> distribution with the estimated parameters.
>
No need for 'the mathematical statistics text': The necessary
information is found on the help page for the Weibull distribution: E(T)
=  b Gamma(1 + 1/a), where 'b' is scale (really!) and 'a' is shape. You
must however take into account the special parametrization that is used
by 'survreg'; see its help page for how to do it.

Alternatively, use 'aftreg' in the package 'eha' and get the same
parametrization as in base  R.

After getting the baseline expectation by the formula above, simply
multiply that value by exp(-lp) to get the expected life for an
individual with linear predictor lp.

A useful alternative is simulation (use 'rweibull') or numerical
integration, especially for estimating remaining expected life 'later in
life'. And for other distributions than the Weibull.


Göran Broström
(author of the eha package)

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival Analysis and Predict time-to-death

David Winsemius
In reply to this post by survivalUser
It depends on several factors. You need answers to all these questions:  How many events occurred, ... and was the period of observation long enough to cover a significant fraction of the life expectancy, …. and is there external evidence or theory that will help establish that this process should follow Weibull statistics?


David.

> On Aug 17, 2015, at 2:18 PM, survivalUser <[hidden email]> wrote:
>
> Thank you David for your answer.
>
> Some follow-up questions:
>
> - So, do you think that try to estimate the life expectancy would be risky
> and probably not justifiable? Is there some sort of 'confidence' that the
> model could give me for a prediction?
>
> - type=response - I found it here:
> https://stat.ethz.ch/R-manual/R-devel/library/survival/html/predict.survreg.html
>
> I have not tried it yet, but I was planning to use that because it says that
> predict the "original scale of the data".
>
> - Yes, I think they are time-varying predictors. Would you suggest other
> models? (coxph?)
>
> Overall, do you think this analysis is feasible/correct? Predicting how much
> time a new individual (with those covariates) will be alive till death, is a
> reasonable thing to predict with survival model?
>
> Thank you again!
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Survival-Analysis-and-Predict-time-to-death-tp4711198p4711207.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.