coxph linear.predictors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

coxph linear.predictors

stephenb
I would like to be able to construct hazard rates (or unconditional death prob) for many subjects from a given survfit.

This will involve adjusting the ( n.event/n.risk)
with (coxph object )$linear.predictors
I must be having another silly day as I cannot reproduce the linear predictor:

fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
fit$linear.predictors[1]
[1] 2.612756

coef(fit)*model.matrix(fit)[1,1]
     age
11.69021

The above is based on the help listing for coxph.object
coefficients: the coefficients of the linear predictor, which multiply
          the columns of the model matrix.  If the model is
          over-determined there will be missing values in the vector
          corresponding to the redundant columns in the model matrix.

Also, please comment whether n.event/n.risk gives the baseline hazard exp(alpha) ?

Please, help. Thanks a lot.

Stephen B


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

David Winsemius

On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:

> I would like to be able to construct hazard rates (or unconditional  
> death prob)

Hazards are not probabilities (since probabilities are constrained to  
the range [0,1] and hazards are unbounded upward.)

> for many subjects from a given survfit.
>
> This will involve adjusting the ( n.event/n.risk)
> with (coxph object )$linear.predictors
> I must be having another silly day as I cannot reproduce the linear  
> predictor:
>
> fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
> fit$linear.predictors[1]
> [1] 2.612756

That's the linear predictor (the beta*X) and that particular number  
only applies to the first case.

>
> coef(fit)*model.matrix(fit)[1,1]
>     age
> 11.69021
>

I don't know what that might be and you are not telling us what you  
think it is.


> The above is based on the help listing for coxph.object
> coefficients: the coefficients of the linear predictor, which multiply
>          the columns of the model matrix.  If the model is
>          over-determined there will be missing values in the vector
>          corresponding to the redundant columns in the model matrix.
>
> Also, please comment whether n.event/n.risk

The Nelson-Aalen estimator of the cumulative hazard as a function of  
intervals prior to t is sum( n-event(t)/ n.risk(t))

> gives the baseline hazard exp(alpha) ?

No. The "baseline hazard", as you are calling this, would be an  
estimate for persons with all covariates = 0, so in this case is for  
women of age=0. (Not a particularly interpretable result in many  
situations. The baseline hazard following treated ovarian cancer for  
neonates is not medically sensible.)

What is the purpose of this request? Is someone telling you you need  
to provide estimates for instantaneous hazards?


--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

stephenb
>>
>> coef(fit)*model.matrix(fit)[1,1]
>>     age
>> 11.69021
>>

> I don't know what that might be and you are not telling us what you  
> think it is.

I think this the calculation of the linear predictor, multiplying

> (the beta*X)

I expected that coef(fit)*model.matrix(fit)[1,1]= fit$linear.predictors[1]

And since it does not, I am lost.

Ultimately I need unconditional death probability from day 1 to day t in the end. Survfit takes too much time as it does a lot of things I do not need. I hope I should be able to get the death prob for every subject from a single call to survfit and then rescale with exp(linear.predictor) for each subject.

Just a comment: as there is not deltat argument, I think that coxph assumes deltat=1 so the hazard is the conditional probability over the next time interval ( t(i-1), t(i) ). With dense data sampled daily this is a day interval.

Your comments are appreciated.

Stephen Bond

-----Original Message-----
From: David Winsemius [mailto:[hidden email]]
Sent: Wednesday, October 27, 2010 1:15 PM
To: Bond, Stephen
Cc: [hidden email]
Subject: Re: [R] coxph linear.predictors


On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:

> I would like to be able to construct hazard rates (or unconditional  
> death prob)

Hazards are not probabilities (since probabilities are constrained to  
the range [0,1] and hazards are unbounded upward.)

> for many subjects from a given survfit.
>
> This will involve adjusting the ( n.event/n.risk)
> with (coxph object )$linear.predictors
> I must be having another silly day as I cannot reproduce the linear  
> predictor:
>
> fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
> fit$linear.predictors[1]
> [1] 2.612756

That's the linear predictor (the beta*X) and that particular number  
only applies to the first case.

>
> coef(fit)*model.matrix(fit)[1,1]
>     age
> 11.69021
>

I don't know what that might be and you are not telling us what you  
think it is.


> The above is based on the help listing for coxph.object
> coefficients: the coefficients of the linear predictor, which multiply
>          the columns of the model matrix.  If the model is
>          over-determined there will be missing values in the vector
>          corresponding to the redundant columns in the model matrix.
>
> Also, please comment whether n.event/n.risk

The Nelson-Aalen estimator of the cumulative hazard as a function of  
intervals prior to t is sum( n-event(t)/ n.risk(t))

> gives the baseline hazard exp(alpha) ?

No. The "baseline hazard", as you are calling this, would be an  
estimate for persons with all covariates = 0, so in this case is for  
women of age=0. (Not a particularly interpretable result in many  
situations. The baseline hazard following treated ovarian cancer for  
neonates is not medically sensible.)

What is the purpose of this request? Is someone telling you you need  
to provide estimates for instantaneous hazards?


--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

David Winsemius

On Oct 27, 2010, at 1:45 PM, Bond, Stephen wrote:

>>>
>>> coef(fit)*model.matrix(fit)[1,1]
>>>    age
>>> 11.69021
>>>
>
>> I don't know what that might be and you are not telling us what you
>> think it is.
>
> I think this the calculation of the linear predictor, multiplying
>
>> (the beta*X)
>
> I expected that coef(fit)*model.matrix(fit)[1,1]= fit
> $linear.predictors[1]
>
> And since it does not, I am lost.
>
> Ultimately I need unconditional death probability from day 1 to day  
> t in the end.

Now it sounds like you do want cap-lambda, the cumulative hazard.  
Let's call it Lambda

Since S(t) = exp(-Lambda(t)), you can invert to get Lambda(t) = -
log(S(t))

?survfit

Possibly:
plot(-log( survfit(fit)$surv ))

> Survfit takes too much time as it does a lot of things I do not  
> need. I hope I should be able to get the death prob for every  
> subject from a single call to survfit and then rescale with  
> exp(linear.predictor) for each subject.

I have run into slowness with that approach myself. The rms package  
provides survest and Predict methods, but if you wanted to stay in  
survival, then why not create a survfit()$surv object based on a  
vector of means and then calculate an exp(newdata) by which you would  
multiply the S(mean|t)? There is also a predict.coxph function with  
its type="risk" parameter that you might want to check.

>
> Just a comment: as there is not deltat argument, I think that coxph  
> assumes deltat=1 so the hazard is the conditional probability over  
> the next time interval ( t(i-1), t(i) ). With dense data sampled  
> daily this is a day interval.

That's not the way I understand it. The Lambda() and S() functions  
only change at the particular event times on a continuous time so they  
are not assumed to have integer values. Pretty sure that Therneau  
avoids presenting extraction methods for instantaneous hazards from  
Cox models. In his book "Modeling Survival Data" he mostly sticks to  
discussing cumulative hazard functions rather than estimating an  
interval or instantaneous hazard. And fortunately he visits rhelp so  
it's quite possible that my errors in this area will get expert review  
and correction.


>
> Your comments are appreciated.
>
> Stephen Bond
>
> -----Original Message-----
> From: David Winsemius [mailto:[hidden email]]
> Sent: Wednesday, October 27, 2010 1:15 PM
> To: Bond, Stephen
> Cc: [hidden email]
> Subject: Re: [R] coxph linear.predictors
>
>
> On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
>
>> I would like to be able to construct hazard rates (or unconditional
>> death prob)
>
> Hazards are not probabilities (since probabilities are constrained to
> the range [0,1] and hazards are unbounded upward.)
>
>> for many subjects from a given survfit.
>>
>> This will involve adjusting the ( n.event/n.risk)
>> with (coxph object )$linear.predictors
>> I must be having another silly day as I cannot reproduce the linear
>> predictor:
>>
>> fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
>> fit$linear.predictors[1]
>> [1] 2.612756
>
> That's the linear predictor (the beta*X) and that particular number
> only applies to the first case.
>
>>
>> coef(fit)*model.matrix(fit)[1,1]
>>    age
>> 11.69021
>>
>
> I don't know what that might be and you are not telling us what you
> think it is.
>
>
>> The above is based on the help listing for coxph.object
>> coefficients: the coefficients of the linear predictor, which  
>> multiply
>>         the columns of the model matrix.  If the model is
>>         over-determined there will be missing values in the vector
>>         corresponding to the redundant columns in the model matrix.
>>
>> Also, please comment whether n.event/n.risk
>
> The Nelson-Aalen estimator of the cumulative hazard as a function of
> intervals prior to t is sum( n-event(t)/ n.risk(t))
>
>> gives the baseline hazard exp(alpha) ?
>
> No. The "baseline hazard", as you are calling this, would be an
> estimate for persons with all covariates = 0, so in this case is for
> women of age=0. (Not a particularly interpretable result in many
> situations. The baseline hazard following treated ovarian cancer for
> neonates is not medically sensible.)
>
> What is the purpose of this request? Is someone telling you you need
> to provide estimates for instantaneous hazards?
>
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

stephenb
What you suggest is exactly what I intended. The big problem is that

X*beta  != linear.predictor  # it works for glm

That's a big loose end. The survival documentation on p41 implies that the above should be an equality, but it is not. And I could not produce the surv vector using a variety of formulas, so I guess this is the spirit of SAS disguising itself as an R package.

Stephen Bond

-----Original Message-----
From: David Winsemius [mailto:[hidden email]]
Sent: Wednesday, October 27, 2010 2:17 PM
To: Bond, Stephen
Cc: [hidden email]
Subject: Re: [R] coxph linear.predictors


On Oct 27, 2010, at 1:45 PM, Bond, Stephen wrote:

>>>
>>> coef(fit)*model.matrix(fit)[1,1]
>>>    age
>>> 11.69021
>>>
>
>> I don't know what that might be and you are not telling us what you
>> think it is.
>
> I think this the calculation of the linear predictor, multiplying
>
>> (the beta*X)
>
> I expected that coef(fit)*model.matrix(fit)[1,1]= fit
> $linear.predictors[1]
>
> And since it does not, I am lost.
>
> Ultimately I need unconditional death probability from day 1 to day  
> t in the end.

Now it sounds like you do want cap-lambda, the cumulative hazard.  
Let's call it Lambda

Since S(t) = exp(-Lambda(t)), you can invert to get Lambda(t) = -
log(S(t))

?survfit

Possibly:
plot(-log( survfit(fit)$surv ))

> Survfit takes too much time as it does a lot of things I do not  
> need. I hope I should be able to get the death prob for every  
> subject from a single call to survfit and then rescale with  
> exp(linear.predictor) for each subject.

I have run into slowness with that approach myself. The rms package  
provides survest and Predict methods, but if you wanted to stay in  
survival, then why not create a survfit()$surv object based on a  
vector of means and then calculate an exp(newdata) by which you would  
multiply the S(mean|t)? There is also a predict.coxph function with  
its type="risk" parameter that you might want to check.

>
> Just a comment: as there is not deltat argument, I think that coxph  
> assumes deltat=1 so the hazard is the conditional probability over  
> the next time interval ( t(i-1), t(i) ). With dense data sampled  
> daily this is a day interval.

That's not the way I understand it. The Lambda() and S() functions  
only change at the particular event times on a continuous time so they  
are not assumed to have integer values. Pretty sure that Therneau  
avoids presenting extraction methods for instantaneous hazards from  
Cox models. In his book "Modeling Survival Data" he mostly sticks to  
discussing cumulative hazard functions rather than estimating an  
interval or instantaneous hazard. And fortunately he visits rhelp so  
it's quite possible that my errors in this area will get expert review  
and correction.


>
> Your comments are appreciated.
>
> Stephen Bond
>
> -----Original Message-----
> From: David Winsemius [mailto:[hidden email]]
> Sent: Wednesday, October 27, 2010 1:15 PM
> To: Bond, Stephen
> Cc: [hidden email]
> Subject: Re: [R] coxph linear.predictors
>
>
> On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:
>
>> I would like to be able to construct hazard rates (or unconditional
>> death prob)
>
> Hazards are not probabilities (since probabilities are constrained to
> the range [0,1] and hazards are unbounded upward.)
>
>> for many subjects from a given survfit.
>>
>> This will involve adjusting the ( n.event/n.risk)
>> with (coxph object )$linear.predictors
>> I must be having another silly day as I cannot reproduce the linear
>> predictor:
>>
>> fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
>> fit$linear.predictors[1]
>> [1] 2.612756
>
> That's the linear predictor (the beta*X) and that particular number
> only applies to the first case.
>
>>
>> coef(fit)*model.matrix(fit)[1,1]
>>    age
>> 11.69021
>>
>
> I don't know what that might be and you are not telling us what you
> think it is.
>
>
>> The above is based on the help listing for coxph.object
>> coefficients: the coefficients of the linear predictor, which  
>> multiply
>>         the columns of the model matrix.  If the model is
>>         over-determined there will be missing values in the vector
>>         corresponding to the redundant columns in the model matrix.
>>
>> Also, please comment whether n.event/n.risk
>
> The Nelson-Aalen estimator of the cumulative hazard as a function of
> intervals prior to t is sum( n-event(t)/ n.risk(t))
>
>> gives the baseline hazard exp(alpha) ?
>
> No. The "baseline hazard", as you are calling this, would be an
> estimate for persons with all covariates = 0, so in this case is for
> women of age=0. (Not a particularly interpretable result in many
> situations. The baseline hazard following treated ovarian cancer for
> neonates is not medically sensible.)
>
> What is the purpose of this request? Is someone telling you you need
> to provide estimates for instantaneous hazards?
>
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

stephenb
In reply to this post by David Winsemius
I think I found the first piece of it. Typical SAS:

> fit$linear.predictors[1]-coef(fit)*(ovarian$age[1]-mean(ovarian$age))
              age
-4.4408920985e-16

Well misrepresented in several places...

Stephen Bond

-----Original Message-----
From: David Winsemius [mailto:[hidden email]]
Sent: Wednesday, October 27, 2010 1:15 PM
To: Bond, Stephen
Cc: [hidden email]
Subject: Re: [R] coxph linear.predictors


On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:

> I would like to be able to construct hazard rates (or unconditional  
> death prob)

Hazards are not probabilities (since probabilities are constrained to  
the range [0,1] and hazards are unbounded upward.)

> for many subjects from a given survfit.
>
> This will involve adjusting the ( n.event/n.risk)
> with (coxph object )$linear.predictors
> I must be having another silly day as I cannot reproduce the linear  
> predictor:
>
> fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
> fit$linear.predictors[1]
> [1] 2.612756

That's the linear predictor (the beta*X) and that particular number  
only applies to the first case.

>
> coef(fit)*model.matrix(fit)[1,1]
>     age
> 11.69021
>

I don't know what that might be and you are not telling us what you  
think it is.


> The above is based on the help listing for coxph.object
> coefficients: the coefficients of the linear predictor, which multiply
>          the columns of the model matrix.  If the model is
>          over-determined there will be missing values in the vector
>          corresponding to the redundant columns in the model matrix.
>
> Also, please comment whether n.event/n.risk

The Nelson-Aalen estimator of the cumulative hazard as a function of  
intervals prior to t is sum( n-event(t)/ n.risk(t))

> gives the baseline hazard exp(alpha) ?

No. The "baseline hazard", as you are calling this, would be an  
estimate for persons with all covariates = 0, so in this case is for  
women of age=0. (Not a particularly interpretable result in many  
situations. The baseline hazard following treated ovarian cancer for  
neonates is not medically sensible.)

What is the purpose of this request? Is someone telling you you need  
to provide estimates for instantaneous hazards?


--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

stephenb
In reply to this post by David Winsemius
To close the issue:

> S.ave <- survfit(fit) # this is average survival
> S.1 <- (S.ave$surv)^exp(fit$linear.predictors[1]) # get subject 1
> S.1
 [1] 0.848450223861993 0.696973203043377 0.530604282995790 0.391136296063732 0.228370373409774 0.132991279576665 0.071742127376216
 [8] 0.030988671457923 0.012340713882587 0.004553665545169 0.000865566322530 0.000135746409544

> survfit(fit,new=ovarian[1,])$surv   # subject 1 from survfit
 [1] 0.848450223861993 0.696973203043377 0.530604282995791 0.391136296063732 0.228370373409774 0.132991279576665 0.071742127376216
 [8] 0.030988671457923 0.012340713882587 0.004553665545169 0.000865566322530 0.000135746409544

A single call to survfit will suffice for thousands of subjects, avoiding the iterative loop calculations inside survfit.

Stephen Bond

-----Original Message-----
From: David Winsemius [mailto:[hidden email]]
Sent: Wednesday, October 27, 2010 1:15 PM
To: Bond, Stephen
Cc: [hidden email]
Subject: Re: [R] coxph linear.predictors


On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:

> I would like to be able to construct hazard rates (or unconditional  
> death prob)

Hazards are not probabilities (since probabilities are constrained to  
the range [0,1] and hazards are unbounded upward.)

> for many subjects from a given survfit.
>
> This will involve adjusting the ( n.event/n.risk)
> with (coxph object )$linear.predictors
> I must be having another silly day as I cannot reproduce the linear  
> predictor:
>
> fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian)
> fit$linear.predictors[1]
> [1] 2.612756

That's the linear predictor (the beta*X) and that particular number  
only applies to the first case.

>
> coef(fit)*model.matrix(fit)[1,1]
>     age
> 11.69021
>

I don't know what that might be and you are not telling us what you  
think it is.


> The above is based on the help listing for coxph.object
> coefficients: the coefficients of the linear predictor, which multiply
>          the columns of the model matrix.  If the model is
>          over-determined there will be missing values in the vector
>          corresponding to the redundant columns in the model matrix.
>
> Also, please comment whether n.event/n.risk

The Nelson-Aalen estimator of the cumulative hazard as a function of  
intervals prior to t is sum( n-event(t)/ n.risk(t))

> gives the baseline hazard exp(alpha) ?

No. The "baseline hazard", as you are calling this, would be an  
estimate for persons with all covariates = 0, so in this case is for  
women of age=0. (Not a particularly interpretable result in many  
situations. The baseline hazard following treated ovarian cancer for  
neonates is not medically sensible.)

What is the purpose of this request? Is someone telling you you need  
to provide estimates for instantaneous hazards?


--

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

Therneau, Terry M., Ph.D.
In reply to this post by stephenb
Gentlemen,
  I read R-news in batch mode so I'm often a day behind.  Let me try to
answer some of the questions.

 1. X*beta  != linear.predictor.  
I'm sorry if the documentation isn't all it could be.  Between the book,
tech report, and help I've written about 400 pages, but this particular
topic isn't yet in it.  The final snipe about being "opaque like SAS"
was really unfair.
The Cox model is a relative risk model, if lp is a linear predictor then
so is lp +c for any constant; they are equally good and equally valid.
The linear.predictor component in a coxph fit is (X-means) * beta.  The
computation exp(lp) occurs multiple times downstream and this keep the
exp function from overflowing when there is something like a Date object
as a predictor.  Adding this constant changes not a single downstream
calcuation.

2. Survfit is too slow.
 I'd like to hear more about this.  My work mostly involves modest data
sets so perhaps I haven't seen it.  Accuracy and maintainability have
been my first worries.

3. Baseline survival.
 Let xbase be a particular set of values for the x covariates (one for
each).  The survival curve for a given xbase is obtained from survfit
   fit <- coxph(....
   sfit <- survfit(fit, newdata=xbase)
   chaz <- -log(sfit$surv)  #cumulative hazard
(The xbase vector will need to have variable names for the function to
know which value goes to which of course).

The cumulative hazard for any other subject will be
   newhaz <- chaz * exp(fit$coef%*% (x-xbase))
There is not a simple transformation of the standard error from one fit
to another, however.  You will need to call survfit with a data frame
for newdata, which will return one curve per row with the proper values.

In my view there is no such thing as "A" baseline survival curve.  Any
xbase you chose is a baseline.  However, it is wise to choose something
near the center of the data space in order to avoid numeric problems
with the exp function above.  I would never ever chose a vector of
zeros, although some text books do -- it saves them about 8 characters
of typing in the newhaz formula above.

Terry Therneau

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: coxph linear.predictors

stephenb
Re: 1. X*beta  != linear.predictor.  

The equality is stated in three different help docs, which is misleading, especially in light of the way glm is set up. I felt like was wrestling with SAS :-)
The relative risk was the original idea behind cox regression, but it can be used for many non-relative purposes. If we want to calculate death probability in each period, then lp is no longer shift invariant.
 
Re: 2. Survfit is too slow.
It seems that the implementation follows the procedure in the original Cox paper, which calls iterative optimization for each death time.
My subjects are mortgages and both the estimation and the prediction samples are several hundred thousand. The call appears to recalculate/optimize everything even though only the $surv changes. Since each subject belongs to a single strata, most of the calculations are redundant.
I am not much of a programmer and could never figure out how to use the R profiler, so cannot be exact here, but the simple exponentiation takes no time and survfit takes several secs for each subject.
So I did:

survlong <- survfit(modlong) # a single call suffices
bl1 <- c(1,cumsum(survlong$strata)+1)
bl2 <- cumsum(survlong$strata) # get the start and end of each strata
for (jj in 1:nrow(newapp)){

  strat=as.integer(newapp[jj,"termfac"])
  surv <- survlong$surv[(b1[strat]):(b2[strat])] # extract the strata
  risk <- predict(modlong,new=newapp[jj,],type="risk")# it seems there is no
                                                      # optimization here
  newsurv <- surv^risk # we done
... rest of code
}

As a package maintainer, you have to decide whether including any of the above and below is useful or users can figure out things on their own. Or maybe survfit can be made smart and subsequent calls on the same model will use the first call to survfit?? It's your call :-)

Kind regards

Stephen B
-----Original Message-----
From: Terry Therneau [mailto:[hidden email]]
Sent: Thursday, October 28, 2010 6:39 PM
To: Bond, Stephen; David Winsemius
Cc: [hidden email]
Subject: Re: [R] coxph linear.predictors

Gentlemen,
  I read R-news in batch mode so I'm often a day behind.  Let me try to
answer some of the questions.

 1. X*beta  != linear.predictor.  
I'm sorry if the documentation isn't all it could be.  Between the book,
tech report, and help I've written about 400 pages, but this particular
topic isn't yet in it.  The final snipe about being "opaque like SAS"
was really unfair.
The Cox model is a relative risk model, if lp is a linear predictor then
so is lp +c for any constant; they are equally good and equally valid.
The linear.predictor component in a coxph fit is (X-means) * beta.  The
computation exp(lp) occurs multiple times downstream and this keep the
exp function from overflowing when there is something like a Date object
as a predictor.  Adding this constant changes not a single downstream
calcuation.

2. Survfit is too slow.
 I'd like to hear more about this.  My work mostly involves modest data
sets so perhaps I haven't seen it.  Accuracy and maintainability have
been my first worries.

3. Baseline survival.
 Let xbase be a particular set of values for the x covariates (one for
each).  The survival curve for a given xbase is obtained from survfit
   fit <- coxph(....
   sfit <- survfit(fit, newdata=xbase)
   chaz <- -log(sfit$surv)  #cumulative hazard
(The xbase vector will need to have variable names for the function to
know which value goes to which of course).

The cumulative hazard for any other subject will be
   newhaz <- chaz * exp(fit$coef%*% (x-xbase))
There is not a simple transformation of the standard error from one fit
to another, however.  You will need to call survfit with a data frame
for newdata, which will return one curve per row with the proper values.

In my view there is no such thing as "A" baseline survival curve.  Any
xbase you chose is a baseline.  However, it is wise to choose something
near the center of the data space in order to avoid numeric problems
with the exp function above.  I would never ever chose a vector of
zeros, although some text books do -- it saves them about 8 characters
of typing in the newhaz formula above.

Terry Therneau

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.