Quantcast

exponential proportional hazard model

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

exponential proportional hazard model

Daniel Gutknecht
Dear R-users,

I am looking for a function designed to handle parametric proportional hazard models with a piecewise constant baseline hazard (i.e. dummies for annual intervals) and time-dependent covariates since I'm especially interested about the effect of those covariates on the baseline hazard.
I tried survreg() but this doesn't seem to work since my data looks like this

id/ start/ stop/ censoring/ covariates  
 1/   0/     5/     0/      ...
 1/   5/     8/     1/
 2/   0/     4/     0/

and survreg(Surv(start,stop,censoring)~covariates,...) always returns 'invalid survival type' (specifying the type="right" returns "wrong number of args...").
Am I missing something or is there a function that can handle this kind of data.
Thanks
Daniel  
--

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: exponential proportional hazard model

skoval
This post has NOT been accepted by the mailing list yet.
Hi Daniel,

I came upon your question because I was also looking for how to fit a piecewise exponential model in R using the survival package. Like you, survreg() was a stumbling block because it currently does not accept Surv objects of the "counting" type.

However, there is an alternative! Because of the direct relationship between the poisson model with a log-time offset and the exponential model, we can use glm to fit a poisson model and include a factor for the time intervals that we want to allow to have separate rates. Doing this will usually involve having to expand the data set into a "counting" process form, where we have a start and stop time for each interval of the piecewise model.

Below I give an example using the heart data set, as in your original example. The interval widths are fixed units of 10. It is not too complex to create the expanded data set when there intervals are all a constant width. Having irregular intervals would take a bit more thought. In this case, the need of 10-unit width intervals is probably not necessary for the later time periods. I wouldn't expect that this would impact the estimated rates, but the standard errors will be larger than what we could get with a model with fewer interval slices.

Even though this message was posted some time ago, I hope you find this reply useful.

Stephanie Kovalchik
Research Fellow
National Cancer Institue


> # FITTING PIECEWISE EXPONENTIAL
> library(survival)
>
> myheart <- heart # LOCAL COPY OF HEART
>
> myheart[1:10,] # ALL PERSONS START AT 0
   start stop event        age      year surgery transplant id
1      0   50     1 -17.155373 0.1232033       0          0  1
2      0    6     1   3.835729 0.2546201       0          0  2
3      0    1     0   6.297057 0.2655715       0          0  3
4      1   16     1   6.297057 0.2655715       0          1  3
5      0   36     0  -7.737166 0.4900753       0          0  4
6     36   39     1  -7.737166 0.4900753       0          1  4
7      0   18     1 -27.214237 0.6078029       0          0  5
8      0    3     1   6.595483 0.7008898       0          0  6
9      0   51     0   2.869268 0.7802875       0          0  7
10    51  675     1   2.869268 0.7802875       0          1  7
>
> # PIECEWISE EXPONENTIAL IN TIME INTERVALS OF 10 UNITS
> interval.width <- 10
>
> # THE FOLLOWING FUNCTION COMPUTES ALL THE TIME BREAKS GIVEN THE EXIT TIME = STOP
> person.breaks <- function(stop) unique(c(seq(0,stop,by=interval.width),stop))
>
> # NEXT WE GET A LIST OF EACH SUBJECT'S TIME PERIODS
> the.breaks <- lapply(unique(myheart$id), function(id){
+
+ person.breaks(max(myheart$stop[myheart$id==id]))
+ })
>
> the.breaks[1:3] # FIRST THREE SUBJECT'S TIMES
[[1]]
[1]  0 10 20 30 40 50

[[2]]
[1] 0 6

[[3]]
[1]  0 10 16

>
> # NOW WE OBTAIN THE EXPANDED PERIOD OF OBSERVATION
> start <- lapply(the.breaks, function(x) x[-length(x)])  # LEFT TIME POINT
> stop <-  lapply(the.breaks, function(x) x[-1])    # RIGHT TIME POINTS
>
>
> # THE FOLLOWING ARE NEEDED TO COMPLETE THE LONG VERSION OF THE HEART DATA SET
> count.per.id <- sapply(start, length)
> index <- tapply(myheart$id, myheart$id, length)
> index <- cumsum(index) # INDEX OF LAST OBSERVATION FOR EACH PLAYER
>
> ph.ecog <- rep(myheart$ph.ecog[index],count.per.id)
> event <- rep(0,sum(count.per.id))
> event[cumsum(count.per.id)] <- myheart$event[index]
>
>
> # BRING ALL OF THIS TOGETHER TO CREATE THE EXPANDED DATASET
> pw.heart <- data.frame(
+ id = rep(myheart$id[index],count.per.id),
+ year = rep(myheart$year[index],count.per.id),
+ start = unlist(start),
+ stop = unlist(stop),
+ event = event
+ )
>
> # CREATE TIME VARIABLE WHICH INDICATES THE PERIOD OF OBSERVATION
> # THIS WILL BE THE OFFSET IN THE POISSON MODEL FIT
> pw.heart$time <- pw.heart$stop-pw.heart$start # LENGTH OF OBSERVATION
>
> # NEXT WE CREATE A FACTOR FOR EACH INTERVAL THAT WILL ALLOW US TO HAVE A DIFFERENT RATE
> # FOR EACH PERIOD
> pw.heart$interval <- factor(pw.heart$start)
>
> pw.heart[1:30,] # HERE IS WHAT THE FINAL DATA FRAME LOOKS LIKE
   id      year start stop event time interval
1   1 0.1232033     0   10     0   10        0
2   1 0.1232033    10   20     0   10       10
3   1 0.1232033    20   30     0   10       20
4   1 0.1232033    30   40     0   10       30
5   1 0.1232033    40   50     1   10       40
6   2 0.2546201     0    6     1    6        0
7   3 0.2655715     0   10     0   10        0
8   3 0.2655715    10   16     1    6       10
9   4 0.4900753     0   10     0   10        0
10  4 0.4900753    10   20     0   10       10
11  4 0.4900753    20   30     0   10       20
12  4 0.4900753    30   39     1    9       30
13  5 0.6078029     0   10     0   10        0
14  5 0.6078029    10   18     1    8       10
15  6 0.7008898     0    3     1    3        0
16  7 0.7802875     0   10     0   10        0
17  7 0.7802875    10   20     0   10       10
18  7 0.7802875    20   30     0   10       20
19  7 0.7802875    30   40     0   10       30
20  7 0.7802875    40   50     0   10       40
21  7 0.7802875    50   60     0   10       50
22  7 0.7802875    60   70     0   10       60
23  7 0.7802875    70   80     0   10       70
24  7 0.7802875    80   90     0   10       80
25  7 0.7802875    90  100     0   10       90
26  7 0.7802875   100  110     0   10      100
27  7 0.7802875   110  120     0   10      110
28  7 0.7802875   120  130     0   10      120
29  7 0.7802875   130  140     0   10      130
30  7 0.7802875   140  150     0   10      140
>
> # FIT POISSON WITH SEPARATE RATES PER INTERVAL AND YEAR EFFECT
> fit <- glm(event~offset(log(time))+interval+year, data=pw.heart, fam="poisson")
> summary(fit)

Call:
glm(formula = event ~ offset(log(time)) + interval + year, family = "poisson",
    data = pw.heart)

Deviance Residuals:
     Min        1Q    Median        3Q       Max  
-0.69397  -0.27125  -0.00004  -0.00003   3.13381  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.703e+00  3.375e-01 -10.970   <2e-16 ***
interval10   -6.614e-01  4.936e-01  -1.340   0.1802    
interval20   -9.994e-01  5.718e-01  -1.748   0.0805 .  
interval30   -3.760e-01  4.689e-01  -0.802   0.4226    
interval40   -1.099e+00  6.408e-01  -1.715   0.0863 .  
interval50   -1.021e+00  6.411e-01  -1.592   0.1113    

[TRUNCATE OUPUT HERE]

year         -1.934e-01  7.011e-02  -2.759   0.0058 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 679.28  on 3240  degrees of freedom
Residual deviance: 511.41  on 3060  degrees of freedom
AIC: 1023.4

Number of Fisher Scoring iterations: 19

>
>
> # NOW WE ESTIMATE THE SURVIVAL FOR THE FIRST SUBJECT
> predict.data <- subset(pw.heart, id==1)
> predict.data
  id      year start stop event time interval
1  1 0.1232033     0   10     0   10        0
2  1 0.1232033    10   20     0   10       10
3  1 0.1232033    20   30     0   10       20
4  1 0.1232033    30   40     0   10       30
5  1 0.1232033    40   50     1   10       40
>
> # OBTAIN THE HAZARD RATE IN EACH INTERVAL FOR PREDICTION DATASET
> lambda <- exp(predict(fit, new=predict.data))
> surv.pois <- exp(-cumsum(lambda))
>
>
>
> # COMPARE TO COX MODEL, NONPARAMETRIC HAZARD
> fit.cox <- coxph(Surv(start, stop, event)~year, data=myheart)
> summary(fit.cox)
Call:
coxph(formula = Surv(start, stop, event) ~ year, data = myheart)

  n= 172, number of events= 75

         coef exp(coef) se(coef)      z Pr(>|z|)  
year -0.19105   0.82610  0.07005 -2.727  0.00639 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

     exp(coef) exp(-coef) lower .95 upper .95
year    0.8261      1.211    0.7201    0.9477

Concordance= 0.611  (se = 0.037 )
Rsquare= 0.043   (max possible= 0.969 )
Likelihood ratio test= 7.52  on 1 df,   p=0.006087
Wald test            = 7.44  on 1 df,   p=0.006386
Score (logrank) test = 7.62  on 1 df,   p=0.005775

> surv.cox <- summary(survfit(fit.cox, new=predict.data, indi=T), time=predict.data$stop)
>
>
> cbind(surv.pois, surv.cox$surv)
  surv.pois          
1 0.7859987 0.7904685
2 0.6941400 0.6985663
3 0.6352611 0.6398573
4 0.5384496 0.5410894
5 0.4969424 0.4997662
Loading...