|
This post has NOT been accepted by the mailing list yet.
Hi Daniel,
I came upon your question because I was also looking for how to fit a piecewise exponential model in R using the survival package. Like you, survreg() was a stumbling block because it currently does not accept Surv objects of the "counting" type.
However, there is an alternative! Because of the direct relationship between the poisson model with a log-time offset and the exponential model, we can use glm to fit a poisson model and include a factor for the time intervals that we want to allow to have separate rates. Doing this will usually involve having to expand the data set into a "counting" process form, where we have a start and stop time for each interval of the piecewise model.
Below I give an example using the heart data set, as in your original example. The interval widths are fixed units of 10. It is not too complex to create the expanded data set when there intervals are all a constant width. Having irregular intervals would take a bit more thought. In this case, the need of 10-unit width intervals is probably not necessary for the later time periods. I wouldn't expect that this would impact the estimated rates, but the standard errors will be larger than what we could get with a model with fewer interval slices.
Even though this message was posted some time ago, I hope you find this reply useful.
Stephanie Kovalchik
Research Fellow
National Cancer Institue
> # FITTING PIECEWISE EXPONENTIAL
> library(survival)
>
> myheart <- heart # LOCAL COPY OF HEART
>
> myheart[1:10,] # ALL PERSONS START AT 0
start stop event age year surgery transplant id
1 0 50 1 -17.155373 0.1232033 0 0 1
2 0 6 1 3.835729 0.2546201 0 0 2
3 0 1 0 6.297057 0.2655715 0 0 3
4 1 16 1 6.297057 0.2655715 0 1 3
5 0 36 0 -7.737166 0.4900753 0 0 4
6 36 39 1 -7.737166 0.4900753 0 1 4
7 0 18 1 -27.214237 0.6078029 0 0 5
8 0 3 1 6.595483 0.7008898 0 0 6
9 0 51 0 2.869268 0.7802875 0 0 7
10 51 675 1 2.869268 0.7802875 0 1 7
>
> # PIECEWISE EXPONENTIAL IN TIME INTERVALS OF 10 UNITS
> interval.width <- 10
>
> # THE FOLLOWING FUNCTION COMPUTES ALL THE TIME BREAKS GIVEN THE EXIT TIME = STOP
> person.breaks <- function(stop) unique(c(seq(0,stop,by=interval.width),stop))
>
> # NEXT WE GET A LIST OF EACH SUBJECT'S TIME PERIODS
> the.breaks <- lapply(unique(myheart$id), function(id){
+
+ person.breaks(max(myheart$stop[myheart$id==id]))
+ })
>
> the.breaks[1:3] # FIRST THREE SUBJECT'S TIMES
[[1]]
[1] 0 10 20 30 40 50
[[2]]
[1] 0 6
[[3]]
[1] 0 10 16
>
> # NOW WE OBTAIN THE EXPANDED PERIOD OF OBSERVATION
> start <- lapply(the.breaks, function(x) x[-length(x)]) # LEFT TIME POINT
> stop <- lapply(the.breaks, function(x) x[-1]) # RIGHT TIME POINTS
>
>
> # THE FOLLOWING ARE NEEDED TO COMPLETE THE LONG VERSION OF THE HEART DATA SET
> count.per.id <- sapply(start, length)
> index <- tapply(myheart$id, myheart$id, length)
> index <- cumsum(index) # INDEX OF LAST OBSERVATION FOR EACH PLAYER
>
> ph.ecog <- rep(myheart$ph.ecog[index],count.per.id)
> event <- rep(0,sum(count.per.id))
> event[cumsum(count.per.id)] <- myheart$event[index]
>
>
> # BRING ALL OF THIS TOGETHER TO CREATE THE EXPANDED DATASET
> pw.heart <- data.frame(
+ id = rep(myheart$id[index],count.per.id),
+ year = rep(myheart$year[index],count.per.id),
+ start = unlist(start),
+ stop = unlist(stop),
+ event = event
+ )
>
> # CREATE TIME VARIABLE WHICH INDICATES THE PERIOD OF OBSERVATION
> # THIS WILL BE THE OFFSET IN THE POISSON MODEL FIT
> pw.heart$time <- pw.heart$stop-pw.heart$start # LENGTH OF OBSERVATION
>
> # NEXT WE CREATE A FACTOR FOR EACH INTERVAL THAT WILL ALLOW US TO HAVE A DIFFERENT RATE
> # FOR EACH PERIOD
> pw.heart$interval <- factor(pw.heart$start)
>
> pw.heart[1:30,] # HERE IS WHAT THE FINAL DATA FRAME LOOKS LIKE
id year start stop event time interval
1 1 0.1232033 0 10 0 10 0
2 1 0.1232033 10 20 0 10 10
3 1 0.1232033 20 30 0 10 20
4 1 0.1232033 30 40 0 10 30
5 1 0.1232033 40 50 1 10 40
6 2 0.2546201 0 6 1 6 0
7 3 0.2655715 0 10 0 10 0
8 3 0.2655715 10 16 1 6 10
9 4 0.4900753 0 10 0 10 0
10 4 0.4900753 10 20 0 10 10
11 4 0.4900753 20 30 0 10 20
12 4 0.4900753 30 39 1 9 30
13 5 0.6078029 0 10 0 10 0
14 5 0.6078029 10 18 1 8 10
15 6 0.7008898 0 3 1 3 0
16 7 0.7802875 0 10 0 10 0
17 7 0.7802875 10 20 0 10 10
18 7 0.7802875 20 30 0 10 20
19 7 0.7802875 30 40 0 10 30
20 7 0.7802875 40 50 0 10 40
21 7 0.7802875 50 60 0 10 50
22 7 0.7802875 60 70 0 10 60
23 7 0.7802875 70 80 0 10 70
24 7 0.7802875 80 90 0 10 80
25 7 0.7802875 90 100 0 10 90
26 7 0.7802875 100 110 0 10 100
27 7 0.7802875 110 120 0 10 110
28 7 0.7802875 120 130 0 10 120
29 7 0.7802875 130 140 0 10 130
30 7 0.7802875 140 150 0 10 140
>
> # FIT POISSON WITH SEPARATE RATES PER INTERVAL AND YEAR EFFECT
> fit <- glm(event~offset(log(time))+interval+year, data=pw.heart, fam="poisson")
> summary(fit)
Call:
glm(formula = event ~ offset(log(time)) + interval + year, family = "poisson",
data = pw.heart)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.69397 -0.27125 -0.00004 -0.00003 3.13381
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.703e+00 3.375e-01 -10.970 <2e-16 ***
interval10 -6.614e-01 4.936e-01 -1.340 0.1802
interval20 -9.994e-01 5.718e-01 -1.748 0.0805 .
interval30 -3.760e-01 4.689e-01 -0.802 0.4226
interval40 -1.099e+00 6.408e-01 -1.715 0.0863 .
interval50 -1.021e+00 6.411e-01 -1.592 0.1113
[TRUNCATE OUPUT HERE]
year -1.934e-01 7.011e-02 -2.759 0.0058 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 679.28 on 3240 degrees of freedom
Residual deviance: 511.41 on 3060 degrees of freedom
AIC: 1023.4
Number of Fisher Scoring iterations: 19
>
>
> # NOW WE ESTIMATE THE SURVIVAL FOR THE FIRST SUBJECT
> predict.data <- subset(pw.heart, id==1)
> predict.data
id year start stop event time interval
1 1 0.1232033 0 10 0 10 0
2 1 0.1232033 10 20 0 10 10
3 1 0.1232033 20 30 0 10 20
4 1 0.1232033 30 40 0 10 30
5 1 0.1232033 40 50 1 10 40
>
> # OBTAIN THE HAZARD RATE IN EACH INTERVAL FOR PREDICTION DATASET
> lambda <- exp(predict(fit, new=predict.data))
> surv.pois <- exp(-cumsum(lambda))
>
>
>
> # COMPARE TO COX MODEL, NONPARAMETRIC HAZARD
> fit.cox <- coxph(Surv(start, stop, event)~year, data=myheart)
> summary(fit.cox)
Call:
coxph(formula = Surv(start, stop, event) ~ year, data = myheart)
n= 172, number of events= 75
coef exp(coef) se(coef) z Pr(>|z|)
year -0.19105 0.82610 0.07005 -2.727 0.00639 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
year 0.8261 1.211 0.7201 0.9477
Concordance= 0.611 (se = 0.037 )
Rsquare= 0.043 (max possible= 0.969 )
Likelihood ratio test= 7.52 on 1 df, p=0.006087
Wald test = 7.44 on 1 df, p=0.006386
Score (logrank) test = 7.62 on 1 df, p=0.005775
> surv.cox <- summary(survfit(fit.cox, new=predict.data, indi=T), time=predict.data$stop)
>
>
> cbind(surv.pois, surv.cox$surv)
surv.pois
1 0.7859987 0.7904685
2 0.6941400 0.6985663
3 0.6352611 0.6398573
4 0.5384496 0.5410894
5 0.4969424 0.4997662
|