
This post has NOT been accepted by the mailing list yet.
Hi,
I've been learning about survival curves and was working through David Collett's modelling survival data book. In particular I derived the Greenwood formula for the standard error of the survival function at a particular time t, and manually calculated the survival function and 95% confidence limits for an example given in the book  page 28 (getting the book's answers).
When I came to do it in R using survfit (and setting error = "greenwood" just to be safe) I got the same survival function, the same standard errors but different upper and lower 95% confidence bounds. The one's I got using R were somewhat wider than the results in the book.
Below is the data I put into R (as a tab separated .txt file). I call the function as:
summary(survfit(Surv(time,died)~1,
data=mock_data,
error="greenwood",
conf.int=0.95))
mock_data:
time n_risk died
0 18 0
10 18 1
18 17 0
18 16 0
19 15 1
29 14 0
30 13 1
36 12 1
58 11 0
58 10 0
58 9 0
59 8 1
75 7 1
93 6 1
97 5 1
106 4 0
107 3 1
108 2 0
108 1 0
The final line of the output in R is:
time n.risk n.event survival std.err lower 95% CI upper 95% CI
107 3 1 0.249 0.1392 0.0829 0.745
whereas the answer the book and I get is:
time n.risk n.event survival std.err lower 95% CI upper 95% CI
107 3 1 0.249 0.1392 0.000 0.522
The confidence intervals in the book are calculated as:
S +/ 1.96 * se(S)
Where: se = standard error, S = the survivor function.
I'm not sure where the discrepancy could lie as we both get the same standard errors. Anyway hopefully someone understands this  and let me know if I've missed some information I should have given.
Yours
