Thank you Chris! That makes very good sense, I was just so in the weeds I

could not see it. I will mess with survreg after I am done teaching today.

Thank you for the fresh set of eyes and advice.

Mike.

On Oct 17 2012, Andrews, Chris wrote:

>Mike,

>

>My guess is that you have censored observations in the middle.

When using the minimum time, the events are happening prior to censorings.

Then the riskset is large and the curve decreases slightly.

When using the maximum time, the events are happening after the

censorings. Then the riskset is small and the curve decreases quickly.

>

For example, moving the first event from time 1 to time 5 causes the final

survival estimate to be lower when using max time (.375) than min time

(.533):

>

>library(survival)

df <- data.frame(mintime = c(1,2,3,4,6), maxtime = c(5,2,3,4,6), Delta=

c(1,0,1,0,0))

>plot(survfit(Surv(mintime,Delta)~1,data=df), conf=FALSE, xlim=c(0,7))

>lines(survfit(Surv(maxtime,Delta)~1,data=df), col=2)

>

>> summary(survfit(Surv(mintime,Delta)~1,data=df))

>Call: survfit(formula = Surv(mintime, Delta) ~ 1, data = df)

>

> time n.risk n.event survival std.err lower 95% CI upper 95% CI

> 1 5 1 0.800 0.179 0.516 1

> 3 3 1 0.533 0.248 0.214 1

>> summary(survfit(Surv(maxtime,Delta)~1,data=df))

>Call: survfit(formula = Surv(maxtime, Delta) ~ 1, data = df)

>

> time n.risk n.event survival std.err lower 95% CI upper 95% CI

> 3 4 1 0.750 0.217 0.4259 1

> 5 2 1 0.375 0.286 0.0839 1

>

Given that you have interval censored data, you can consider fitting the

survival curve with interval censoring techniques. For example survreg fits

a parametric curve.

>

>Chris

>

>-----Original Message-----

>From: Michael Rentz [mailto:

[hidden email]]

>Sent: Tuesday, October 16, 2012 12:36 PM

>To:

[hidden email]
>Subject: [R] R Kaplan-Meier plotting quirks?

>

Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to

include enough detail.

>

I have a question about survival curves I have been battling off and on

for a few months. No one local seems to be able to help, so I turn here.

The issue seems to either be how R calculates Kaplan-Meier Plots, or

something with the underlying statistic itself that I am misunderstanding.

Basically, longer survival times are yielding steeper drops in survival

than a set of shorter survival times but with the same number of loss and

retention events.

>

As a minor part of my research I have been comparing tag survival in

marked wild rodents. I am comparing a standard ear tag with a relatively

new technique. The newer tag clearly “wins” using survival tests, but

the resultant Kaplan-Meier plot does not seem to make sense. Since I am

dealing with a wild animal and only trapped a few days out of a month the

data is fairly messy, with gaps in capture history that require assumptions

of tag survival. An animal that is tagged and recaptured 2 days later with

a tag and 30 days later without one could have an assumed tag retention of

2 days (minimum confirmed) or 30 days (maximum possible).

>

Both are significant with a survtest, but the K-M plots differ. A plot of

minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days)

yields a curve with a steep initial drop in “survival”, but then a

leveling off and straight line thereafter at about 80% survival. Plotting

the maximum possible dates (same number of losses/retention, but retention

times are longer, the length to the next capture without a tag, typically

25-30 days or more) does not show as steep of a drop in the first few

days, but at about the point the minimum estimate levels off this one

begins dropping steeply. 400 days out the plot with minimum possible

estimates has tag survival of about 80%, whereas the plot with the same

loss rate but longer assumed survival times shows only a 20% assumed

survival at 400 days. Complicating this of course is the fact that the

great majority of the animals die before the tag is lost, survival of the

rodents is on the order of months.

>

I really am not sure what is going on, unless somehow the high number of

events in the first few days followed by few events thereafter leads to the

assumption that after the initial few days survival of the tag is high. The

plotting of maximum lengths has a more even distribution of events, rather

than a clumping in the first few days, so I guess the model assumes

relatively constant hazards? As an aside, a plot of the mean between the

minimum and maximum almost mirrors the maximum plot. Adding five days to

the minimum when the minimum plus 5 is less than the maximum returns a plot

with a steeper initial drop, but then constant thereafter, mimicking the

minimum plot, but at a lower final survival rate.

>

Basically, I am at a loss why surviving longer would *decrease* the

survival rate???

>

My co-author wants to drop the K-M graph given the confusion, but I think

it would be odd to publish a survival paper without one. I am not sure

which graph to use? They say very different things, while the actual

statistics do not differ that greatly.

>

I am more than happy to provide the data and code for anyone who would

like to help if the above is not explanation enough. Thank you in advance.

>

>Mike.

>

>

>--

>Michael S. Rentz

>PhD Candidate, Conservation Biology

>University of Minnesota

>5122 Idlewild Street

>Duluth, MN 55804

>(218) 525-3299

>

[hidden email]
>

>

>**********************************************************

Electronic Mail is not secure, may not be read every day, and should not

be used for urgent or sensitive issues

>

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.