Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

jCeradini
Hi,

I'm running conditional logistic regression with survival::clogit. I have
"1-1 case-control" data, i.e., there is 1 case and 1 control in each strata.

Model:
fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
data = dat)
Where resp is 1's and 0's, and x1 and x2 are both continuous.

Predictors are both significant. A snippet of summary(fit):
Concordance= 0.763  (se = 0.5 )
Rsquare= 0.304   (max possible= 0.5 )
Likelihood ratio test= 27.54  on 2 df,   p=1.047e-06
Wald test            = 17.19  on 2 df,   p=0.0001853
Score (logrank) test = 17.43  on 2 df,   p=0.0001644,   Robust = 6.66
 p=0.03574

The concordance estimate seems good but the SE is HUGE.

I get a very different estimate from the survConcordance function, which I
know says computes concordance for a "single continuous covariate", but it
runs on my model with 2 continuous covariates....

survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat)
n= 76
Concordance= 0.9106648 se= 0.09365047
concordant  discordant   tied.risk   tied.time    std(c-d)
 1315.0000   129.0000     0.0000   703.0000   270.4626

Are both of these concordance estimates valid but providing different
information?
Is one more appropriate for measuring "performance" (in the AUC sense) of
conditional logistic models?
Is it possible that the HUGE SE estimate represents a convergence problem
(no warnings were thrown when fit the model), or is this model just useless?

Thanks!
--
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
[hidden email] / 914.707.8506
wyocoopunit.org

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

Andrews, Chris
I only get the digest, sorry if this has already been answered.

When I run your code (after creating some data) I get a warning that "weights are ignored in clogit".  This is a result of miscalling the clogit function.  The first 2 commas should be +s.

library(survival)
nn <- 1000
dat <- data.frame(resp = rbinom(nn, 1, 0.5), x1=rnorm(nn), x2=rnorm(nn), ID = rep(seq(nn/2), e=2), site = rep(seq(nn/10), e=10))
fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron", data = dat) # warning
fit <- clogit(resp ~ x1 + x2 + strata(ID) + cluster(site), method ="efron", data = dat) # no warning
summary(fit)

Chris

-----Original Message-----
From: Joe Ceradini [mailto:[hidden email]]
Sent: Tuesday, January 19, 2016 12:48 PM
To: [hidden email]
Subject: [R] Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

Hi,

I'm running conditional logistic regression with survival::clogit. I have
"1-1 case-control" data, i.e., there is 1 case and 1 control in each strata.

Model:
fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
data = dat)
Where resp is 1's and 0's, and x1 and x2 are both continuous.

Predictors are both significant. A snippet of summary(fit):
Concordance= 0.763  (se = 0.5 )
Rsquare= 0.304   (max possible= 0.5 )
Likelihood ratio test= 27.54  on 2 df,   p=1.047e-06
Wald test            = 17.19  on 2 df,   p=0.0001853
Score (logrank) test = 17.43  on 2 df,   p=0.0001644,   Robust = 6.66
 p=0.03574

The concordance estimate seems good but the SE is HUGE.

I get a very different estimate from the survConcordance function, which I
know says computes concordance for a "single continuous covariate", but it
runs on my model with 2 continuous covariates....

survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat)
n= 76
Concordance= 0.9106648 se= 0.09365047
concordant  discordant   tied.risk   tied.time    std(c-d)
 1315.0000   129.0000     0.0000   703.0000   270.4626

Are both of these concordance estimates valid but providing different
information?
Is one more appropriate for measuring "performance" (in the AUC sense) of
conditional logistic models?
Is it possible that the HUGE SE estimate represents a convergence problem
(no warnings were thrown when fit the model), or is this model just useless?

Thanks!
--
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
[hidden email] / 914.707.8506
wyocoopunit.org

        [[alternative HTML version deleted]]


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

jCeradini
Thanks for pointing that out, Chris. That was a thoughtless typo on my part
when I was simplifying my model for the sake of posting.

I've run a whole set of models without any problems/warning. My main
question is regarding the difference between the concordance estimate that
summary(fit) reports and the concordance estimated with survConcordance,
particularly in relation to estimating clogit model performance. Also,
whether or not I should be concerned about the giant SE estimate I get for
concordance from summary(fit). This is within the context of a 1:1
case-control study (1 case and 1 control per strata).

Corrected model:
fit <- clogit(resp ~ x1 + x2 + strata(ID) + cluster(site), method ="efron",
data = dat)
Where resp is 1's and 0's, and x1 and x2 are both continuous.

The rest of the code and output details should be in my original post.

Thanks.
Joe

On Wed, Jan 20, 2016 at 6:11 AM, Andrews, Chris <[hidden email]>
wrote:

> I only get the digest, sorry if this has already been answered.
>
> When I run your code (after creating some data) I get a warning that
> "weights are ignored in clogit".  This is a result of miscalling the clogit
> function.  The first 2 commas should be +s.
>
> library(survival)
> nn <- 1000
> dat <- data.frame(resp = rbinom(nn, 1, 0.5), x1=rnorm(nn), x2=rnorm(nn),
> ID = rep(seq(nn/2), e=2), site = rep(seq(nn/10), e=10))
> fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
> data = dat) # warning
> fit <- clogit(resp ~ x1 + x2 + strata(ID) + cluster(site), method
> ="efron", data = dat) # no warning
> summary(fit)
>
> Chris
>
> -----Original Message-----
> From: Joe Ceradini [mailto:[hidden email]]
> Sent: Tuesday, January 19, 2016 12:48 PM
> To: [hidden email]
> Subject: [R] Survival::coxph (clogit), survConcordance vs. summary(fit)
> concordance
>
> Hi,
>
> I'm running conditional logistic regression with survival::clogit. I have
> "1-1 case-control" data, i.e., there is 1 case and 1 control in each
> strata.
>
> Model:
> fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
> data = dat)
> Where resp is 1's and 0's, and x1 and x2 are both continuous.
>
> Predictors are both significant. A snippet of summary(fit):
> Concordance= 0.763  (se = 0.5 )
> Rsquare= 0.304   (max possible= 0.5 )
> Likelihood ratio test= 27.54  on 2 df,   p=1.047e-06
> Wald test            = 17.19  on 2 df,   p=0.0001853
> Score (logrank) test = 17.43  on 2 df,   p=0.0001644,   Robust = 6.66
>  p=0.03574
>
> The concordance estimate seems good but the SE is HUGE.
>
> I get a very different estimate from the survConcordance function, which I
> know says computes concordance for a "single continuous covariate", but it
> runs on my model with 2 continuous covariates....
>
> survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat)
> n= 76
> Concordance= 0.9106648 se= 0.09365047
> concordant  discordant   tied.risk   tied.time    std(c-d)
>  1315.0000   129.0000     0.0000   703.0000   270.4626
>
> Are both of these concordance estimates valid but providing different
> information?
> Is one more appropriate for measuring "performance" (in the AUC sense) of
> conditional logistic models?
> Is it possible that the HUGE SE estimate represents a convergence problem
> (no warnings were thrown when fit the model), or is this model just
> useless?
>
> Thanks!
> --
> Cooperative Fish and Wildlife Research Unit
> Zoology and Physiology Dept.
> University of Wyoming
> [hidden email] / 914.707.8506
> wyocoopunit.org
>
>         [[alternative HTML version deleted]]
>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not
> be used for urgent or sensitive issues
>



--
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
[hidden email] / 914.707.8506
wyocoopunit.org

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

Therneau, Terry M., Ph.D.
In reply to this post by jCeradini
I read the digest form which puts me behind, plus the last 2 days have been solid meetings
with an external advisory group so I missed the initial query.   Three responses.

1. The clogit routine sets the data up properly and then calls a stratified Cox model.  If
you want the survConcordance routine to give the same answer, it also needs to know about
the strata
     survConcordance (Surv(rep(1, 76L), resp) ~ predict(fit) + strata(ID), data=dat)
I'm not surprised that you get a very different answer with/without strata.

2. I've never thought of using a robust variance for the matched case/control model.  I'm
having a hard time wrapping my head around what you would expect that to accomplish
(statistically).  Subjects are already matched on someone from the same site, so where
does a per-site effect creep in?  Assuming there is a good reason and I just don't see it
(not an unwarranted assumption), I'm not aware of any work on what an appropriate variance
would be for the concordance in that case.

3. I need to think about the large variance issue.

Terry Therneau


On 01/20/2016 08:09 PM, [hidden email] wrote:

> Hi,
>
> I'm running conditional logistic regression with survival::clogit. I have
> "1-1 case-control" data, i.e., there is 1 case and 1 control in each strata.
>
> Model:
> fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
> data = dat)
> Where resp is 1's and 0's, and x1 and x2 are both continuous.
>
> Predictors are both significant. A snippet of summary(fit):
> Concordance= 0.763  (se = 0.5 )
> Rsquare= 0.304   (max possible= 0.5 )
> Likelihood ratio test= 27.54  on 2 df,   p=1.047e-06
> Wald test            = 17.19  on 2 df,   p=0.0001853
> Score (logrank) test = 17.43  on 2 df,   p=0.0001644,   Robust = 6.66
>   p=0.03574
>
> The concordance estimate seems good but the SE is HUGE.
>
> I get a very different estimate from the survConcordance function, which I
> know says computes concordance for a "single continuous covariate", but it
> runs on my model with 2 continuous covariates....
>
> survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat)
> n= 76
> Concordance= 0.9106648 se= 0.09365047
> concordant  discordant   tied.risk   tied.time    std(c-d)
>   1315.0000   129.0000     0.0000   703.0000   270.4626
>
> Are both of these concordance estimates valid but providing different
> information?
> Is one more appropriate for measuring "performance" (in the AUC sense) of
> conditional logistic models?
> Is it possible that the HUGE SE estimate represents a convergence problem
> (no warnings were thrown when fit the model), or is this model just useless?
>
> Thanks!

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survival::coxph (clogit), survConcordance vs. summary(fit) concordance

jCeradini
Thanks Terry!

I thought that since I was providing survConcordance with the model object
that the same formula would be applied. But I was obviously wrong. I just
ran survConcordance with the addition of the strata argument, as you
suggested, and got the same answer as summary(fit)....with the same scary
SE.

This is a wildlife habitat selection analysis. Each individual animal has
habitat features that they used (1) and habitat that was available but that
they did not use (0). The habitat that is available is different for each
individual, hence the need for strata(ID of individual). However, all the
habitat data are collected from multiple discrete sites and each site has
multiple individuals on it. For all these analyses of these data, I've
assumed that individuals within a site may be more correlated than
individuals between sites, hence addition of cluster(site).

I was able recalculate the same concordance estimate as summary(fit) by
estimating predicted probabilities using:
risk <- predict(fit, type='risk')
risk / (1+risk)
And then used a probability cut-off of 0.5 for whether an observed point
was correctly classified, which returned the same 0.76 as the concordance
estimate.
So, can I just think of this concordance as a classification table (or
confusion matrix) with a 0.5 threshold (thus classification error would be
(1 - 0.76)?
Was I mistaken in thinking concordance was more akin to AUC in
unconditional logistic regression?

Thanks.
Joe


On Thu, Jan 21, 2016 at 8:01 AM, Therneau, Terry M., Ph.D. <
[hidden email]> wrote:

> I read the digest form which puts me behind, plus the last 2 days have
> been solid meetings with an external advisory group so I missed the initial
> query.   Three responses.
>
> 1. The clogit routine sets the data up properly and then calls a
> stratified Cox model.  If you want the survConcordance routine to give the
> same answer, it also needs to know about the strata
>     survConcordance (Surv(rep(1, 76L), resp) ~ predict(fit) + strata(ID),
> data=dat)
> I'm not surprised that you get a very different answer with/without strata.
>
> 2. I've never thought of using a robust variance for the matched
> case/control model.  I'm having a hard time wrapping my head around what
> you would expect that to accomplish (statistically).  Subjects are already
> matched on someone from the same site, so where does a per-site effect
> creep in?  Assuming there is a good reason and I just don't see it (not an
> unwarranted assumption), I'm not aware of any work on what an appropriate
> variance would be for the concordance in that case.
>
> 3. I need to think about the large variance issue.
>
> Terry Therneau
>
>
>
> On 01/20/2016 08:09 PM, [hidden email] wrote:
>
>> Hi,
>>
>> I'm running conditional logistic regression with survival::clogit. I have
>> "1-1 case-control" data, i.e., there is 1 case and 1 control in each
>> strata.
>>
>> Model:
>> fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron",
>> data = dat)
>> Where resp is 1's and 0's, and x1 and x2 are both continuous.
>>
>> Predictors are both significant. A snippet of summary(fit):
>> Concordance= 0.763  (se = 0.5 )
>> Rsquare= 0.304   (max possible= 0.5 )
>> Likelihood ratio test= 27.54  on 2 df,   p=1.047e-06
>> Wald test            = 17.19  on 2 df,   p=0.0001853
>> Score (logrank) test = 17.43  on 2 df,   p=0.0001644,   Robust = 6.66
>>   p=0.03574
>>
>> The concordance estimate seems good but the SE is HUGE.
>>
>> I get a very different estimate from the survConcordance function, which I
>> know says computes concordance for a "single continuous covariate", but it
>> runs on my model with 2 continuous covariates....
>>
>> survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat)
>> n= 76
>> Concordance= 0.9106648 se= 0.09365047
>> concordant  discordant   tied.risk   tied.time    std(c-d)
>>   1315.0000   129.0000     0.0000   703.0000   270.4626
>>
>> Are both of these concordance estimates valid but providing different
>> information?
>> Is one more appropriate for measuring "performance" (in the AUC sense) of
>> conditional logistic models?
>> Is it possible that the HUGE SE estimate represents a convergence problem
>> (no warnings were thrown when fit the model), or is this model just
>> useless?
>>
>> Thanks!
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Cooperative Fish and Wildlife Research Unit
Zoology and Physiology Dept.
University of Wyoming
[hidden email] / 914.707.8506
wyocoopunit.org

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.