Quantcast

differences between survival models between STATA and R

classic Classic list List threaded Threaded
9 messages Options
JPF
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

differences between survival models between STATA and R

JPF
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

Joshua Wiley-2
Hi J,

You have not provided nearly enough information for us to evaluate
whether the results should be similar.  You are talking about two
completely different packages, with no information on your data, only
a small amount of information about your model ("the same analysis")
but clearly the analysis is *not* the same or you would get the same
results, so really you have "I think I am running the same analysis
but either the underlying code/model, the data, or my syntax is
actually running a different model and what I really want is someone
to help me synchronize the results from R and Stata so I feel more
confident, but please do this without data, code, or output".

Please do not take this the wrong way, I am not trying to be harsh,
just point out the difficulty of the question you have asked us.  I
primarily use R, but I work at a consulting center and have access to
Stata and some interest in survival models, so if you sent us your
data (or found/created some example data that replicated the
discrepancy you note), I would try to check out both your R and Stata
code, but I can only do this with some help from you.  Otherwise, I
have to find my own data, examine the functions in R and Stata to
compare what they are doing, and try many tests on my own to hopefully
try to replicate what you might be seeing and then see if I can
actually get the same output.

Perhaps a more saintly version of myself would do that, but unlike my
officemate, there is no special place waiting for me in heaven, or
perhaps I have yet to find my wings.  More detailed help provided when
you provide a reproducible example as the posting guide requests.

Cheers,

Josh


On Fri, Jul 6, 2012 at 2:12 PM, JPF <[hidden email]> wrote:

> Dear Community,
>
> I have been using two types of survival programs to analyse a data set.
>
> The first one is an R function called aftreg. The second one an STATA
> function called streg.
>
> Both of them include the same analyisis with a weibull distribution. Yet,
> results are very different.
>
> Shouldn't the results be the same?
>
> Kind regards,
> J
>
> --
> View this message in context: http://r.789695.n4.nabble.com/differences-between-survival-models-between-STATA-and-R-tp4635670.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

it's this bit right here I am referring to:

> and provide commented, minimal, self-contained, reproducible code.
                 -------------------------->                      ^^^^^^^^^^^^^


--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

Terry Therneau-2
In reply to this post by JPF
Without more information, we can only guess what you did, or what you
are seeing on the page that is "different".

I'll make a random guess though.  There are about 5 ways to paramaterize
the Weibull distribution.  The standard packages that I know, however,
tend to use the one found in the Kalbfleisch and Prentice book The
Statistical Analysis of Failure time Data.  This includes the survreg
funciton in R and lifereg in SAS, and likely stata tthought I don't know
that package.  The aftreg function in the eha package uses something
different.

About 1/2 the weibull questions I see are due to a change in parameters.

Terry T.

---- begin included message -----



Dear Community,

I have been using two types of survival programs to analyse a data set.

The first one is an R function called aftreg. The second one an STATA
function called streg.

Both of them include the same analyisis with a weibull distribution. Yet,
results are very different.

Shouldn't the results be the same?

Kind regards,
J

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
JPF
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

JPF
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

Robert Baer
On 7/9/2012 9:17 AM, Javier Palacios Fenech wrote:
> Please.
After, Terry's response I guess I was expecting to hear how your
comparison between R and STATA went when you used the R function,
survreg() for your analysis.

We still don't know what your data look like.  The posting guide asks
for a "reproducible example".  This typically means including at least a
"toy dataset" if not the actual data you are using.

To learn more:
library(survival)
?survreg

>
> find an example here. With exactly the same data set, I run two hazard
> models following the instructions for each function.
>
> aftreg(formula = Surv(sta, sto, S) ~ a + b + c + d +   e + f + g
> , factor(F), data = data.frame(SURV),
>      dist = "weibull", id = ID)
>
> streg  f1 f2 f3 f4 f5 a b c d g f g,  dist(weibull)   time nolog    (note:
> F= f1, f2,f3,f4,f5)
>
> Results are different. Really different. With aftreg some estimates are
> significant, and with STATA they are not. Many estimates do not even have
> the same sign, therefore predicting contrary effects. Which model should I
> trust?
>
> Best,
>
> J
>
>
>
>
>
> On Mon, Jul 9, 2012 at 9:59 AM, Terry Therneau <[hidden email]> wrote:
>
>> Without more information, we can only guess what you did, or what you are
>> seeing on the page that is "different".
>>
>> I'll make a random guess though.  There are about 5 ways to paramaterize
>> the Weibull distribution.  The standard packages that I know, however, tend
>> to use the one found in the Kalbfleisch and Prentice book The Statistical
>> Analysis of Failure time Data.  This includes the survreg funciton in R and
>> lifereg in SAS, and likely stata tthought I don't know that package.  The
>> aftreg function in the eha package uses something different.
>>
>> About 1/2 the weibull questions I see are due to a change in parameters.
>>
>> Terry T.
>>
>> ---- begin included message -----
>>
>>
>>
>>
>> Dear Community,
>>
>> I have been using two types of survival programs to analyse a data set.
>>
>> The first one is an R function called aftreg. The second one an STATA
>> function called streg.
>>
>> Both of them include the same analyisis with a weibull distribution. Yet,
>> results are very different.
>>
>> Shouldn't the results be the same?
>>
>> Kind regards,
>> J
>>
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
JPF
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

JPF
This post was updated on .
In reply to this post by JPF
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

jthetzel
In reply to this post by JPF
UCLA's Advanced Technical Services' Statistical Computing website often has very good resources for comparing analyses between R, Stata, and SAS ( http://www.ats.ucla.edu/stat ). For accelerated failure time models, I believe that it has some examples for Stata ( http://www.ats.ucla.edu/stat/stata/examples/asa/asastata8.htm ), but not for R.  However, the Stata examples, to the best of my knowledge, use Hosmer and Lemeshow's HMO HIV data from their 1998 Applied Survival Analysis text, which happens to be available at http://www.ats.ucla.edu/stat/R/examples/asa/hmohiv.csv .  Thus, you can try reproducing UCLA's Stata AFT examples in R, and post the results here as a reproducible example to illustrate what is confusing you.

For example:
## Require eha package for aftreg function
require("eha")

## Hosmer and Lemeshow's HMO HIV data
hmohiv <- read.csv("http://www.ats.ucla.edu/stat/R/examples/asa/hmohiv.csv") 

## Example model
aft1 <- aftreg(Surv(time, censor) ~ drug, data = hmohiv, dist = "weibull", shape = 1)
summary(aft1)

Yields:
> summary(aft1)
Call:
aftreg(formula = Surv(time, censor) ~ drug, data = hmohiv, dist = "weibull", 
    shape = 1)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
drug                0.239     1.056     2.874     0.224     0.000 

log(scale)                    3.024    20.571     0.112     0.000 

Which you can compare to the following from the UCLA website:
streg drug, dist(exp) time

Exponential regression -- accelerated failure-time form 

No. of subjects =          100                     Number of obs   =       100
No. of failures =           80
Time at risk    =         1136
                                                   LR chi2(1)      =     20.93
Log likelihood  =   -146.79209                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |  -1.055687   .2238868    -4.72   0.000    -1.494497   -.6168771
       _cons |   3.023903   .1543033    19.60   0.000     2.721474    3.326332
------------------------------------------------------------------------------

And then describe to r-help which discrepancies between the R and Stata output are confusing you.

Regards,
Jeremy

JPF wrote
Dear Community,

I have been using two types of survival programs to analyse a data set.

The first one is an R function called aftreg. The second one an STATA function called streg.

Both of them include the same analyisis with a weibull distribution. Yet, results are very different.

Shouldn't the results be the same?

Kind regards,
J
Jeremy T. Hetzel
Boston University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

jthetzel
In reply to this post by JPF
Be sure you reply or forward your message to me to the r-help listhost.  I might not have time to review it tonight, while others might.

Jeremy

On Mon, Jul 9, 2012 at 2:38 PM, JPF [via R] <[hidden email]> wrote:
Please, find an example with a data set here.

The data sets are prepared to be read with R (dataR) and STATA (dataSTATA)
directly. The only difference is that STATA treats blanks as NAs.

*in R, by using aftreg:*

dataR<-read.table("clipboard",h=T)
> su<-aftreg(Surv(sta,sto,S) ~
 a+b+c+d+g+h+j+k+l+m+factor(pcont),dist="weibull",
data=data.frame(dataR),id=ID)
> summary(su)

Call:
aftreg(formula = Surv(sta, sto, S) ~ a + b + c + d + g + h +
    j + k + l + m + factor(pcont), data = data.frame(thern),
    dist = "weibull", id = ID)

Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
a                  39.416     0.001     1.001     0.002     0.598
b                  55.879     0.002     1.002     0.001     0.002
c                  49.554    -0.001     0.999     0.001     0.251
d                  51.266     0.000     1.000     0.000     0.758
g                  14.701    -0.006     0.994     0.002     0.011
h                  32.358     0.005     1.005     0.001     0.000
j                  51.768    -0.022     0.978     0.001     0.000
k                  53.832    -0.004     0.996     0.001     0.000
l                  18.851     0.016     1.017     0.001     0.000
m                   0.809     0.907     2.478     0.132     0.000
factor(pcont)
               1    0.007     0         1           (reference)
               2    0.272    -0.051     0.950     0.158     0.745
               3    0.201     0.653     1.922     0.155     0.000
               4    0.253     0.181     1.198     0.156     0.246
               5    0.267     0.691     1.996     0.162     0.000

log(scale)                    3.369    29.038     0.225     0.000
log(shape)                    1.925     6.858     0.055     0.000

Events                    190
Total time at risk          3129
Max. log. likelihood      -298.47
LR test statistic         681
Degrees of freedom        14
Overall p-value           0



*in STATA, by using streg:*

 insheet using "dataSTATA.txt"
(18 vars, 4862 obs)

.
. stset ntime, failure(s) id(id)
                id:  id
     failure event:  s != 0 & s < .
obs. time interval:  (ntime[_n-1], ntime]
 exit on or before:  failure
------------------------------------------------------------------------------
     4862  total obs.
        0  exclusions
------------------------------------------------------------------------------
     4862  obs. remaining, representing
      351  subjects
      283  failures in single failure-per-subject data
     4862  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        30

.
. gen f1 = 0
. replace f1 = 1 if pcont==1
(520 real changes made)

.
. gen f2 = 0
. replace f2= 1 if pcont==2
(1267 real changes made)


. gen f3 = 0
. replace f3= 1 if pcont==3
(771 real changes made)


. gen f4 = 0
. replace f4= 1 if pcont==4
(960 real changes made)


. gen f5 = 0
. replace f5= 1 if pcont==5
(1344 real changes made)


. streg  f1 f2 f3 f4 f5 a b c d g h j k l m, dist(weibull)  time nolog

         failure _d:  s
   analysis time _t:  ntime
                 id:  id
note: f5 dropped due to collinearity

Weibull regression -- accelerated failure-time form

No. of subjects =          228                     Number of obs   =
 3129
No. of failures =          190
Time at risk    =         3129
                                                   LR chi2(14)     =
 226.94
Log likelihood  =   -47.541886                     Prob > chi2     =
 0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
          f1 |   .7194758   .2271213     3.17   0.002     .2743262
 1.164625
          f2 |   .4011278   .0604573     6.63   0.000     .2826336
.519622
          f3 |   .0142088   .0676573     0.21   0.834    -.1183972
 .1468148
          f4 |   .2984225   .0814102     3.67   0.000     .1388614
 .4579835
           a |   .0030444   .0024399     1.25   0.212    -.0017377
 .0078265
           b |  -.0008451   .0009987    -0.85   0.397    -.0028026
 .0011124
           c |   .0015207   .0009827     1.55   0.122    -.0004055
 .0034468
           d |   .0005143   .0007139     0.72   0.471    -.0008848
 .0019135
           g |   .0024349   .0025024     0.97   0.331    -.0024698
 .0073395
           h |  -.0024076   .0012125    -1.99   0.047    -.0047842
-.0000311
           j |    .013318   .0012565    10.60   0.000     .0108553
 .0157806
           k |    .002104   .0010438     2.02   0.044     .0000581
 .0041498
           l |  -.0017612   .0010071    -1.75   0.080     -.003735
 .0002127
           m |  -.0694492    .090727    -0.77   0.444    -.2472708
 .1083724
       _cons |   1.812124   .2000892     9.06   0.000     1.419956
 2.204291
-------------+----------------------------------------------------------------
       /ln_p |   1.536971   .0756138    20.33   0.000      1.38877
 1.685171
-------------+----------------------------------------------------------------
           p |   4.650481   .3516405                      4.009916
 5.393373
         1/p |   .2150315   .0162593                      .1854127
 .2493818
------------------------------------------------------------------------------

I do an accelerated failure time model with both programs. The problem of
"survreg" is that it cannot handle time-dependent covariates:
https://stat.ethz.ch/pipermail/r-help/2010-July/247216.html



Thanks,



On Mon, Jul 9, 2012 at 10:19 AM, Terry Therneau <[hidden email]> wrote:

>  No input data, no output listed......   my crytal ball is too cloudy to
> answer this.
>
> Terry T
>
>
> On 07/09/2012 09:17 AM, Javier Palacios Fenech wrote:
>
> Please.
>
>  find an example here. With exactly the same data set, I run two hazard
> models following the instructions for each function.
>
>  aftreg(formula = Surv(sta, sto, S) ~ a + b + c + d +   e + f + g
> , factor(F), data = data.frame(SURV),
>     dist = "weibull", id = ID)
>
>  streg  f1 f2 f3 f4 f5 a b c d g f g,  dist(weibull)   time nolog
>  (note: F= f1, f2,f3,f4,f5)
>
>  Results are different. Really different. With aftreg some estimates are
> significant, and with STATA they are not. Many estimates do not even have
> the same sign, therefore predicting contrary effects. Which model should I
> trust?
>
>  Best,
>
>  J
>
>
>
>
>
> On Mon, Jul 9, 2012 at 9:59 AM, Terry Therneau <[hidden email]> wrote:
>
>> Without more information, we can only guess what you did, or what you are
>> seeing on the page that is "different".
>>
>> I'll make a random guess though.  There are about 5 ways to paramaterize
>> the Weibull distribution.  The standard packages that I know, however, tend
>> to use the one found in the Kalbfleisch and Prentice book The Statistical
>> Analysis of Failure time Data.  This includes the survreg funciton in R and
>> lifereg in SAS, and likely stata tthought I don't know that package.  The
>> aftreg function in the eha package uses something different.
>>
>> About 1/2 the weibull questions I see are due to a change in parameters.
>>
>> Terry T.
>>
>> ---- begin included message -----
>>
>>
>>
>>
>> Dear Community,
>>
>> I have been using two types of survival programs to analyse a data set.
>>
>> The first one is an R function called aftreg. The second one an STATA
>> function called streg.
>>
>> Both of them include the same analyisis with a weibull distribution. Yet,
>> results are very different.
>>
>> Shouldn't the results be the same?
>>
>> Kind regards,
>> J
>>
>>
>
>
>  --
> *Javier Palacios Fenech*
> PhD in Management
> Research Area: Marketing
>  www.javierpalacios.com
>
>
>
>
--
*Javier Palacios Fenech*
PhD in Management
Research Area: Marketing
www.javierpalacios.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

dataR.txt (447K) Download Attachment
dataSTATA.txt (434K) Download Attachment



If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/differences-between-survival-models-between-STATA-and-R-tp4635670p4635888.html
To unsubscribe from differences between survival models between STATA and R, click here.
NAML

Jeremy T. Hetzel
Boston University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: differences between survival models between STATA and R

Göran Broström
In reply to this post by Terry Therneau-2
On Mon, Jul 9, 2012 at 3:59 PM, Terry Therneau <[hidden email]> wrote:

> Without more information, we can only guess what you did, or what you are
> seeing on the page that is "different".
>
> I'll make a random guess though.  There are about 5 ways to paramaterize
> the Weibull distribution.  The standard packages that I know, however, tend
> to use the one found in the Kalbfleisch and Prentice book The Statistical
> Analysis of Failure time Data.  This includes the survreg funciton in R and
> lifereg in SAS, and likely stata tthought I don't know that package.  The
> aftreg function in the eha package uses something different.
>
The main difference is that aftreg estimates AFT factors, i.e., the
(exponential) score of an individual accelerates time with that factor. In
survreg, expected life is multiplied by the corresponding factor (the
inverse of the previous).

>
> About 1/2 the weibull questions I see are due to a change in parameters.
>
> For this reason I have introduced an additional parameter to aftreg,
'param', in the latest version of eha. The default value is 'default'
(everything is as before) and an alternative is 'survreg' (everything is as
in survreg).

Göran

> Terry T.
>
> ---- begin included message -----
>
>
>
>
> Dear Community,
>
> I have been using two types of survival programs to analyse a data set.
>
> The first one is an R function called aftreg. The second one an STATA
> function called streg.
>
> Both of them include the same analyisis with a weibull distribution. Yet,
> results are very different.
>
> Shouldn't the results be the same?
>
> Kind regards,
> J
>
> ______________________________**________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


--
Göran Broström

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...