Help understanding why glm and lrm.fit runs with my data, but lrm does not

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Help understanding why glm and lrm.fit runs with my data, but lrm does not

Bonnett, Laura
Dear all,

I am using the publically available GustoW dataset.  The exact version I am using is available here: https://drive.google.com/open?id=0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk

I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT.  I have successfully fitted a logistic regression model using the "glm" function as shown below.

library(rms)
gusto <- spss.get("GustoW.sav")
fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)

However, my review of the literature and other websites suggest I need to use "lrm" for the purposes of producing a nomogram.  When I run the command using "lrm" (see below) I get an error message saying:
Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
  Unable to fit model using "lrm.fit"

My code is as follows:
gusto2 <- gusto[,c(1,3,5,8,9,10)]
gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location")
label(gusto2)=lapply(names(var.labels),function(x) label(gusto2[,x])=var.labels[x])

ddist = datadist(gusto2)
options(datadist='ddist')

fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)

Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
  Unable to fit model using "lrm.fit"

Online solutions to this problem involve checking whether any variables are redundant.  However, the results for my data suggest  that none are.
redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)

Redundancy Analysis

redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)

n: 2188         p: 5    nk: 3

Number of NAs:   0

Transformation of target variables forced to be linear

R-squared cutoff: 0.9   Type: ordinary

R^2 with which each variable can be predicted from all other variables:

   AGE    HYP KILLIP    HRT    ANT
 0.028  0.032  0.053  0.046  0.040

No redundant variables

I've also tried just considering "lrm.fit" and that code seems to run without error too:
lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)

Logistic Regression Model

 lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
     gusto2$ANT), y = gusto2$DAY30)

                       Model Likelihood     Discrimination    Rank Discrim.
                          Ratio Test           Indexes           Indexes
 Obs          2188    LR chi2     233.59    R2       0.273    C       0.846
  0           2053    d.f.             5    g        1.642    Dxy     0.691
  1            135    Pr(> chi2) <0.0001    gr       5.165    gamma   0.696
 max |deriv| 4e-09                          gp       0.079    tau-a   0.080
                                            Brier    0.048

           Coef     S.E.   Wald Z Pr(>|Z|)
 Intercept -13.8515 0.9694 -14.29 <0.0001
 x[1]        0.0989 0.0103   9.58 <0.0001
 x[2]        0.9030 0.1510   5.98 <0.0001
 x[3]        1.3576 0.2570   5.28 <0.0001
 x[4]        0.6884 0.2034   3.38 0.0007
 x[5]        0.6327 0.2003   3.16 0.0016

I was therefore hoping someone would explain why the "lrm" code is producing an error message, while "lrm.fit" and "glm" do not.  In particular I would welcome a solution to ensure I can produce a nomogram.

Kind regards,
Laura

Dr Laura Bonnett
NIHR Post-Doctoral Fellow

Department of Biostatistics,
Waterhouse Building, Block F,
1-5 Brownlow Street,
University of Liverpool,
Liverpool,
L69 3GL

0151 795 9686
[hidden email]



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help understanding why glm and lrm.fit runs with my data, but lrm does not

Jan van der LAan-2

With lrm.fit you are fitting a completely different model. One of the
things lrm does, is preparing the input for lrm.fit which in this case
means that dummy variables are generated for categorical variables such
as 'KILLIP'.

The error message means that model did not converge after the maximum
number of iterations. One possible solution is to try to increase the
maximum number of iterations, e.g.:

fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT, data = gusto2, maxit = 100)

HTH,

Jan



On 14-09-17 09:30, Bonnett, Laura wrote:

> Dear all,
>
> I am using the publically available GustoW dataset.  The exact version I am using is available here: https://drive.google.com/open?id=0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk
>
> I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT.  I have successfully fitted a logistic regression model using the "glm" function as shown below.
>
> library(rms)
> gusto <- spss.get("GustoW.sav")
> fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
>
> However, my review of the literature and other websites suggest I need to use "lrm" for the purposes of producing a nomogram.  When I run the command using "lrm" (see below) I get an error message saying:
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>    Unable to fit model using "lrm.fit"
>
> My code is as follows:
> gusto2 <- gusto[,c(1,3,5,8,9,10)]
> gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location")
> label(gusto2)=lapply(names(var.labels),function(x) label(gusto2[,x])=var.labels[x])
>
> ddist = datadist(gusto2)
> options(datadist='ddist')
>
> fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>    Unable to fit model using "lrm.fit"
>
> Online solutions to this problem involve checking whether any variables are redundant.  However, the results for my data suggest  that none are.
> redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Redundancy Analysis
>
> redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
>
> n: 2188         p: 5    nk: 3
>
> Number of NAs:   0
>
> Transformation of target variables forced to be linear
>
> R-squared cutoff: 0.9   Type: ordinary
>
> R^2 with which each variable can be predicted from all other variables:
>
>     AGE    HYP KILLIP    HRT    ANT
>   0.028  0.032  0.053  0.046  0.040
>
> No redundant variables
>
> I've also tried just considering "lrm.fit" and that code seems to run without error too:
> lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)
>
> Logistic Regression Model
>
>   lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
>       gusto2$ANT), y = gusto2$DAY30)
>
>                         Model Likelihood     Discrimination    Rank Discrim.
>                            Ratio Test           Indexes           Indexes
>   Obs          2188    LR chi2     233.59    R2       0.273    C       0.846
>    0           2053    d.f.             5    g        1.642    Dxy     0.691
>    1            135    Pr(> chi2) <0.0001    gr       5.165    gamma   0.696
>   max |deriv| 4e-09                          gp       0.079    tau-a   0.080
>                                              Brier    0.048
>
>             Coef     S.E.   Wald Z Pr(>|Z|)
>   Intercept -13.8515 0.9694 -14.29 <0.0001
>   x[1]        0.0989 0.0103   9.58 <0.0001
>   x[2]        0.9030 0.1510   5.98 <0.0001
>   x[3]        1.3576 0.2570   5.28 <0.0001
>   x[4]        0.6884 0.2034   3.38 0.0007
>   x[5]        0.6327 0.2003   3.16 0.0016
>
> I was therefore hoping someone would explain why the "lrm" code is producing an error message, while "lrm.fit" and "glm" do not.  In particular I would welcome a solution to ensure I can produce a nomogram.
>
> Kind regards,
> Laura
>
> Dr Laura Bonnett
> NIHR Post-Doctoral Fellow
>
> Department of Biostatistics,
> Waterhouse Building, Block F,
> 1-5 Brownlow Street,
> University of Liverpool,
> Liverpool,
> L69 3GL
>
> 0151 795 9686
> [hidden email]
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help understanding why glm and lrm.fit runs with my data, but lrm does not

David Winsemius
In reply to this post by Bonnett, Laura

> On Sep 14, 2017, at 12:30 AM, Bonnett, Laura <[hidden email]> wrote:
>
> Dear all,
>
> I am using the publically available GustoW dataset.  The exact version I am using is available here: https://drive.google.com/open?id=0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk
>
> I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT.  I have successfully fitted a logistic regression model using the "glm" function as shown below.
>
> library(rms)
> gusto <- spss.get("GustoW.sav")
> fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
>
> However, my review of the literature and other websites suggest I need to use "lrm" for the purposes of producing a nomogram.  When I run the command using "lrm" (see below) I get an error message saying:
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>  Unable to fit model using "lrm.fit"
>
> My code is as follows:
> gusto2 <- gusto[,c(1,3,5,8,9,10)]
> gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location")
> label(gusto2)=lapply(names(var.labels),function(x) label(gusto2[,x])=var.labels[x])
>
> ddist = datadist(gusto2)
> options(datadist='ddist')
>
> fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>  Unable to fit model using "lrm.fit"
>
> Online solutions to this problem involve checking whether any variables are redundant.  However, the results for my data suggest  that none are.
> redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Redundancy Analysis
>
> redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
>
> n: 2188         p: 5    nk: 3
>
> Number of NAs:   0
>
> Transformation of target variables forced to be linear
>
> R-squared cutoff: 0.9   Type: ordinary
>
> R^2 with which each variable can be predicted from all other variables:
>
>   AGE    HYP KILLIP    HRT    ANT
> 0.028  0.032  0.053  0.046  0.040
>
> No redundant variables
>
> I've also tried just considering "lrm.fit" and that code seems to run without error too:
> lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)
>
> Logistic Regression Model
>
> lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
>     gusto2$ANT), y = gusto2$DAY30)
>
>                       Model Likelihood     Discrimination    Rank Discrim.
>                          Ratio Test           Indexes           Indexes
> Obs          2188    LR chi2     233.59    R2       0.273    C       0.846
>  0           2053    d.f.             5    g        1.642    Dxy     0.691
>  1            135    Pr(> chi2) <0.0001    gr       5.165    gamma   0.696
> max |deriv| 4e-09                          gp       0.079    tau-a   0.080
>                                            Brier    0.048
>
>           Coef     S.E.   Wald Z Pr(>|Z|)
> Intercept -13.8515 0.9694 -14.29 <0.0001
> x[1]        0.0989 0.0103   9.58 <0.0001
> x[2]        0.9030 0.1510   5.98 <0.0001
> x[3]        1.3576 0.2570   5.28 <0.0001
> x[4]        0.6884 0.2034   3.38 0.0007
> x[5]        0.6327 0.2003   3.16 0.0016
>
> I was therefore hoping someone would explain why the "lrm" code is producing an error message, while "lrm.fit" and "glm" do not.  In particular I would welcome a solution to ensure I can produce a nomogram.

Try this:

lrm  # look at code, do a search on "fail"
?lrm.fit  # read the structure of the returned value of lrm.fit

my.fit <- lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
    gusto2$ANT), y = gusto2$DAY30)

print(my.fit$fail)  # the error message you got from the lrm call means convergence failed

Documentation bug: The documentation of the cause of the 'fail'- value incorrectly gives the name of this parameter as 'maxiter' in the  Value section.

--
David.



>
> Kind regards,
> Laura
>
> Dr Laura Bonnett
> NIHR Post-Doctoral Fellow
>
> Department of Biostatistics,
> Waterhouse Building, Block F,
> 1-5 Brownlow Street,
> University of Liverpool,
> Liverpool,
> L69 3GL
>
> 0151 795 9686
> [hidden email]
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help understanding why glm and lrm.fit runs with my data, but lrm does not

Frank Harrell
Fixed 'maxiter' in the help file.  Thanks.

Please give the original source of that dataset.

That dataset is a tiny sample of GUSTO-I and not large enough to fit this
model very reliably.

A nomogram using the full dataset (not publicly available to my knowledge)
is already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf

Use lrm, not lrm.fit for this.  Adding maxit=20 will probably make it work
on the small dataset but still not clear on why you are using this dataset.

Frank


------------------------------
Frank E Harrell Jr      Professor      School of Medicine

Department of *Biostatistics*      *Vanderbilt University*

On Thu, Sep 14, 2017 at 10:48 AM, David Winsemius <[hidden email]>
wrote:

>
> > On Sep 14, 2017, at 12:30 AM, Bonnett, Laura <
> [hidden email]> wrote:
> >
> > Dear all,
> >
> > I am using the publically available GustoW dataset.  The exact version I
> am using is available here: https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%
> 3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk&data=02%7C01%7Cf.harrell%40vanderbilt.edu%
> 7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80fa
> ecad%7C0%7C0%7C636410009046132507&sdata=UZgX3%2Ba%
> 2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D&reserved=0
> >
> > I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP,
> HRT and ANT.  I have successfully fitted a logistic regression model using
> the "glm" function as shown below.
> >
> > library(rms)
> > gusto <- spss.get("GustoW.sav")
> > fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=
> binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
> >
> > However, my review of the literature and other websites suggest I need
> to use "lrm" for the purposes of producing a nomogram.  When I run the
> command using "lrm" (see below) I get an error message saying:
> > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> >  Unable to fit model using "lrm.fit"
> >
> > My code is as follows:
> > gusto2 <- gusto[,c(1,3,5,8,9,10)]
> > gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> > gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> > gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> > gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> > var.labels=c(DAY30="30-day Mortality", AGE="Age in Years",
> KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior
> Infarct Location")
> > label(gusto2)=lapply(names(var.labels),function(x)
> label(gusto2[,x])=var.labels[x])
> >
> > ddist = datadist(gusto2)
> > options(datadist='ddist')
> >
> > fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
> >
> > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> >  Unable to fit model using "lrm.fit"
> >
> > Online solutions to this problem involve checking whether any variables
> are redundant.  However, the results for my data suggest  that none are.
> > redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
> >
> > Redundancy Analysis
> >
> > redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
> >
> > n: 2188         p: 5    nk: 3
> >
> > Number of NAs:   0
> >
> > Transformation of target variables forced to be linear
> >
> > R-squared cutoff: 0.9   Type: ordinary
> >
> > R^2 with which each variable can be predicted from all other variables:
> >
> >   AGE    HYP KILLIP    HRT    ANT
> > 0.028  0.032  0.053  0.046  0.040
> >
> > No redundant variables
> >
> > I've also tried just considering "lrm.fit" and that code seems to run
> without error too:
> > lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,
> gusto2$HRT,gusto2$ANT),gusto2$DAY30)
> >
> > Logistic Regression Model
> >
> > lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
> >     gusto2$ANT), y = gusto2$DAY30)
> >
> >                       Model Likelihood     Discrimination    Rank
> Discrim.
> >                          Ratio Test           Indexes           Indexes
> > Obs          2188    LR chi2     233.59    R2       0.273    C
>  0.846
> >  0           2053    d.f.             5    g        1.642    Dxy
>  0.691
> >  1            135    Pr(> chi2) <0.0001    gr       5.165    gamma
>  0.696
> > max |deriv| 4e-09                          gp       0.079    tau-a
>  0.080
> >                                            Brier    0.048
> >
> >           Coef     S.E.   Wald Z Pr(>|Z|)
> > Intercept -13.8515 0.9694 -14.29 <0.0001
> > x[1]        0.0989 0.0103   9.58 <0.0001
> > x[2]        0.9030 0.1510   5.98 <0.0001
> > x[3]        1.3576 0.2570   5.28 <0.0001
> > x[4]        0.6884 0.2034   3.38 0.0007
> > x[5]        0.6327 0.2003   3.16 0.0016
> >
> > I was therefore hoping someone would explain why the "lrm" code is
> producing an error message, while "lrm.fit" and "glm" do not.  In
> particular I would welcome a solution to ensure I can produce a nomogram.
>
> Try this:
>
> lrm  # look at code, do a search on "fail"
> ?lrm.fit  # read the structure of the returned value of lrm.fit
>
> my.fit <- lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP,
> gusto2$HRT,
>     gusto2$ANT), y = gusto2$DAY30)
>
> print(my.fit$fail)  # the error message you got from the lrm call means
> convergence failed
>
> Documentation bug: The documentation of the cause of the 'fail'- value
> incorrectly gives the name of this parameter as 'maxiter' in the  Value
> section.
>
> --
> David.
>
>
>
> >
> > Kind regards,
> > Laura
> >
> > Dr Laura Bonnett
> > NIHR Post-Doctoral Fellow
> >
> > Department of Biostatistics,
> > Waterhouse Building, Block F,
> > 1-5 Brownlow Street,
> > University of Liverpool,
> > Liverpool,
> > L69 3GL
> >
> > 0151 795 9686
> > [hidden email]
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&
> data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb88
> 07f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%
> 7C636410009046132507&sdata=GAPis8GXCfundLz48dX66AZfVTzxs%
> 2BNBUmG1kgpx2Ro%3D&reserved=0
> > PLEASE do read the posting guide https://na01.safelinks.
> protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.
> org%2Fposting-guide.html&data=02%7C01%7Cf.harrell%40vanderbilt.edu%
> 7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80fa
> ecad%7C0%7C0%7C636410009046132507&sdata=C8xd7UizYeLM6bylOyad8bumQTsYOz
> FYZu2IcMo%2BUII%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Help understanding why glm and lrm.fit runs with my data, but lrm does not

Bonnett, Laura
Many thanks for the assistance.  I am using a small sample of GUSTO-1 as a teaching demonstration.  The Gusto-1 dataset in various smaller subsets is available from this website: http://clinicalpredictionmodels.org/doku.php?id=rcode_and_data:start  which is associated with the Clinical Prediction Models book by Steyerberg.

Many thanks again for your assistance.

Kind regards,
Laura

From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Harrell
Sent: 14 September 2017 17:22
To: David Winsemius <[hidden email]>
Cc: Bonnett, Laura <[hidden email]>; [hidden email]
Subject: Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not

Fixed 'maxiter' in the help file.  Thanks.

Please give the original source of that dataset.

That dataset is a tiny sample of GUSTO-I and not large enough to fit this model very reliably.

A nomogram using the full dataset (not publicly available to my knowledge) is already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf

Use lrm, not lrm.fit for this.  Adding maxit=20 will probably make it work on the small dataset but still not clear on why you are using this dataset.

Frank


________________________________
Frank E Harrell Jr

Professor

School of Medicine


Department of Biostatistics

Vanderbilt University


On Thu, Sep 14, 2017 at 10:48 AM, David Winsemius <[hidden email]<mailto:[hidden email]>> wrote:

> On Sep 14, 2017, at 12:30 AM, Bonnett, Laura <[hidden email]<mailto:[hidden email]>> wrote:
>
> Dear all,
>
> I am using the publically available GustoW dataset.  The exact version I am using is available here: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507&sdata=UZgX3%2Ba%2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D&reserved=0
>
> I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT.  I have successfully fitted a logistic regression model using the "glm" function as shown below.
>
> library(rms)
> gusto <- spss.get("GustoW.sav")
> fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
>
> However, my review of the literature and other websites suggest I need to use "lrm" for the purposes of producing a nomogram.  When I run the command using "lrm" (see below) I get an error message saying:
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>  Unable to fit model using "lrm.fit"
>
> My code is as follows:
> gusto2 <- gusto[,c(1,3,5,8,9,10)]
> gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location")
> label(gusto2)=lapply(names(var.labels),function(x) label(gusto2[,x])=var.labels[x])
>
> ddist = datadist(gusto2)
> options(datadist='ddist')
>
> fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>  Unable to fit model using "lrm.fit"
>
> Online solutions to this problem involve checking whether any variables are redundant.  However, the results for my data suggest  that none are.
> redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Redundancy Analysis
>
> redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
>
> n: 2188         p: 5    nk: 3
>
> Number of NAs:   0
>
> Transformation of target variables forced to be linear
>
> R-squared cutoff: 0.9   Type: ordinary
>
> R^2 with which each variable can be predicted from all other variables:
>
>   AGE    HYP KILLIP    HRT    ANT
> 0.028  0.032  0.053  0.046  0.040
>
> No redundant variables
>
> I've also tried just considering "lrm.fit" and that code seems to run without error too:
> lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)
>
> Logistic Regression Model
>
> lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
>     gusto2$ANT), y = gusto2$DAY30)
>
>                       Model Likelihood     Discrimination    Rank Discrim.
>                          Ratio Test           Indexes           Indexes
> Obs          2188    LR chi2     233.59    R2       0.273    C       0.846
>  0           2053    d.f.             5    g        1.642    Dxy     0.691
>  1            135    Pr(> chi2) <0.0001    gr       5.165    gamma   0.696
> max |deriv| 4e-09                          gp       0.079    tau-a   0.080
>                                            Brier    0.048
>
>           Coef     S.E.   Wald Z Pr(>|Z|)
> Intercept -13.8515 0.9694 -14.29 <0.0001
> x[1]        0.0989 0.0103   9.58 <0.0001
> x[2]        0.9030 0.1510   5.98 <0.0001
> x[3]        1.3576 0.2570   5.28 <0.0001
> x[4]        0.6884 0.2034   3.38 0.0007
> x[5]        0.6327 0.2003   3.16 0.0016
>
> I was therefore hoping someone would explain why the "lrm" code is producing an error message, while "lrm.fit" and "glm" do not.  In particular I would welcome a solution to ensure I can produce a nomogram.

Try this:

lrm  # look at code, do a search on "fail"
?lrm.fit  # read the structure of the returned value of lrm.fit

my.fit <- lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
    gusto2$ANT), y = gusto2$DAY30)

print(my.fit$fail)  # the error message you got from the lrm call means convergence failed

Documentation bug: The documentation of the cause of the 'fail'- value incorrectly gives the name of this parameter as 'maxiter' in the  Value section.

--
David.



>
> Kind regards,
> Laura
>
> Dr Laura Bonnett
> NIHR Post-Doctoral Fellow
>
> Department of Biostatistics,
> Waterhouse Building, Block F,
> 1-5 Brownlow Street,
> University of Liverpool,
> Liverpool,
> L69 3GL
>
> 0151 795 9686
> [hidden email]<mailto:[hidden email]>
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507&sdata=GAPis8GXCfundLz48dX66AZfVTzxs%2BNBUmG1kgpx2Ro%3D&reserved=0
> PLEASE do read the posting guide https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507&sdata=C8xd7UizYeLM6bylOyad8bumQTsYOzFYZu2IcMo%2BUII%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.