Question on approximations of full logistic regression model

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on approximations of full logistic regression model

khosoda
Hi,
I am trying to construct a logistic regression model from my data (104
patients and 25 events). I build a full model consisting of five
predictors with the use of penalization by rms package (lrm, pentrace
etc) because of events per variable issue. Then, I tried to approximate
the full model by step-down technique predicting L from all of the
componet variables using ordinary least squares (ols in rms package) as
the followings. I would like to know whether I am doing right or not.

> library(rms)
> plogit <- predict(full.model)
> full.ols <- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
> fastbw(full.ols, aics=1e10)

 Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
 stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
 x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
 procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
 ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
 x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000

Then, fitted an approximation to the full model using most imprtant
variable (R^2 for predictions from the reduced model against the
original Y drops below 0.95), that is, dropping "stenosis".

> full.ols.approx <- ols(plogit ~ x1+x2+ClinicalScore+procedure)
> full.ols.approx$stats
          n  Model L.R.        d.f.          R2           g       Sigma
104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622

This approximate model had R^2 against the full model of 0.99.
Therefore, I updated the original full logistic model dropping
"stenosis" as predictor.

> full.approx.lrm <- update(full.model, ~ . -stenosis)

> validate(full.model, bw=F, B=1000)
          index.orig training    test optimism index.corrected    n
Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000

> validate(full.approx.lrm, bw=F, B=1000)
          index.orig training    test optimism index.corrected    n
Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000

Validatin revealed this approximation was not bad.
Then, I made a nomogram.

> full.approx.lrm.nom <- nomogram(full.approx.lrm,
fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
> plot(full.approx.lrm.nom)

Another nomogram using ols model,

> full.ols.approx.nom <- nomogram(full.ols.approx,
fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
> plot(full.ols.approx.nom)

These two nomograms are very similar but a little bit different.

My questions are;

1. Am I doing right?

2. Which nomogram is correct

I would appreciate your help in advance.

--
KH

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

Frank Harrell
I think you are doing this correctly except for one thing.  The validation and other inferential calculations should be done on the full model.  Use the approximate model to get a simpler nomogram but not to get standard errors.  With only dropping one variable you might consider just running the nomogram on the entire model.
Frank

細田弘吉 wrote
Hi,
I am trying to construct a logistic regression model from my data (104
patients and 25 events). I build a full model consisting of five
predictors with the use of penalization by rms package (lrm, pentrace
etc) because of events per variable issue. Then, I tried to approximate
the full model by step-down technique predicting L from all of the
componet variables using ordinary least squares (ols in rms package) as
the followings. I would like to know whether I am doing right or not.

> library(rms)
> plogit <- predict(full.model)
> full.ols <- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
> fastbw(full.ols, aics=1e10)

 Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
 stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
 x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
 procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
 ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
 x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000

Then, fitted an approximation to the full model using most imprtant
variable (R^2 for predictions from the reduced model against the
original Y drops below 0.95), that is, dropping "stenosis".

> full.ols.approx <- ols(plogit ~ x1+x2+ClinicalScore+procedure)
> full.ols.approx$stats
          n  Model L.R.        d.f.          R2           g       Sigma
104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622

This approximate model had R^2 against the full model of 0.99.
Therefore, I updated the original full logistic model dropping
"stenosis" as predictor.

> full.approx.lrm <- update(full.model, ~ . -stenosis)

> validate(full.model, bw=F, B=1000)
          index.orig training    test optimism index.corrected    n
Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000

> validate(full.approx.lrm, bw=F, B=1000)
          index.orig training    test optimism index.corrected    n
Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000

Validatin revealed this approximation was not bad.
Then, I made a nomogram.

> full.approx.lrm.nom <- nomogram(full.approx.lrm,
fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
> plot(full.approx.lrm.nom)

Another nomogram using ols model,

> full.ols.approx.nom <- nomogram(full.ols.approx,
fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
> plot(full.ols.approx.nom)

These two nomograms are very similar but a little bit different.

My questions are;

1. Am I doing right?

2. Which nomogram is correct

I would appreciate your help in advance.

--
KH

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

khosoda
Thank you for your reply, Prof. Harrell.

I agree with you. Dropping only one variable does not actually help a lot.

I have one more question.
During analysis of this model I found that the confidence
intervals (CIs) of some coefficients provided by bootstrapping (bootcov
function in rms package) was narrower than CIs provided by usual
variance-covariance matrix and CIs of other coefficients wider.  My data
has no cluster structure. I am wondering which CIs are better.
I guess bootstrapping one, but is it right?

I would appreciate your help in advance.
--
KH



(11/05/16 12:25), Frank Harrell wrote:

> I think you are doing this correctly except for one thing.  The validation
> and other inferential calculations should be done on the full model.  Use
> the approximate model to get a simpler nomogram but not to get standard
> errors.  With only dropping one variable you might consider just running the
> nomogram on the entire model.
> Frank
>
>
> KH wrote:
>>
>> Hi,
>> I am trying to construct a logistic regression model from my data (104
>> patients and 25 events). I build a full model consisting of five
>> predictors with the use of penalization by rms package (lrm, pentrace
>> etc) because of events per variable issue. Then, I tried to approximate
>> the full model by step-down technique predicting L from all of the
>> componet variables using ordinary least squares (ols in rms package) as
>> the followings. I would like to know whether I am doing right or not.
>>
>>> library(rms)
>>> plogit<- predict(full.model)
>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
>>> fastbw(full.ols, aics=1e10)
>>
>>   Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>   stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>   x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>   procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>   ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>   x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>
>> Then, fitted an approximation to the full model using most imprtant
>> variable (R^2 for predictions from the reduced model against the
>> original Y drops below 0.95), that is, dropping "stenosis".
>>
>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>> full.ols.approx$stats
>>            n  Model L.R.        d.f.          R2           g       Sigma
>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>
>> This approximate model had R^2 against the full model of 0.99.
>> Therefore, I updated the original full logistic model dropping
>> "stenosis" as predictor.
>>
>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>
>>> validate(full.model, bw=F, B=1000)
>>            index.orig training    test optimism index.corrected    n
>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>
>>> validate(full.approx.lrm, bw=F, B=1000)
>>            index.orig training    test optimism index.corrected    n
>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>
>> Validatin revealed this approximation was not bad.
>> Then, I made a nomogram.
>>
>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>> plot(full.approx.lrm.nom)
>>
>> Another nomogram using ols model,
>>
>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>> plot(full.ols.approx.nom)
>>
>> These two nomograms are very similar but a little bit different.
>>
>> My questions are;
>>
>> 1. Am I doing right?
>>
>> 2. Which nomogram is correct
>>
>> I would appreciate your help in advance.
>>
>> --
>> KH
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


     E-mail address
         Office: [hidden email]
        Home  : [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

Frank Harrell
The choice is not clear, and requires some simulations to estimate the average absolute error of the covariance matrix estimators.
Frank

細田弘吉 wrote
Thank you for your reply, Prof. Harrell.

I agree with you. Dropping only one variable does not actually help a lot.

I have one more question.
During analysis of this model I found that the confidence
intervals (CIs) of some coefficients provided by bootstrapping (bootcov
function in rms package) was narrower than CIs provided by usual
variance-covariance matrix and CIs of other coefficients wider.  My data
has no cluster structure. I am wondering which CIs are better.
I guess bootstrapping one, but is it right?

I would appreciate your help in advance.
--
KH



(11/05/16 12:25), Frank Harrell wrote:
> I think you are doing this correctly except for one thing.  The validation
> and other inferential calculations should be done on the full model.  Use
> the approximate model to get a simpler nomogram but not to get standard
> errors.  With only dropping one variable you might consider just running the
> nomogram on the entire model.
> Frank
>
>
> KH wrote:
>>
>> Hi,
>> I am trying to construct a logistic regression model from my data (104
>> patients and 25 events). I build a full model consisting of five
>> predictors with the use of penalization by rms package (lrm, pentrace
>> etc) because of events per variable issue. Then, I tried to approximate
>> the full model by step-down technique predicting L from all of the
>> componet variables using ordinary least squares (ols in rms package) as
>> the followings. I would like to know whether I am doing right or not.
>>
>>> library(rms)
>>> plogit<- predict(full.model)
>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
>>> fastbw(full.ols, aics=1e10)
>>
>>   Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>   stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>   x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>   procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>   ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>   x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>
>> Then, fitted an approximation to the full model using most imprtant
>> variable (R^2 for predictions from the reduced model against the
>> original Y drops below 0.95), that is, dropping "stenosis".
>>
>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>> full.ols.approx$stats
>>            n  Model L.R.        d.f.          R2           g       Sigma
>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>
>> This approximate model had R^2 against the full model of 0.99.
>> Therefore, I updated the original full logistic model dropping
>> "stenosis" as predictor.
>>
>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>
>>> validate(full.model, bw=F, B=1000)
>>            index.orig training    test optimism index.corrected    n
>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>
>>> validate(full.approx.lrm, bw=F, B=1000)
>>            index.orig training    test optimism index.corrected    n
>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>
>> Validatin revealed this approximation was not bad.
>> Then, I made a nomogram.
>>
>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>> plot(full.approx.lrm.nom)
>>
>> Another nomogram using ols model,
>>
>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>> plot(full.ols.approx.nom)
>>
>> These two nomograms are very similar but a little bit different.
>>
>> My questions are;
>>
>> 1. Am I doing right?
>>
>> 2. Which nomogram is correct
>>
>> I would appreciate your help in advance.
>>
>> --
>> KH
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


     E-mail address
         Office: [hidden email]
        Home  : [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

khosoda
Thank you for your comment, Prof. Harrell.
I would appreciate it very much if you could teach me how to simulate
for the estimation. For reference, following codes are what I did
(bootcov, summary, and validation).

MyFullModel.boot <- bootcov(MyFullModel, B=1000, coef.reps=T)

 > summary(MyFullModel, stenosis=c(70, 80), x1=c(1.5, 2.0), x2=c(1.5, 2.0))
              Effects              Response : outcome

  Factor              Low  High Diff. Effect S.E. Lower 0.95 Upper 0.95
  stenosis            70.0 80   10.0  -0.11  0.24 -0.59      0.37
   Odds Ratio         70.0 80   10.0   0.90    NA  0.56      1.45
  x1                   1.5  2    0.5   1.21  0.37  0.49      1.94
   Odds Ratio          1.5  2    0.5   3.36    NA  1.63      6.95
  x2                   1.5  2    0.5  -0.29  0.19 -0.65      0.08
   Odds Ratio          1.5  2    0.5   0.75    NA  0.52      1.08
  ClinicalScore        3.0  5    2.0   0.61  0.38 -0.14      1.36
   Odds Ratio          3.0  5    2.0   1.84    NA  0.87      3.89
  procedure - CA:CE    2.0  1     NA   0.83  0.46 -0.07      1.72
   Odds Ratio          2.0  1     NA   2.28    NA  0.93      5.59

 > summary(MyFullModel.boot, stenosis=c(70, 80), x1=c(1.5, 2.0),
x2=c(1.5, 2.0))
              Effects              Response : outcome

  Factor              Low  High Diff. Effect S.E. Lower 0.95 Upper 0.95
  stenosis            70.0 80   10.0  -0.11  0.28 -0.65      0.43
   Odds Ratio         70.0 80   10.0   0.90    NA  0.52      1.54
  x1                   1.5  2    0.5   1.21  0.29  0.65      1.77
   Odds Ratio          1.5  2    0.5   3.36    NA  1.92      5.89
  x2                   1.5  2    0.5  -0.29  0.16 -0.59      0.02
   Odds Ratio          1.5  2    0.5   0.75    NA  0.55      1.02
  ClinicalScore        3.0  5    2.0   0.61  0.45 -0.28      1.50
   Odds Ratio          3.0  5    2.0   1.84    NA  0.76      4.47
  procedure - CAS:CEA  2.0  1     NA   0.83  0.38  0.07      1.58
   Odds Ratio          2.0  1     NA   2.28    NA  1.08      4.85

 > validate(MyFullModel, bw=F, B=1000)
           index.orig training    test optimism index.corrected    n
Dxy           0.6425   0.7054  0.6122   0.0932          0.5493 1000
R2            0.3270   0.3745  0.3330   0.0415          0.2855 1000
Intercept     0.0000   0.0000  0.0683  -0.0683          0.0683 1000
Slope         1.0000   1.0000  1.0465  -0.0465          1.0465 1000
Emax          0.0000   0.0000  0.0221   0.0221          0.0221 1000
D             0.2715   0.2795  0.2424   0.0371          0.2345 1000
U            -0.0192  -0.0192 -0.0035  -0.0157         -0.0035 1000
Q             0.2908   0.2987  0.2460   0.0528          0.2380 1000
B             0.1265   0.1164  0.1332  -0.0168          0.1433 1000
g             1.3366   1.5041  1.5495  -0.0455          1.3821 1000
gp            0.2082   0.2172  0.2258  -0.0087          0.2169 1000

 > validate(MyFullModel.boot, bw=F, B=1000)
           index.orig training    test optimism index.corrected    n
Dxy           0.6425   0.7015  0.6139   0.0877          0.5549 1000
R2            0.3270   0.3738  0.3346   0.0392          0.2878 1000
Intercept     0.0000   0.0000  0.0613  -0.0613          0.0613 1000
Slope         1.0000   1.0000  1.0569  -0.0569          1.0569 1000
Emax          0.0000   0.0000  0.0226   0.0226          0.0226 1000
D             0.2715   0.2805  0.2438   0.0367          0.2348 1000
U            -0.0192  -0.0192 -0.0039  -0.0153         -0.0039 1000
Q             0.2908   0.2997  0.2477   0.0521          0.2387 1000
B             0.1265   0.1177  0.1329  -0.0153          0.1417 1000
g             1.3366   1.5020  1.5523  -0.0503          1.3869 1000
gp            0.2082   0.2191  0.2263  -0.0072          0.2154 1000



(11/05/16 22:01), Frank Harrell wrote:

> The choice is not clear, and requires some simulations to estimate the
> average absolute error of the covariance matrix estimators.
> Frank
>
>
> 細田弘吉 wrote:
>>
>> Thank you for your reply, Prof. Harrell.
>>
>> I agree with you. Dropping only one variable does not actually help a lot.
>>
>> I have one more question.
>> During analysis of this model I found that the confidence
>> intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>> function in rms package) was narrower than CIs provided by usual
>> variance-covariance matrix and CIs of other coefficients wider.  My data
>> has no cluster structure. I am wondering which CIs are better.
>> I guess bootstrapping one, but is it right?
>>
>> I would appreciate your help in advance.
>> --
>> KH
>>
>>
>>
>> (11/05/16 12:25), Frank Harrell wrote:
>>> I think you are doing this correctly except for one thing.  The
>>> validation
>>> and other inferential calculations should be done on the full model.  Use
>>> the approximate model to get a simpler nomogram but not to get standard
>>> errors.  With only dropping one variable you might consider just running
>>> the
>>> nomogram on the entire model.
>>> Frank
>>>
>>>
>>> KH wrote:
>>>>
>>>> Hi,
>>>> I am trying to construct a logistic regression model from my data (104
>>>> patients and 25 events). I build a full model consisting of five
>>>> predictors with the use of penalization by rms package (lrm, pentrace
>>>> etc) because of events per variable issue. Then, I tried to approximate
>>>> the full model by step-down technique predicting L from all of the
>>>> componet variables using ordinary least squares (ols in rms package) as
>>>> the followings. I would like to know whether I am doing right or not.
>>>>
>>>>> library(rms)
>>>>> plogit<- predict(full.model)
>>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure,
>>>>> sigma=1)
>>>>> fastbw(full.ols, aics=1e10)
>>>>
>>>>    Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>>>    stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>>>    x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>>>    procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>>>    ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>>>    x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>>>
>>>> Then, fitted an approximation to the full model using most imprtant
>>>> variable (R^2 for predictions from the reduced model against the
>>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>>
>>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>>> full.ols.approx$stats
>>>>             n  Model L.R.        d.f.          R2           g       Sigma
>>>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>>>
>>>> This approximate model had R^2 against the full model of 0.99.
>>>> Therefore, I updated the original full logistic model dropping
>>>> "stenosis" as predictor.
>>>>
>>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>>
>>>>> validate(full.model, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>>>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>>>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>>>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>>>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>>>
>>>>> validate(full.approx.lrm, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>>>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>>>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>>>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>>>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>>>
>>>> Validatin revealed this approximation was not bad.
>>>> Then, I made a nomogram.
>>>>
>>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.approx.lrm.nom)
>>>>
>>>> Another nomogram using ols model,
>>>>
>>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.ols.approx.nom)
>>>>
>>>> These two nomograms are very similar but a little bit different.
>>>>
>>>> My questions are;
>>>>
>>>> 1. Am I doing right?
>>>>
>>>> 2. Which nomogram is correct
>>>>
>>>> I would appreciate your help in advance.
>>>>
>>>> --
>>>> KH
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> -----
>>> Frank Harrell
>>> Department of Biostatistics, Vanderbilt University
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>       E-mail address
>>           Office: [hidden email]
>> Home  : [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3526155.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
*************************************************
 神戸大学大学院医学研究科 脳神経外科学分野
 細田 弘吉
 
 〒650-0017 神戸市中央区楠町7丁目5-1
     Phone: 078-382-5966
     Fax  : 078-382-5979
     E-mail address
         Office: [hidden email]
        Home  : [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

Tim Hesterberg-2
In reply to this post by khosoda
My usual rule is that whatever gives the widest confidence intervals
in a particular problem is most accurate for that problem :-)

Bootstrap percentile intervals tend to be too narrow.
Consider the case of the sample mean; the usual formula CI is
    xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n)
The bootstrap percentile interval for symmetric data is roughly
    xbar +- z_alpha sqrt( (1/(n  )) sum((x_i - xbar)^2)) / sqrt(n)
It is narrower than the formula CI because
  * z quantiles rather than t quantiles
  * standard error uses divisor of n rather than (n-1)

In stratified sampling, the narrowness factor depends on the
stratum sizes, not the overall n.
In regression, estimates for some quantities may be based on a small
subset of the data (e.g. coefficients related to rare factor levels).

This doesn't mean we should give up on the bootstrap.
There are remedies for the bootstrap biases, see e.g.
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
vs. Smoothing, Proceedings of the Section on Statistics and the
Environment, American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

And other methods have their own biases, particularly in nonlinear
applications such as logistic regression.

Tim Hesterberg

>Thank you for your reply, Prof. Harrell.
>
>I agree with you. Dropping only one variable does not actually help a lot.
>
>I have one more question.
>During analysis of this model I found that the confidence
>intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>function in rms package) was narrower than CIs provided by usual
>variance-covariance matrix and CIs of other coefficients wider.  My data
>has no cluster structure. I am wondering which CIs are better.
>I guess bootstrapping one, but is it right?
>
>I would appreciate your help in advance.
>--
>KH
>
>
>
>(11/05/16 12:25), Frank Harrell wrote:
>> I think you are doing this correctly except for one thing.  The validation
>> and other inferential calculations should be done on the full model.  Use
>> the approximate model to get a simpler nomogram but not to get standard
>> errors.  With only dropping one variable you might consider just running the
>> nomogram on the entire model.
>> Frank
>>
>>
>> KH wrote:
>>>
>>> Hi,
>>> I am trying to construct a logistic regression model from my data (104
>>> patients and 25 events). I build a full model consisting of five
>>> predictors with the use of penalization by rms package (lrm, pentrace
>>> etc) because of events per variable issue. Then, I tried to approximate
>>> the full model by step-down technique predicting L from all of the
>>> componet variables using ordinary least squares (ols in rms package) as
>>> the followings. I would like to know whether I am doing right or not.
>>>
>>>> library(rms)
>>>> plogit<- predict(full.model)
>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
>>>> fastbw(full.ols, aics=1e10)
>>>
>>>   Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>>   stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>>   x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>>   procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>>   ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>>   x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>>
>>> Then, fitted an approximation to the full model using most imprtant
>>> variable (R^2 for predictions from the reduced model against the
>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>
>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>> full.ols.approx$stats
>>>            n  Model L.R.        d.f.          R2           g       Sigma
>>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>>
>>> This approximate model had R^2 against the full model of 0.99.
>>> Therefore, I updated the original full logistic model dropping
>>> "stenosis" as predictor.
>>>
>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>
>>>> validate(full.model, bw=F, B=1000)
>>>            index.orig training    test optimism index.corrected    n
>>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>>
>>>> validate(full.approx.lrm, bw=F, B=1000)
>>>            index.orig training    test optimism index.corrected    n
>>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>>
>>> Validatin revealed this approximation was not bad.
>>> Then, I made a nomogram.
>>>
>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>> plot(full.approx.lrm.nom)
>>>
>>> Another nomogram using ols model,
>>>
>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>> plot(full.ols.approx.nom)
>>>
>>> These two nomograms are very similar but a little bit different.
>>>
>>> My questions are;
>>>
>>> 1. Am I doing right?
>>>
>>> 2. Which nomogram is correct
>>>
>>> I would appreciate your help in advance.
>>>
>>> --
>>> KH
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> -----
>> Frank Harrell
>> Department of Biostatistics, Vanderbilt University
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>     E-mail address
>         Office: [hidden email]
> Home  : [hidden email]
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

khosoda
Thank you for your advice, Tim.

I am reading your paper and other materials in your website.
I could not find R package of your bootknife method. Is there any R
package for this procedure?

(11/05/17 14:13), Tim Hesterberg wrote:

> My usual rule is that whatever gives the widest confidence intervals
> in a particular problem is most accurate for that problem :-)
>
> Bootstrap percentile intervals tend to be too narrow.
> Consider the case of the sample mean; the usual formula CI is
>      xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n)
> The bootstrap percentile interval for symmetric data is roughly
>      xbar +- z_alpha sqrt( (1/(n  )) sum((x_i - xbar)^2)) / sqrt(n)
> It is narrower than the formula CI because
>    * z quantiles rather than t quantiles
>    * standard error uses divisor of n rather than (n-1)
>
> In stratified sampling, the narrowness factor depends on the
> stratum sizes, not the overall n.
> In regression, estimates for some quantities may be based on a small
> subset of the data (e.g. coefficients related to rare factor levels).
>
> This doesn't mean we should give up on the bootstrap.
> There are remedies for the bootstrap biases, see e.g.
> Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
> vs. Smoothing, Proceedings of the Section on Statistics and the
> Environment, American Statistical Association, 2924-2930.
> http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
>
> And other methods have their own biases, particularly in nonlinear
> applications such as logistic regression.
>
> Tim Hesterberg
>
>> Thank you for your reply, Prof. Harrell.
>>
>> I agree with you. Dropping only one variable does not actually help a lot.
>>
>> I have one more question.
>> During analysis of this model I found that the confidence
>> intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>> function in rms package) was narrower than CIs provided by usual
>> variance-covariance matrix and CIs of other coefficients wider.  My data
>> has no cluster structure. I am wondering which CIs are better.
>> I guess bootstrapping one, but is it right?
>>
>> I would appreciate your help in advance.
>> --
>> KH
>>
>>
>>
>> (11/05/16 12:25), Frank Harrell wrote:
>>> I think you are doing this correctly except for one thing.  The validation
>>> and other inferential calculations should be done on the full model.  Use
>>> the approximate model to get a simpler nomogram but not to get standard
>>> errors.  With only dropping one variable you might consider just running the
>>> nomogram on the entire model.
>>> Frank
>>>
>>>
>>> KH wrote:
>>>>
>>>> Hi,
>>>> I am trying to construct a logistic regression model from my data (104
>>>> patients and 25 events). I build a full model consisting of five
>>>> predictors with the use of penalization by rms package (lrm, pentrace
>>>> etc) because of events per variable issue. Then, I tried to approximate
>>>> the full model by step-down technique predicting L from all of the
>>>> componet variables using ordinary least squares (ols in rms package) as
>>>> the followings. I would like to know whether I am doing right or not.
>>>>
>>>>> library(rms)
>>>>> plogit<- predict(full.model)
>>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
>>>>> fastbw(full.ols, aics=1e10)
>>>>
>>>>    Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>>>    stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>>>    x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>>>    procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>>>    ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>>>    x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>>>
>>>> Then, fitted an approximation to the full model using most imprtant
>>>> variable (R^2 for predictions from the reduced model against the
>>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>>
>>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>>> full.ols.approx$stats
>>>>             n  Model L.R.        d.f.          R2           g       Sigma
>>>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>>>
>>>> This approximate model had R^2 against the full model of 0.99.
>>>> Therefore, I updated the original full logistic model dropping
>>>> "stenosis" as predictor.
>>>>
>>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>>
>>>>> validate(full.model, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>>>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>>>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>>>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>>>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>>>
>>>>> validate(full.approx.lrm, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>>>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>>>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>>>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>>>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>>>
>>>> Validatin revealed this approximation was not bad.
>>>> Then, I made a nomogram.
>>>>
>>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.approx.lrm.nom)
>>>>
>>>> Another nomogram using ols model,
>>>>
>>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.ols.approx.nom)
>>>>
>>>> These two nomograms are very similar but a little bit different.
>>>>
>>>> My questions are;
>>>>
>>>> 1. Am I doing right?
>>>>
>>>> 2. Which nomogram is correct
>>>>
>>>> I would appreciate your help in advance.
>>>>
>>>> --
>>>> KH
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> -----
>>> Frank Harrell
>>> Department of Biostatistics, Vanderbilt University
>>> --
>>> View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>      E-mail address
>>          Office: [hidden email]
>> Home  : [hidden email]
>>
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

khosoda
In reply to this post by Frank Harrell
I tried to make a histogram of bootstrap distribution of my logistic
model according to "Regression Model Strategy" (pp197-200). Attached is
the histogram I made. The figure demonstrates bootstrap distribution of
log odds ratio from my logistic model. The solid curve is a kernel
density estimate and dashed curve is a normal density with the dame mean
and standard deviation as the bootstrapped values. Vertical lines
indicate asymmetric 0.9, 0.95, and 0,99 two-sided confidence limits for
the log odds ratio based on quantiles of the bootstrap values.

It seems to me that bootstrap distribution is normal and that estimation
of confidence interval is, ummm, accurate.

Am I right?

The codes I used are followings;

 > library(rms)
 > b <- bootcov(MyModel.penalized, B=1000, coef.reps=T)
 > s <- summary(MyModel.penalized, x1=c(1.0, 1.5), x2=c(1.0, 1.5),
ClinicalScore=c(4,6), procedure=c("E", "A"))
 > X <- predict(MyModel.penalized, data.frame(T1=c(1.0, 1.5), T2=c(1.0,
1.5), ClinicalScore=c(4,6), procedure=c("E", "A")), type="x")
 > X
   Intercept  x1  x2  ClinicalScore   procedure=E
1         1 1.0 1.0              4             1
2         1 1.5 1.5              6             0
 > Xdif <- X[2, drop=F] -X[1, drop=F]
 > Xdif
   Intercept  x1  x2  ClinicalScore   procedure=E
2         0 0.5 0.5              2            -1
 > conf <- c(.9, .95, .99)
 > bootplot(b, X=Xdif, conf.int=conf, xlim=c(0, 6))

 > boot.log.odds.ratio <- b$boot.Coef %*% t(Xdif)
 > sd <- sqrt(var(boot.log.odds.ratio))
 > sd
           2
2 0.7412509
 > z <- seq(0, 6, length=104)
 > lines(z, dnorm(z, mean=mean(boot.log.odds.ratio), sd = sd), lty=2)

(11/05/16 22:01), Frank Harrell wrote:

> The choice is not clear, and requires some simulations to estimate the
> average absolute error of the covariance matrix estimators.
> Frank
>
>
> 細田弘吉 wrote:
>>
>> Thank you for your reply, Prof. Harrell.
>>
>> I agree with you. Dropping only one variable does not actually help a lot.
>>
>> I have one more question.
>> During analysis of this model I found that the confidence
>> intervals (CIs) of some coefficients provided by bootstrapping (bootcov
>> function in rms package) was narrower than CIs provided by usual
>> variance-covariance matrix and CIs of other coefficients wider.  My data
>> has no cluster structure. I am wondering which CIs are better.
>> I guess bootstrapping one, but is it right?
>>
>> I would appreciate your help in advance.
>> --
>> KH
>>
>>
>>
>> (11/05/16 12:25), Frank Harrell wrote:
>>> I think you are doing this correctly except for one thing.  The
>>> validation
>>> and other inferential calculations should be done on the full model.  Use
>>> the approximate model to get a simpler nomogram but not to get standard
>>> errors.  With only dropping one variable you might consider just running
>>> the
>>> nomogram on the entire model.
>>> Frank
>>>
>>>
>>> KH wrote:
>>>>
>>>> Hi,
>>>> I am trying to construct a logistic regression model from my data (104
>>>> patients and 25 events). I build a full model consisting of five
>>>> predictors with the use of penalization by rms package (lrm, pentrace
>>>> etc) because of events per variable issue. Then, I tried to approximate
>>>> the full model by step-down technique predicting L from all of the
>>>> componet variables using ordinary least squares (ols in rms package) as
>>>> the followings. I would like to know whether I am doing right or not.
>>>>
>>>>> library(rms)
>>>>> plogit<- predict(full.model)
>>>>> full.ols<- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure,
>>>>> sigma=1)
>>>>> fastbw(full.ols, aics=1e10)
>>>>
>>>>    Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2
>>>>    stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991
>>>>    x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882
>>>>    procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711
>>>>    ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544
>>>>    x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000
>>>>
>>>> Then, fitted an approximation to the full model using most imprtant
>>>> variable (R^2 for predictions from the reduced model against the
>>>> original Y drops below 0.95), that is, dropping "stenosis".
>>>>
>>>>> full.ols.approx<- ols(plogit ~ x1+x2+ClinicalScore+procedure)
>>>>> full.ols.approx$stats
>>>>             n  Model L.R.        d.f.          R2           g       Sigma
>>>> 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622
>>>>
>>>> This approximate model had R^2 against the full model of 0.99.
>>>> Therefore, I updated the original full logistic model dropping
>>>> "stenosis" as predictor.
>>>>
>>>>> full.approx.lrm<- update(full.model, ~ . -stenosis)
>>>>
>>>>> validate(full.model, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000
>>>> R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000
>>>> Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000
>>>> Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000
>>>> Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000
>>>>
>>>>> validate(full.approx.lrm, bw=F, B=1000)
>>>>             index.orig training    test optimism index.corrected    n
>>>> Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000
>>>> R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000
>>>> Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000
>>>> Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000
>>>> Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000
>>>>
>>>> Validatin revealed this approximation was not bad.
>>>> Then, I made a nomogram.
>>>>
>>>>> full.approx.lrm.nom<- nomogram(full.approx.lrm,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.approx.lrm.nom)
>>>>
>>>> Another nomogram using ols model,
>>>>
>>>>> full.ols.approx.nom<- nomogram(full.ols.approx,
>>>> fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
>>>>> plot(full.ols.approx.nom)
>>>>
>>>> These two nomograms are very similar but a little bit different.
>>>>
>>>> My questions are;
>>>>
>>>> 1. Am I doing right?
>>>>
>>>> 2. Which nomogram is correct
>>>>
>>>> I would appreciate your help in advance.
>>>>
>>>> --
>>>> KH

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

x5factor_final-CI-histogram.pdf (138K) Download Attachment