Question on approximations of full logistic regression model

8 messages
Open this post in threaded view
|

Question on approximations of full logistic regression model

 Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not. > library(rms) > plogit <- predict(full.model) > full.ols <- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) > fastbw(full.ols, aics=1e10)  Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2  stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991  x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882  procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711  ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544  x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping "stenosis". > full.ols.approx <- ols(plogit ~ x1+x2+ClinicalScore+procedure) > full.ols.approx\$stats           n  Model L.R.        d.f.          R2           g       Sigma 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping "stenosis" as predictor. > full.approx.lrm <- update(full.model, ~ . -stenosis) > validate(full.model, bw=F, B=1000)           index.orig training    test optimism index.corrected    n Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000 R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000 Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000 Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000 Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000 > validate(full.approx.lrm, bw=F, B=1000)           index.orig training    test optimism index.corrected    n Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000 R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000 Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000 Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000 Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram. > full.approx.lrm.nom <- nomogram(full.approx.lrm, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) > plot(full.approx.lrm.nom) Another nomogram using ols model, > full.ols.approx.nom <- nomogram(full.ols.approx, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) > plot(full.ols.approx.nom) These two nomograms are very similar but a little bit different. My questions are; 1. Am I doing right? 2. Which nomogram is correct I would appreciate your help in advance. -- KH ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

 I think you are doing this correctly except for one thing.  The validation and other inferential calculations should be done on the full model.  Use the approximate model to get a simpler nomogram but not to get standard errors.  With only dropping one variable you might consider just running the nomogram on the entire model. Frank 細田弘吉 wrote Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not. > library(rms) > plogit <- predict(full.model) > full.ols <- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) > fastbw(full.ols, aics=1e10)  Deleted       Chi-Sq d.f. P      Residual d.f. P      AIC    R2  stenosis       1.41  1    0.2354   1.41   1    0.2354  -0.59 0.991  x2            16.78  1    0.0000  18.19   2    0.0001  14.19 0.882  procedure     26.12  1    0.0000  44.31   3    0.0000  38.31 0.711  ClinicalScore 25.75  1    0.0000  70.06   4    0.0000  62.06 0.544  x1            83.42  1    0.0000 153.49   5    0.0000 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping "stenosis". > full.ols.approx <- ols(plogit ~ x1+x2+ClinicalScore+procedure) > full.ols.approx\$stats           n  Model L.R.        d.f.          R2           g       Sigma 104.0000000 487.9006640   4.0000000   0.9908257   1.3341718   0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping "stenosis" as predictor. > full.approx.lrm <- update(full.model, ~ . -stenosis) > validate(full.model, bw=F, B=1000)           index.orig training    test optimism index.corrected    n Dxy           0.6425   0.7017  0.6131   0.0887          0.5539 1000 R2            0.3270   0.3716  0.3335   0.0382          0.2888 1000 Intercept     0.0000   0.0000  0.0821  -0.0821          0.0821 1000 Slope         1.0000   1.0000  1.0548  -0.0548          1.0548 1000 Emax          0.0000   0.0000  0.0263   0.0263          0.0263 1000 > validate(full.approx.lrm, bw=F, B=1000)           index.orig training    test optimism index.corrected    n Dxy           0.6446   0.6891  0.6265   0.0626          0.5820 1000 R2            0.3245   0.3592  0.3428   0.0164          0.3081 1000 Intercept     0.0000   0.0000  0.1281  -0.1281          0.1281 1000 Slope         1.0000   1.0000  1.1104  -0.1104          1.1104 1000 Emax          0.0000   0.0000  0.0444   0.0444          0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram. > full.approx.lrm.nom <- nomogram(full.approx.lrm, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) > plot(full.approx.lrm.nom) Another nomogram using ols model, > full.ols.approx.nom <- nomogram(full.ols.approx, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) > plot(full.ols.approx.nom) These two nomograms are very similar but a little bit different. My questions are; 1. Am I doing right? 2. Which nomogram is correct I would appreciate your help in advance. -- KH ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. Frank Harrell Department of Biostatistics, Vanderbilt University
Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

Open this post in threaded view
|

Re: Question on approximations of full logistic regression model

Open this post in threaded view
|