Quantcast

[R] logistic regression model + Cross-Validation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[R] logistic regression model + Cross-Validation

nitin jindal
Hi,

I am trying to cross-validate a logistic regression model.
I am using logistic regression model (lrm) of package Design.

f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
val <- validate.lrm(f, method="cross", B=5)

My class cy has values 0 and 1.

"val" variable will give me indicators like slope and AUC. But, I also need
the vector of predicted values of class variable "cy" for each record while
cross-validation, so that I can manually look at the results. So, is there
any way to get those probabilities assigned to each class.

regards,
Nitin

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [R] logistic regression model + Cross-Validation

Frank Harrell
nitin jindal wrote:
> Hi,
>
> I am trying to cross-validate a logistic regression model.
> I am using logistic regression model (lrm) of package Design.
>
> f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
> val <- validate.lrm(f, method="cross", B=5)

val <- validate(f, ...)    # .lrm not needed

>
> My class cy has values 0 and 1.
>
> "val" variable will give me indicators like slope and AUC. But, I also need
> the vector of predicted values of class variable "cy" for each record while
> cross-validation, so that I can manually look at the results. So, is there
> any way to get those probabilities assigned to each class.
>
> regards,
> Nitin

No, validate.lrm does not have that option.  Manually looking at the
results will not be easy when you do enough cross-validations.  A single
5-fold cross-validation does not provide accurate estimates.  Either use
the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
  k is often 10 but the optimum value may not be 10.  Code for averaging
repeated cross-validations is in
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
along with simulations of bootstrap vs. a few cross-validation methods
for binary logistic models.

Frank
--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [R] logistic regression model + Cross-Validation

nitin jindal
If validate.lrm does not has this option, do any other function has it.
I will certainly look into your advice on cross validation. Thnx.

nitin

On 1/21/07, Frank E Harrell Jr <[hidden email]> wrote:

>
> nitin jindal wrote:
> > Hi,
> >
> > I am trying to cross-validate a logistic regression model.
> > I am using logistic regression model (lrm) of package Design.
> >
> > f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
> > val <- validate.lrm(f, method="cross", B=5)
>
> val <- validate(f, ...)    # .lrm not needed
>
> >
> > My class cy has values 0 and 1.
> >
> > "val" variable will give me indicators like slope and AUC. But, I also
> need
> > the vector of predicted values of class variable "cy" for each record
> while
> > cross-validation, so that I can manually look at the results. So, is
> there
> > any way to get those probabilities assigned to each class.
> >
> > regards,
> > Nitin
>
> No, validate.lrm does not have that option.  Manually looking at the
> results will not be easy when you do enough cross-validations.  A single
> 5-fold cross-validation does not provide accurate estimates.  Either use
> the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
>   k is often 10 but the optimum value may not be 10.  Code for averaging
> repeated cross-validations is in
> http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
> along with simulations of bootstrap vs. a few cross-validation methods
> for binary logistic models.
>
> Frank
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [R] logistic regression model + Cross-Validation

Frank Harrell
nitin jindal wrote:
> If validate.lrm does not has this option, do any other function has it.
> I will certainly look into your advice on cross validation. Thnx.
>
> nitin

Not that I know of, but easy to program.
Frank

>
> On 1/21/07, Frank E Harrell Jr <[hidden email]> wrote:
>> nitin jindal wrote:
>>> Hi,
>>>
>>> I am trying to cross-validate a logistic regression model.
>>> I am using logistic regression model (lrm) of package Design.
>>>
>>> f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
>>> val <- validate.lrm(f, method="cross", B=5)
>> val <- validate(f, ...)    # .lrm not needed
>>
>>> My class cy has values 0 and 1.
>>>
>>> "val" variable will give me indicators like slope and AUC. But, I also
>> need
>>> the vector of predicted values of class variable "cy" for each record
>> while
>>> cross-validation, so that I can manually look at the results. So, is
>> there
>>> any way to get those probabilities assigned to each class.
>>>
>>> regards,
>>> Nitin
>> No, validate.lrm does not have that option.  Manually looking at the
>> results will not be easy when you do enough cross-validations.  A single
>> 5-fold cross-validation does not provide accurate estimates.  Either use
>> the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
>>   k is often 10 but the optimum value may not be 10.  Code for averaging
>> repeated cross-validations is in
>> http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
>> along with simulations of bootstrap vs. a few cross-validation methods
>> for binary logistic models.
>>
>> Frank
>> --
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                       Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [R] logistic regression model + Cross-Validation

Weiwei Shi
In reply to this post by nitin jindal
why not use lda{MASS} and it has cv=T option; it does "loo", though.
Or use randomForest.

if you have to use lrm, then the following code might help:

n.fold <- 5 # 5-fold cv
n.sample <- 50 # assumed 50 samples
s <- sample(1:n.fold, size=n.sample, replace=T)
for (i in 1:n.fold){
  # create your training data and validation data for each fold
  trn <- YOURWHOLEDATAFRAME[s!=i,]
  val <- YOURWHOLEDATAFRAME[s==i,]
  # now do your own modeling using lrm
  # todo
}

HTH,

weiwei

On 1/21/07, nitin jindal <[hidden email]> wrote:

> If validate.lrm does not has this option, do any other function has it.
> I will certainly look into your advice on cross validation. Thnx.
>
> nitin
>
> On 1/21/07, Frank E Harrell Jr <[hidden email]> wrote:
> >
> > nitin jindal wrote:
> > > Hi,
> > >
> > > I am trying to cross-validate a logistic regression model.
> > > I am using logistic regression model (lrm) of package Design.
> > >
> > > f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
> > > val <- validate.lrm(f, method="cross", B=5)
> >
> > val <- validate(f, ...)    # .lrm not needed
> >
> > >
> > > My class cy has values 0 and 1.
> > >
> > > "val" variable will give me indicators like slope and AUC. But, I also
> > need
> > > the vector of predicted values of class variable "cy" for each record
> > while
> > > cross-validation, so that I can manually look at the results. So, is
> > there
> > > any way to get those probabilities assigned to each class.
> > >
> > > regards,
> > > Nitin
> >
> > No, validate.lrm does not have that option.  Manually looking at the
> > results will not be easy when you do enough cross-validations.  A single
> > 5-fold cross-validation does not provide accurate estimates.  Either use
> > the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
> >   k is often 10 but the optimum value may not be 10.  Code for averaging
> > repeated cross-validations is in
> > http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
> > along with simulations of bootstrap vs. a few cross-validation methods
> > for binary logistic models.
> >
> > Frank
> > --
> > Frank E Harrell Jr   Professor and Chair           School of Medicine
> >                       Department of Biostatistics   Vanderbilt University
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...