how to do an external validation with R

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how to do an external validation with R

beginner
I would like to do external validation using R software. So far I have used packages like "Design" and "DAAG". However they perform internal validation rather than external one. In order to perform external validation I would have to split my data beforehand into training and test set, leave the test set on the site and use only the training set to select a model. I would then test the selected model with the sample set left initially on the site. I would like to repeat this process several times to make sure that all the samples are included at least once in a test set. I thought that I need to use a loop function in R to perform this process automatically. As I am new to R I don't know how to make a loop. Could you please help me with this or suggest an R package ? I would be very very grateful for help with this task !
Reply | Threaded
Open this post in threaded view
|

Re: how to do an external validation with R

Frank Harrell
Splitting one dataset into training and test is still internal validation, and it requires an enormous sample size (around 20,000 typically) in order to be competitive with the bootstrap.

Note that the Design package has been replaced with rms, and rms has two functions for external validation (val.prob and val.surv, should you have had external data unlike your case) and many more for internal validation.
Frank
beginner wrote
I would like to do external validation using R software. So far I have used packages like "Design" and "DAAG". However they perform internal validation rather than external one. In order to perform external validation I would have to split my data beforehand into training and test set, leave the test set on the site and use only the training set to select a model. I would then test the selected model with the sample set left initially on the site. I would like to repeat this process several times to make sure that all the samples are included at least once in a test set. I thought that I need to use a loop function in R to perform this process automatically. As I am new to R I don't know how to make a loop. Could you please help me with this or suggest an R package ? I would be very very grateful for help with this task !
Frank Harrell
Department of Biostatistics, Vanderbilt University