I am running a predictive model on customer churn. I have around 100k customers in the database, 15% of which have already churned. My plan is to split the data into a train and test set. However, my confusion is, if I use all the 100k records in either train and test, I then wont have any customers I can get predictions on which I can then use to contact and make sure they stay with us as the model would have already seen them all.
So out of the 100k customers, do I take say 50k and split that across a train and test set 50/50 and then I keep 50k of just current customers to run through the model once I have tested the accuracy?