Advice how to split my data

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Advice how to split my data

Hi all.

I am running a predictive model on customer churn.  I have around 100k customers in the database, 15% of which have already churned.  My plan is to split the data into a train and test set.  However, my confusion is, if I use all the 100k records in either train and test, I then wont have any customers I can get predictions on which I can then use to contact and make sure they stay with us as the model would have already seen them all.

So out of the 100k customers, do I take say 50k and split that across a train and test set 50/50 and then I keep 50k of just current customers to run through the model once I have tested the accuracy?

Many thanks for any help.