Hi Hemant,
Let's take it one step at a time. Save this code as "qdrfm.R" in your R working directory: It includes the comments I added last time and fixes a bug in the recency scoring. qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3, date.format="%Y-%m-%d",weights=c(1,1,1),finish=NA) { # if no finish date is specified, use current date if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y") x$rscore<-as.numeric(finish-as.Date(x[,3],date.format)) cat("Range of purchase recency",range(x$rscore),"\n") cat("Range of purchase freqency",range(table(x[,1])),"\n") cat("Range of purchase amount",range(by(x[,2],x[,1],sum)),"\n") custIDs<-unique(x[,1]) ncust<-length(custIDs) # initialize a data frame to hold the output rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust), fscore=rep(0,ncust),mscore=rep(0,ncust)) # categorize the minimum number of days # since last purchase for each customer rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE) # categorize the number of purchases # recorded for each customer rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE) # categorize the amount purchased # by each customer rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE) # calculate the RFM score from the # optionally weighted average of the above rfmout$cscore<-round((weights[1]*rfmout$rscore+ weights[2]*rfmout$fscore+ weights[3]*rfmout$mscore)/sum(weights),2) return(rfmout[order(rfmout$cscore),]) } Now you can load the function into your workspace like this: source("qdrfm.R") Load your data: df<-read.csv("df.csv") Run the function with the defaults except for the finish date: df.rfm<-qdrfm(df,finish=as.Date("2017-08-31")) Range of purchase recency 31 122 Range of purchase freqency 1 4 Range of purchase amount 5.97 127.65 Your problem is now apparent. If I use the following breaks, I will generate NA values in all three scores: df.rfm2<-qdrfm(df,rbreaks=c(10,30,50),fbreaks=c(1,2,3), mbreaks=c(8,14,400),finish=as.Date("2017-08-31")) head(df.rfm2) As I wrote before, the breaks _must_ cover the range of values if you want a sensible analysis: df.rfm3<-qdrfm(df,rbreaks=c(0,75,150),fbreaks=c(0,2,5), mbreaks=c(0,75,150),finish=as.Date("2017-08-31")) head(df.rfm3) Looking at df.rfm3, it seems that the recency score is the only one discriminating users. This suggests to me that the data distributions are causing a problem. First, you have 946 users in a dataset of 1000 rows, meaning that almost all made only one transaction. Second, your purchase amounts are concentrated in the 0-20 range. Therefore if I change the breaks to reflect this, I get a much better separation of customers: df.rfm4<-qdrfm(df,rbreaks=c(0,75,150),fbreaks=c(0,1,5), mbreaks=c(0,10,150),finish=as.Date("2017-08-31")) Maybe this will get you going. Jim On Wed, Oct 11, 2017 at 4:43 PM, Hemant Sain <[hidden email]> wrote: > Also try to put finish date as 2017-08-31. > and help me with the complete running r code. > > On 11 October 2017 at 10:36, Hemant Sain <[hidden email]> wrote: >> >> Hey Jim, >> i'm attaching you the actual dataset i'm working on and i want RFM breaks >> as >> r=(10,30,50), f=(1,2,3),m=(8,14,400). >> ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |