Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

hemantsain55
I'm trying to perform an RFM analysis on the attached dataset,
I'm able to get the results using the auto_rfm function but i want to
define my own breaks for RFM.
as follow
r <-c(30,60,90)
 f <-c(2,5,8)
 m <-c(10,20,30)
but when i tried to define my own breaks i got the identical result for RFM
i.e 111 for every ID.
please help me with this with working R script so that i can get
output according to my own defined values of RFM instead of system defined
breaks.
Thanks


*Dataset:*

*user_id subtotal_amount created_at*

*3451945 19.32 6/11/2017 17:40*
*5404261 20 6/16/2017 22:45*

*3572177 9.78 7/6/2017 0:41*
*1197515 11.97 5/20/2017 17:48*
*7288355 14.76 6/5/2017 17:48*
*3071276 7.99 6/11/2017 0:13*
*8568400 15.98 6/22/2017 0:59*
*429475 7.99 6/8/2017 18:14*
*6805938 13.97 7/1/2017 23:30*
*561442 11.67 6/22/2017 18:13*
*1127373 11.27 6/11/2017 16:43*
*5973764 12.07 6/19/2017 22:43*
*683302 12.37 6/19/2017 17:18*
*391019 26.64 6/23/2017 23:57*
*580790 22.85 5/26/2017 23:34*
*1315314 6.29 6/18/2017 23:10*
*6980574 8.67 5/13/2017 20:21*
*8279240 17.26 6/2/2017 2:46*
*8700821 9.48 6/26/2017 1:47*
*778933 13.05 5/11/2017 17:50*
*1028301 9.47 5/31/2017 20:56*
*8305179 8.49 6/16/2017 0:17*
*8294420 12.65 6/11/2017 18:04*
*5775051 11.28 6/26/2017 17:13*
*3527917 7.99 6/4/2017 21:31*
*7434689 9.78 5/5/2017 16:43*
*6299124 20.65 5/25/2017 22:55*
*407736 6.88 6/5/2017 23:40*
*6207916 11.48 6/17/2017 16:31*
*2284913 10.08 7/21/2017 16:22*
*5833389 15.67 7/31/2017 17:19*
*1537907 22.63 7/4/2017 17:23*
*8791577 12.97 7/27/2017 0:09*
*7390743 18.36 5/7/2017 16:22*
*8562057 17.64 5/27/2017 19:07*
*393153 7.98 7/19/2017 16:42*
*6764358 15.37 7/14/2017 20:20*
*6444042 9.28 5/3/2017 19:54*
*442647 15.96 7/22/2017 0:01*
*6665810 7.99 7/3/2017 16:20*
*3318928 7.99 6/11/2017 22:36*
*565493 24.56 7/11/2017 16:06*
*3337179 16.86 5/31/2017 0:20*
*394651 21.67 5/22/2017 23:30*
*421849 23.57 5/24/2017 17:12*
*404111 22.06 5/22/2017 23:05*
*3967182 8.29 7/23/2017 23:00*
*8380345 10.38 6/22/2017 15:11*
*6843512 6.49 6/16/2017 2:11*
*3562940 8.18 7/18/2017 17:09*
*678953 6.99 7/7/2017 16:37*
*477935 8.47 7/19/2017 16:07*
*8069635 6.79 7/27/2017 17:29*
*435287 15.98 7/25/2017 21:39*
*7210916 11.47 7/12/2017 17:30*
*320190 16.86 6/10/2017 23:16*
*7101677 9.28 6/6/2017 20:50*
*1358520 16.65 7/10/2017 17:52*
*485601 5.99 7/18/2017 17:46*
*7288355 24.32 7/28/2017 0:01*
*8657204 6.46 6/26/2017 22:32*
*368087 15.26 6/17/2017 17:20*
*5532715 11.78 7/24/2017 16:54*
*318181 11.16 5/14/2017 23:01*
*457094 10.92 6/15/2017 18:43*
*8733533 9.67 6/16/2017 17:06*
*2229405 15.37 6/16/2017 16:59*
*654301 11.77 6/24/2017 0:16*
*440110 26.85 5/3/2017 0:58*
*478324 8.79 7/10/2017 17:02*
*927885 17.05 5/20/2017 15:41*
*1489397 8.47 7/26/2017 17:55*
*454200 13.94 7/26/2017 1:11*
*7235501 13.81 5/19/2017 1:13*
*527673 9.97 5/20/2017 0:37*
*2438553 7.99 6/18/2017 23:09*
*2592988 39.8 6/24/2017 17:37*
*538511 15.15 6/11/2017 16:58*
*481081 14.87 5/19/2017 17:57*
*1999017 18.06 7/21/2017 16:28*
*3925889 8.08 6/26/2017 16:17*
*1046802 10.67 7/19/2017 23:07*
*434850 7.49 7/2/2017 23:00*
*370681 8.47 7/14/2017 18:19*
*3554336 21.54 7/28/2017 16:50*
*5731193 9.67 5/22/2017 16:41*
*1062134 15.05 6/5/2017 21:13*
*408175 17.94 5/23/2017 20:18*
*8733533 8.38 7/15/2017 18:16*
*588771 13.87 6/25/2017 23:49*
*6338225 6.79 7/1/2017 18:59*
*6340638 17.95 6/15/2017 23:25*
*6926244 9.07 7/17/2017 19:02*
*8880079 10.58 6/14/2017 20:37*
*7333070 20.76 5/6/2017 17:22*
*6409065 13.76 7/30/2017 18:00*
*946735 7.57 7/21/2017 19:33*
*386328 15.98 7/21/2017 1:50*
*377431 9.16 5/30/2017 22:53*
*1870101 10.28 6/1/2017 17:00*
*318128 10.74 7/15/2017 22:16*
*3122611 7.68 6/27/2017 18:01*
*7528015 17.16 6/26/2017 21:10*
*6993335 8.57 6/13/2017 17:14*
*3424400 9.27 5/11/2017 18:51*
*7515441 15.77 6/26/2017 22:25*
*3962258 10.26 6/21/2017 1:33*
*6470596 12.16 7/31/2017 17:57*
*3415331 8.08 7/17/2017 19:19*
*3301515 10 7/2/2017 20:09*
*5359396 9.47 6/26/2017 19:44*
*7975103 13.87 6/29/2017 17:08*
*440393 6.18 5/3/2017 16:48*
*2170350 17.26 7/29/2017 23:11*
*370726 19.25 6/2/2017 16:03*
*761305 21.82 7/14/2017 17:25*
*7849625 12.03 7/24/2017 16:50*
*321713 7.68 7/12/2017 16:49*
*7300770 9.57 6/27/2017 16:06*
*1095644 8.47 5/9/2017 14:40*
*321315 17.94 5/4/2017 16:07*
*7579307 11.78 7/6/2017 19:42*
*7981910 7.99 6/11/2017 20:30*
*2731720 56.86 7/21/2017 16:35*
*628140 17.45 7/20/2017 17:31*
*321926 9.48 7/25/2017 17:25*
*7000119 9.37 5/28/2017 23:37*
*9101259 14.86 6/26/2017 16:31*
*8682720 17.73 7/15/2017 17:29*
*445002 30.25 5/4/2017 23:05*
*7319495 13.48 6/26/2017 17:09*
*7900556 7.99 5/28/2017 20:17*
*927932 11.37 5/7/2017 22:35*
*674966 6.89 6/28/2017 21:27*
*323736 33.25 5/17/2017 18:02*
*1096148 12.46 7/28/2017 17:36*
*3195598 6.98 5/16/2017 19:30*
*685341 7.83 6/10/2017 17:43*
*7006511 11.15 7/10/2017 20:34*
*320245 20.47 5/20/2017 16:03*
*2387580 7.36 6/26/2017 16:54*
*492746 7.99 6/25/2017 0:59*
*974050 8.49 7/16/2017 19:01*
*1706322 14.99 5/24/2017 20:22*
*7288355 12.47 5/19/2017 1:56*
*50496250 16.58 7/26/2017 0:24*
*447509 6.29 5/5/2017 16:32*
*1330217 8.27 5/19/2017 17:34*
*2154446 7.99 7/6/2017 16:41*
*1038646 8.69 5/13/2017 22:35*
*314670 8.49 5/29/2017 23:09*
*563231 26.32 7/31/2017 18:16*
*699366 12 7/4/2017 17:46*
*8306831 6.99 5/22/2017 21:58*
*4378079 10.17 7/10/2017 18:07*
*8307283 15.87 5/31/2017 19:22*
*6493978 17.32 7/18/2017 16:13*
*1299335 17.32 7/19/2017 23:24*
*1041199 11.07 5/25/2017 17:32*
*956047 7.27 5/24/2017 16:35*
*377134 19.25 7/31/2017 18:03*
*3395660 20.56 7/24/2017 18:25*
*482106 51.29 7/14/2017 17:02*
*521363 16.27 7/18/2017 22:03*
*537518 10.98 6/13/2017 21:48*
*1943828 11.77 6/29/2017 18:30*
*606395 10.98 7/11/2017 1:15*
*1228153 15.34 5/1/2017 18:48*
*6437041 10.27 5/12/2017 15:39*
*3109401 10.28 6/26/2017 16:43*
*530302 8.57 7/10/2017 18:10*
*3109401 28.94 6/22/2017 15:43*
*6461282 12.58 6/9/2017 22:52*
*8296976 16.15 5/29/2017 23:11*
*2018954 9.98 7/14/2017 23:33*
*6241196 19.56 7/14/2017 22:54*
*8217936 11.47 6/16/2017 17:10*
*463122 8.29 7/15/2017 18:28*
*532110 10.46 7/26/2017 16:53*
*9100252 15.07 6/26/2017 15:43*
*439030 9.37 7/19/2017 17:23*
*326157 11.73 6/13/2017 0:27*
*1146325 21.85 5/16/2017 0:30*
*1502399 8.29 7/6/2017 18:57*
*2623130 7.99 5/22/2017 15:57*
*2747554 11.28 5/7/2017 17:15*
*1645383 7.77 5/31/2017 18:16*
*2739083 6.66 7/3/2017 17:53*
*6004810 10.08 6/4/2017 19:39*
*3759866 14.06 7/1/2017 18:19*
*770582 10.59 5/11/2017 22:01*
*1186104 19.92 6/12/2017 17:19*
*636778 7.99 5/5/2017 17:42*
*6147540 12.82 7/23/2017 22:26*
*5813054 14.88 6/26/2017 17:00*
*4115178 12 7/3/2017 21:24*
*8964829 7.99 6/27/2017 21:28*
*3944025 9.96 5/16/2017 16:33*
*8227862 10.48 7/23/2017 19:14*
*7104071 14.46 6/14/2017 18:53*
*4255115 7.38 5/21/2017 21:36*
*2550433 8.49 6/8/2017 17:30*
*6824172 13.17 6/14/2017 20:36*
*455032 6.97 5/17/2017 17:38*
*1206605 10.67 6/24/2017 21:08*
*337571 8.67 7/7/2017 18:52*
*633665 11.98 7/9/2017 17:16*
*2202055 8.47 6/1/2017 17:54*
*3581705 15.15 6/4/2017 17:48*
*50484076 8.28 7/26/2017 18:44*
*1145712 14.85 7/12/2017 17:48*
*599316 13.47 5/5/2017 17:35*
*321097 17.85 5/10/2017 16:17*
*316951 6.88 7/11/2017 16:59*
*566170 8.49 6/11/2017 17:19*
*4183643 9.67 6/20/2017 16:54*
*3312216 27.02 6/15/2017 20:22*
*507997 8.78 5/31/2017 20:20*
*1564977 10.77 5/15/2017 18:26*
*1063879 10.67 5/11/2017 16:56*
*6768611 10.08 6/18/2017 16:02*
*466723 9.52 5/12/2017 0:21*
*461771 9.78 6/9/2017 19:00*
*1160016 15.05 6/27/2017 16:31*
*7808075 7.99 5/10/2017 18:05*
*3388025 7.28 5/26/2017 21:04*
*367393 16.94 7/31/2017 16:09*
*748047 8.37 7/12/2017 20:35*
*5312887 7.18 7/9/2017 23:32*
*9503792 44.46 7/16/2017 23:59*
*794037 16.66 7/28/2017 23:19*
*7742605 13.28 5/19/2017 17:45*
*424303 8.47 7/3/2017 16:56*
*423285 10.07 5/16/2017 16:30*
*744532 15.45 5/10/2017 21:47*
*1315758 12.45 5/4/2017 17:17*
*5768484 9.28 6/1/2017 16:47*
*1749414 7.88 7/23/2017 18:19*
*1943828 11.77 5/24/2017 18:45*
*6554452 9.58 7/14/2017 20:16*
*7333070 6.18 5/17/2017 22:34*
*6335238 9.27 5/23/2017 0:12*
*7898943 14.96 7/27/2017 18:50*
*439121 8.78 6/28/2017 16:56*
*321315 9.47 6/2/2017 16:43*
*970063 9.67 5/15/2017 13:20*
*744532 17.54 6/11/2017 19:27*
*380324 9.47 6/24/2017 1:00*
*50512368 9.47 7/27/2017 19:13*
*475428 9.48 6/17/2017 18:44*
*378535 17.36 7/6/2017 17:08*
*483547 11.29 7/31/2017 17:53*
*7825238 9.87 5/15/2017 17:55*
*3881856 8.08 6/26/2017 17:39*
*1188351 6.77 7/18/2017 17:29*
*3955854 9.37 5/1/2017 17:18*
*431483 7.88 7/2/2017 21:45*
*6906319 7.97 5/31/2017 17:05*
*2414336 17.26 5/10/2017 18:20*
*838785 9.47 6/28/2017 17:59*
*317459 9.97 7/10/2017 16:50*
*439914 9.67 7/14/2017 18:20*
*760177 17.94 6/27/2017 16:09*
*318091 14.77 5/14/2017 17:26*
*1274436 10.28 7/26/2017 17:29*
*2626118 8.49 7/7/2017 17:29*
*3224795 13.28 7/27/2017 23:22*
*7210916 10.78 6/30/2017 17:09*
*374293 13.58 7/28/2017 23:46*
*466723 8.07 5/16/2017 16:34*
*1591942 7.99 6/22/2017 17:04*
*1967943 6.49 5/5/2017 16:42*
*477759 6.38 5/12/2017 16:50*
*331106 10.67 7/5/2017 17:05*
*1170093 7.99 5/8/2017 18:37*
*3993066 22.26 6/4/2017 23:34*
*385427 10.78 5/10/2017 21:07*
*3563453 8.29 6/6/2017 16:43*
*992545 18.35 5/3/2017 15:35*
*430623 15.17 5/5/2017 16:10*
*7288355 14.76 6/3/2017 18:38*
*9124832 8.49 7/26/2017 21:53*
*7737402 14.54 5/17/2017 23:30*
*3913032 8.18 6/19/2017 17:37*
*335627 17.35 5/31/2017 16:34*
*1340737 11.87 7/12/2017 21:47*
*468557 7.98 5/10/2017 16:47*
*458337 7.49 5/23/2017 16:36*
*606287 10.48 5/1/2017 16:33*
*2321336 8.38 5/22/2017 16:25*





*R code*


*library(data.table)*
*df = fread("rfm_90.csv", header = T , stringsAsFactors = FALSE,
check.names = FALSE)[-1,]*

*sum(is.na <http://is.na/>(df))*

*df$user_id = as.integer(df$user_id)*
*df$subtotal_amount = as.numeric(df$subtotal_amount)*
*df$created_at = as.POSIXct(df$created_at)*
*df$created_at = as.Date(df$created_at)*


*occurences = table(unlist(df$user_id))*
*occurences = as.data.frame(occurences)*
*occurences = occurences[order(-occurences$Freq),]*

* r <-c(30,60,90)*
* f <-c(2,5,8)*
* m <-c(10,20,30)*

* getScoreWithBreaks <- function(df,r,f,m) {*

*   ## scoring the Recency*
*   len = length(r)*
*   R_Score <- c(rep(1,length(df[,1])))*
*   df <- cbind(df,R_Score)*
*   for(i in 1:len){*
*     if(i == 1){*
*       p1=0*
*     }else{*
*       p1=r[i-1]*
*     }*
*     p2=r[i]*

*     if(dim(df[p1<df$Recency & df$Recency<=p2,])[1]>0) df[p1<df$Recency &
df$Recency<=p2,]$R_Score = len - i+ 2*
*   }*

*   ## scoring the Frequency*
*   len = length(f)*
*   F_Score <- c(rep(1,length(df[,1])))*
*   df <- cbind(df,F_Score)*
*   for(i in 1:len){*
*     if(i == 1){*
*       p1=0*
*     }else{*
*       p1=f[i-1]*
*     }*
*     p2=f[i]*

*     if(dim(df[p1<df$Frequency & df$Frequency<=p2,])[1]>0)
df[p1<df$Frequency & df$Frequency<=p2,]$F_Score = i*
*   }*
*   if(dim(df[f[len]<df$Frequency,])[1]>0) df[f[len]<df$Frequency,]$F_Score
= len+1*

*   ## scoring the Monetary*
*   len = length(m)*
*   M_Score <- c(rep(1,length(df[,1])))*
*   df <- cbind(df,M_Score)*
*   for(i in 1:len){*
*     if(i == 1){*
*       p1=0*
*     }else{*
*       p1=m[i-1]*
*     }*
*     p2=m[i]*

*     if(dim(df[p1<df$Monetary & df$Monetary<=p2,])[1]>0) df[p1<df$Monetary
& df$Monetary<=p2,]$M_Score = i*
*   }*
*   if(dim(df[m[len]<df$Monetary,])[1]>0) df[m[len]<df$Monetary,]$M_Score =
len+1*

*   #order the dfframe by R_Score, F_Score, and M_Score desc*
*   df <- df[order(-df$R_Score,-df$F_Score,-df$M_Score),]*

*   # caculate the total score*
*   Total_Score <- c(100*df$R_Score + 10*df$F_Score+df$M_Score)*

*   df <- cbind(df,Total_Score)*

*   return(df)*

* }*



* df2<-getScoreWithBreaks(df,r,f,m)*

*# *
*library(easyRFM)*
*result <- rfm_auto(df, id="user_id", payment ="subtotal_amount",
date="created_at")*

*breaks = result$breaks*
*classes = result$classes*
*summary = result$get_table*

*classes = as.data.frame(classes)*
*breaks = as.data.frame(breaks)*
*summary = as.data.frame(summary)*
--
hemantsain.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

Jim Lemon-4
Hi Hemant,
I had a chance to look at this. Here is a function that allows you to
rank customers on the raw values of recency, frequency and monetary. I
added the code for cutting the raw values into intervals, but haven't
had a chance to test it and I can't test it right now. It may be
helpful.

rfm_df<-read.table("rfm.csv",header=TRUE,sep="\t",stringsAsFactors = FALSE)

# expects a three (or more) column data frame where
# column 1 is customer ID, column 2 is amount of purchase
# and column 3 is date of purchase
qdrfm<-function(x,rbreaks=NULL,fbreaks=NULL,mbreaks=NULL,
 date.format="%Y-%m-%d") {

 today<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
 x$rscore<-today-as.Date(x[,3],date.format)
 if(!is.null(rbreaks))
  x$rscore<-cut(x$rscore,breaks=rbreaks,labels=FALSE)
  custIDs<-unique(x[,1])
 ncust<-length(custIDs)
 rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
  fscore=rep(0,ncust),mscore=rep(0,ncust))
 rfmout$rscore=rank(by(x$rscore,x[,1],min))
 if(!is.null(fbreaks)) rfmout$fscore<-rank(by(x[,3],x[,1],length))
 else
  rfmout$fscore<-cut(by(x[,3],x[,1],length),breaks=fbreaks,labels=FALSE)
 if(!is.null(mbreaks)) rfmout$mscore<-rank(by(x[,2],x[,1],sum))
 else
  rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
 rfmout$cscore<-rank((order(rfmout$rscore)+
  order(rfmout$fscore,decreasing=TRUE)+
  order(rfmout$mscore,decreasing=TRUE))/3)
 rfmout$cscore<-rfmout$cscore-min(rfmout$cscore)+1
 return(rfmout[order(rfmout$cscore),])
}

qdrfm(rfm_df,date.format="%m/%d/%Y")

Jim

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

Jim Lemon-4
Hi Hemant,
Here is an example that might answer your questions. Please don't run
previous code as it might not work.

I define the break values as arguments to the function
(rbreaks,fbreaks,mbreaks) If you want the breaks to work, make sure that
they cover the range of the input values, otherwise you get NAs.

# expects a three (or more) column data frame where
# column 1 is customer ID, column 2 is amount of purchase
# and column 3 is date of purchase
qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,date.format="%Y-%m-%d",
 weights=c(1,1,1),finish=NA) {

 # if no finish date is specified, use current date
 if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
 x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
 x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE))
 custIDs<-unique(x[,1])
 ncust<-length(custIDs)
 rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
  fscore=rep(0,ncust),mscore=rep(0,ncust))
 rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
 rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
 rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
 rfmout$cscore<-(weights[1]*rfmout$rscore+
  weights[2]*rfmout$fscore+
  weights[3]*rfmout$mscore)/sum(weights)
 return(rfmout[order(rfmout$cscore),])
}

set.seed(12345)
x2<-data.frame(ID=sample(1:50,250,TRUE),
 purchase=round(runif(250,5,100),2),
 date=paste(rep(2016,250),sample(1:12,250,TRUE),
  sample(1:28,250,TRUE),sep="-"))

# example 1
qdrfm(x2)

# example 2
qdrfm(x2,rbreaks=c(0,200,400),fbreaks=c(0,5,10),mbreaks=c(0,350,700),
 finish=as.Date("2017-01-01"))

Jim

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

hemantsain55
I'm getting all the rows as NA in  Cscore and almost most of the
observation in R and F and M are also NA.
what can be the reason for this. also suggest me the appropriate solution.

On 9 October 2017 at 15:51, Jim Lemon <[hidden email]> wrote:

> Hi Hemant,
> Here is an example that might answer your questions. Please don't run
> previous code as it might not work.
>
> I define the break values as arguments to the function
> (rbreaks,fbreaks,mbreaks) If you want the breaks to work, make sure that
> they cover the range of the input values, otherwise you get NAs.
>
> # expects a three (or more) column data frame where
> # column 1 is customer ID, column 2 is amount of purchase
> # and column 3 is date of purchase
> qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,date.format="%Y-%m-%d",
>  weights=c(1,1,1),finish=NA) {
>
>  # if no finish date is specified, use current date
>  if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
>  x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
>  x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE))
>  custIDs<-unique(x[,1])
>  ncust<-length(custIDs)
>  rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
>   fscore=rep(0,ncust),mscore=rep(0,ncust))
>  rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
>  rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
>  rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
>  rfmout$cscore<-(weights[1]*rfmout$rscore+
>   weights[2]*rfmout$fscore+
>   weights[3]*rfmout$mscore)/sum(weights)
>  return(rfmout[order(rfmout$cscore),])
> }
>
> set.seed(12345)
> x2<-data.frame(ID=sample(1:50,250,TRUE),
>  purchase=round(runif(250,5,100),2),
>  date=paste(rep(2016,250),sample(1:12,250,TRUE),
>   sample(1:28,250,TRUE),sep="-"))
>
> # example 1
> qdrfm(x2)
>
> # example 2
> qdrfm(x2,rbreaks=c(0,200,400),fbreaks=c(0,5,10),mbreaks=c(0,350,700),
>  finish=as.Date("2017-01-01"))
>
> Jim
>
>


--
hemantsain.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

Jim Lemon-4
I seriously doubt that you are running the code I sent. What you have
probably done is to run your data, which has a different date format,
without changing the breaks or the date format arguments. As you
haven't provided any example that shows what you are doing, I can't
guess what the problem is.

Jim


On Mon, Oct 9, 2017 at 9:40 PM, Hemant Sain <[hidden email]> wrote:

> I'm getting all the rows as NA in  Cscore and almost most of the observation
> in R and F and M are also NA.
> what can be the reason for this. also suggest me the appropriate solution.
>
> On 9 October 2017 at 15:51, Jim Lemon <[hidden email]> wrote:
>>
>> Hi Hemant,
>> Here is an example that might answer your questions. Please don't run
>> previous code as it might not work.
>>
>> I define the break values as arguments to the function
>> (rbreaks,fbreaks,mbreaks) If you want the breaks to work, make sure that
>> they cover the range of the input values, otherwise you get NAs.
>>
>> # expects a three (or more) column data frame where
>> # column 1 is customer ID, column 2 is amount of purchase
>> # and column 3 is date of purchase
>> qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,date.format="%Y-%m-%d",
>>  weights=c(1,1,1),finish=NA) {
>>
>>  # if no finish date is specified, use current date
>>  if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
>>  x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
>>  x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE))
>>  custIDs<-unique(x[,1])
>>  ncust<-length(custIDs)
>>  rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
>>   fscore=rep(0,ncust),mscore=rep(0,ncust))
>>  rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
>>  rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
>>  rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
>>  rfmout$cscore<-(weights[1]*rfmout$rscore+
>>   weights[2]*rfmout$fscore+
>>   weights[3]*rfmout$mscore)/sum(weights)
>>  return(rfmout[order(rfmout$cscore),])
>> }
>>
>> set.seed(12345)
>> x2<-data.frame(ID=sample(1:50,250,TRUE),
>>  purchase=round(runif(250,5,100),2),
>>  date=paste(rep(2016,250),sample(1:12,250,TRUE),
>>   sample(1:28,250,TRUE),sep="-"))
>>
>> # example 1
>> qdrfm(x2)
>>
>> # example 2
>> qdrfm(x2,rbreaks=c(0,200,400),fbreaks=c(0,5,10),mbreaks=c(0,350,700),
>>  finish=as.Date("2017-01-01"))
>>
>> Jim
>>
>
>
>
> --
> hemantsain.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

hemantsain55
Hello Jim,
i have converted all my variable data type according to your attached
example including date, and my dataset looks like this.


ID                 purchase                 date
1234             10.2                         2017-02-18
3453             18.9                         2017-03-22
7689              8                             2017-03-24



but when I'm passing the data into the function it is giving me same values
for entire observations i. r=2, f=2, m=2

and which part of your code is responsible to calculate recency
and frequency score i mean how it will determine how many times a user made
a purchase in last 30 days so that we can put that user into our own
defined category.

one more thing it would be great if you can explain lil bit about finish
date. because i'm not able to understand what do you meant by finish date.

Thanks

On 10 October 2017 at 02:27, Jim Lemon <[hidden email]> wrote:

> I seriously doubt that you are running the code I sent. What you have
> probably done is to run your data, which has a different date format,
> without changing the breaks or the date format arguments. As you
> haven't provided any example that shows what you are doing, I can't
> guess what the problem is.
>
> Jim
>
>
> On Mon, Oct 9, 2017 at 9:40 PM, Hemant Sain <[hidden email]>
> wrote:
> > I'm getting all the rows as NA in  Cscore and almost most of the
> observation
> > in R and F and M are also NA.
> > what can be the reason for this. also suggest me the appropriate
> solution.
> >
> > On 9 October 2017 at 15:51, Jim Lemon <[hidden email]> wrote:
> >>
> >> Hi Hemant,
> >> Here is an example that might answer your questions. Please don't run
> >> previous code as it might not work.
> >>
> >> I define the break values as arguments to the function
> >> (rbreaks,fbreaks,mbreaks) If you want the breaks to work, make sure that
> >> they cover the range of the input values, otherwise you get NAs.
> >>
> >> # expects a three (or more) column data frame where
> >> # column 1 is customer ID, column 2 is amount of purchase
> >> # and column 3 is date of purchase
> >> qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,date.format="%Y-%m-%d",
> >>  weights=c(1,1,1),finish=NA) {
> >>
> >>  # if no finish date is specified, use current date
> >>  if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
> >>  x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
> >>  x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE))
> >>  custIDs<-unique(x[,1])
> >>  ncust<-length(custIDs)
> >>  rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
> >>   fscore=rep(0,ncust),mscore=rep(0,ncust))
> >>  rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
> >>  rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
> >>  rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
> >>  rfmout$cscore<-(weights[1]*rfmout$rscore+
> >>   weights[2]*rfmout$fscore+
> >>   weights[3]*rfmout$mscore)/sum(weights)
> >>  return(rfmout[order(rfmout$cscore),])
> >> }
> >>
> >> set.seed(12345)
> >> x2<-data.frame(ID=sample(1:50,250,TRUE),
> >>  purchase=round(runif(250,5,100),2),
> >>  date=paste(rep(2016,250),sample(1:12,250,TRUE),
> >>   sample(1:28,250,TRUE),sep="-"))
> >>
> >> # example 1
> >> qdrfm(x2)
> >>
> >> # example 2
> >> qdrfm(x2,rbreaks=c(0,200,400),fbreaks=c(0,5,10),mbreaks=c(0,350,700),
> >>  finish=as.Date("2017-01-01"))
> >>
> >> Jim
> >>
> >
> >
> >
> > --
> > hemantsain.com
>



--
hemantsain.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)

Jim Lemon-4
Hi Hemant,
see inline below.

On Tue, Oct 10, 2017 at 4:19 PM, Hemant Sain <[hidden email]> wrote:

> Hello Jim,
> i have converted all my variable data type according to your attached
> example including date, and my dataset looks like this.
>
>
> ID                 purchase                 date
> 1234             10.2                         2017-02-18
> 3453             18.9                         2017-03-22
> 7689              8                             2017-03-24
>
As I don't have your data set, I can't tell you why you are getting
the same values. First, I'll create a data set that looks something
like your example, except with 50 customers and 250 transactions, all
in 2017:

set.seed(12345)
x3<-data.frame(ID=sample(1234:1283,250,TRUE),
 purchase=round(runif(250,5,100),2),
 date=paste(rep(2017,250),sample(1:9,250,TRUE),
  sample(1:28,250,TRUE),sep="-"))

Look at it carefully. Is there anything that you think is wrong?

>
>
> but when I'm passing the data into the function it is giving me same values
> for entire observations i. r=2, f=2, m=2
>
> and which part of your code is responsible to calculate recency and
> frequency score i mean how it will determine how many times a user made a
> purchase in last 30 days so that we can put that user into our own defined
> category.
>
Here is the function commented for easier understanding.

Recency is calculated as the most recent purchase for each customer
from the "finish" date, which defaults to the current date. If you are
examining historical data, you may want to set a different "finish"
date.

Frequency is simply the number of purchases recorded for each customer.

Monetary is the sum of the purchase amounts for each customer

The default breaks for each score are those calculated by the "cut"
function. If you want specific breaks, they _must_ cover the range of
the values or cut will generate NAs. I have added a printout of the
ranges of the raw recency, frequency and monetary scores so that you
can enter your own breaks.

qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,
 date.format="%Y-%m-%d",weights=c(1,1,1),finish=NA) {

 # if no finish date is specified, use current date
 if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y")
 x$rscore<-as.numeric(finish-as.Date(x[,3],date.format))
 cat("Range of purchase recency",range(x$rscore),"\n")
 x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE))
 cat("Range of purchase freqency",range(table(x[,1])),"\n")
 cat("Range of purchase amount",range(by(x[,2],x[,1],sum)),"\n")
 custIDs<-unique(x[,1])
 ncust<-length(custIDs)
 # initialize a data frame to hold the output
 rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust),
  fscore=rep(0,ncust),mscore=rep(0,ncust))
 # categorize the minimum number of days
 # since last purchase for each customer
 rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE)
 # categorize the number of purchases
 # recorded for each customer
 rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE)
 # categorize the amount purchased
 # by each customer
 rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE)
 # calculate the RFM score from the
 # optionally weighted average of the above
 rfmout$cscore<-round((weights[1]*rfmout$rscore+
  weights[2]*rfmout$fscore+
  weights[3]*rfmout$mscore)/sum(weights),2)
 return(rfmout[order(rfmout$cscore),])
}

# run the dataset with default breaks
qdrfm(x3)

# now specify breaks with respect to the printout of the raw scores
qdrfm(x3,rbreaks=c(0,150,300),fbreaks=c(0,5,11),mbreaks=c(0,300,600))

# now give the total amount purchased twice the weight
qdrfm(x3,rbreaks=c(0,150,300),fbreaks=c(0,5,11),
 mbreaks=c(0,300,600),weights=c(1,1,2))

I hope that this will explain the function better.

Jim

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.