Quantcast

How to deploy statistical models built in R in real-time?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to deploy statistical models built in R in real-time?

Guazzelli, Alex
I am framing this as a question since I would like to know how folks  
are currently deploying the models they build in R. Say, you want to  
use the results of your model inside another application in real-time  
or on-demand, how do you do it? How do you use the decisions you get  
back from your models?

As you may know, a PMML package is available for R that allows for  
many mining model to be exported into the Predictive Model Markup  
Language. PMML is the standard way to represent models and can be  
exported from most statistical packages (including SPSS, SAS,  
KNIME, ...). Once your model is represented as a PMML file, it can  
easily be moved around. PMML allows for true interoperability. We have  
recently published an article about PMML on The R Journal. It  
basically describes the PMML language and the package itself. If you  
are interested in finding out more about PMML and how to benefit from  
this standard, please check the link below.

http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf

We have also wrote a paper about open standards and cloud computing  
for the SIGKDD Explorations newsletter. In this paper, we describe the  
ADAPA Scoring Engine which executes PMML models and is available as a  
service on the Amazon Cloud. ADAPA can be used to deploy R models in  
real-time from anywhere in the world. I believe it represents a  
revolution in data mining since it allows for anyone that uses R to  
make effective use of predictive models at a cost of less than $1/hour.

http://www.zementis.com/docs/SIGKDD_ADAPA.pdf

Thanks!

Alex






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to deploy statistical models built in R in real-time?

cybaea
>
> I am framing this as a question since I would like to know how folks
> are currently deploying the models they build in R. Say, you want to
> use the results of your model inside another application in real-time
> or on-demand, how do you do it? How do you use the decisions you get
> back from your models?

Late answer, sorry.  I love PMML (and have been advocating it since at
least version 2.0) but I rarely see it deployed in commercial
companies.  What I see in decreasing order of importance:

1. Pre-scoring.  That is pre-calculate the scores of your model for each
customer and stuff them into a database that your operational system can
access.  Example: Customer churn in mobile telco.

2. Convert the model to SQL.  This is obviously easier for some model
types (trees, k-nearest neighbour, ...) than others.  This is
surprisingly common.  Example: A Big Famous Data Insights Company
created a global customer segmentation model (really: 'cause all markets
and cultures are the same....) for a multi-national company and
distributed it as a Word document with pseudo-SQL fragments for each
country to implement.  Gets over the problem of different technologies
in different countries.

3. Pre-scoring for multiple likely events.  Example: For cross- and
up-sell in a call centre (which is phenomenally effective) you really
want to include the outcome of the original call as an input to the
propensity model.  A badly handled complaint call does not offer the
same opportunities for flogging more products as a successful upgrade to
a higher price plan (but might be an opportunity to extend an
(expensive) retention offer).  The Right Way to do this is to run the
model in real time which would usually mean PMML if you have created the
model in R.  At least one vendor recommended “just” pre-scoring the
model for each possible (relevant) call outcome and storing that in the
operational database.  That vendor also sold databases :-)

4. Use PL/R to embed R within your (PostgreSQL) RDBMS.  (Rare.)

5. Embed R within your operational application and run the model that
way (I have done this exactly once).

Somewhere between 1 and 2 is an approach that doesn’t really fit with
the way you framed the question (and is probably OT for this list).  It
is simply this: if you want to deploy models for real-time or fast
on-demand usage, usually you don’t implement them in R (or SAS or
SPSS).  In Marketing, which is my main area, there are dedicated tools
for real-time decisioning and marketing like Oracle RTD [1], Unica
inbound marketing [2], Chordiant Recommendation Advisor, and others [3],
though only the first of these can realistically be described as
“modelling”.


Happy to discuss this more offline if you want.  And I really like your
approach - hope to actually use it some day.


Allan.
More at http://www.pcagroup.co.uk/ and http://www.cybaea.net/Blogs/Data/


[1]
http://www.oracle.com/appserver/business-intelligence/real-time-decisions.html
[2] http://www.unica.com/products/Inbound_Marketing.htm [web site down
at time of writing]
[3] E.piphany and SPSS Interaction Builder appears to be nearly dead in
the market.


On 08/07/09 23:38, Guazzelli, Alex wrote:

> I am framing this as a question since I would like to know how folks
> are currently deploying the models they build in R. Say, you want to
> use the results of your model inside another application in real-time
> or on-demand, how do you do it? How do you use the decisions you get
> back from your models?
>
> As you may know, a PMML package is available for R that allows for
> many mining model to be exported into the Predictive Model Markup
> Language. PMML is the standard way to represent models and can be
> exported from most statistical packages (including SPSS, SAS,
> KNIME, ...). Once your model is represented as a PMML file, it can
> easily be moved around. PMML allows for true interoperability. We have
> recently published an article about PMML on The R Journal. It
> basically describes the PMML language and the package itself. If you
> are interested in finding out more about PMML and how to benefit from
> this standard, please check the link below.
>
> http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf
>
> We have also wrote a paper about open standards and cloud computing
> for the SIGKDD Explorations newsletter. In this paper, we describe the
> ADAPA Scoring Engine which executes PMML models and is available as a
> service on the Amazon Cloud. ADAPA can be used to deploy R models in
> real-time from anywhere in the world. I believe it represents a
> revolution in data mining since it allows for anyone that uses R to
> make effective use of predictive models at a cost of less than $1/hour.
>
> http://www.zementis.com/docs/SIGKDD_ADAPA.pdf
>
> Thanks!
>
> Alex
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>    
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to deploy statistical models built in R in real-time?

Guazzelli, Alex
Hi Allan,

Thanks a lot for your reply. I am glad you are an advocate for PMML. I  
found PMML in 2004 and was fascinated by the idea of representing my  
models in a standard language that can actually be moved around  
different platforms.  PMML started small, but is now a mature  
standard. The latest version, PMML 4.0 was just released last month. I  
actually wrote a blog about all that the new version offers. If you  
are interested in taking a look at it, please follow the link below.

http://adapasupport.zementis.com/2009/06/pmml-40-is-here.html

BTW, thanks for the link to your blog. I believe it is a great source  
of information for the R community and the data mining community in  
general.

 From what you wrote in your reply, all deployment options for R users  
rely on batch mode scoring. Your conclusion was also very important  
since you say that if you want to deploy models for real-time or on-
demand, you usually would not use R ... or tools as SAS or SPSS.  
That's most probably because until recently there were actually no  
deployment platforms that could execute models built in these tools in  
real-time or on-demand. I believe the ADAPA Scoring Engine is a such a  
platform. Given that it is available on the cloud as a service, it is  
highly scalable and cost effective. The small instance costs less than  
$1/hour. I t offers a web console for batch scoring besides web  
services for real-time or on-demand scoring.

Thanks again for your reply.

Best,

Alex


On Jul 15, 2009, at 1:12 AM, Allan Engelhardt wrote:

>> I am framing this as a question since I would like to know how folks
>> are currently deploying the models they build in R. Say, you want to
>> use the results of your model inside another application in real-time
>> or on-demand, how do you do it? How do you use the decisions you get
>> back from your models?
>
> Late answer, sorry.  I love PMML (and have been advocating it since  
> at least version 2.0) but I rarely see it deployed in commercial  
> companies.  What I see in decreasing order of importance:
>
> 1. Pre-scoring.  That is pre-calculate the scores of your model for  
> each customer and stuff them into a database that your operational  
> system can access.  Example: Customer churn in mobile telco.
>
> 2. Convert the model to SQL.  This is obviously easier for some  
> model types (trees, k-nearest neighbour, ...) than others.  This is  
> surprisingly common.  Example: A Big Famous Data Insights Company  
> created a global customer segmentation model (really: 'cause all  
> markets and cultures are the same....) for a multi-national company  
> and distributed it as a Word document with pseudo-SQL fragments for  
> each country to implement.  Gets over the problem of different  
> technologies in different countries.
>
> 3. Pre-scoring for multiple likely events.  Example: For cross- and  
> up-sell in a call centre (which is phenomenally effective) you  
> really want to include the outcome of the original call as an input  
> to the propensity model.  A badly handled complaint call does not  
> offer the same opportunities for flogging more products as a  
> successful upgrade to a higher price plan (but might be an  
> opportunity to extend an (expensive) retention offer).  The Right  
> Way to do this is to run the model in real time which would usually  
> mean PMML if you have created the model in R.  At least one vendor  
> recommended “just” pre-scoring the model for each possible  
> (relevant) call outcome and storing that in the operational  
> database.  That vendor also sold databases :-)
>
> 4. Use PL/R to embed R within your (PostgreSQL) RDBMS.  (Rare.)
>
> 5. Embed R within your operational application and run the model  
> that way (I have done this exactly once).
>
> Somewhere between 1 and 2 is an approach that doesn’t really fit  
> with the way you framed the question (and is probably OT for this  
> list).  It is simply this: if you want to deploy models for real-
> time or fast on-demand usage, usually you don’t implement them in R  
> (or SAS or SPSS).  In Marketing, which is my main area, there are  
> dedicated tools for real-time decisioning and marketing like Oracle  
> RTD [1], Unica inbound marketing [2], Chordiant Recommendation  
> Advisor, and others [3], though only the first of these can  
> realistically be described as “modelling”.
>
>
> Happy to discuss this more offline if you want.  And I really like  
> your approach - hope to actually use it some day.
>
>
> Allan.
> More at http://www.pcagroup.co.uk/ and http://www.cybaea.net/Blogs/Data/
>
>
> [1] http://www.oracle.com/appserver/business-intelligence/real-time-decisions.html
> [2] http://www.unica.com/products/Inbound_Marketing.htm [web site  
> down at time of writing]
> [3] E.piphany and SPSS Interaction Builder appears to be nearly dead  
> in the market.
>
>
> On 08/07/09 23:38, Guazzelli, Alex wrote:
>>
>> I am framing this as a question since I would like to know how folks
>> are currently deploying the models they build in R. Say, you want to
>> use the results of your model inside another application in real-time
>> or on-demand, how do you do it? How do you use the decisions you get
>> back from your models?
>>
>> As you may know, a PMML package is available for R that allows for
>> many mining model to be exported into the Predictive Model Markup
>> Language. PMML is the standard way to represent models and can be
>> exported from most statistical packages (including SPSS, SAS,
>> KNIME, ...). Once your model is represented as a PMML file, it can
>> easily be moved around. PMML allows for true interoperability. We  
>> have
>> recently published an article about PMML on The R Journal. It
>> basically describes the PMML language and the package itself. If you
>> are interested in finding out more about PMML and how to benefit from
>> this standard, please check the link below.
>>
>> http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf
>>
>> We have also wrote a paper about open standards and cloud computing
>> for the SIGKDD Explorations newsletter. In this paper, we describe  
>> the
>> ADAPA Scoring Engine which executes PMML models and is available as a
>> service on the Amazon Cloud. ADAPA can be used to deploy R models in
>> real-time from anywhere in the world. I believe it represents a
>> revolution in data mining since it allows for anyone that uses R to
>> make effective use of predictive models at a cost of less than $1/
>> hour.
>>
>> http://www.zementis.com/docs/SIGKDD_ADAPA.pdf
>>
>> Thanks!
>>
>> Alex
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
Alex Guazzelli, Ph.D.
Vice President of Analytics

Zementis, Inc.
6125 Cornerstone Court East, Suite 250
San Diego, CA  92121
T: 619 330 0780  x1011
F: 858 535 0227
E: [hidden email]
www.zementis.com







        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...