| By David Smith | Article Rating: |
|
| September 19, 2012 11:00 AM EDT | Reads: |
1,640 |
This guest post is by Alex Guazzelli, VP of Analytics at Zementis Inc. -- ed.
PMML, the
Predictive Model Markup Language, is the de facto standard to represent predictive
analytics and data mining models. With PMML, it is extremely easy to move a
predictive solution from one system to another, since it avoids proprietary
issues and incompatibilities.
Companies around the globe are benefiting from PMML to
make instant use of their predictive solutions. With PMML, there is no
need for custom coding: you can easily move
your solution from the scientist’s desktop, where it was built, to the production
environment, where it is operationally deployed. Companies
also use PMML as the common language between service providers and external vendors.
In this way, it defines a single and clear process for the exchange of
predictive solutions. It becomes the bridge not only between data analysis,
model building, and deployment systems, but also between all the people and
teams involved in the analytical process. This is extremely important, since PMML
is used to disseminate knowledge and best practices, and to ensure
transparency.
All the top analytical tools, commercial and open-source,
support PMML. And, the language itself has reached a great level of maturity
and refinement. PMML 4.1, its latest version, makes it extremely easy for
predictive solutions to be represented in an open and standard way. With PMML, you
can represent a myriad of pre- and post-processing steps, besides the
predictive modeling techniques per se. PMML 4.1 allows for multiple models
(model composition, chaining, segmentation, and ensemble, which includes random
forest models), to be represented by a single and concise language element. It
also allows for model outputs to be transformed into business decisions. Therefore,
a PMML file is able to represent the entire solution, from raw data to business
decision, with one or multiple predictive models.
The availability
of a standard such as PMML combined with scoring solutions in the cloud, for
Hadoop, and in-database make it possible for predictive analytics to fulfill
its promise and crack the big data code. Zementis, Inc. has been in the
forefront of PMML-based scoring, first through its ADAPA Scoring Engine, which
is available for on-site deployment or as a service on cloud (Amazon and IBM),
and lately through its Universal PMML Plug-in which is offered for a range of
databases and for Hadoop. Zementis has partnered with Revolution Analytics, so
that predictive solutions built in R can benefit from the vast scoring infrastructure
already in place. I am proud to be associated with Zementis and excited to be
part of an ever-growing PMML community.
A PMML package for R that exports all kinds of predictive
models is available directly from CRAN.
Traditionally, the PMML Package offered support for the
following data mining algorithms:
ksvm (kernlab):
Support Vector Machines
nnet: Neural
Networks
rpart: C&RT Decision
Trees
lm & glm
(stats): Linear and Binary Logistic Regression Models
arules: Association
Rules
kmeans and hclust:
Clustering Models
Recently, it has been expanded to support:
multinom (nnet):
Multinomial Logistic Regression Models;
glm (stats):
Generalized Linear Models for classification and regression with a wide variety
of link functions
randomForest:
Random Forest Models for classification and regression (click HERE for examples);
rsf
(randomSurvivalForest): Random Survival Forest Models;
And,
this expansion is still on-going as the R community implements support for
other packages and techniques. For more on the PMML package, please take a look
at the paper we published with Graham Williams from Togaware in “The R Journal”.
For that just follow the link below:
PMML: An Open
Standard for Sharing Models
There may be quite a few reasons for you to move your
predictive solution from R to an independent deployment platform. Among them,
you may want parallel execution on big data or real-time scoring for
applications such as fraud detection or recommender systems. With PMML you can
easily move your model to the cloud or inside the database for scoring. Or,
even have it executed on Hadoop. It is really up to you! On top of that, PMML
allows for side-by-side deployment of predictive assets from R as well as other
commercial data mining tools, supporting a multi-vendor environment as well as
platform independent deployment.
More and more companies and individuals are using the PMML
standard for the obvious benefits it provides, putting their predictive
solutions on the fast track. With PMML, the speed of predictive solutions can
be on par with the speed of business.
Dr. Alex Guazzelli is the VP of Analytics
at Zementis Inc. where he is responsible for developing core technology and
predictive solutions under ADAPA, a PMML-based decisioning platform. With more
than 20 years of experience in predictive analytics, Dr. Guazzelli holds a PhD
in Computer Science from the University of Southern California and has co-authored
the book PMML
in Action: Unleashing the Power of Open Standards for Data Mining and
Predictive Analytics, now in its second edition (paperback and
kindle). You can follow him at @DrAlexGuazzelli. Read the original blog entry...
Published September 19, 2012 Reads 1,640
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By David Smith
David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.< David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid
- Cloud People: A Who's Who of Cloud Computing
- CollabNet And UC4 Announce General Availability Of Joint Enterprise DevOps Platform
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- The Software Freedom Conservancy – Fundraising Campaign: Non-Profit Accounting Software
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Remote Controlling a Car over the Web. Ingredients: Smartphone, WebSocket, and Raspberry Pi.
- Midokura Announces General Availability of Disruptive Network Virtualization Technology
- Social Business Intelligence Book Industry’s First Executive SBI Guide
- The Linux Foundation’s Collaboration – OpenDaylight Project – Open Source SDN
- Tech Trends To Watch In May 2013
- Services Orinted Architecture (SOA) Market
- Cloud People: A Who's Who of Cloud Computing
- SUSE Receives Common Criteria Security Certifications
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- CollabNet And UC4 Announce General Availability Of Joint Enterprise DevOps Platform
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- The Software Freedom Conservancy – Fundraising Campaign: Non-Profit Accounting Software
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Project Floodlight Grows to the World’s Largest SDN Ecosystem; Global Users, Contributors and Partners Innovating Using Open Source SDN
- Mobility News Weekly – Week of March 17, 2013
- Global Information Security Products And Services Industry
- Kevin Benedict’s What’s New in HTML5 – Week of February 24, 2013
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- Creating Web Applications with the Eclipse Web Tools Project
- Eclipse Special: Remote Debugging Tomcat & JBoss Apps with Eclipse
- Where Are RIA Technologies Headed in 2008?
- The Next Programming Models, RIAs and Composite Applications
- SYS-CON Webcast: Eclipse IDE for Students, Useful Eclipse Tips & Tricks
- How to Bring Eclipse 3.1, J2SE 5.0, and Tomcat 5.0 Together
- Eclipse: The Story of Web Tools Platform 0.7
- The Top 250 Players in the Cloud Computing Ecosystem
- "Eclipse 3.0 is a Great Leap Forward," Says JDJ's Dudney
- Developing an Eclipse BIRT Report Item Extension




























