Welcome!

Eclipse Authors: Carmen Gonzalez, Roger Strukhoff, Lori MacVittie, Kevin Jackson, Mark R. Hinkle

Blog Feed Post

Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

At the Strata conference in New York today, Steve Yun (Principal Predictive Modeler at Allstate's Research and Planning Center) described the various ways he tackled the problem of fitting a generalized linear model to 150M records of insurance data. He evaluated several approaches: Proc GENMOD in SAS Installing a Hadoop cluster Using open-source R (both on the full data set, and on using sampling) Running the data through Revolution R Enterprise  Steve described each of the approaches as follows. Approach 1 is current practice. (Allstate is a big user of SAS, but there's a growing contingent of R users.) Proc GENMOD takes around 5 hours to return results for a Poisson model with 150 million observations and 70 degrees of freedom. "It's difficult to be productive on a tight schedule if it takes over 5 hours to fit one candidate models!", Steve said. Approach 2: It was hoped that installing a Hadoop cluster and running the model there would improve performace. According to Steve, "a lot of plumbing was required: this involved coding the matrix equations for iteratively-reweighted least squares as a map-reduce task (using the rmr package), and manually coding the factor variables as indicator columns in the design matrix. Unfortunately, each iteration took abour 1.5 hours, with 5-10 iterations required to convergence. (Even then, there were problems with singularites in the design matrix.) Approach 3: Perhaps installing R on a server with lots of RAM would help. (Because open-source R runs in-memory, you need RAM in the order of several times the size of the data to make it work.) Alas, not even a 250Gb server was sufficient: even after waiting three days, the data couldn't even be loaded. Sampling the data down into 10 partitions was more successful, and allowed for the use of the glmnet package and L1 regularization to automate the variable selection process. But each glmnet fit on a partition still took over 30 minutes, and Steve said it would be difficult for managers to accept a process that involved sampling. Approach 4: Steve turned to Revolution Analytics' Joe Rickert to evaluate how long the same model would take using the big-data RevoScaleR package in Revolution R Enterprise. Joe loaded the data onto a 5-node cluster (20 cores total), and used the distributed rxGlm function, which was able to process the data in 5.7 minutes. Joe demonstrated this process live during the session. So in summary, here's how the four approaches fared: Approach Platform Time to fit 1: SAS 16-core  Sun Server 5 hours 2: rmr / map-reduce 10-node (8 cores / node) Hadoop cluster > 10 hours 3: Open source R 250 GB Server Impossible (> 3 days) 4: RevoScaleR 5-node (4 cores / node) LSF cluster 5.7 minutes That's quite a difference! So what have we learned: SAS works, but is slow. It's possible to program the model in Hadoop, but it's even slower. The data is too big for open-source R, even on a very large server. Revolution R Enterprise gets the same results as SAS, but about 50x faster. Steve and Joe's slides and video of the presentation, Start Small Before Going Big, will be available on the Strata website in due course.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...

ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ --  IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's platform-as-a-service. The new platform enables developers to build ap...

Focused on this fast-growing market’s needs, Vitesse Semiconductor Corporation (Nasdaq: VTSS), a leading provider of IC solutions to advance "Ethernet Everywhere" in Carrier, Enterprise and Internet of Things (IoT) networks, introduced its IStaX™ software (VSC6815SDK), a robust protocol stack to simplify deployment and management of Industrial-IoT network applications such as Industrial Ethernet switching, surveillance, video distribution, LCD signage, intelligent sensors, and metering equipment. Leveraging technologies proven in the Carrier and Enterprise markets, IStaX is designed to work ac...
"There is a natural synchronization between the business models, the IoT is there to support ,” explained Brendan O'Brien, Co-founder and Chief Architect of Aria Systems, in this SYS-CON.tv interview at the 15th International Cloud Expo®, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
C-Labs LLC, a leading provider of remote and mobile access for the Internet of Things (IoT), announced the appointment of John Traynor to the position of chief operating officer. Previously a strategic advisor to the firm, Mr. Traynor will now oversee sales, marketing, finance, and operations. Mr. Traynor is based out of the C-Labs office in Redmond, Washington. He reports to Chris Muench, Chief Executive Officer. Mr. Traynor brings valuable business leadership and technology industry expertise to C-Labs. With over 30 years' experience in the high-tech sector, John Traynor has held numerous...
Bit6 today issued a challenge to the technology community implementing Web Real Time Communication (WebRTC). To leap beyond WebRTC’s significant limitations and fully leverage its underlying value to accelerate innovation, application developers need to consider the entire communications ecosystem.
The 3rd International @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades.
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data a...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, phone and digital TV services to consumers primarily in rural areas.
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what the future may hold. Mike Kavis is Vice President & Principal Cloud Architect at Cloud Technology Pa...
The Industrial Revolution in the 18th to 19th centuries was a period during which predominantly rural societies in Europe and America became industrial and urban. Advances in steam technology, transportation, mass production and the telegraph collectively transformed industry and society. Today, the Internet of Things (IoT) has the potential to once again transform industry and society just as the Industrial Revolution did. Analyst firm IDC forecasts that the IoT market will grow to $8.9 trillion by 2020 with anywhere between 30 to 50 billion connected autonomous things, making the potential g...
Cloud Expo 2014 TV commercials will feature @ThingsExpo, which was launched in June, 2014 at New York City's Javits Center as the largest 'Internet of Things' event in the world. The next @ThingsExpo will take place November 4-6, 2014, at the Santa Clara Convention Center, in Santa Clara, California. Since its launch in 2008, Cloud Expo TV commercials have been aired and CNBC, Fox News Network, and Bloomberg TV. Please enjoy our 2014 commercial.
From a software development perspective IoT is about programming "things," about connecting them with each other or integrating them with existing applications. In his session at @ThingsExpo, Yakov Fain, co-founder of Farata Systems and SuranceBay, will show you how small IoT-enabled devices from multiple manufacturers can be integrated into the workflow of an enterprise application. This is a practical demo of building a framework and components in HTML/Java/Mobile technologies to serve as a platform that can integrate new devices as they become available on the market.
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is now open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
Located in booth #314, the Bsquare team will present DataV demos and discuss how DataV will help customers put their data to work to improve business outcomes. DataV is unlocking new initiatives across a wide landscape of customers in industries such as industrial manufacturing, transportation, retail and mobile. The solution is designed to complement a new project start or help to enrich an existing machine investment.
The Physical Web incorporates beacons that can be put in any small retail store, for example, so that every store now has "an app" for its customers. In this Birds-of-a-Feather session at Internet of @ThingsExpo, Scott Jenson, Product Designer at Google, will discuss the Physical Web and how it is an open standard so any device can broadcast a URL wirelessly, so any phone/tablet/watch nearby can see, and rank those devices. When the user taps on one, they just go to that web page. It's really that simple. It's about thinking small, enabling micro-information (what is in my prescription bottle...
BSQUARE is a global leader of embedded software solutions. We enable smart connected systems at the device level and beyond that millions use every day and provide actionable data solutions for the growing Internet of Things (IoT) market. We empower our world-class customers with our products, services and solutions to achieve innovation and success. For more information, visit www.bsquare.com.
Whether you're a startup or a 100 year old enterprise, the Internet of Things offers a variety of new capabilities for your business. IoT style solutions can help you get closer your customers, launch new product lines and take over an industry. Some companies are dipping their toes in, but many have already taken the plunge, all while dramatic new capabilities continue to emerge. In his session at Internet of @ThingsExpo, Reid Carlberg, Senior Director, Developer Evangelism at salesforce.com, to discuss real-world use cases, patterns and opportunities you can harness today.