Click here to close now.

Welcome!

Eclipse Authors: XebiaLabs Blog, Ken Fogel, Sematext Blog, Marcin Warpechowski, Trevor Parsons

Blog Feed Post

Quick History: glm()

by Joseph Rickert I recently wrote about some R resources that are available for generalized linear models (GLMs). Looking over the material, I was amazed by the amount of effort that is continuing to go into GLMs, both with with respect to new theoretical developments and also in response to practical problems such as the need to deal with very large data sets. (See packages biglm, ff, ffbase, RevoScaleR for example.) This led me to wonder about the history of the GLM and its implementations. An adequate exploration of this topic would occupy a serious science historian (which I am definitely not) for a considerable amount of time. However, I think even a brief look at what apears to be the main line of the development of the GLM in R provides some insight into how good software influences statistical practice. A convenient place to start is with the 1972 paper Generalized Linear Models by Nelder and Wedderburn This seems to be the first paper  to give the GLM a life of its own.  The authors pulled things together by: grouping the Normal, Poisson, Binomial (probit) and gamma distributions together as members of the exponential family applying maximum likelihood estimation via the iteratively reweighted least squares algorithm to the family introducing the terminology “generalized linear models” suggesting  that this unification would be a pedagogic improvement that would “simplify the teaching of the subject to both specialists and non-specialists” It is clear that the GLM was not “invented” in 1972. But, Nelder and Wedderburn were able to package up statistical knowledge and a tradition of analysis going pretty far back in a way that will forever shape how statisticians think about generalizations of linear models. For a brief, but fairly detailed account of the history of the major developments in the in categorical data analysis, logistic regression and loglinear models in the early 20th century leading up to the GLM see Chapter 10 of Agresti 1996. (One very interesting fact highlighted by Agresti is that the iteratively reweighted least squares algorithm that Nelder and Weddergurn used to fit GLMs is the method that R.A. Fisher introduced in 1935 to for fitting probit models by means of maximum likelihood.) The first generally available software to implement a wide range of GLMs seems to have been the Fortran based GLIM system which was developed by the Royal Statistical Society’s Working Party on Statistical Computing, released in 1974 and developed through 1993. My guess is that GLIM dominated the field for nearly 20 years until it was eclipsed by the growing popularity of the 1991 version of S, and the introduction of PROC GENMOD in version 6.09 of SAS that was released in the 1993 timeframe. (Note that the first edition of the manual for the MatLab Statistics Toolbox also dates from 1993.) In any event, in the 1980s, the GLM became the “go to” statistical tool that it is today. In the chapter on Generalized Linear Models that they contributed to Chambers and Hastie’s landmark 1992 book, Hastie and Pregibon write that “GLMS have become popular over the past 10 years, partly due to the computer package GLIM …” It is dangerous temptation to attribute more to a quotation like this than the authors intended. Nevertheless, I think it does offer some support for the idea that in a field such as statistics, theory shapes the tools and then the shape of the tools exerts some influence on how the theory develops. R’s glm() function was, of course,  modeled on the S implementation, The stats package documentation states: The original R implementation of glm was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.The design was inspired by the S function of the same name described in Hastie & Pregibon (1992). I take this to mean that the R implementation of glm() was much more than just a direct port of the S code. glm() has come a long way. It is very likely that only the SAS PROC GENMOD implementation of the GLM has matched R’s glm()in popularity over the past decade. However, SAS’s closed environment has failed to match open-source R’s ability to foster growth and stimulate creativity. The performance, stability and rock solid reliability of glm() has contributed to making GLMs a basic tool both for statisticians and for the new generation of data scientists as well.   How GLM implementations will develop outside of R in the future is not clear at all. Python’s evolving glm implementation appears to be in the GLIM tradition. (The Python documentation references the paper by Green (1984) which, in-turn, references GLIM.) Going back to first principles is always a good idea, however Python's GLM function apparently only supports one parameter exponential families. The Python developers have a long way to go before they can match R's rich functionality.The Julia glm function is clearly being modeled after R and shows much promise. However, recent threads on the julia-stats google group forum indicate that the Julia developers are just now beginning to work on basic glm() functionality. ReferencesAgresti, Alan, An Introduction to Categorical Data Analysis: John Wiley and Sons (1996)Chambers, John M. and Trevor J. Hastie (ed.), Statistical Models In S: Wadsworth & Brooks /Cole (1992)Green, P.J., Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives: Journal of the Royal Statistical Society, Series (1984)McCullagh, P. and J. A. Nelder. Generalized Linear Models: Chapman & Hall (1990)Nelder, J.A and R.W.M. Wedderburn, Generalized Linear Models: K. R. Statist Soc A (1972), 135, part 3, p. 370

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
“With easy-to-use SDKs for Atmel’s platforms, IoT developers can now reap the benefits of realtime communication, and bypass the security pitfalls and configuration complexities that put IoT deployments at risk,” said Todd Greene, founder & CEO of PubNub. PubNub will team with Atmel at CES 2015 to launch full SDK support for Atmel’s MCU, MPU, and Wireless SoC platforms. Atmel developers now have access to PubNub’s secure Publish/Subscribe messaging with guaranteed ¼ second latencies across PubNub’s 14 global points-of-presence. PubNub delivers secure communication through firewalls, proxy ser...
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements around Unified Networks, Cloud Computing strategies, Virtualization around Software defined Data Ce...
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
Chuck Piluso will present a study of cloud adoption trends and the power and flexibility of IBM Power and Pureflex cloud solutions. Speaker Bio: Prior to Data Storage Corporation (DSC), Mr. Piluso founded North American Telecommunication Corporation, a facilities-based Competitive Local Exchange Carrier licensed by the Public Service Commission in 10 states, serving as the company's chairman and president from 1997 to 2000. Between 1990 and 1997, Mr. Piluso served as chairman & founder of International Telecommunications Corporation, a facilities-based international carrier licensed by t...
There are lots of challenges in IoT around secure, scalable and business friendly infrastructure for enterprises. For large corporations, IoT implementations are one of the top priorities of the decade. All industries are seeing a competitive need to sustain by investing in IoT initiatives. The value addition comes from improved customer service, innovative product and additional revenue streams. The data from these IP-connected devices can be leveraged for a variety of business applications as well as responsive action controls. The various architectural building blocks of an IoT ...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
WebRTC is an up-and-coming standard that enables real-time voice and video to be directly embedded into browsers making the browser a primary user interface for communications and collaboration. WebRTC runs in a number of browsers today and is currently supported in over a billion installed browsers globally, across a range of platform OS and devices. Today, organizations that choose to deploy WebRTC applications and use a host machine that supports audio through USB or Bluetooth can use Plantronics products to connect and transit or receive the audio associated with the WebRTC session.
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provides a high level technical overview of many cloud services available to mobile app developers, includi...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will meet your customers' needs of tomorrow - today! Ciqada. Let your products take flight. For more inform...
Health care systems across the globe are under enormous strain, as facilities reach capacity and costs continue to rise. M2M and the Internet of Things have the potential to transform the industry through connected health solutions that can make care more efficient while reducing costs. In fact, Vodafone's annual M2M Barometer Report forecasts M2M applications rising to 57 percent in health care and life sciences by 2016. Lively is one of Vodafone's health care partners, whose solutions enable older adults to live independent lives while staying connected to loved ones. M2M will continue to gr...
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) application providers dating back to 2004. Cybozu's kintone.com is a leading global BYOA (Build Your O...
As enterprises move to all-IP networks and cloud-based applications, communications service providers (CSPs) – facing increased competition from over-the-top providers delivering content via the Internet and independently of CSPs – must be able to offer seamless cloud-based communication and collaboration solutions that can scale for small, midsize, and large enterprises, as well as public sector organizations, in order to keep and grow market share. The latest version of Oracle Communications Unified Communications Suite gives CSPs the capability to do just that. In addition, its integration ...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
While not quite mainstream yet, WebRTC is starting to gain ground with Carriers, Enterprises and Independent Software Vendors (ISV’s) alike. WebRTC makes it easy for developers to add audio and video communications into their applications by using Web browsers as their platform. But like any market, every customer engagement has unique requirements, as well as constraints. And of course, one size does not fit all. In her session at WebRTC Summit, Dr. Natasha Tamaskar, Vice President, Head of Cloud and Mobile Strategy at GENBAND, will explore what is needed to take a real time communications ...
The IoT Bootcamp is coming to Cloud Expo | @ThingsExpo on June 9-10 at the Javits Center in New York. Instructor. Registration is now available at http://iotbootcamp.sys-con.com/ Instructor Janakiram MSV previously taught the famously successful Multi-Cloud Bootcamp at Cloud Expo | @ThingsExpo in November in Santa Clara. Now he is expanding the focus to Janakiram is the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquired by Aditi Technologies. He is a Microsoft Regional Director for Hyderabad, India, and one of the f...
In 2015, 4.9 billion connected "things" will be in use. By 2020, Gartner forecasts this amount to be 25 billion, a 410 percent increase in just five years. How will businesses handle this rapid growth of data? Hadoop will continue to improve its technology to meet business demands, by enabling businesses to access/analyze data in real time, when and where they need it. Cloudera's Chief Technologist, Eli Collins, will discuss how Big Data is keeping up with today's data demands and how in the future, data and analytics will be pervasive, embedded into every workflow, application and infra...
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
As Marc Andreessen says software is eating the world. Everything is rapidly moving toward being software-defined – from our phones and cars through our washing machines to the datacenter. However, there are larger challenges when implementing software defined on a larger scale - when building software defined infrastructure. In his session at 16th Cloud Expo, Boyan Ivanov, CEO of StorPool, will provide some practical insights on what, how and why when implementing "software-defined" in the datacenter.
How is unified communications transforming the way businesses operate? In his session at WebRTC Summit, Arvind Rangarajan, Director of Product Marketing at BroadSoft, will discuss how to extend unified communications experience outside the enterprise through WebRTC. He will also review use cases across different industry verticals. Arvind Rangarajan is Director, Product Marketing at BroadSoft. He has over 19 years of experience in the telecommunications industry in various roles such as Software Development, Product Management and Product Marketing, applied across Wireless, Unified Communic...