Welcome!

Eclipse Authors: Carmen Gonzalez, Lori MacVittie, Mark R. Hinkle, Natalie Lerner, Roger Strukhoff

Blog Feed Post

Quick History: glm()

by Joseph Rickert I recently wrote about some R resources that are available for generalized linear models (GLMs). Looking over the material, I was amazed by the amount of effort that is continuing to go into GLMs, both with with respect to new theoretical developments and also in response to practical problems such as the need to deal with very large data sets. (See packages biglm, ff, ffbase, RevoScaleR for example.) This led me to wonder about the history of the GLM and its implementations. An adequate exploration of this topic would occupy a serious science historian (which I am definitely not) for a considerable amount of time. However, I think even a brief look at what apears to be the main line of the development of the GLM in R provides some insight into how good software influences statistical practice. A convenient place to start is with the 1972 paper Generalized Linear Models by Nelder and Wedderburn This seems to be the first paper  to give the GLM a life of its own.  The authors pulled things together by: grouping the Normal, Poisson, Binomial (probit) and gamma distributions together as members of the exponential family applying maximum likelihood estimation via the iteratively reweighted least squares algorithm to the family introducing the terminology “generalized linear models” suggesting  that this unification would be a pedagogic improvement that would “simplify the teaching of the subject to both specialists and non-specialists” It is clear that the GLM was not “invented” in 1972. But, Nelder and Wedderburn were able to package up statistical knowledge and a tradition of analysis going pretty far back in a way that will forever shape how statisticians think about generalizations of linear models. For a brief, but fairly detailed account of the history of the major developments in the in categorical data analysis, logistic regression and loglinear models in the early 20th century leading up to the GLM see Chapter 10 of Agresti 1996. (One very interesting fact highlighted by Agresti is that the iteratively reweighted least squares algorithm that Nelder and Weddergurn used to fit GLMs is the method that R.A. Fisher introduced in 1935 to for fitting probit models by means of maximum likelihood.) The first generally available software to implement a wide range of GLMs seems to have been the Fortran based GLIM system which was developed by the Royal Statistical Society’s Working Party on Statistical Computing, released in 1974 and developed through 1993. My guess is that GLIM dominated the field for nearly 20 years until it was eclipsed by the growing popularity of the 1991 version of S, and the introduction of PROC GENMOD in version 6.09 of SAS that was released in the 1993 timeframe. (Note that the first edition of the manual for the MatLab Statistics Toolbox also dates from 1993.) In any event, in the 1980s, the GLM became the “go to” statistical tool that it is today. In the chapter on Generalized Linear Models that they contributed to Chambers and Hastie’s landmark 1992 book, Hastie and Pregibon write that “GLMS have become popular over the past 10 years, partly due to the computer package GLIM …” It is dangerous temptation to attribute more to a quotation like this than the authors intended. Nevertheless, I think it does offer some support for the idea that in a field such as statistics, theory shapes the tools and then the shape of the tools exerts some influence on how the theory develops. R’s glm() function was, of course,  modeled on the S implementation, The stats package documentation states: The original R implementation of glm was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.The design was inspired by the S function of the same name described in Hastie & Pregibon (1992). I take this to mean that the R implementation of glm() was much more than just a direct port of the S code. glm() has come a long way. It is very likely that only the SAS PROC GENMOD implementation of the GLM has matched R’s glm()in popularity over the past decade. However, SAS’s closed environment has failed to match open-source R’s ability to foster growth and stimulate creativity. The performance, stability and rock solid reliability of glm() has contributed to making GLMs a basic tool both for statisticians and for the new generation of data scientists as well.   How GLM implementations will develop outside of R in the future is not clear at all. Python’s evolving glm implementation appears to be in the GLIM tradition. (The Python documentation references the paper by Green (1984) which, in-turn, references GLIM.) Going back to first principles is always a good idea, however Python's GLM function apparently only supports one parameter exponential families. The Python developers have a long way to go before they can match R's rich functionality.The Julia glm function is clearly being modeled after R and shows much promise. However, recent threads on the julia-stats google group forum indicate that the Julia developers are just now beginning to work on basic glm() functionality. ReferencesAgresti, Alan, An Introduction to Categorical Data Analysis: John Wiley and Sons (1996)Chambers, John M. and Trevor J. Hastie (ed.), Statistical Models In S: Wadsworth & Brooks /Cole (1992)Green, P.J., Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives: Journal of the Royal Statistical Society, Series (1984)McCullagh, P. and J. A. Nelder. Generalized Linear Models: Chapman & Hall (1990)Nelder, J.A and R.W.M. Wedderburn, Generalized Linear Models: K. R. Statist Soc A (1972), 135, part 3, p. 370

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
BSQUARE is a global leader of embedded software solutions. We enable smart connected systems at the device level and beyond that millions use every day and provide actionable data solutions for the growing Internet of Things (IoT) market. We empower our world-class customers with our products, services and solutions to achieve innovation and success. For more information, visit www.bsquare.com.
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic • Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happens, where data lives and where the interface lies. For instance, it’s a mix of architectural style...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
Connected devices are changing the way we go about our everyday life, from wearables to driverless cars, to smart grids and entire industries revolutionizing business opportunities through smart objects, capable of two-way communication. But what happens when objects are given an IP-address, and we rely on that connection, sometimes with our lives? How do we secure those vast data infrastructures and safe-keep the privacy of sensitive information? This session will outline how each and every connected device can uphold a core root of trust via a unique cryptographic signature – a “bir...
SYS-CON Events announced today that Utimaco will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Utimaco is a leading manufacturer of hardware based security solutions that provide the root of trust to keep cryptographic keys safe, secure critical digital infrastructures and protect high value data assets. Only Utimaco delivers a general-purpose hardware security module (HSM) as a customizable platform to easily integrate into existing software solutions, embed business logic and build s...
SYS-CON Events announced today that Red Hat, the world's leading provider of open source solutions, will exhibit at Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Red Hat is the world's leading provider of open source software solutions, using a community-powered approach to reliable and high-performing cloud, Linux, middleware, storage and virtualization technologies. Red Hat also offers award-winning support, training, and consulting services. As the connective hub in a global network of enterprises, partners, a...
SYS-CON Events announced today that SOA Software, an API management leader, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. SOA Software is a leading provider of API Management and SOA Governance products that equip business to deliver APIs and SOA together to drive their company to meet its business strategy quickly and effectively. SOA Software’s technology helps businesses to accelerate their digital channels with APIs, drive partner adoption, monetize their assets, and achieve a...
From a software development perspective IoT is about programming "things," about connecting them with each other or integrating them with existing applications. In his session at @ThingsExpo, Yakov Fain, co-founder of Farata Systems and SuranceBay, will show you how small IoT-enabled devices from multiple manufacturers can be integrated into the workflow of an enterprise application. This is a practical demo of building a framework and components in HTML/Java/Mobile technologies to serve as a platform that can integrate new devices as they become available on the market.
SYS-CON Events announced today that Matrix.org has been named “Silver Sponsor” of Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Matrix is an ambitious new open standard for open, distributed, real-time communication over IP. It defines a new approach for interoperable Instant Messaging and VoIP based on pragmatic HTTP APIs and WebRTC, and provides open source reference implementations to showcase and bootstrap the new standard. Our focus is on simplicity, security, and supporting the fullest feature set.
Internet of @ThingsExpo Silicon Valley announced on Thursday its first 12 all-star speakers and sessions for its upcoming event, which will take place November 4-6, 2014, at the Santa Clara Convention Center in California. @ThingsExpo, the first and largest IoT event in the world, debuted at the Javits Center in New York City in June 10-12, 2014 with over 6,000 delegates attending the conference. Among the first 12 announced world class speakers, IBM will present two highly popular IoT sessions, which will take place November 4-6, 2014 at the Santa Clara Convention Center in Santa Clara, Calif...
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at Internet of @ThingsExpo, Robin Raymond, Chief Architect at Hookflash Inc., will walk through the shifting landscape of traditional telephone a...
WebRTC defines no default signaling protocol, causing fragmentation between WebRTC silos. SIP and XMPP provide possibilities, but come with considerable complexity and are not designed for use in a web environment. In his session at Internet of @ThingsExpo, Matthew Hodgson, technical co-founder of the Matrix.org, will discuss how Matrix is a new non-profit Open Source Project that defines both a new HTTP-based standard for VoIP & IM signaling and provides reference implementations.

SUNNYVALE, Calif., Oct. 20, 2014 /PRNewswire/ -- Spansion Inc. (NYSE: CODE), a global leader in embedded systems, today added 96 new products to the Spansion® FM4 Family of flexible microcontrollers (MCUs). Based on the ARM® Cortex®-M4F core, the new MCUs boast a 200 MHz operating frequency and support a diverse set of on-chip peripherals for enhanced human machine interfaces (HMIs) and machine-to-machine (M2M) communications. The rich set of periphera...

SYS-CON Events announced today that Aria Systems, the recurring revenue expert, has been named "Bronze Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Aria Systems helps leading businesses connect their customers with the products and services they love. Industry leaders like Pitney Bowes, Experian, AAA NCNU, VMware, HootSuite and many others choose Aria to power their recurring revenue business and deliver exceptional experiences to their customers.
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce the value of the network in helping organizations to maximize their company’s cloud experience.
The Internet of Things (IoT) is making everything it touches smarter – smart devices, smart cars and smart cities. And lucky us, we’re just beginning to reap the benefits as we work toward a networked society. However, this technology-driven innovation is impacting more than just individuals. The IoT has an environmental impact as well, which brings us to the theme of this month’s #IoTuesday Twitter chat. The ability to remove inefficiencies through connected objects is driving change throughout every sector, including waste management. BigBelly Solar, located just outside of Boston, is trans...
SYS-CON Events announced today that Matrix.org has been named “Silver Sponsor” of Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Matrix is an ambitious new open standard for open, distributed, real-time communication over IP. It defines a new approach for interoperable Instant Messaging and VoIP based on pragmatic HTTP APIs and WebRTC, and provides open source reference implementations to showcase and bootstrap the new standard. Our focus is on simplicity, security, and supporting the fullest feature set.
Predicted by Gartner to add $1.9 trillion to the global economy by 2020, the Internet of Everything (IoE) is based on the idea that devices, systems and services will connect in simple, transparent ways, enabling seamless interactions among devices across brands and sectors. As this vision unfolds, it is clear that no single company can accomplish the level of interoperability required to support the horizontal aspects of the IoE. The AllSeen Alliance, announced in December 2013, was formed with the goal to advance IoE adoption and innovation in the connected home, healthcare, education, aut...
SYS-CON Events announced today that Red Hat, the world's leading provider of open source solutions, will exhibit at Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Red Hat is the world's leading provider of open source software solutions, using a community-powered approach to reliable and high-performing cloud, Linux, middleware, storage and virtualization technologies. Red Hat also offers award-winning support, training, and consulting services. As the connective hub in a global network of enterprises, partners, a...
The only place to be June 9-11 is Cloud Expo & @ThingsExpo 2015 East at the Javits Center in New York City. Join us there as delegates from all over the world come to listen to and engage with speakers & sponsors from the leading Cloud Computing, IoT & Big Data companies. Cloud Expo & @ThingsExpo are the leading events covering the booming market of Cloud Computing, IoT & Big Data for the enterprise. Speakers from all over the world will be hand-picked for their ability to explore the economic strategies that utility/cloud computing provides. Whether public, private, or in a hybrid form, clo...