Welcome!

Eclipse Authors: Liz McMillan, David H Deans, JP Morgenthal, Mano Marks, Yeshim Deniz

Blog Feed Post

An Introduction to SAS for R Programmers

by Joseph Rickert Life decisions are usually much too complicated to be attributed to any single cause, but one important reason that I am here at Revolution today is that I ignored suggestions from well-meaning faculty back in graduate school to work more in SAS rather than doing everything in R. There was a heavy emphasis on SAS then: the faculty were worried about us getting jobs. This was before the rise of the data scientist and the the corporate model my professors had in mind was: PhD statisticians do statistics and everyone else writes SAS code. I would not be surprised if this is still not the prevailing model in traditional Statistics programs. My bet is there are statisticians everywhere who have yet to come to grips with the concept of a “data scientist”.  Anyway, because of the great cosmic balance, or the bad karma that comes from ignoring well-intentioned advice and the fact that there are quite a few companies out there that want to convert their SAS code to R, I occasionally get to look at SAS code. In the process of interviewing candidates for this kind of work it struck me that there are many people coming to data science through the programming or machine learning routes who have some R knowledge as well as experience with Java, Python and C++ who have never worked with SAS. To this group I offer the following very brief “Introduction to SAS for R Programmers”. So what is SAS exactly? Originally, SAS  stood for “Statistical Analysis System”. Indeed, towards the beginning of his invaluable book, “R for SAS and SPSS Users”, Bob Muenchen characterizes SAS as a system for statistical computation that has five main components: A data management system for reading, transforming and organizing data (The Data Step) A large number of procedures (PROCs) for statistical analysis and graphics The Output Delivery System for extracting output from PROCs and customizing printed output A macro language for programming in the data step and calling PROCS The Interactive Matrix programming language (IML) for developing new algorithms SAS is not a single programming language. It is an entire ecosystem of products (not all seamlessly integrated) that contains at least two languages! While becoming a competent SAS programmer clearly requires mastering an impressive number of skills, quite a bit can be accomplished in SAS with a basic knowledge of the Data Step and the more common procedures (PROCs) in the base and Stat packages. Moreover, as it turns out, these two foundational components of SAS are the very two things that an R programmer is likely to find most strange about SAS. There is really only one data structure in SAS, a file with rows of observations and columns of variables that always gets processed by means of an implied loop. A Data Step “program” starts with the first row of a SAS file executes all of the code it encounters until it comes to a run; statement then looks at the second row of the file and runs through the code again. The Data Step proceeds sequentially through the entire file in this fashion. An excellent presentation from Steven J. First illustrates the process nicely. See slides 36 through 45 for an example of SAS code with a very clear PowerPoint animation of how this all works. It is true that SAS programmers can work with arrays, but this is actually a computational sleight of hand. Arrays are actually special columns in a data set. R programmers are used to an interactive computational experience. Within a session, at any point in time the objects that resulted from a previous computation are available as inputs to the next calculation. There is always a sense of moving forward. If you didn’t compute something as part of the last function you ran, just write another function and compute it now. In SAS, however, one uses the various PROCS to conjure the results in a methodical, premeditated way. For example, something like the following code would run a simple regression in SAS sending the results to the console. proc reg data = myData;model Y = X;run:  However, if you wanted to have the fitted values and residuals available for a further computation, you would have to rerun the regression specifying an output file and the keywords for computing the fitted values and residuals. proc reg DATA = myData;MODEL Y = X / stb clb;OUTPUT OUT=OUTREG P=PREDCIT R=RESED;run; Kathy Welch a statistical consultant at the University of Michigan, provides a very clear example of this linear way of working. Most SAS programming probably gets done by writing SAS macros. Look at Bob Muenchen’s book (or this article) for practical examples of R functions to replace SAS macros. For more advanced work,the SAS/Tool Kit (yet another add on) allows SAS probrammers to write custom procedures. But, from a R programmer’s perspective probably the most exciting SAS product is the IML System which provides the ability to call R from within an IML procedure. The documentation  provides an example of transferring data stored in SAS/IML vectors to R, running a model in R and then, importing the results back into SAS/IML vectors. Actually, if you are an R programmer, all you might really want to do is import data from SAS to R. Thre are at least five ways to do this using functions from various open source R libraries. (Note that some of these methods require preparation steps to be done in SAS.) The document “An Introduction to S and The Hmisc and Design Libraries” on CRAN is also helpful. However, I recommend using rxImport feature in RevoScaleR package that ships with Revolution R Enterprise. Importing a SAS file with rxImport looks like this: rxImport(inData=data,outFile="sasFileName") Not only is it a one step process that does not require having SAS installed on your system, but it reads .sas7bdat files directly into Revolution Analytics' .xdf file format. You can easily work with SAS files that are too large to fit into memory Once in .xdf file format the data can be worked on with RevoScaleR’s parallel external memory algorithms (PEMAs) or written to .csv files or data frames.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assis...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs oft...
The 21st International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo Silicon Valley Call for Papers is now open.
As cloud adoption continues to transform business, today's global enterprises are challenged with managing a growing amount of information living outside of the data center. The rapid adoption of IoT and increasingly mobile workforce are exacerbating the problem. Ensuring secure data sharing and efficient backup poses capacity and bandwidth considerations as well as policy and regulatory compliance issues.
SYS-CON Events announced today that Interoute has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Interoute is the owner operator of Europe's largest network and a global cloud services platform, which encompasses over 70,000 km of lit fiber, 15 data centers, 17 virtual data centers and 33 colocation centers, with connections to 195 additional partner data centers. Our full-service Unifie...
In order to meet the rapidly changing demands of today’s customers, companies are continually forced to redefine their business strategies in order to meet these needs, stay relevant and continue to see profitable growth. IoT deployment and development is integral in this transformation, and today businesses are increasingly seeing the value of investing their resources into IoT deployments. These technologies are able increase ROI through projects such as connecting supply chains or enabling sm...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs ofte...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
SYS-CON Events announced today that DivvyCloud will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. DivvyCloud software enables organizations to achieve their cloud computing goals by simplifying and automating security, compliance and cost optimization of public and private cloud infrastructure. Using DivvyCloud, customers can leverage programmatic Bots to identify and remediate common cloud problems in rea...
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists will examine how DevOps helps to meet th...
SYS-CON Events announced today that A&I Solutions has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 1999, A&I Solutions is a leading information technology (IT) software and services provider focusing on best-in-class enterprise solutions. By partnering with industry leaders in technology, A&I assures customers high performance levels across all IT environments including: mai...
Every successful software product evolves from an idea to an enterprise system. Notably, the same way is passed by the product owner's company. In his session at 20th Cloud Expo, Oleg Lola, CEO of MobiDev, will provide a generalized overview of the evolution of a software product, the product owner, the needs that arise at various stages of this process, and the value brought by a software development partner to the product owner as a response to these needs.
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software in the hope of capturing value in IoT. Although IoT is relatively new in the market, it has already gone through many promotional terms such as IoE, IoX, SDX, Edge/Fog, Mist Compute, etc. Ultimately, irrespective of the name, it is about deriving value from independent software assets participating in an ecosystem as one comprehensive solution.
SYS-CON Events announced today that EARP will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "We are a software house, so we perfectly understand challenges that other software houses face in their projects. We can augment a team, that will work with the same standards and processes as our partners' internal teams. Our teams will deliver the same quality within the required time and budget just as our partn...