Welcome!

Eclipse Authors: Pat Romanski, Elizabeth White, Liz McMillan, David H Deans, JP Morgenthal

Blog Feed Post

The ACM 2013 Mining Big Data Camp and "Un-Conference"

by Joseph Rickert The 2013 Mining Big Data Camp was held last Saturday at Ebay’s Town Hall Conference Center in San Jose. The San Francisco chapter of the ACM has been sponsoring this data mining themed, “un-conference” event since 2009. Attendance, this year was lighter than I remembered in the past, however, the event continues to be a viable way to find out what’s hot in the Silicon Valley Data Mining scene. The buzz this year was about Deep Learning, Data Science and R. I stumbled into the hall just in time to watch the un-conference take shape. Greg Makowski and his team of ACM volunteers do a superb job of managing chaos. An un-conference self-organizes: people propose sessions, a show of hands decides which will fly, people volunteer or are gently prodded into leading the sessions, and a quick count decides which sessions get the larger rooms. I found myself leading two sessions: “An Overview and Introduction to R”, and a discussion on “How to become a Data Scientist”. The attitude of the participants in the R session was strictly business: “How is R organized?”, “What is the best way learn R?’, “Show me some code.” The pragmatism and enthusiasm reflected exactly what the polls indicate: R skills have become essential to Data Mining and Data Science. In addition to the “Data Scientist” session in which I participated there was another parallel session led by eBay hiring managers on getting hired as a Data Scientist. I think the tremendous interest in this topic at the un-conference and elsewhere reflects how much momentum has been built up towards establishing “Data Scientist” as a distinct job position, and also indicates how useful the title has become as a label for a fairly extensive set of interdisciplinary skills. My take is that a Data Scientist needs to be proficient in four areas: Statistical Inference: an understanding of sampling and experimental design at minimum Sufficient programming skills to acquire and manipulate large data sets and implement machine learning algorithms IT skills: some knowledge of Linux and big data architectures, how to connect to databases, clusters, clouds and hadoop Business Skills: How to take an insufficiently articulated business problem and shape it into a series of relevant technical questions. These are not all that different from Drew Conway’s original Venn diagram, but they include the ability to ask the right questions that Hilary Mason always so eloquently emphasizes. While R and Data Science are in the realm of the here and now, the buzz around Deep Learning is that it might be the next really big thing. “Deep Learning” refers to using multi-layer neural nets, including Restricted Boltzmann Machines, to solve difficult tasks in machine vision, audio processing and difficult Natural Language Processing. Apparently, the basic ideas have been around for quite some time but recent advances in training these multilayer networks have made them practical for certain classes of problems. Python seems to be the language of choice for working in this area: for example NuPIC (the Numenta Platform for Intelligent Computing, which recently became an open-source project) is a mix of Python and C++ . The two very knowledgable Ebay engineers who lead the un-conference session worked through and example based on code that I think relied on the Pylearn2 library. For me, the ACM un-conference brought some clarity to the complementary roles R and Python play in Data Mining, and provided concrete examples that illustrate why KDnuggets advises would-be Data Scientists to learn both languages (and SQL). If you are interested in learning more about Deep Learning and its role in reviving the dreams for Artificial Intelligence have a look at the two Google Tech Talks by Geoff Hinton and Andrew Ng. Related articles R as a command-line tool for data science R usage skyrocketing: Rexer poll

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

IoT & Smart Cities Stories
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...