Eclipse Authors: Pat Romanski, Elizabeth White, Liz McMillan, David H Deans, JP Morgenthal

Blog Feed Post

Starting at the Basics: What is Hadoop and what problems does it solve?



With this post I start with the basics on Hadoop, including its history.

The story starts with the early days of Google. Engineers needed to design new ways to store and process and retrieve data that would scale to very large sizes. The published two papers on their design in 2003, and the highly regarded community-focused Doug Cutting produced an open source version of the software called Hadoop.

Along with that open source project came many other related open source capabilities, and soon an entire big data framework was created. New methods of storing, processing and retrieving data were now available, free, from the Apache Software Foundation. And innovation continued as a firm called Cloudera stood up to continue to accelerate innovation into the open source project.

Hadoop is a single data platform infrastructure that is more simplified, efficient, and runs on affordable commodity hardware.

Hadoop is designed to handle the three V’s of Big Data: volume, variety, velocity. First lets look at volume, Hadoop is a distributed architecture that scales cost effectively. In other words, Hadoop was designed to scale out, and it is much more cost effective to grow the system. As you need more storage or computing capacity, all you need to do is add more nodes to the cluster. Second is variety, Hadoop allows you to store data in any format, be that structured or unstructured data. This means that you will not need to alter your data to fit any single schema before putting it into Hadoop. Next is velocity, with Hadoop you can load raw data into the system and then later define how you want to view it. Because of the flexibility of the system, you are able to avoid many network and processing bottlenecks associated with loading raw data. Since data is always changing, the flexibility of the system makes it much easier to integrate any changes.

Hadoop will allow you to process massive amounts of data very quickly. Hadoop is known as a distributing processing engine which leverages data locality. That means it was designed to execute transformations and processes where the data actually exists. Another benefit of value is from an analytics perspective, Hadoop allows you load raw data and then define the structure of the data at the time of query. This means that Hadoop is quick, flexible, and able to handle any type of analysis you want to conduct.

Organizations begin to utilize Hadoop when they need faster processing on large data sets, and often find they save the organization some money too. Large users of Hadoop include: Facebook, Amazon, Adobe, EBay, and LinkedIn. It is also in use throughout the financial sector and the US government. These organizations are a testament to what can be done at internet speed by utilizing big data to its fullest extent

To read more about Hadoop, click here.

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder of Crucial Point and publisher of CTOvision.com

IoT & Smart Cities Stories
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.
Early Bird Registration Discount Expires on August 31, 2018 Conference Registration Link ▸ HERE. Pick from all 200 sessions in all 10 tracks, plus 22 Keynotes & General Sessions! Lunch is served two days. EXPIRES AUGUST 31, 2018. Ticket prices: ($1,295-Aug 31) ($1,495-Oct 31) ($1,995-Nov 12) ($2,500-Walk-in)
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...