Click here to close now.

Welcome!

Eclipse Authors: XebiaLabs Blog, Ken Fogel, Sematext Blog, Marcin Warpechowski, Trevor Parsons

Blog Feed Post

Cloudera and Cleversafe: A Strategic Combination For Enterprise IT

By

Cloudera and Cleversafe are totally different companies addressing different challenges. But the two firms have quite a bit in common. Here are key commonalities I’ve observed:

  • Both invest in real engineering and deliver enterprise-grade/quality capabilities
  • Both are proven to work at scale (including very large scale when required)
  • Both are led by CEOs that are highly regarded by their peers and the community, and both CEOs are very likeable people (I’ve met and worked with both Mike Olson and Chris Gladwin).
  • Both have used the services of my firm, Crucial Point (and that is most appreciated by me, by the way!).
  • Both are in the In-Q-Tel portfolio and are known to the national security community because of that.
  • Both firms partner with Carahsoft (which is, by the way, another strategic partner of Crucial Point’s).
  • Both are key thought leaders in the domain of Big Data, with Cloudera being known for its open source distribution of Apache Hadoop (CDH4) and their management capabilities over CDH, and Cleversafe being known for their fielding of modern object storage with the lowest cost/TB of any system on the market plus agile access and impressive I/O.
That said, these two firms really address different areas of enterprise data needs, and have built different capabilities that can be used by enterprises to address separate aspects of Big Data challenges.
Which is part of the reason I was excited to learn of cooperation between these two firms. When firms addressing different parts of a hard challenge collaborate it can mean great things for enterprise missions.
Here are a few thoughts on the nature of a well engineered solution that came from their work together:
  • In July 2012, Cleversafe announced that the are now working with Cloudera’s Distribution including Apache Hadoop (CDH) for new capabilities that enable the benefits of Cleversafe’s data storage and security with the power of MapReduce.  With this well engineered combination, the data for an enterprise is not stored in HDFS.  The benefits of HDFS are already provided by other Cleversafe functionality, so there is already fault tolerance and speed, for example. But even greater benefits are provided through this well engineered solution, including the elimination of single points of failure without the need for HDFS’s complete/multiple replication.
  • So basically you can store data using Cleversafe technology and get all the benefits there, and can run MapReduce jobs over the data leveraging Hadoop without using HDFS.
  • This well engineered solution enables data to be stored in conventional format on nodes where it is expected to be used for computation and enables MapReduce operation. This comes with the many other benefits of Cleversafe, including the ability to protect data without the overhead of massive network traffic and costly backup storage. It also removes challenges with Namenode issues since a Cleversafe cluster’s accesser nodes federate and cover for each other.
  • The bottom line result of this Cleversafe leveraging of Cloudera’s CDH:  Incredible cost benefits, fantastic disaster recovery/continuity of operations features, fast access to data from multiple locations, and an ability to run MapReduce jobs and leverage Hadoop-centric applications without using HDFS.
I liked the context provided by Andrew Brust at zdnet.com on this topic. He writes that:

Cleversafe swaps out HDFS
Assuming it works as advertised, Cleversafe’s company name is a fair reflection of its Hadoop architecture.  While other HDFS alternatives exist for Hadoop (for example, MapR‘s Hadoop distro, which can mount HDFS-compatible NFS volumes), Cleversafe’s Slicestor appliance nodes retain HDFS’ distributed nature and maintain fault tolerance too.  Cleversafe does this with “information dispersal” slices: spreading the data around different nodes in the cluster, employing Erasure Coding – a scheme that allows reconstruction of data from a subset of storage nodes, and eliminates single points of failure without the overhead of HDFS’ complete replication.

Meanwhile, the data is also stored in conventional format on the nodes where it is expected to be used for computation.  The conventional storage assures fast MapReduce operations, and the striped storage assures fault tolerance, without the need (and network traffic and management overhead) to keep multiple full copies of the data.

Namenode issues disappear as well, since a Cleversafe cluster’s accesser nodes federate and cover for each other, and the meta data is split up along with the data itself.  Although various high availability namenode technologies are appearing in the major Hadoop distributions now, they nonetheless still use a single central namenode at any given time.  Keeping a warm spare around is not the same thing as having meta data/directory services responsibilities shared among a collection of active nodes.

Although Cleversafe clusters are appliance-based, the appliances nonetheless use commodity processors and  storage.  The added value comes from tuning and optimization, and the unique storage software subsystem.  Cleversafe storage runs about $500 per Terabyte, and can be less depending on total storage size.  On the MapReduce side, Cleversafe uses Cloudera’s Distribution Including Apache Hadoop (CDH).

For more information see this July 2012 press release from Cleversafe:

Cleversafe First to Deliver Breakthrough Capabilities for Combined Storage and Massive Computation

First System to Support Storage and Analysis of Datasets at Previously Unattainable Scale with Unparalleled Reliability and Efficiency

Chicago, July 10, 2012 – Cleversafe Inc., the solution for limitless data storage, today announced plans to build the first Dispersed Compute Storage solution by combining the power of Hadoop MapReduce with Cleversafe’s highly scalable Object-based Dispersed Storage System. This solution will significantly alter the Big Data landscape by decreasing infrastructure costs for separate servers dedicated to analytical processes, reducing required storage capacity, and simultaneously improving data integrity. In addition, the company’s solution will reduce network bottlenecks by bringing together computation and storage at any scale, petabytes to exabytes and beyond.

Traditional storage systems are not designed for large-scale distributed computation and data analysis. Present implementations treat data storage and analysis of that data separately, transferring data from Storage Area Networks (SANs) or Network Attached Storage (NASs) across the network to perform the computations used to gather insight. In this manner the network quickly becomes the bottleneck, making multi-site computation over the WAN particularly challenging. Cleversafe solves this problem by combining Hadoop MapReduce alongside its Dispersed Storage Network (dsNet) system on the same platform and replacing the Hadoop Distributed File System (HDFS) which relies on 3 copies to protect data with Information Dispersal Algorithms thereby significantly improving reliability and allowing analytics at a scale previously unattainable through traditional HDFS configurations.

“For any company, the movement, management and storage of massive data stores for analytical purposes is already unmanageable,” said Chris Gladwin, CEO and President of Cleversafe. “Many companies have had to invest significant resources in both CAPEX and OPEX to manage the challenge of Big Data and to try and capitalize on the opportunity to gather insights from that data,” said Gladwin. “The key to reducing both cost and complexity is to combine computation with dispersed storage,” said Gladwin. “Cleversafe’s solution will provide infinitely scalable, reliable, and cost effective storage for data to support massive computation while enhancing the analysis workflow.”

Hadoop MapReduce, which is already being used broadly throughout the industry, represents only a partial solution to this problem. While it lends itself naturally to enabling computations where the data exists rather than transferring data to computation nodes, it has inherent scalability and reliability limitations. Current HDFS deployments utilize a single server for all metadata operations and 3 copies of the data for protection. Failure of the single metadata node could render stored data inaccessible or result in a permanent loss of data. Maintaining 3 copies of data at massive scale for protection leads to skyrocketing overhead and management costs.

Cleversafe’s dsNet system protects both data and metadata equally and is inherently more reliable. By applying the company’s unique Information Dispersal technology to slice and disperse data, single points of failure are eliminated. As data is distributed evenly across all Slicestor nodes metadata can scale linearly and infinitely as new nodes are added, thus reducing any scalability bottlenecks and increasing performance. Cleversafe’s unique approach delivers the powerful combination of analytics and storage in a geographically distributed single system allowing organizations to efficiently scale their Big Data environments to hundreds of petabytes and even exabytes today.

“There isn’t an industry today that’s untouched by Big Data or a company that wouldn’t benefit from the intrinsic value of that data if they could collect, organize, store and analyze it in a cost-effective manner,” said John Webster, Senior Partner at Evaluator Group. “Cleversafe’s approach to combining dispersed storage and Hadoop for analytics is a groundbreaking step for the industry and for any company to effectively bridge storage and large-scale computation,” said Webster.

No market segment has a more critical need to harness Big Data than the Government sector. Lockheed Martin is partnering with Cleversafe to develop a federal version of the Cleversafe Dispersed Compute Storage solution designed for the unique needs of federal government agencies.

“By combining the power of Hadoop analytics with Cleversafe’s Object-based Dispersed Storage solution, government entities will be able to significantly reduce their total cost of infrastructure as the amount of their mission critical data grows,” said Tom Gordon, CTO & VP of Engineering of Lockheed Martin’s Information Systems and Global Solutions-National. “The Federal community has been out in front of Big Data, well ahead of many other market segments, and needs technology solutions today that are well suited for Exabyte scale storage as well as massive computation,” said Gordon. “Taken Cleversafe’s approach with Hadoop across commodity hardware, these features deliver a new approach to bring the true potential of Big Data analytics into reach.”

Cleversafe’s object-based storage solution is 100 million times more reliable than traditional RAID-based systems and it doesn’t rely on replication to protect information. Its information dispersal capabilities reduce storage costs up to 90 percent while meeting compliance requirements and ensuring protection against data loss, whether it’s latent hardware errors, data corruption or malicious threats. With the combination of limitless scale, highly reliable storage and efficient analytics in the same platform, Cleversafe is solving the most challenging Big Data problems for customers in a very efficient manner.

Tweet This: @Cleversafe to build first storage-based compute solution based on its dsNet solution and Hadoop MapReduce.

About Cleversafe Inc.

Cleversafe has created a breakthrough technology that solves petabyte and beyond big data storage problems. This solution drives up to 90 percent of the storage cost out of the business while enabling secure and reliable global access and collaboration. The world’s largest data repositories rely on Cleversafe. To learn more about Cleversafe and its solutions, please visit www.cleversafe.com, call 312-423-6640 or email us at [email protected].

 

 

 

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley, former CTO of the Defense Intelligence Agency (DIA), is Founder and CTO of Crucial Point LLC, a technology research and advisory firm providing fact based technology reviews in support of venture capital, private equity and emerging technology firms. He has extensive industry experience in intelligence and security and was awarded an intelligence community meritorious achievement award by AFCEA in 2008, and has also been recognized as an Infoworld Top 25 CTO and as one of the most fascinating communicators in Government IT by GovFresh.

@ThingsExpo Stories
Connected devices and the Internet of Things are getting significant momentum in 2014. In his session at Internet of @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, examined three key elements that together will drive mass adoption of the IoT before the end of 2015. The first element is the recent advent of robust open source protocols (like AllJoyn and WebRTC) that facilitate M2M communication. The second is broad availability of flexible, cost-effective storage designed to handle the massive surge in back-end data in a world where timely analytics is e...
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...
SYS-CON Events announced today that CommVault has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. A singular vision – a belief in a better way to address current and future data management needs – guides CommVault in the development of Singular Information Management® solutions for high-performance data protection, universal availability and sim...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
SYS-CON Media announced today that @ThingsExpo Blog launched with 7,788 original stories. @ThingsExpo Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @ThingsExpo Blog can be bookmarked. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago.
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
The Internet of Things promises to transform businesses (and lives), but navigating the business and technical path to success can be difficult to understand. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, demonstrated how to approach creating broadly successful connected customer solutions using real world business transformation studies including New England BioLabs and more.
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on Twitter at @MicroservicesE
SYS-CON Events announced today that Site24x7, the cloud infrastructure monitoring service, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Site24x7 is a cloud infrastructure monitoring service that helps monitor the uptime and performance of websites, online applications, servers, mobile websites and custom APIs. The monitoring is done from 50+ locations across the world and from various wireless carriers, thus providing a global perspective of the end-user experience. Site24x7 supports monitoring H...
Wearable technology was dominant at this year’s International Consumer Electronics Show (CES) , and MWC was no exception to this trend. New versions of favorites, such as the Samsung Gear (three new products were released: the Gear 2, the Gear 2 Neo and the Gear Fit), shared the limelight with new wearables like Pebble Time Steel (the new premium version of the company’s previously released smartwatch) and the LG Watch Urbane. The most dramatic difference at MWC was an emphasis on presenting wearables as fashion accessories and moving away from the original clunky technology associated with t...
SYS-CON Events announced today that SafeLogic has been named “Bag Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. SafeLogic provides security products for applications in mobile and server/appliance environments. SafeLogic’s flagship product CryptoComply is a FIPS 140-2 validated cryptographic engine designed to secure data on servers, workstations, appliances, mobile devices, and in the Cloud.
One of the biggest challenges when developing connected devices is identifying user value and delivering it through successful user experiences. In his session at Internet of @ThingsExpo, Mike Kuniavsky, Principal Scientist, Innovation Services at PARC, described an IoT-specific approach to user experience design that combines approaches from interaction design, industrial design and service design to create experiences that go beyond simple connected gadgets to create lasting, multi-device experiences grounded in people's real needs and desires.
The list of ‘new paradigm’ technologies that now surrounds us appears to be at an all time high. From cloud computing and Big Data analytics to Bring Your Own Device (BYOD) and the Internet of Things (IoT), today we have to deal with what the industry likes to call ‘paradigm shifts’ at every level of IT. This is disruption; of course, we understand that – change is almost always disruptive.
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...
Can call centers hang up the phones for good? Intuitive Solutions did. WebRTC enabled this contact center provider to eliminate antiquated telephony and desktop phone infrastructure with a pure web-based solution, allowing them to expand beyond brick-and-mortar confines to a home-based agent model. It also ensured scalability and better service for customers, including MUY! Companies, one of the country's largest franchise restaurant companies with 232 Pizza Hut locations. This is one example of WebRTC adoption today, but the potential is limitless when powered by IoT.
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the M2M space. This really allows some room for influential individuals to create more high value inter...
SOA Software has changed its name to Akana. With roots in Web Services and SOA Governance, Akana has established itself as a leader in API Management and is expanding into cloud integration as an alternative to the traditional heavyweight enterprise service bus (ESB). The company recently announced that it achieved more than 90% year-over-year growth. As Akana, the company now addresses the evolution and diversification of SOA, unifying security, management, and DevOps across SOA, APIs, microservices, and more.
“With easy-to-use SDKs for Atmel’s platforms, IoT developers can now reap the benefits of realtime communication, and bypass the security pitfalls and configuration complexities that put IoT deployments at risk,” said Todd Greene, founder & CEO of PubNub. PubNub will team with Atmel at CES 2015 to launch full SDK support for Atmel’s MCU, MPU, and Wireless SoC platforms. Atmel developers now have access to PubNub’s secure Publish/Subscribe messaging with guaranteed ¼ second latencies across PubNub’s 14 global points-of-presence. PubNub delivers secure communication through firewalls, proxy ser...