At the recent Strata Data conference in New York, the Cloudera team released Cloudera Data Platform, representing our renewed pledge to open source large scale data processing. Offering the best of the Hortonworks and Cloudera portfolios, Cloudera Data Platform provides organizations with the ability to uniformly run large-scale data processing on any cloud or on-premise infrastructure. This marks not only an enormous milestone for us as a company but also a big leap for users across the world.
However, to describe what Cloudera Data Platform really means, we must go back to the beginning of open-source.
The evolution of open source
There was a time when customers had to make a single strategic choice about which software to use for a specific purpose. Picking a winner was not always easy, and making the wrong choice often resulted in years of technical debt, costly migrations failed projects, or worst of all: perpetual lock-in.
Today, however, enterprise IT organizations can make wiser choices.
Over the last 20 years, the majority of innovation within the data processing software space has happened through open source. Open source allows companies to break free from vendor lock-in, develop long-term, sustainable community standards, and maintain an open ecosystem that drives innovation. While proprietary software is still the most common business model in IT products, they almost always build on open source libraries and frameworks themselves. And no CIO will be easily persuaded to bet on a single proprietary software offering, especially when it comes to decisions of a global scale. Similarly, investors are also likely to raise concerns over how proprietary business strategies will compete against the rapid innovation cycles that open source models facilitate.
Evidently, contributing to open source projects can be a winning formula for many organizations – but in a competitive business landscape, why would they want to share the benefits of innovation and competitive advantage?
The answer is that many organizations are happy to share the wins, especially those who truly believe in the value they add on top of open-source frameworks. Any single business will inevitably work to improve certain aspects of the software – naturally with the aim of serving their own specific goals – while simultaneously relying on other players to maintain other aspects of the software, which ultimately strengthens their own offering. While there is a significant level of cross-industry investment, these businesses recognize it is an investment into their own ability to create value. Contributing to open source comes with a cost, but it’s certainly one that is well worth paying.
Other companies who do not contribute directly to the open-source space typically do so because their core business lies outside of IT. However, for this group of users, knowing that a community of companies are united behind an open-source framework remains an incredibly powerful concept. This encourages them to adopt the technology, resulting in a tremendous rush of innovation for services and products throughout various industries.
For these organizations that cannot directly contribute to open source projects, choosing the right open-source framework that is suitable for the long-term is vital, and they are turning to open-source distributors like Cloudera to reap the benefits of open innovation. In that sense, the cost of open source for Cloudera’s customers turns into a business transaction, but the benefit is exactly the same as before, with Cloudera representing the interests of enterprise IT and their use cases.
We have been serving these customers for over 10 years with tremendous success. The innovation in the data space has been exhilarating to be a part of, and seeing our customers succeed with our technology fuels our continuous innovation.
Creating Cloudera Data Platform
After merging with Hortonworks, our first goal as the new Cloudera was to provide a modern blended formula; a distribution of our software that provided the best of Cloudera and Hortonworks portfolios.
However, to create this best-of-breed offering, we first needed to optimize our product’s form factor to deal with the ever-recurring challenge around cloud: Which cloud would it be available on? Private cloud? Public cloud? If we did choose public cloud, which public cloud would it be? How could we do both in a hybrid cloud setting?
We understood that our customers should be able to run all types of data workloads without being limited to a single cloud environment. There is a lot of weight to those requirements, calling for an entire new category: The Enterprise Data Cloud; a blueprint that allows users and operators alike to do anything they need with data, in any cloud environment. We have implemented the Enterprise Data Cloud with the Cloudera Data Platform (CDP), which allows IT departments to respond quickly to project requirements without compromises, enabling them to say “Yes.” to users, allowing our customers to focus on business outcomes, wherever they relate to data.
CDP’s most powerful features are its multi-function and multi-cloud capabilities. Let’s drill down to what this means:
Multi-function depends on a distributor’s ability to make the right application choices for their customers. For instance, it’s the distributor’s job to select the most relevant application frameworks and provide solid integration for their customers. However, multi-function with CDP goes beyond that: with CDP, Cloudera exports product functionality into analytical experiences that allow you to focus on the use-case, and give you the autonomy to action decisions around use-case design and the underlying platform yourself through graphical and programmatic self-service capabilities.
For instance, rather than debating with the finance department on how to give them a better SLA for their monthly business reports, you can simply give them a larger data warehouse – purely for the end of the month – in a cloud environment of your choice. Or, rather than having to teach researchers in your marketing team how to access a data warehouse, you can quickly provision a machine learning workspace with Cloudera Machine Learning within CDP to give them secure, self-service access to enterprise data.
With CDP, we also offer the unique ability to run data workloads on-premise or on any cloud; whether that be hybrid cloud, private cloud, or multiple public cloud environments. The common denominator for this is Kubernetes, the de-facto standard for containerized execution environments. We work tremendously hard to bring a wide range of software frameworks into our open source portfolio and we ensure all of those can run on all major cloud Kubernetes services (such as AWS, Azure, and Google Cloud Platform), as well as on-premises on bare-metal servers or on-premises Kubernetes.
All of this, by the way, is secured, authorized and governed in the very same way via Cloudera’s Shared Data Experience offering.
With CDP enterprise IT gains the ability to say “Yes” to business units, giving control to operations and agility to users, based on the data processing tools that people want and need.
Navigating the cloud vendor landscape
The cloud market is becoming increasingly saturated, resulting in anxiety amongst some organizations to execute a cloud strategy quickly. Hyper-scale cloud provider offerings around data-intensive software frameworks are mostly verbatim subsets of the many open source initiatives that we deliver with CDP, but the involvement of the providers is mostly unidirectional, consuming open-source assets while being light on the contribution bit we talked about above.
Cloudera, on the other hand, has tremendous manufacturing depth – in other words, the ability to drive critical fixes and influence the strategy of open-source frameworks. We are deeply invested in the communities of developers who drive innovation collaboratively, and we add value by quickly transporting large-scale data around the software ecosystem and data centers, on- or off-premise, in the most efficient way possible.
Cloudera, with CDP, is now also committed to maintaining form factor flexibility; the ability to deploy diverse and deep functionality everywhere. We believe in infrastructure independence and have chosen a path that allows us to support hybrid and multi-cloud architectures.
Just like you wouldn’t prefer to be boxed into a single decision around proprietary software, you should not be constrained to a single decision around proprietary cloud technology.
With Cloudera you don’t worry about picking the winner or making wrong choices; you largely pick the origin of the functionality you need: Apache Hadoop, Apache Spark, Apache Flink, Apache NiFi, … The list goes on, but far more importantly, you can rely on us, as a customer or not, that we improve the ecosystem at large and you gain the ability to provide a layer of abstraction around competing offerings for infrastructure.
The enterprise data cloud is a category our customers created and through hard work, we connected the dots for them, yielding CDP, which we believe, sets the industry standard for a unified, integrated portfolio with the most relevant and modern data processing tools across all data centers and cloud environments.
A recent 451 Research survey found that 62% of organizations are all-in on hybrid with only 17% all-in on a public cloud. Read this report to see more findings on the impact of hybrid cloud for digital transformation and the business impact you can expect: Hybrid Cloud – From Happenstance to an Explicit IT Strategy