When it comes to furnishing our living spaces, it seems we go through phases. When I was just setting out and leaving home, IKEA was my preferred furniture store. You make your choice, collect all the flat-pack boxes, lug them home, and after some hex key gymnastics: voilà. You’ve truly made it! Since then, I’ve drifted from the “some assembly required” phase to the “ready-made” one. Furniture these days is something that gets delivered, fully assembled, and ready to go. Together with flat-pack pieces, it gives me the right mix. Organizations have gone through similar phases when it comes to their enterprise analytics and data management platforms.
Building loosely on Arun Murthy’s “Hadoop: Decade Two, Day Zero” blog, legacy Cloudera and Hortonworks platforms were a bit like flat-pack furniture. All the pieces to build your organization’s analytical capabilities were there, they just needed to be connected. Although that work was considerably less complex than bringing together the open-source projects that made up the distributions in the first place, it had to be done before it could be let loose on the data in order to get insight and value. It also meant, independent from each other, customers were each assembling their own data warehouses using the platform, their own machine learning, data engineering, multi-stage data pipelines to cover the complete data lifecycle.
This realization was a key consideration when we architected the new Cloudera Data Platform (CDP). Of course, its solid foundation was always going to be the enterprise data cloud:
- Providing the complete range of analytics for the data lifecycle, from the edge to AI
- Flexibly deployed to cloud and data center
- With consistent data security and governance across all
- Based 100% on both open source and open standards
However, one of our key objectives was to make it easier to deploy specific analytics and remove the overhead and effort associated with self-assembly. Cluster Definitions and experiences do just that.
Templates in office applications like Microsoft PowerPoint provide not only tremendous accelerators when creating new materials, they also ensure consistency and leverage best practices. Cluster Definitions are CDP’s equivalent, enabling organizations to quickly create clusters by using one of the prescriptive cluster definitions included by default or by leveraging their own customised ones. Select one from the growing list and CDP self assembles the right projects from its runtime. Need a Data Mart? Apache Hue and Apache Impala are readily selected and configured. Want it to be real-time? HDFS, Apache Kudu, Apache Spark, and YARN are automatically added. Child’s play for all data lifecycle analytics and deployed to any cloud. Of course, IT teams can still step into the advanced options to tweak those if need be.
CDP analytic experiences take this one step further still by giving authorized end-users the ability to start their own analytics clusters. Choices are even more straightforward: choose the analytical capability of interest (data warehousing and machine learning are shortly joined by the rest of data lifecycle analytics: data flow, data engineering, and operational database) and select the desired t-shirt sized capacity (S/M/L). Again, the advanced configuration of elements like auto-scaling can be tuned if needed, although the majority of users will feel quite at ease with the defaults.
Either approach helps organizations move faster – from a developer-oriented approach to data and analytics to an enterprise-oriented solution-based focus. Business users gain self-service access to data and analytics, within the boundaries and guardrails enterprise IT has set and that ensure safety and compliance. This shift in data and analytics is exactly what analysts too have identified: Gartner describes Cloud Data Ecosystems; 451 Research talks about the Enterprise Intelligence Platform. We call it an Enterprise Data Cloud and Cloudera Data Platform is our implementation. To see how you can move from managing data and analytics to gaining insight from it, read the Gartner Report, Cloud Data Ecosystems Emerge as the New Data and Analytics Battleground as well as 451 Research’s The rise of the Enterprise Intelligence Platform