Late last year, the news of the merger between Hortonworks and Cloudera shook the industry and gave birth to the new Cloudera – the combined company with a focus on being an Enterprise Data Cloud leader and a product offering that spans from edge to AI. One of the most promising technology areas in this merger that already had a high growth potential and is poised for even more growth is the Data-in-Motion platform called Hortonworks DataFlow (HDF). It is a key capability that will address the needs of our combined customer base in areas of real-time streaming architectures and Internet-of-Things (IoT). HDF is already a highly successful product offering with hundreds of customers like ClearSense, Trimble, Hilton etc.
So, what happens to HDF in the new Cloudera? What should customers expect? The good news is that it will still remain strategic to our company as well as to our customers. So, HDF is now reborn as Cloudera DataFlow (CDF).
What is Cloudera DataFlow?
Cloudera DataFlow (CDF) is a scalable, real-time streaming data platform that collects, curates, and analyzes data so customers gain key insights for immediate actionable intelligence. It meets the challenges faced with data-in-motion, such as real-time stream processing, data provenance, and data ingestion from IoT devices and other streaming sources. Built on 100% open source technology, CDF helps you deliver a better customer experience, boost your operational efficiency and stay ahead of the competition across all your strategic digital initiatives.
With the rise of streaming architectures and digital transformation initiatives everywhere, enterprises are struggling to find comprehensive tools for data management to handle high volumes of high-velocity streaming data. CDF, as an end-to-end streaming data platform, emerges as a clear solution for managing data from the edge all the way to the enterprise. It can handle edge data collection, data ingestion, transformation, curation, data enrichment, content routing, processing multiple streams at IoT scale and analyzing those in real-time to gain actionable intelligence. CDF can do this within a common framework that offers unified security, governance and management.
The key aspects of the CDF platform are –
- Edge Data Management – Set up hundreds of MiNiFi agents in or near edge devices to enable edge data collection, content filtering, routing etc. This allows you to take on complex, distributed use cases such as connecting hundreds of retail stores across the country or getting data from thousands of utility sensors from your consumer edge. This is going to be a significant area of investment for us given our customer interest, the industry trends and the market potential.
- Flow Management – Adopt a no-code approach to create visual flows for building complex data ingestion / transformation with drag-and-drop ease. Powered by Apache NiFi and its 260+ pre-built processors, CDF enables you to take on extremely high-scale, high-volume and high-speed data ingestion use cases with simplicity and ease.
- Stream Processing – Manage and process multiple streams of real-time data using the most advanced distributed stream processing system – Apache Kafka. Process millions of real-time messages per second to feed into your data lake or for immediate streaming analytics.
- Streaming Analytics – Analyze millions of streams of data in real-time using advanced techniques such as aggregations, time-based windowing, content-filtering etc., to generate key insights and actionable intelligence for predictive and prescriptive analytics. CDF is the only streaming platform to offer a choice of 3 different streaming analytics solutions – Apache Storm, Kafka Streams and Apache Spark Streaming.
- Enterprise Services – Leverage a common set of enterprise services for unified security, governance and single sign-on across the entire Cloudera DataFlow platform. This makes the platform experience truly enriching when the same set of services make the interoperability between components seamless.
Why Cloudera DataFlow?
CDF addresses a wide range of uses like Customer 360, data movement between data centers (on-premises and cloud), data ingestion from real-time streaming sources, log data ingestion and processing, streaming analytics etc. CDF also addresses a wide spectrum of IoT-specific use cases like Predictive Maintenance, Asset Tracking, Patient Monitoring, Utility Monitoring, Smart Cities etc. CDF is the only comprehensive streaming data platform in the market that is 100% open source and also offers a choice of three streaming analytics engines. CDF is the only platform in the market to offer out-of-the-box data provenance on streaming data. With an extremely strong community behind it, Apache NiFi powers CDF’s Flow Management capabilities with over 260+ pre-built processors for data source connectivity, ingestion, transformation and content routing.
To learn more about Cloudera DataFlow, attend our upcoming webinar on Feb 13th, 2019.
Dinesh Chandrasekhar (@AppInt4All) is a technology evangelist, a thought leader and a seasoned product marketer with over 24+ years of industry experience. He has an impressive track record of taking new integration/mobile/IoT/Big Data products to market with a clear GTM strategy of pre-and-post launch activities. Dinesh has extensive experience working on enterprise software as well as SaaS products delivering sophisticated solutions for customers with complex architectures. As a Lean Six Sigma Green Belt, he has been the champion for Digital Transformation at companies like Software AG, CA Technologies and IBM. Dinesh’s areas of expertise include IoT, Application/Data integration, BPM, Analytics, B2B, API management, Microservices, and Mobility. He is proficient in use cases across multiple industry verticals like retail, manufacturing, utilities, and healthcare. He is a prolific speaker, blogger, and a weekend coder. He currently works at Cloudera, managing their Data-in-Motion product line. He is fascinated by new technology trends including blockchain and deep learning. Dinesh holds an MBA degree from Santa Clara University and a Master’s degree in Computer Applications from the University of Madras.