Today’s data landscape is characterized by exponentially increasing volumes of data, comprising a variety of structured, unstructured, and semi-structured data types originating from an expanding number of disparate data sources located on-premises, in the cloud, and at the edge. In conjunction with the evolving data ecosystem are demands by business for reliable, trustworthy, up-to-date data to enable real-time actionable insights. Big Data Fabric has emerged in response to modern data ecosystem challenges facing today’s enterprises.
What is Big Data Fabric?
Big Data Fabric offers a comprehensive approach to overcoming the challenges of rapidly growing data, ever-changing application requirements and distributed processing needs. Forrester describes Big Data Fabric as, “A unified, trusted, and comprehensive view of business data produced by orchestrating data sources automatically, intelligently, and securely, then preparing and processing them in big data platforms such as Hadoop and Apache Spark, data lakes, in-memory, and NoSQL.”
Pragmatically, Big Data Fabric comprises the following 6 layers that work harmoniously to provide seamless, real-time integration of Big Data:
- Data ingestion
- Data processing and persistence
- Data orchestration
- Data discovery
- Data management and intelligence
- Data access
A Big Data Fabric can cover a wide variety of data locations and sources. It enables processing, management, analysis, and storage of virtually any amount of data from a multitude of sources, as well as access to these data by applications and tools employing a variety of interfaces.
Who should care about Big Data Fabric?
Enterprises for whom data is crucial, but who are struggling to glean reliable, actionable insights from a combination of corporate and external data in a timely manner because of legacy systems that are too slow, too segregated, and inefficient, should care about Big Data Fabric.
Likewise, big companies whose business units are storing large volumes of data from separate systems in different formats, thus creating Big Data silos resulting in large datasets that must be integrated manually and consequently erode corporate Big Data investments, should care about Big Data Fabric.
What are the Benefits of Big Data Fabric?
Big Data Fabric enables a permanent and scalable mechanism for business to consolidate all its data under the umbrella of one unified platform. A Big Data Fabric leverages storage and processing power from multiple heterogeneous nodes to enable enterprise-wide access to all data assets of an enterprise. According to Forrester, a Big Data Fabric assists enterprises to “…quickly ingest, transform, curate, and prepare streaming and batch data to support a real-time trusted view of the customer and the business.” *
Furthermore, Big Data Fabric enables companies to:
- Effectively consolidate Big Data assets with on-premises and Cloud data sources, for a complete view of enterprise-wide information.
- Gain access to the latest data in real-time.
- Easily onboard new Big Data systems and retire legacy systems, while keeping business systems running continuously without disruption.
From a problem-solving perspective, Big Data Fabric overcomes the challenges of insufficient data availability, unreliability of data storage and security, siloed data, poor scalability, and reliance on underperforming legacy systems.
What Use Cases does Big Data Fabric support?
Big Data Fabric supports a variety of use cases ranging from real-time insights and machine learning to streaming and advanced analytics. It enables orchestration of data flow and curation of data across various big data platforms (such as data lakes, Hadoop, and NoSQL) to support a single version of the truth, customer personalization, and advanced big data analytics. The top Big Data Fabric use cases recognized by Forrester are 360-degree view of the customer, Internet-of-things (IoT) analytics, and real-time and advanced analytics.
Cloudera Enterprise Platform as Big Data Fabric
Forrester acknowledges Cloudera (as well as Hortonworks, which merged with Cloudera in January 2019) as being among the top 15 providers of Big Data Fabric**. With annual revenues exceeding $100 million, Cloudera is positioned as a “large player” in Big Data Fabric* which is conceptualized by Forrester in Figure 1 below.
Figure 2 below depicts how the Cloudera Enterprise Platform enables the six layers of the big data fabric architecture. Four key benefits of the Cloudera Enterprise Platform as a Big Data Fabric are described in the following paragraphs.
Enterprises seek high-value, agile analytics. An organization needs a unified data management and analytics platform that can support its business objectives. Enterprises are looking for greater agility to detect change and respond proactively. Cloudera Enterprise is a one-stop shop for running analytics models and algorithms against multiple data sources across on-premises and cloud, and sometimes real-time data sources.
Cloudera enables high-value analytical use cases from the edge to AI including proactive and predictive maintenance, usage-based analytics for targeted communications, recommendation engines, Enterprise Risk Management, AML (Anti-Money Laundering), Fraud Detection/Prevention, Cybersecurity, and Machine Models.
Shared Data Catalog to preserve context and enforce Governance
Data catalogs are essential for sharing knowledge about distributed data, and for improving data quality, trust, and governance. Data catalogs provide (1) a unified view of data and metadata to facilitate search and discovery of data assets, (2) the ability to track and manage data use and sharing, and 3) consistent data context across the analytics life cycle. Cloudera’s shared data catalog defines and preserves structure and business context of distributed, heterogeneous data across the analytics life cycle enabling data consumers to extract business value. This shared data catalog enforces compliance with data security, privacy, retention policies, and processes to ensure continued trust by consumers, and facilitates compliance with regulatory and legal requirements.
High Performance for all Analytical Workloads
Legacy data platforms are unable to handle the scale, variety, and velocity required to satisfy the numerous demands of the modern data ecosystem. Furthermore, analytical workloads are becoming bigger and more complex, and thus more challenging to the delivery of high-performance analytics. Cloudera Enterprise facilitates on-demand elastic scale and multi-tenant capabilities across a spectrum of analytical workloads in a cost-effective manner. It offers a full range of capabilities including data ingestion, data processing, data discovery, and descriptive, prescriptive and predictive analytics all in a single platform.
Enterprises are seeking location transparency. Consequently, they need a flexible infrastructure to run analytics anywhere and everywhere including more applications in the cloud, on all devices, and on multiple different cloud platforms. Cloudera enables applications and business solutions at the edge, in data centers, in public clouds, or any combination thereof.
Big Data Fabric offers enterprises the opportunity to replace their legacy systems with a viable solution for satisfying business requirements to unify data and both simplify and accelerate workloads in today’s complex data landscape. Cloudera Enterprise as a Big Data Fabric is an ideal choice for creating a unified data environment for modernizing and accelerating complex data management and analytical workloads.
*Now Tech: Big Data Fabric, Q2 2018
Hi, you have really done great work, you ‘ve carefully selected great resource on big data for Environment. I benefited much from it. It relates to our blog on:
big data for Environment
Really good content for quick reference.