Since we announced the general availability of Apache Iceberg in Cloudera Data Platform (CDP), Cloudera customers, such as Teranet, have built open lakehouses to future-proof their data platforms for all their analytical workloads. Cloudera partners are also benefiting from Apache Iceberg in CDP. For example, Modak Nabu is helping their enterprise customers accelerate data ingestion, […]
Introduction Stream processing is about creating business value by applying logic to your data while it is in motion. Many times that involves combining data sources to enrich a data stream. Flink SQL does this and directs the results of whatever functions you apply to the data into a sink. Business use cases, such as […]
Innovation Accelerator Spotlight: Data teams can collaborate to streamline data transformation and analytics pipelines in the open data lakehouse using any engine, and in any form factor to produce high quality data that your business can trust.
The promise of a modern data lakehouse architecture Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from […]
Cloudera Machine Learning (CML) is a cloud-native and hybrid-friendly machine learning platform. It unifies self-service data science and data engineering in a single, portable service as part of an enterprise data cloud for multi-function analytics on data anywhere. CML empowers organizations to build and deploy machine learning and AI capabilities for business at scale, efficiently […]
In June 2022, Cloudera announced the general availability of Apache Iceberg in the Cloudera Data Platform (CDP). Iceberg is a 100% open-table format, developed through the Apache Software Foundation, which helps users avoid vendor lock-in and implement an open lakehouse. The general availability covers Iceberg running within some of the key data services in CDP, […]
Fine grained access control (FGAC) with Spark Apache Spark with its rich data APIs has been the processing engine of choice in a wide range of applications from data engineering to machine learning, but its security integration has been a pain point. Many enterprise customers need finer granularity of control, in particular at the column […]
Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake. […]