Data Engineering Archives - Page 2 of 9

December 15, 2022 | Business

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Since we announced the general availability of Apache Iceberg in Cloudera Data Platform (CDP), Cloudera customers, such as Teranet, have built open lakehouses to future-proof their data platforms for all their analytical workloads. Cloudera partners are also benefiting from Apache Iceberg in CDP. For example, Modak Nabu is helping their enterprise customers accelerate data ingestion, […]

by Bill Zhang , Shaun Ahmadian , Zoltán Borók-Nagy , Vincent Kulandaisamy 5 min read

CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Data Warehouse Machine Learning SDX Technologies Governance Machine Learning Modernize Architecture Performance Security, Risk, & Compliance

November 18, 2022 | Technical

Enriching Streams with Hive tables via Flink SQL

Introduction Stream processing is about creating business value by applying logic to your data while it is in motion. Many times that involves combining data sources to enrich a data stream. Flink SQL does this and directs the results of whatever functions you apply to the data into a sink. Business use cases, such as […]

by Jimit Patel , Ferenc Csaky 5 min read

Cloudera Data Platform (CDP) Cloudera Data Science Workbench Data Engineering Data Warehouse DataFlow Customer Analytics Modernize Architecture Streaming

October 7, 2022 | Technical

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Innovation Accelerator Spotlight: Data teams can collaborate to streamline data transformation and analytics pipelines in the open data lakehouse using any engine, and in any form factor to produce high quality data that your business can trust.

by Raghotham Murthy 4 min read

Cloudera Data Platform (CDP) Data Engineering

September 9, 2022 | Business

The Modern Data Lakehouse: An Architectural Innovation

The promise of a modern data lakehouse architecture Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from […]

by David Dichmann , Navita Sood 5 min read

CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Data Warehouse Machine Learning SDX Technologies Governance Machine Learning Modernize Architecture Performance Security, Risk, & Compliance

August 24, 2022 | Technical

Building Custom Runtimes with Editors in Cloudera Machine Learning

Cloudera Machine Learning (CML) is a cloud-native and hybrid-friendly machine learning platform. It unifies self-service data science and data engineering in a single, portable service as part of an enterprise data cloud for multi-function analytics on data anywhere. CML empowers organizations to build and deploy machine learning and AI capabilities for business at scale, efficiently […]

by Oleksandr Akulov 6 min read

Cloudera Data Science Workbench Data Engineering Machine Learning

August 8, 2022 | Technical

How to Use Apache Iceberg in CDP’s Open Lakehouse

In June 2022, Cloudera announced the general availability of Apache Iceberg in the Cloudera Data Platform (CDP). Iceberg is a 100% open-table format, developed through the Apache Software Foundation, which helps users avoid vendor lock-in and implement an open lakehouse. The general availability covers Iceberg running within some of the key data services in CDP, […]

by Bill Zhang , Peter Ableda , Shaun Ahmadian , Manish Maheshwari 7 min read

CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Data Warehouse Machine Learning SDX Technologies Governance Machine Learning Modernize Architecture Performance Security, Risk, & Compliance

August 3, 2022 | Technical

Applying Fine Grained Security to Apache Spark

Fine grained access control (FGAC) with Spark Apache Spark with its rich data APIs has been the processing engine of choice in a wide range of applications from data engineering to machine learning, but its security integration has been a pain point. Many enterprise customers need finer granularity of control, in particular at the column […]

by Shaun Ahmadian , Bill Zhang 4 min read

CDP Private Cloud Cloudera Data Platform (CDP) Data Engineering Data Ingestion Governance Ops and DevOps Security, Risk, & Compliance

June 30, 2022 | Technical

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera Technology Spotlight

by Bill Zhang , Shaun Ahmadian , Cloudera Contributors 5 min read

CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Data Warehouse Machine Learning SDX Technologies Governance Machine Learning Modernize Architecture Performance Security, Risk, & Compliance

June 17, 2022 | Business

The Future of the Data Lakehouse – Open

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake. […]

by Ram Venkatesh , Priyank Patel 4 min read

CDP Public Cloud Cloudera Data Platform (CDP) Data Engineering Data Warehouse Machine Learning SDX Technologies Governance Machine Learning Modernize Architecture Performance Security, Risk, & Compliance

May 9, 2022 | Technical

Optimizing Hive on Tez Performance

A guide to tune and troubleshoot performance of the Hive on Tez after upgrading to CDP

by Jay Desai 8 min read

Apache Hive CDP Private Cloud Cloudera Data Platform (CDP) Data Engineering

Filter By