Apache Hive Archives | Cloudera Blog

May 9, 2022 | Technical

Optimizing Hive on Tez Performance

A guide to tune and troubleshoot performance of the Hive on Tez after upgrading to CDP

by Cloudera 8 min read

November 22, 2021 | Business

Addressing the Three Scalability Challenges in Modern Data Platforms

How the Cloudera Data Platform helps organizations overcome scalability challenges

by Cloudera 6 min read

Apache Hive Apache Impala Apache Spark Cloudera Data Platform Private Cloud Public Cloud

September 21, 2021 | Technical

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for […]

by Philippe Lanoe , Shaun Ahmadian 5 min read

Apache Airflow Apache Hive Apache Spark Data Engineering Ops and DevOps

August 26, 2021 | Technical

Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The […]

by Cloudera , Aravindan Vijayan 7 min read

Apache Hadoop Apache Hive Apache Impala Apache NiFi Apache Ozone Apache Spark Data Science Workbench Ozone AI Private Cloud Data Ingestion Data Science Enterprise AI

August 17, 2021 | Technical

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to […]

by Shaun Ahmadian 4 min read

Apache Airflow Apache Hive Apache Spark Data Engineering Private Cloud Public Cloud Data Ingestion Data Science Modernize Architecture

August 10, 2021 | Technical

Generating and Viewing Lineage through Apache Ozone

Follow your data in object storage on-premises As businesses look to scale-out storage, they need a storage layer that is performant, reliable and scalable. With Apache Ozone on the Cloudera Data Platform (CDP), they can implement a scale-out model and build out their next generation storage architecture without sacrificing security, governance and lineage. CDP integrates […]

by Cloudera , Srinivas Sudhindra 6 min read

Apache Atlas Apache Hive Apache Ozone Apache Spark Shared Data Experience (SDX) Cloudera Data Platform Private Cloud Metadata & Lineage

June 25, 2021 | Technical

Migrate Hive data from CDH to CDP public cloud

Introduction Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments. Using easy-to-define policies, Replication Manager solves […]

by Shailesh Shiwalkar 7 min read

Apache Hive Apache Impala Apache Ranger Apache Sentry Cloudera Enterprise Migrate to CDP Shared Data Experience (SDX) Cloudera Data Platform Data Engineering Data Hub Public Cloud Data Ingestion Governance Metadata & Lineage

June 7, 2021 | Technical

What is new in Cloudera Streaming Analytics 1.4?

At the end of March, we released the first version of Cloudera SQL Stream Builder as part of Cloudera Streaming Analytics 1.3. It enabled users to easily write, run and manage real-time SQL queries on streams from Apache Kafka with an exceptionally smooth user experience. Since then, we have been working hard to expose the […]

by Cloudera , Marton Balassi 3 min read

Apache Flink Apache Hive Apache Kafka Apache Kudu Cloudera Data Platform DataFlow Customer Analytics Modernize Architecture

March 24, 2021 | Technical

Filter more pay less with the latest Cloudera Data Warehouse runtime!

Introduction One of the most effective ways to improve performance and minimize cost in database systems today is by avoiding unnecessary work, such as data reads from the storage layer (e.g., disks, remote storage), transfers over the network, or even data materialization during query execution. Since its early days, Apache Hive improves distributed query execution […]

by Cloudera 5 min read

Apache Hive Cloudera Data Platform Data Warehouse Public Cloud Technology Data Ingestion

February 9, 2021 | Technical

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability. A typical approach that […]

by Cloudera 8 min read

Apache Hive Apache Kafka Cloud Cloud Data Warehouse Cloudera Data Platform Data Warehouse Public Cloud Modernize Architecture Streaming data

Filter By