Apache Yarn Archives | Cloudera Blog

September 1, 2020 | Technical

Discover and Explore Data Faster with the CDP DDE Template

From a-z in 10 minutes! It is hard to believe if you have had previous experience with setting up, sizing, and deploying a distributed search engine service that this is possible. Imagine how many times IT has lost valuable time spending hours trying to understand Apache Solr application requirements and map them into how to […]

by Cloudera 7 min read

July 16, 2020 | Technical

Fair Scheduler to Capacity Scheduler conversion tool

Introduction In Apache Hadoop YARN 3.x (YARN for short), switching to Capacity Scheduler has considerable benefits and only a few drawbacks. To bring these features to users who are currently using Fair Scheduler, we created a tool with the upstream YARN community to help the migration process. Why switching to Capacity Scheduler What can we […]

by Peter Bacsko , Rudolf Reti 10 min read

Apache Hadoop Apache Yarn Cloudera Enterprise Cloudera Data Platform Data Science

April 30, 2020 | Technical

Operational Database Integrity

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post provides an overview of the OpDB data integrity capabilities that help you achieve ACID […]

by Cloudera , Liliana Kadar 3 min read

Apache Hadoop Apache Kafka Apache NiFi Apache Phoenix Apache Ranger Apache Spark Apache Yarn Cloudera Manager Hue Operational DB Ops and DevOps

April 16, 2020 | Technical

Operational Database Management

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post gives you an overview of the OpDB management tools and features in the Cloudera […]

by Cloudera , Liliana Kadar , Krishna Maheshwari 2 min read

Apache Hadoop Apache Kafka Apache NiFi Apache Phoenix Apache Ranger Apache Spark Apache Yarn Cloudera Manager Hue Cloudera Data Platform Operational DB Ops and DevOps

April 6, 2020 | Business

Hadoop: Decade Two, Day Zero*

This blog was originally published on Medium The Data Cloud — Powered By Hadoop One key aspect of the Cloudera Data Platform (CDP), which is just beginning to be understood, is how much of a recombinant-evolution it represents, from an architectural standpoint, vis-à-vis Hadoop in its first decade. I’ve been having a blast showing CDP to […]

by Cloudera 6 min read

Apache Hadoop Apache Hive Apache Impala Apache Yarn Cloud Hortonworks Data Platform Hybrid Cloud Cloudera Data Platform Data Hub Private Cloud Public Cloud Data Science Modernize Architecture

February 14, 2020 | Technical

Benchmarking Ozone: Cloudera’s next generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance […]

by Istvan Fajth , Mukul Kumar Singh 4 min read

Apache HDFS Apache Hive Apache Ozone Apache Yarn Cloudera Data Platform Data Hub Data Warehouse Modernize Architecture Performance

January 8, 2020 | Technical

Introducing Apache Spark on Docker on top of Apache YARN with CDP DataCenter release

Editor’s Note, August 2020: CDP Data Center is now called CDP Private Cloud Base. You can learn more about it here. Introduction Motivation Bringing your own libraries to run a Spark job on a shared YARN cluster can be a huge pain. In the past, you had to install the dependencies independently on each host […]

by Cloudera 11 min read

Apache Spark Apache Yarn Docker Cloudera Data Platform Data Engineering Enterprise AI Modernize Architecture

July 17, 2019 | Technical

YuniKorn: a universal resources scheduler

Hello world, it’s been a while! We are super excited today to announce the open-sourcing of one of the exciting new projects we’ve been working behind the scenes at the intersection of big-data and computation platforms – YuniKorn! Yunikorn is a new standalone universal resource-scheduler responsible for allocating/managing resources for big-data workloads including batch jobs and […]

by Cloudera , Wangda Tan , Vinod Kumar Vavilapalli , Sunil Govindan , Wilfred Spiegelenburg 4 min read

Apache Hadoop Apache Yarn Cloud

January 14, 2019 | Business

Open Hybrid Architecture Initiative: Game Changing User Experience Powering the Cloud Native Journey

This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. This is part seven of an on going series about the Open Hybrid Architecture Initiative. You can learn more about the vision, key tenets, real-world use case, new storage environment of O3, participation […]

by Cloudera 7 min read

Apache Yarn

December 20, 2018 | Technical

{Submarine} : Running deep learning workloads on Apache Hadoop

This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. (This Blogpost is coauthored by Xun Liu and Quan Zhou from Netease). Introduction Hadoop is the most popular open source framework for the distributed processing of large, enterprise data sets. It is heavily […]

by Cloudera 8 min read

Apache Hadoop Apache Yarn Hortonworks Data Platform

Filter By