Author Archives: Yanpei Chen

What Do Real-Life Apache Hadoop Workloads Look Like?

Categories: CDH Hadoop HBase HDFS Hive MapReduce Oozie Ops and DevOps Pig Testing Use Case

Organizations in diverse industries have adopted Apache Hadoop-based systems for large-scale data processing. As a leading force in Hadoop development with customers in half of the Fortune 50 companies, Cloudera is in a unique position to characterize and compare real-life Hadoop workloads. Such insights are essential as developers, data scientists, and decision makers reflect on current use cases to anticipate technology trends.

Recently we collaborated with researchers at UC Berkeley to collect and analyze a set of Hadoop traces.

Read More