Apache Hadoop Archives | Cloudera Blog

September 7, 2022 | Business

Large Scale Industrialization Key to Open Source Innovation

Cloudera’s open source licensing policies have evolved with the changing dynamics in open source innovation. For more information on Cloudera’s current policy, please contact OSSQuestions@cloudera.com. We are now well into 2022 and the megatrends that drove the last decade in data—The Apache Software Foundation as a primary innovation vehicle for big data, the arrival of […]

by Cloudera 5 min read

August 26, 2021 | Technical

Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The […]

by Cloudera , Aravindan Vijayan 7 min read

Apache Hadoop Apache Hive Apache Impala Apache NiFi Apache Ozone Apache Spark Data Science Workbench Ozone AI Private Cloud Data Ingestion Data Science Enterprise AI

December 7, 2020 | Technical

Global View Distributed File System with Mount Points

Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of […]

by Cloudera 9 min read

Apache Hadoop Apache Ozone Data Science Workbench S3Guard Cloudera Data Platform Data Engineering Data Warehouse Modernize Architecture

August 21, 2020 | Business

Dancing with Elephants in 5 Easy Steps

The Corner Office is pressing their direct reports across the company to “Move To The Cloud” to increase agility and reduce costs. And next to those legacy ERP, HCM, SCM and CRM systems, that mysterious elephant in the room – that “Big Data” platform running in the data center that is driving much of the […]

by Cloudera 6 min read

Apache Hadoop Cloud Enterprise data cloud Hybrid Cloud Data Science Enterprise AI Streaming data

August 6, 2020 | Business

Industry Transformation: the new business as usual

The Industry Transformation category at the Data Impact Awards has never been more timely. While the business world is mostly focused on digital transformation, Cloudera and our customers know that true, data-driven change is reshaping whole enterprises and entire industries. We are also, at time of writing, in the middle of a global pandemic – […]

by Cloudera 2 min read

Apache Hadoop Data Science Deep Learning Enterprise data cloud Cloudera Data Platform Customer Analytics Data Ingestion Data Science Enterprise AI IoT/ Connected Products Modernize Architecture Security, Risk, & Compliance Streaming data

July 27, 2020 | Technical

Enabling high-speed Spark direct reader for Apache Hive ACID tables

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse […]

by Cloudera , Anurag Shekhar , Shubham Chaurasia 5 min read

Apache Hadoop Apache Hive Apache Spark Data Science Workbench Cloudera Data Platform Data Engineering Customer Analytics Enterprise AI Performance

July 20, 2020 | Technical

Cloudera Operational Database experience (dbPaaS) available as Technical Preview

The Cloudera Operational Database (COD) experience is a managed dbPaaS solution which abstracts the underlying cluster instance as a Database. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year. […]

by Cloudera , Josh Elser , Amit Virmani 3 min read

Apache Hadoop Cloudera Data Platform Operational DB Ops and DevOps

July 16, 2020 | Technical

Fair Scheduler to Capacity Scheduler conversion tool

Introduction In Apache Hadoop YARN 3.x (YARN for short), switching to Capacity Scheduler has considerable benefits and only a few drawbacks. To bring these features to users who are currently using Fair Scheduler, we created a tool with the upstream YARN community to help the migration process. Why switching to Capacity Scheduler What can we […]

by Peter Bacsko , Rudolf Reti 10 min read

Apache Hadoop Apache Yarn Cloudera Enterprise Cloudera Data Platform Data Science

July 10, 2020 | Technical

Apache Hadoop YARN in CDP Data Center 7.1: What’s new and how to upgrade

Editor’s Note, August 2020: CDP Data Center is now called CDP Private Cloud Base. You can learn more about it here. Background This blogpost will cover how customers can migrate clusters and workloads to the new Cloudera Data Platform – Data Center 7.1 (CDP DC 7.1 onwards) plus highlights of this new release. CDP DC […]

by Szilard Nemeth , Wilfred Spiegelenburg 6 min read

Apache Hadoop Hortonworks Data Platform Cloudera Data Platform Data Science

June 24, 2020 | Technical

Multi-Raft – Boost up write performance for Apache Hadoop-Ozone

This blog post was written by Guest Blogger Li Cheng, Software Engineer, Tencent Inc. Using Hadoop-Ozone in Prod Apache Hadoop-Ozone is a new-era object storage solution for Big Data platform. It is scalable with strong consistency. Ozone uses Raft protocol, implemented by Apache Ratis (Incubating), to achieve high availability in its distributed system. My team […]

by Cloudera 8 min read

Apache Hadoop Data Science Workbench Ozone Cloudera Data Platform Data Engineering Technology Modernize Architecture Performance

Filter By