Cloudera’s open source licensing policies have evolved with the changing dynamics in open source innovation. For more information on Cloudera’s current policy, please contact OSSQuestions@cloudera.com. We are now well into 2022 and the megatrends that drove the last decade in data—The Apache Software Foundation as a primary innovation vehicle for big data, the arrival of […]
Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The […]
Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of […]
The Corner Office is pressing their direct reports across the company to “Move To The Cloud” to increase agility and reduce costs. And next to those legacy ERP, HCM, SCM and CRM systems, that mysterious elephant in the room – that “Big Data” platform running in the data center that is driving much of the […]
The Industry Transformation category at the Data Impact Awards has never been more timely. While the business world is mostly focused on digital transformation, Cloudera and our customers know that true, data-driven change is reshaping whole enterprises and entire industries. We are also, at time of writing, in the middle of a global pandemic – […]
Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse […]
The Cloudera Operational Database (COD) experience is a managed dbPaaS solution which abstracts the underlying cluster instance as a Database. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year. […]
Introduction In Apache Hadoop YARN 3.x (YARN for short), switching to Capacity Scheduler has considerable benefits and only a few drawbacks. To bring these features to users who are currently using Fair Scheduler, we created a tool with the upstream YARN community to help the migration process. Why switching to Capacity Scheduler What can we […]
Editor’s Note, August 2020: CDP Data Center is now called CDP Private Cloud Base. You can learn more about it here. Background This blogpost will cover how customers can migrate clusters and workloads to the new Cloudera Data Platform – Data Center 7.1 (CDP DC 7.1 onwards) plus highlights of this new release. CDP DC […]
This blog post was written by Guest Blogger Li Cheng, Software Engineer, Tencent Inc. Using Hadoop-Ozone in Prod Apache Hadoop-Ozone is a new-era object storage solution for Big Data platform. It is scalable with strong consistency. Ozone uses Raft protocol, implemented by Apache Ratis (Incubating), to achieve high availability in its distributed system. My team […]