Apache HDFS Archives - Cloudera Blog

November 23, 2021 | Technical

Getting Started with Cloudera Data Platform Operational Database (COD)

How to create an operational database

by Cloudera , Biplab Chakraborty 3 min read

January 26, 2021 | Technical

Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime

Has your organization considered upgrading from Hortonworks Data Flow (HDF) to Cloudera Flow Management (CFM), but thought the migration process would be too disruptive to your mission critical dataflows? In truth, many NiFi dataflows can be migrated from HDF to CFM quickly and easily with no data loss and without any service interruption. Here we […]

by Andrew Lim 4 min read

Apache HDFS Apache Kafka Apache NiFi Cloudera Data Platform (CDP) Cloudera Enterprise DataFlow Hortonworks Data Platform Streaming

September 15, 2020 | Technical

Access control for Azure ADLS cloud object storage

CDP for Azure introduces fine-grained authorization for access to Azure Data Lake Storage using Apache Ranger policies. Cloudera and Microsoft have been working together closely on this integration, which greatly simplifies the security administration of access to ADLS-Gen2 cloud storage. Apache Ranger provides a centralized console to manage authorization and view audits of access to […]

by Madhan Neethiraj 5 min read

Apache Atlas Apache HDFS Apache Hive Apache Kafka Apache Ranger Apache Solr Cloud SDX Technologies Security, Risk, & Compliance

September 1, 2020 | Technical

Discover and Explore Data Faster with the CDP DDE Template

From a-z in 10 minutes! It is hard to believe if you have had previous experience with setting up, sizing, and deploying a distributed search engine service that this is possible. Imagine how many times IT has lost valuable time spending hours trying to understand Apache Solr application requirements and map them into how to […]

by Cloudera 7 min read

April 24, 2020 | Technical

One billion files in Ozone

Apache Hadoop Ozone is a distributed key-value store that can manage both small and large files alike. Ozone was designed to address the scale limitations of HDFS with respect to small files. HDFS is designed to store large files and the recommended number of files on HDFS is 300 million for a Namenode, and doesn’t […]

by Nandakumar Vadivelu , Bharat Viswanadham , Shashikant Banerjee 3 min read

Apache Hadoop Apache HDFS Ozone Cloudera Data Platform (CDP) Cloudera Data Science Workbench Data Engineering Data Hub Data Warehouse Modernize Architecture

February 14, 2020 | Technical

Benchmarking Ozone: Cloudera’s next generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance […]

by Istvan Fajth , Mukul Kumar Singh 4 min read

Apache HDFS Apache Hive Apache Ozone Apache Yarn Cloudera Data Platform (CDP) Data Hub Data Warehouse Modernize Architecture Performance

February 6, 2020 | Technical

Disk and Datanode Size in HDFS

This blog discusses answers to questions like what is the right disk size in datanode and what is the right capacity for a datanode. A few of our customers have asked us about using dense storage nodes. It is certainly possible to use dense nodes for archival storage because IO bandwidth requirements are usually lower […]

by Cloudera 3 min read

Apache Hadoop Apache HDFS Cloudera Data Platform (CDP) Cloudera Enterprise Customer Analytics Performance

May 9, 2019 | Technical

Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges

Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications. The Apache Hadoop Distributed File System (HDFS) was developed to store and process large data sets over the range of terabytes and petabytes. However, HDFS stores small files inefficiently, leading […]

by Cloudera , Bhagya Gummalla 11 min read

Apache Hadoop Apache HDFS

April 5, 2018 | Technical

Cloud Architectures for Interactive Analytics with Apache Hive

Enterprises are increasingly moving portions or entire datacenters to the cloud in order to minimize their physical footprint, minimize operational overhead, and shorten their infrastructure acquisition cycles. An incidental benefit is that cloud services, like cloud-based object storage, bring a new set of tools to a Hadoop architect. At Hortonworks, our customers use a number […]

by Brandon Wilson , Gopal Vijayaraghavan 3 min read

Apache HDFS Apache Hive Cloud Customer Analytics

January 23, 2018 | Technical

Deploy Cloudera EDH Clusters Like a Boss Revamped – Part 2

In Part 1: Infrastructure Considerations in this three part revamped series on deploying clusters like a boss, we provided a general explanation for how nodes are classified, disk layout configurations and network topologies to think about when deploying your clusters. In this Part 2: Service and Role Layouts segment of the series, we take a […]

by Benjamin Vera-Tudela , Brandon Freeman , Mladen Kovacevic 8 min read

Apache Hadoop Apache HDFS Cloudera Enterprise

Filter By