Has your organization considered upgrading from Hortonworks Data Flow (HDF) to Cloudera Flow Management (CFM), but thought the migration process would be too disruptive to your mission critical dataflows? In truth, many NiFi dataflows can be migrated from HDF to CFM quickly and easily with no data loss and without any service interruption. Here we […]
CDP for Azure introduces fine-grained authorization for access to Azure Data Lake Storage using Apache Ranger policies. Cloudera and Microsoft have been working together closely on this integration, which greatly simplifies the security administration of access to ADLS-Gen2 cloud storage. Apache Ranger provides a centralized console to manage authorization and view audits of access to […]
From a-z in 10 minutes! It is hard to believe if you have had previous experience with setting up, sizing, and deploying a distributed search engine service that this is possible. Imagine how many times IT has lost valuable time spending hours trying to understand Apache Solr application requirements and map them into how to […]
Apache Hadoop Ozone is a distributed key-value store that can manage both small and large files alike. Ozone was designed to address the scale limitations of HDFS with respect to small files. HDFS is designed to store large files and the recommended number of files on HDFS is 300 million for a Namenode, and doesn’t […]
Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance […]
This blog discusses answers to questions like what is the right disk size in datanode and what is the right capacity for a datanode. A few of our customers have asked us about using dense storage nodes. It is certainly possible to use dense nodes for archival storage because IO bandwidth requirements are usually lower […]
Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications. The Apache Hadoop Distributed File System (HDFS) was developed to store and process large data sets over the range of terabytes and petabytes. However, HDFS stores small files inefficiently, leading […]
Enterprises are increasingly moving portions or entire datacenters to the cloud in order to minimize their physical footprint, minimize operational overhead, and shorten their infrastructure acquisition cycles. An incidental benefit is that cloud services, like cloud-based object storage, bring a new set of tools to a Hadoop architect. At Hortonworks, our customers use a number […]
In Part 1: Infrastructure Considerations in this three part revamped series on deploying clusters like a boss, we provided a general explanation for how nodes are classified, disk layout configurations and network topologies to think about when deploying your clusters. In this Part 2: Service and Role Layouts segment of the series, we take a […]