Category Archives: HDFS

Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges

Categories: Hadoop HDFS

Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications. The Apache Hadoop Distributed File System (HDFS) was developed to store and process large data sets over the range of terabytes and petabytes. However, HDFS stores small files inefficiently, leading to inefficient Namenode memory utilization and RPC calls, block scanning throughput degradation, and reduced application layer performance. In this blog post,

Read more

Next Generation Data Warehousing at Santander UK

Categories: CDH HBase HDFS Kafka Kudu Use Case

Timely data is crucial to businesses in the Big Data age: This blog post outlines how Santander UK utilises the latest Cloudera technologies and superior software development capability to create the next generation of data warehousing and streaming analytics to support intelligence that can improve relationships with customers and follow the mantra of ‘we want to help people grow and prosper.

Santander UK’s big data journey started around four years ago.

Read more

Deploy Cloudera EDH Clusters Like a Boss Revamped – Part 2

Categories: CDH Hadoop HDFS

In Part 1: Infrastructure Considerations in this three part revamped series on deploying clusters like a boss, we provided a general explanation for how nodes are classified, disk layout configurations and network topologies to think about when deploying your clusters.

In this Part 2: Service and Role Layouts segment of the series, we take a step higher up the stack looking at the various services and roles that make up your Cloudera Enterprise deployment.  

Read more

Hadoop Delegation Tokens Explained

Categories: CDH Hadoop HDFS Platform Security & Cybersecurity

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the concept of Hadoop Delegation Tokens in the context of Hadoop Distributed File System (HDFS) and Hadoop Key Management Server (KMS),

Read more

Deploy Cloudera EDH Clusters Like a Boss Revamped – Part 1

Categories: CDH Hadoop HDFS

We at Cloudera believe that all companies should have the power to leverage data for financial gain, to lower operational costs, and to avoid risk. We enable this by providing an enterprise grade platform that allows customers to easily manage, store, process, and analyze all of your data, regardless of volume and variety.

Cloudera’s Enterprise Data Hub (EDH), a modern machine learning and analytics platform that is optimized for the cloud,

Read more