Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications. The Apache Hadoop Distributed File System (HDFS) was developed to store and process large data sets over the range of terabytes and petabytes. However, HDFS stores small files inefficiently, leading to inefficient Namenode memory utilization and RPC calls, block scanning throughput degradation, and reduced application layer performance. In this blog post,
Timely data is crucial to businesses in the Big Data age: This blog post outlines how Santander UK utilises the latest Cloudera technologies and superior software development capability to create the next generation of data warehousing and streaming analytics to support intelligence that can improve relationships with customers and follow the mantra of ‘we want to help people grow and prosper.‘
Santander UK’s big data journey started around four years ago.
In Part 1: Infrastructure Considerations in this three part revamped series on deploying clusters like a boss, we provided a general explanation for how nodes are classified, disk layout configurations and network topologies to think about when deploying your clusters.
In this Part 2: Service and Role Layouts segment of the series, we take a step higher up the stack looking at the various services and roles that make up your Cloudera Enterprise deployment.
Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the concept of Hadoop Delegation Tokens in the context of Hadoop Distributed File System (HDFS) and Hadoop Key Management Server (KMS),
We at Cloudera believe that all companies should have the power to leverage data for financial gain, to lower operational costs, and to avoid risk. We enable this by providing an enterprise grade platform that allows customers to easily manage, store, process, and analyze all of your data, regardless of volume and variety.
Cloudera’s Enterprise Data Hub (EDH), a modern machine learning and analytics platform that is optimized for the cloud,