Tag Archives: HDFS

Partition Management in Hadoop

Categories: Hadoop Hive

Guest blog post written by Adir Mashiach

In this post I’ll talk about the problem of Hive tables with a lot of small partitions and files and describe my solution in details.

partition management in hadoop

A little background

In my organization,  we keep a lot of our data in HDFS. Most of it is the raw data but a significant amount is the final product of many data enrichment processes.

Read more

Transparent Hierarchical Storage Management with Apache Kudu and Impala

Categories: CDH Impala Kudu Parquet

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options.

Read more

Hadoop Delegation Tokens Explained

Categories: CDH Hadoop HDFS Platform Security & Cybersecurity

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the concept of Hadoop Delegation Tokens in the context of Hadoop Distributed File System (HDFS) and Hadoop Key Management Server (KMS),

Read more

Apache Impala is now a Top-Level Apache Project

Categories: CDH Hadoop Impala

Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop.  Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options),

Read more

The Value of Certification

Categories: Careers Training

Each year in early November, my inbox fills up with people asking advice about certification. Some are reflecting on their careers and looking to move on or move up; others have given themselves or their managers the goal of getting certified this year. They awake one morning in early November and realize the clock is ticking.

The first thing they ask for is a discount, of course. Beyond that, they want to know what a certification is going to do for them more generally,

Read more