Category Archives: Hadoop

Hadoop Delegation Tokens Explained

Categories: CDH Hadoop HDFS Platform Security & Cybersecurity

Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when problems arise. Delegation tokens were designed and are widely used in the Hadoop ecosystem as an authentication method. This blog post introduces the concept of Hadoop Delegation Tokens in the context of Hadoop Distributed File System (HDFS) and Hadoop Key Management Server (KMS),

Read more

Apache Impala is now a Top-Level Apache Project

Categories: CDH Hadoop Impala

Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop.  Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options),

Read more

Deploy Cloudera EDH Clusters Like a Boss Revamped – Part 1

Categories: CDH Hadoop HDFS

We at Cloudera believe that all companies should have the power to leverage data for financial gain, to lower operational costs, and to avoid risk. We enable this by providing an enterprise grade platform that allows customers to easily manage, store, process, and analyze all of your data, regardless of volume and variety.

Cloudera’s Enterprise Data Hub (EDH), a modern machine learning and analytics platform that is optimized for the cloud,

Read more

A Look at ADLS Performance – Throughput and Scalability

Categories: CDH Cloud Hadoop HDFS Performance

Overview

Azure Data Lake Store (ADLS) is a highly scalable cloud-based data store that is designed  for collecting, storing and analyzing large amounts of data, and is ideal for enterprise-grade applications.  Data can originate from almost any source, such as Internet applications and mobile devices; it is stored securely and durably, while being highly available in any geographic region.  ADLS is performance-tuned for big data analytics and can be easily accessed from many components of the Apache Hadoop ecosystem,

Read more

Deep Learning with Intel’s BigDL and Apache Spark

Categories: CDH Data Science Hadoop Spark

Cloudera recently published a blog post on how to use Deeplearning4J (DL4J) along with Apache Hadoop and Apache Spark to get state-of-the-art results on an image recognition task. Continuing on a similar stream of work, in this post we discuss a viable alternative that is specifically designed to be used with Spark, and data available in Spark and Hadoop clusters via a Scala or Python API.

The Deep Learning landscape is still evolving.

Read more