Category Archives: Hadoop

Deep Learning with Intel’s BigDL and Apache Spark

Categories: CDH Data Science Hadoop Spark

Cloudera recently published a blog post on how to use Deeplearning4J (DL4J) along with Apache Hadoop and Apache Spark to get state-of-the-art results on an image recognition task. Continuing on a similar stream of work, in this post we discuss a viable alternative that is specifically designed to be used with Spark, and data available in Spark and Hadoop clusters via a Scala or Python API.

The Deep Learning landscape is still evolving.

Read more

Accessing Secure Cluster from Web Applications

Categories: CDH Hadoop How-to

As customers use Apache Hadoop clusters in ways other than through HUE and Hadoop Command Line Interface (CLI) and integrate it closely with the applications they develop, we often get asked how to access their secure Hadoop cluster from within the custom applications. Many customers use a service account in their application and access the cluster with a fixed service account. However, other customers would like to access as the end users who have authenticated to the application.

Read more

Implementing Temporal Graphs with Apache TinkerPop and HGraphDB

Categories: Graph Processing Hadoop HBase How-to

When most people think of Big Data, often they imagine loads of unstructured data. However, there is always some sort of structure or relationships within this data. Based on these relationships there are one or more representation schemes best suited to handle this type of data. A common pattern seen in the field is hierarchy/relationship representation. This form of representation is adept in handling scenarios like complex business models, chain of event or plans, chain of stock orders in banks,

Read more

New in Cloudera Enterprise 5.12: Hue 4 Interface and Query Assistant

Categories: CDH Cloudera Manager Cloudera Navigator Hadoop Hue

When it comes to self-service business intelligence and exploratory analytics, Cloudera has continued to push limits and innovate to help our customers expedite this journey and get the most value from their data. Over the past year, we have made a number of significant advancements in Hue to provide a more powerful user experience for SQL developers and make them more productive for their every day self-service BI tasks and workflows.

With the recent release of Cloudera 5.12,

Read more

Introducing S3Guard: S3 Consistency for Apache Hadoop

Categories: Altus CDH Cloud Hadoop

Synopsis

This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works.

Problem

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.

Read more