Author Archives: Cy Jervis

About Cy Jervis

Community Manager Cloudera Community

Accessing Secure Cluster from Web Applications

Categories: CDH Hadoop How-to

As customers use Apache Hadoop clusters in ways other than through HUE and Hadoop Command Line Interface (CLI) and integrate it closely with the applications they develop, we often get asked how to access their secure Hadoop cluster from within the custom applications. Many customers use a service account in their application and access the cluster with a fixed service account. However, other customers would like to access as the end users who have authenticated to the application.

Read more

Implementing Temporal Graphs with Apache TinkerPop and HGraphDB

Categories: Graph Processing Hadoop HBase How-to

When most people think of Big Data, often they imagine loads of unstructured data. However, there is always some sort of structure or relationships within this data. Based on these relationships there are one or more representation schemes best suited to handle this type of data. A common pattern seen in the field is hierarchy/relationship representation. This form of representation is adept in handling scenarios like complex business models, chain of event or plans, chain of stock orders in banks,

Read more

New in Cloudera Enterprise 5.12: Hue 4 Interface and Query Assistant

Categories: CDH Cloudera Manager Cloudera Navigator Hadoop Hue

When it comes to self-service business intelligence and exploratory analytics, Cloudera has continued to push limits and innovate to help our customers expedite this journey and get the most value from their data. Over the past year, we have made a number of significant advancements in Hue to provide a more powerful user experience for SQL developers and make them more productive for their every day self-service BI tasks and workflows.

With the recent release of Cloudera 5.12,

Read more

Cloudera Director and Spot Instances: Resilience and Repair

Categories: CDH Cloud Testing

Cloudera Director enables self-service provisioning and management of CDH and Cloudera Enterprise Data Hub in the cloud. Running Cloudera Enterprise on top of public cloud infrastructure allows you to pay only for the resources you need to meet your data processing demands.

Amazon Web Services (AWS) provides the ability to bid on spare Amazon EC2 computing capacity at a discount through Amazon EC2 Spot instances. With Cloudera Director, you can configure clusters to use Spot instances to improve workload execution time and save costs.

Read more

Introducing S3Guard: S3 Consistency for Apache Hadoop

Categories: Altus CDH Cloud Hadoop

Synopsis

This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works.

Problem

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.

Read more