Category Archives: Hadoop

Introducing S3Guard: S3 Consistency for Apache Hadoop

Categories: Altus CDH Cloud Hadoop

Synopsis

This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works.

Problem

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.

Read more

Deep learning on Apache Spark and Apache Hadoop with Deeplearning4j

Categories: Data Science Hadoop Spark

In late 2016, Ben Lorica of O’Reilly Media declared that “2017 will be the year the data science and big data community engage with AI technologies.” Deep learning on GPUs has pervaded universities and research organizations prior to 2017, but distributed deep learning on CPUs is now beginning to gain widespread adoption in a diverse set of companies and domains. While GPUs provide top-of-the-line performance in numerical computing, CPUs are also becoming more efficient and much of today’s existing hardware already has CPU computing power available in bulk.

Read more

How-to: Backup and disaster recovery for Apache Solr (part I)

Categories: Hadoop How-to Search

Cloudera Search (that is Apache Solr integrated with the Apache Hadoop eco-system) now supports (as of C5.9) a backup and disaster recovery capability for Solr collections.

In this post we will cover the basics of the backup and disaster recovery capability in Solr and hence in Cloudera Search. In the next post we will cover the design of the Solr snapshots functionality and its integration with the Hadoop ecosystem as well as public cloud platforms (e.g.

Read more

New in Cloudera Enterprise 5.11: Hue Data Search and Tagging

Categories: CDH Hadoop Hue

Self-service business intelligence and exploratory analytics continue to be a primary use case for Cloudera’s customers. Over the past year, we have made a number of significant advancements in Hue, the intelligent SQL editor, to provide a more powerful user experience for SQL developers and make them even more productive for those use cases.

The recent release of  Cloudera 5.11 furthers this effort with new enhancements around embedded search and tagging for faster data discovery,

Read more

The Benefits of Migrating HPC Workloads To Apache Spark

Categories: CDH Data Science Hadoop Spark

Overview

Recently we worked with a customer that needed to run a very significant amount of models in a given day to satisfy internal and government regulated risk requirements.  Several thousand model executions would need to be supported per hour.  Total execution time was very important to this client.  In the past the customer used thousands of servers to meet the demand.  They need to run many derivations of this model with different economic factors to satisfy their requirements.

Read more