Tag Archives: Amazon S3

Apache Impala is now a Top-Level Apache Project

Categories: CDH Hadoop Impala

Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop.  Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options),

Read more

New in Cloudera Enterprise 5.12: Hue 4 Interface and Query Assistant

Categories: CDH Cloudera Manager Cloudera Navigator Hadoop Hue

When it comes to self-service business intelligence and exploratory analytics, Cloudera has continued to push limits and innovate to help our customers expedite this journey and get the most value from their data. Over the past year, we have made a number of significant advancements in Hue to provide a more powerful user experience for SQL developers and make them more productive for their every day self-service BI tasks and workflows.

With the recent release of Cloudera 5.12,

Read more

Introducing S3Guard: S3 Consistency for Apache Hadoop

Categories: Altus CDH Cloud Hadoop

Synopsis

This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works.

Problem

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.

Read more

Using Amazon S3 with Cloudera BDR

Categories: CDH Cloud Cloudera Manager HDFS Hive

More of you are moving to public cloud services for backup and disaster recovery purposes, and Cloudera has been enhancing the capabilities of Cloudera Manager and CDH to help you do that. Specifically, Cloudera Backup and Disaster Recovery (BDR) now supports backup to and restore from Amazon S3 for Cloudera Enterprise customers.

BDR lets you replicate Apache HDFS data from your on-premise cluster to or from Amazon S3 with full fidelity (all file and directory metadata is replicated along with the data).

Read more