Tag Archives: Hive

Using Amazon S3 with Cloudera BDR

Categories: CDH Cloud Cloudera Manager HDFS Hive

More of you are moving to public cloud services for backup and disaster recovery purposes, and Cloudera has been enhancing the capabilities of Cloudera Manager and CDH to help you do that. Specifically, Cloudera Backup and Disaster Recovery (BDR) now supports backup to and restore from Amazon S3 for Cloudera Enterprise customers.

BDR lets you replicate Apache HDFS data from your on-premise cluster to or from Amazon S3 with full fidelity (all file and directory metadata is replicated along with the data).

Read more

Data Engineering with Cloudera Altus

Categories: Altus Cloud Hive Spark

With modern businesses dealing with an ever-increasing volume of data, and an expanding set of data sources, the data engineering process that enables analysis, visualization, and reporting only becomes more important.

When considering running data engineering workloads in the public cloud, there are capabilities which enable different operational models from on-premises deployments. The key factors here are the presence of a distinct storage layer within the cloud environment, and the ability to provision compute resources on-demand (e.g.: with Amazon’s S3 and EC2 respectively).

Read more

New in Cloudera Enterprise 5.10: Hue SQL Editor and Security Improvements

Categories: Hadoop Hue Oozie

Cloudera Enterprise 5.10 includes the latest updates of Hue, the intelligent editor for SQL Developers and Analysts.

As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.10 includes an updated version of Hue. We provide a summary of the main enhancements in the following part of this blog post. (Hue from C5.10 is also available for a quick try in one click on demo.gethue.com.)

SQL Improvements

The Hue editor keeps getting better with these major improvements:

Row Count

The number of rows returned is displayed so you can quickly see the size of the dataset.

Read more

Introducing Cloudera Navigator Optimizer: For Optimal SQL Workload Efficiency on Apache Hadoop

Categories: Cloudera Navigator Impala Performance

Cloudera Navigator Optimizer, a new (beta) component of Cloudera Enterprise, helps optimize inefficient query workloads for best results on Apache Hadoop.

With the proliferation of Apache Hadoop deployments, more and more customers are looking to reduce operational overheads in their enterprise data warehouse (EDW) installations by exploiting low-cost, highly scalable, open source SQL-on-Hadoop frameworks such as Impala and Apache Hive. Processing portions of SQL workloads better suited to Hadoop on these frameworks,

Read more