With modern businesses dealing with an ever-increasing volume of data, and an expanding set of data sources, the data engineering process that enables analysis, visualization, and reporting only becomes more important.
When considering running data engineering workloads in the public cloud, there are capabilities which enable different operational models from on-premises deployments. The key factors here are the presence of a distinct storage layer within the cloud environment, and the ability to provision compute resources on-demand (e.g.: with Amazon’s S3 and EC2 respectively).
Cloudera Enterprise 5.10 includes the latest updates of Hue, the intelligent editor for SQL Developers and Analysts.
As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.10 includes an updated version of Hue. We provide a summary of the main enhancements in the following part of this blog post. (Hue from C5.10 is also available for a quick try in one click on demo.gethue.com.)
The Hue editor keeps getting better with these major improvements:
The number of rows returned is displayed so you can quickly see the size of the dataset.
Starting in Cloudera Enterprise 5.5, Cloudera Navigator offers interactive visual analytics that help answer important questions about the data that’s in your CDH clusters.
The new analytics system in Cloudera Navigator shows the distribution of data along various metadata dimensions and supports interactive filtering and grouping with a simple point-and-click interface. This new functionality a great complement to Cloudera Navigator’s search capabilities and is integrated with Navigator’s policy engine, so you can easily understand the impact of data management policies before applying them to your data.
Cloudera Navigator Optimizer, a new (beta) component of Cloudera Enterprise, helps optimize inefficient query workloads for best results on Apache Hadoop.
With the proliferation of Apache Hadoop deployments, more and more customers are looking to reduce operational overheads in their enterprise data warehouse (EDW) installations by exploiting low-cost, highly scalable, open source SQL-on-Hadoop frameworks such as Impala and Apache Hive. Processing portions of SQL workloads better suited to Hadoop on these frameworks,
Impala is designed to deliver insight on data in Apache Hadoop in real time. As data often lands in Hadoop continuously in certain use cases (such as time-series analysis, real-time fraud detection, real-time risk detection, and so on), it’s desirable for Impala to query this new “fast” data with minimal delay and without interrupting running queries.
In this blog post, you will learn an approach for continuous loading of data into Impala via HDFS,