Category Archives: Hadoop

Skool: An Open Source Data Integration Tool for Apache Hadoop from BT Group

Categories: Data Ingestion Guest Hadoop

In this guest post, Skool’s architects at BT Group explain its origins, design, and functionality.

With increased adoption of big data comes the challenge of integrating existing data sitting in various relational and file-based systems with Apache Hadoop infrastructure. Although open source connectors (such as Apache Sqoop) and utilities (such as Httpfs/Curl on Linux) make it easy to exchange data, data engineering teams often spend an inordinate amount of time writing code for this purpose.

Read more

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Categories: Data Ingestion Flume Hadoop Kafka Search Spark Use Case

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases,

Read more

Cloudera Enterprise 5.8 is Now Available

Categories: CDH Cloudera Manager Hadoop

Cloudera Enterprise 5.8 is now generally available (comprising CDH 5.8, Cloudera Manager 5.8, and Cloudera Navigator 2.7). 

Cloudera is excited to announce the general availability of Cloudera Enterprise 5.8! Main highlights of this release include Impala read/write support on Amazon S3, a redesigned SQL query editor GUI, the expansion of role-based access control functionality to Cloudera Search, and the GA of Cloudera Navigator Optimizer to facilitate and optimize workload migrations.

Read more

Cloudera Navigator Optimizer Graduates from Beta, is Now Generally Available

Categories: Cloud Cloudera Navigator Hadoop

This new release includes, among other things, support for “slicing and dicing” workloads by user/application/report, workload breakdown by similar queries, and alerts for Apache Hive and Apache Impala (incubating) best practices.

Cloudera Navigator Optimizer enables database architects and database administrators (DBAs) to gain in-depth understanding of their SQL workloads running in data warehouse environments or on Apache Hadoop. Navigator Optimizer makes planning offload projects more predictable by assessing risk and reducing development costs.

Read more