Category Archives: Impala

BI and SQL Analytics with Apache Impala (Incubating) in CDH 5.8: 3x Faster on Secure Clusters

Categories: CDH Impala

Released with CDH 5.8, Impala 2.6 brings solid performance improvements, particularly for clusters secured by Kerberos running BI workloads on Apache Hadoop.

Just a few months back, we showed you how Impala 2.5 delivered a 4x performance boost compared to Impala 2.3 for BI workloads on Hadoop via the introduction of several features like runtime filters. Here’s an update: Compared to two releases ago, Impala 2.6 delivers 12x better performance on secure workloads and continues this drumbeat of consistent performance improvement.

Read More

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time

Categories: CDH Flume Impala Spark Use Case

This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds.                     

Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for many reasons, including overburdened backing data stores or content-management systems or a delay in load times of third-party content such as advertisements.

Read More

Announcing hs2client, A Fast New C++ / Python Thrift Client for Impala and Hive

Categories: Data Science Hive Impala Tools

This new (alpha) C++ client library for Apache Impala (incubating) and Apache Hive provides high-performance data access from Python.

Earlier this year, members of the Python data tools and Impala teams at Cloudera began collaborating to create a new C++ library to eventually become a faster, more memory-efficient replacement for impyla, PyHive, and other (largely pure Python) client libraries for talking to Hive and Impala.

Read More

How-to: Analyze Fantasy Sports using Apache Spark and SQL

Categories: Hive How-to Impala Spark Use Case

As part of the drumbeat for Spark Summit West in San Francisco (June 6-8),  learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.

In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,

Read More

Guide to Configuring Apache Impala (incubating) for HA with F5 BIG-IP

Categories: Impala

This new guide steps you through the process of configuring that platform to manage client connections to Impala for HA in business-critical BI applications.

For production deployments of Apache Impala (incubating), using a load-balancing proxy server has several advantages:

  • Applications connect to a single well-known host and port, rather than keeping track of the hosts where the impalad daemon is running.
  • If any host running the impalad daemon becomes unavailable,

Read More