Released with CDH 5.8, Impala 2.6 brings solid performance improvements, particularly for clusters secured by Kerberos running BI workloads on Apache Hadoop.
Just a few months back, we showed you how Impala 2.5 delivered a 4x performance boost compared to Impala 2.3 for BI workloads on Hadoop via the introduction of several features like runtime filters. Here’s an update: Compared to two releases ago, Impala 2.6 delivers 12x better performance on secure workloads and continues this drumbeat of consistent performance improvement.
This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds.
Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for many reasons, including overburdened backing data stores or content-management systems or a delay in load times of third-party content such as advertisements.
This new (alpha) C++ client library for Apache Impala (incubating) and Apache Hive provides high-performance data access from Python.
Earlier this year, members of the Python data tools and Impala teams at Cloudera began collaborating to create a new C++ library to eventually become a faster, more memory-efficient replacement for impyla, PyHive, and other (largely pure Python) client libraries for talking to Hive and Impala.
As part of the drumbeat for Spark Summit West in San Francisco (June 6-8), learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.
In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,
This new guide steps you through the process of configuring that platform to manage client connections to Impala for HA in business-critical BI applications.
For production deployments of Apache Impala (incubating), using a load-balancing proxy server has several advantages:
- Applications connect to a single well-known host and port, rather than keeping track of the hosts where the impalad daemon is running.
- If any host running the impalad daemon becomes unavailable,