Category Archives: Impala

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time

Categories: CDH Flume Impala Spark Use Case

This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds.                     

Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for many reasons, including overburdened backing data stores or content-management systems or a delay in load times of third-party content such as advertisements.

Read More

Announcing hs2client, A Fast New C++ / Python Thrift Client for Impala and Hive

Categories: Data Science Hive Impala Tools

This new (alpha) C++ client library for Apache Impala (incubating) and Apache Hive provides high-performance data access from Python.

Earlier this year, members of the Python data tools and Impala teams at Cloudera began collaborating to create a new C++ library to eventually become a faster, more memory-efficient replacement for impyla, PyHive, and other (largely pure Python) client libraries for talking to Hive and Impala.

Read More

How-to: Analyze Fantasy Sports using Apache Spark and SQL

Categories: Hive How-to Impala Spark Use Case

As part of the drumbeat for Spark Summit West in San Francisco (June 6-8),  learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.

In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,

Read More

Guide to Configuring Apache Impala (incubating) for HA with F5 BIG-IP

Categories: Impala

This new guide steps you through the process of configuring that platform to manage client connections to Impala for HA in business-critical BI applications.

For production deployments of Apache Impala (incubating), using a load-balancing proxy server has several advantages:

  • Applications connect to a single well-known host and port, rather than keeping track of the hosts where the impalad daemon is running.
  • If any host running the impalad daemon becomes unavailable,

Read More

How-to: Configure SAP HANA with Apache Impala (incubating)

Categories: How-to Impala

Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.

Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,

Read More