Before CDH 5.10, every CDH cluster had to have its own Apache Hive Metastore (HMS) backend database. This model is ideal for clusters where each cluster contains the data locally along with the metadata. In the cloud, however, many CDH clusters run directly on a shared object store (like Amazon S3), making it possible for the data to live across multiple clusters and beyond any cluster’s lifespan. In this scenario clusters need to regenerate and coordinate metadata for the underlying shared data individually.
This new (alpha) C++ client library for Apache Impala (incubating) and Apache Hive provides high-performance data access from Python.
Earlier this year, members of the Python data tools and Impala teams at Cloudera began collaborating to create a new C++ library to eventually become a faster, more memory-efficient replacement for impyla, PyHive, and other (largely pure Python) client libraries for talking to Hive and Impala.
As part of the drumbeat for Spark Summit West in San Francisco (June 6-8), learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL.
In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis,
Cloudera Enterprise 5.7 is now generally available (comprising CDH 5.7, Cloudera Manager 5.7, and Cloudera Navigator 2.6).
Cloudera is excited to announce the general availability of Cloudera Enterprise 5.7! Main highlights of this release include production-ready Hive-on-Spark functionality, which will help users accelerate their use of Apache Spark as a data processing standard; 4x performance gains for Apache Impala (incubating); easier cluster configuration and utilization reporting; and end-to-end encryption for Apache Spark data.
The recently-released Apache Hive 2.0 contains some exciting improvements, many of which are already available in CDH 5.x.
Recently, the Apache Hive community announced Hive 2.0.0. This is a larger release compared to the previous one (covered here), with a lengthy list of new features (many experimental), enhancements, and bug fixes. Cloudera’s Hive team have been working with the community for months to drive toward this significant release.