Combining HANA and Impala can unlock a variety of new use cases that span the full range of enterprise data. Here’s how to do it.
Information is growing at an exponential rate driven by enterprise applications and databases, and often takes the form of new types of data from sources such as social media, sensors, and mobile devices. Because it is not cost-effective to store and process all this information in an in-memory database,
Thanks to Richard Williamson of Silicon Valley Data Science for allowing us to republish the following post about his sample application based on Apache Spark, Apache Kudu (incubating), and Apache Impala (incubating).
Why should your infrastructure maintain a linear growth pattern when your business scales up and down during the day based on natural human cycles? There is an obvious need to maintain a steady baseline infrastructure to keep the lights on for your business,
Impala 2.5, now shipping in CDH 5.7, brings significant performance improvements and some highly requested features.
Impala has proven to be a high-performance analytics query engine since the beginning. Even as an initial production release in 2013, it demonstrated performance 2x faster than a traditional DBMS, and each subsequent release has continued to demonstrate the wide performance gap between Impala’s analytic-database architecture and SQL-on-Apache Hadoop alternatives.
Using Apache Impala (incubating) on top of Apache Kudu (incubating) has significant performance benefits
Apache Kudu (incubating) is the newest addition to the set of storage engines that integrate with the Apache Hadoop ecosystem. The promise of Kudu is to deliver high-scan performance, targeting analytical workloads, while allowing users to concurrently insert, update, and delete records. With these properties, Kudu becomes a viable alternative to existing combinations of HDFS and/or Apache HBase to achieve similar results with less complicated ETL pipelines,
Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.
Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage,