The super-active Apache Spark community is exerting a strong gravitational pull within the Apache Hadoop ecosystem. I recently had that opportunity to ask Cloudera’s Apache Spark committers (Sean Owen, Imran Rashid [PMC], Sandy Ryza, and Marcelo Vanzin) for their perspectives about how the Spark community has worked and is working together, and the work to be done via the One Platform initiative to make the Spark stack enterprise-ready.
Recent Impala testing demonstrates its scalability to a large number of concurrent users.
Impala, the open source MPP query engine designed for high-concurrency SQL over Apache Hadoop, has seen tremendous adoption across enterprises in industries such as financial services, telecom, healthcare, retail, gaming, government, and advertising. Impala has unlocked the ability to use business intelligence (BI) applications on Hadoop; these applications support critical business needs such as data discovery,
Live updates about your query’s progress in the Impala Shell? That’s a win!
The Impala Shell is a great tool for quickly running exploratory queries, or testing new features in Impala. While Impala is pretty fast, some queries can still take several seconds or longer to complete. It’s therefore useful to be able to see how much progress the query has made and to get an idea of how long the query will take.
YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.
Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.
As is their custom, Cloudera Engineering’s interns made innovation, especially for Apache Spark, the theme of the Summer season.
Cloudera has a long-time tradition of searching far and wide for the smartest summer engineering interns that it can find. Alumni of the program have become start-up co-founders, faculty at top-tier CS departments, employees at other prominent technology companies (including Google, Databricks, Uber, LinkedIn), as well as many current employees at Cloudera.