Learn how to build an Impala table around data that comes from non-Impala, or even non-SQL, sources.
As data pipelines start to include more aspects such as NoSQL or loosely specified schemas, you might encounter situations where you have data files (particularly in Apache Parquet format) where you do not know the precise table definition. This tutorial shows how you can build an Impala table around data that comes from non-Impala or even non-SQL sources,
YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.
Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.
Learn about the architecture of Ibis, the roadmaps for Ibis and Impala, and how to get started and contribute.
We created Ibis, a new Python data analysis framework now incubating in Cloudera Labs, with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop,
This new Cloudera Labs project promises to deliver the great Python user experience and ecosystem at Hadoop scale.
Across the user community, you will find general agreement that the Apache Hadoop stack has progressed dramatically in just the past few years. For example, Search and Impala have moved Hadoop beyond batch processing, while developers are seeing significant productivity gains and additional use cases by transitioning from MapReduce to Apache Spark.
Thanks to such advances in the ecosystem,
This year will close out with new features for reliability, usability, and nested types, and in 2016, performance-related enhancements promise >20x gains.
It’s been roughly a year since we provided an update about the Impala roadmap. During that time, a number of milestones have been reached:
- Most Cloudera customers have deployed Impala to production across industries including financial services, retail, healthcare, gaming, government, advertising, and telecom.