Thanks to former Cloudera intern Jose Cambronero for the post below about his summer project, which involved contributions to MLlib in Apache Spark.
Data can come in many shapes and forms, and can be described in many ways. Statistics like the mean and standard deviation of a sample provide descriptions of some of its important qualities. Less commonly used statistics such as skewness and kurtosis provide additional perspective into the data’s profile.
This new open source complement to HDFS and Apache HBase is designed to fill gaps in Hadoop’s storage layer that have given rise to stitched-together, hybrid architectures.
The set of data storage and processing technologies that define the Apache Hadoop ecosystem are expansive and ever-improving, covering a very diverse set of customer use cases used in mission-critical enterprise applications. At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster,
This new core security layer provides a unified data access path for all Hadoop ecosystem components, while improving performance.
We’re thrilled to announce the beta availability of RecordService, a distributed, scalable, data access service for unified access control and enforcement in Apache Hadoop. RecordService is Apache Licensed open source that we intend to transition to the Apache Software Foundation. In this post, we’ll explain the motivation, system architecture,
Learn about the architecture of Ibis, the roadmaps for Ibis and Impala, and how to get started and contribute.
We created Ibis, a new Python data analysis framework now incubating in Cloudera Labs, with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop,
This year will close out with new features for reliability, usability, and nested types, and in 2016, performance-related enhancements promise >20x gains.
It’s been roughly a year since we provided an update about the Impala roadmap. During that time, a number of milestones have been reached:
- Most Cloudera customers have deployed Impala to production across industries including financial services, retail, healthcare, gaming, government, advertising, and telecom.