Category Archives: Community

Apache Hadoop 3.0.0-alpha2 Released

Categories: Community Hadoop

The Apache Hadoop project announced the release of 3.0.0-alpha2 on January 25th, 2017. This is the second alpha release in the 3.0.0 release series leading up to 3.0.0 GA, and incorporates 857 new fixes, improvements, and features since 3.0.0-alpha1 last September. It’s worth reading our previous blog post about 3.0.0-alpha1; in this post, we’ll discuss the new improvements that landed in alpha2.

Classpath Isolation for Hadoop Client Jars

The pain of classpath isolation has been experienced by many Java developers.

Read more

Apache HBase is Everywhere

Categories: Community Events HBase

For Cloudera, Apache HBase has grown into a stable, scalable, mature, and critical component of the Apache Hadoop stack.  

HBase adds the ability to do low-latency random read/write across your big data. While it is a key piece of the Apache Hadoop ecosystem, HBase itself has an ecosystem of projects and products that use it as a storage engine for systems such as time series database (OpenTSDB), or SQL-style databases (Apache Phoenix,

Read more

Meet Cloudera’s Apache Spark Committers

Categories: Community General Meet the Engineer Spark

The super-active Apache Spark community is exerting a strong gravitational pull within the Apache Hadoop ecosystem. I recently had that opportunity to ask Cloudera’s Apache Spark committers (Sean Owen, Imran Rashid [PMC], Sandy Ryza, and Marcelo Vanzin) for their perspectives about how the Spark community has worked and is working together, and the work to be done via the One Platform initiative to make the Spark stack enterprise-ready.

Recently, Apache Spark has become the most currently active project in the Apache Hadoop ecosystem (measured by number of contributors/commits over time),

Read more

Apache Spark Comes to Apache HBase with HBase-Spark Module

Categories: Cloudera Labs Community HBase Spark

The SparkOnHBase project in Cloudera Labs was recently merged into the Apache HBase trunk. In this post, learn the project’s history and what the future looks like for the new HBase-Spark module.

SparkOnHBase was first pushed to Github on July 2014, just six months after Spark Summit 2013 and five months after Apache Spark first shipped in CDH. That conference was a big turning point for me,

Read more

The New Wrangle Conference: Solving the Hardest Data Science Challenges from Startup to Enterprise

Categories: Community Data Science Events

Wrangle, a new conference dedicated to the practice of data science from startup to enterprise, debuts in San Francisco on Oct. 22, 2015.

Even as Cloudera introduce new tools for analytics and machine learning into its platform (like the recently announced Ibis project, for example), we are mindful of the fact that many of the hardest problems in data science cannot be solved by technology alone. From the smallest startups to the largest enterprises,

Read more