This Month in the Ecosystem
The ecosystem is evolving at a rapid pace – so rapidly, that important developments are often passing through the public attention zone too quickly. Thus, we think it might be helpful to bring you a digest (by no means complete!) of our favorite highlights on a regular basis. (This effort, by the way, has different goals than the fine Hadoop Weekly newsletter, which has a more expansive view – and which you should subscribe to immediately, as far as we’re concerned.)
Find the first installment below. Although the time period reflected here is obviously more than a month long, we have some catching up to do before we can move to a truly monthly cadence.
- The HBase Community Has Its Day: HBaseCon 2013
On June 13, more than 700 Apache HBase enthusiasts converged in San Francisco to get deep-dives about HBase operations, internals, case studies, and ecosystem – including sessions from users like Pinterest, Groupon, Yahoo!, Box, Salesforce.com, and Twitter. Unquestionably, the memes of the day were SQL-over-HBase (hello, Cloudera Impala), Search-over-HBase (hello, Cloudera Search), and enterprise readiness (metrics, scalability, reliability).
See session presentations and video
- Search Comes to Hadoop
Cloudera Search, introduced as a beta release in late June, marries Apache Solr with Hadoop to bring a familiar, Google-like search experience to analysts who prefer to avoid Java and SQL — and to bring freedom from dedicated search clusters to the IT department. Yet another example of bringing the app/workload to the data, not the other way around.
Explore Cloudera Search
- Hadoop Memes: Hadoop Summit San Jose
Similar to HBaseCon, SQL-over-Hadoop was the prevailing theme (catch this panel). The imminent Hadoop 2.0 beta, including YARN, was another popular topic. (BTW, did you know that Hadoop 2.0 including MR2/YARN has been shipping as a deployment option in CDH4 for over a year? Indeed, it has.) “Data is the New Bacon” T-shirts proved to be the fashion statement of the show.
See session presentations and videos
- Morphlines Graduates into the CDK
Cloudera Morphlines is a fascinating new open source framework for building and integrating ETL apps that move/transform data between Hadoop, Solr, data warehouses, and so on. As of July 10, the Morphlines libraries are part of the Cloudera Development Kit (CDK).
Learn more about Morphlines
- Concurrency and Authentication for Apache Hive: HiveServer2
For all its usefulness since its donation by Facebook to the ecosystem, Hive has proven to be lacking in several enterprise features — most notably, in support for concurrency and authentication. In response to those needs, Cloudera contributed HiveServer2 to Hive 0.11 (and ships it inside CDH), bringing concurrency and authentication to Hive.
Technical overview of HiveServer2
- UC Berkeley Creates “Masters of Big Data”
We think this is great: UC Berkeley’s School of Information announced that it will offer the country’s first fully online Master of Information and Data Science (MIDS) degree. The more people with the skills to consume Hadoop as end-users, the bigger the Hadoop ecosystem will grow.
Learn more about the MIDS
- It’s All About the Use Case: Data Impact Awards
Finally, users of CDH have their very own award. Any intriguing use case (running in production of course) is eligible for the awards, which will be announced at Strata + Hadoop World 2013.
Explore the Data Impact Awards
- Sentry Fills Hadoop’s Enterprise Security Gap
While Hadoop has strong security at the filesystem level, it lacks the granular support needed to adequately secure access to data by users and BI applications. Sentry, a new authorization module for Hadoop (now shipping with CDH and Impala and recently proposed for the Apache Incubator), aims to address that gap.
Technical overview of Sentry
- OSCON Attendees Reveal Big Interest in Big Data
Historically not focused on Big Data, OSCON 2013 proved itself to be a hidden reservoir of interest in that topic – if visitors to the Cloudera exhibit were any indication. The QuickStart VM and “Data is the New Bacon/Tofu” T-shirts were powerful attractive forces.
Read more about Cloudera’s presence at OSCON 2013
- It’s All About You: Community Forums for Cloudera Users
One of my favorite developments. As a complement to tried-and-true mailing lists, users of Cloudera Standard and customers of Cloudera Enterprise can now ask questions, get answers, and build their online reputations via a new community forums environment.
Join the conversation at community.cloudera.com
- Columnar Storage for Hadoop: Parquet 1.0
Parquet, the open source columnar storage library project co-founded by Cloudera and Twitter, hit the 1.0 milestone. And Impala users worldwide rejoiced, because Impala performance will only get even better now.
Learn more about Parquet 1.0
- Making Data Integration Delicious: Apache Sqoop Cookbook
Sqoop is an invaluable tool for integrating Hadoop clusters with traditional, relational-oriented infrastructure. This new book from Apache Sqoop Committers/PMC Members Kathleen Ting and Jarek Jarcec Cecho will help you use that tool with great effect.
Peek inside the Apache Sqoop Cookbook
The next installment of “This Month in the Ecosystem” will publish in early September.
Justin Kestelyn is Cloudera’s developer outreach director.