Category Archives: Hadoop

Community Meetups during Strata + Hadoop World 2014

Categories: Community Events General Hadoop

The meetup opportunities during the conference week are more expansive than ever — spanning Impala, Spark, HBase, Kafka, and more.

Strata + Hadoop World 2014 is a kaleidoscope of experiences for attendees, and those experiences aren’t contained within the conference center’s walls. For example, the meetups that occur during the conf week (which is concurrent with NYC DataWeek) are a virtual track for developers — and with Strata + Hadoop World being bigger than ever,

Read More

Getting Started with Big Data Architecture

Categories: Hadoop

What does a “Big Data engineer” do, and what does “Big Data architecture” look like? In this post, you’ll get answers to both questions.

Apache Hadoop has come a long way in its relatively short lifespan. From its beginnings as a reliable storage pool with integrated batch processing using the scalable, parallelizable (though inherently sequential) MapReduce framework, we have witnessed the recent additions of real-time (interactive) components like Impala for interactive SQL queries and integration with Apache Solr as a search engine for free-form text exploration.

Read More

What’s Next for Impala: Focus on Advanced SQL Functionality

Categories: Hadoop Impala

Impala 2.0 will add much more complete SQL functionality to what is already the fastest SQL-on-Hadoop solution available.

In September 2013, we provided a roadmap for Impala — the open source MPP SQL query engine for Apache Hadoop, which was on release 1.1 at the time — that documented planned functionality through release 2.0 and beyond.

Impala is now on release 1.4, with many major features delivered since our previous roadmap update,

Read More

Big Data Benchmarks: Toward Real-Life Use Cases

Categories: Guest Hadoop Ops and DevOps Performance

The Transaction Processing Council (TPC), working with Cloudera, recently announced the new TPCx-HS benchmark, a good first step toward providing a Big Data benchmark.

In this interview by Roberto Zicari with Francois Raab, the original author of the TPC-C Benchmark, and Yanpei Chen, a Performance Engineer at Cloudera, the interviewees share their thoughts on the next step for benchmarks that reflect real-world use cases.

This interview was originally published at ODBMS.org;

Read More