Author Archives: Brock Noland

Native Parquet Support Comes to Apache Hive

Categories: Hive Impala Parquet

Bringing Parquet support to Hive was a community effort that deserves congratulations!

Previously, this blog introduced Parquet, an efficient ecosystem-wide columnar storage format for Apache Hadoop. As discussed in that blog post, Parquet encodes data extremely efficiently and as described in Google’s original Dremel paper. (For more technical details on the Parquet format read Dremel made simple with Parquet, or go directly to the open and community-driven Parquet Format specification.)

Before discussing the Parquet Hive integration,

Read More

About Apache Flume FileChannel

Categories: Data Ingestion Flume General

The post below was originally published via blogs.apache.org and is republished below for your reading pleasure.

This blog post is about Apache Flume’s File Channel. Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms.

Read More

Apache MRUnit Is Now A Top Level Project

Categories: MapReduce

This posted was originally posted to the Apache Software Foundation MRUnit blog.

The Apache MRUnit team has graduated from the Apache Incubator to an Apache TLP (Top Level Project)! MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended.

Read More

Apache MRUnit 0.9.0-incubating has been released!

Categories: Community Testing

This post was originally posted on the Apache Software Foundation’s blog.

We (the Apache MRUnit team) have just released Apache MRUnit 0.9.0-incubating (tarball, nexus, javadoc). Apache MRUnit is an Apache Incubator project that is a Java library which helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended.

Read More

Apache MRUnit 0.8.1-incubating has been released!

Categories: General Hadoop

This blog was originally posted on the Apache Software Foundation MRUnit’s blog.

We (the Apache MRUnit team) have just released Apache MRUnit 0.8.1-incubating. Apache MRUnit is an Apache Incubator project. MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended.

Read More