Apache Avro was added the to Hadoop family last April and last year there were three Avro releases: 1.0.0 in July, 1.1.0 in September and 1.2.0 in October. After the 1.2.0 release, Doug Cutting introduced Avro: a New Format for Data Interchange on this blog and the Avro team went right to work building the next release of Avro.
It’s a new year and there’s a new Avro: 1.3.0.
This summer I sent the following tweet, “Had lunch today at Twitter HQ. Thanks for the invite, @! Great lunch conversation. Smart, friendly and fun team.” Kevin Weil leads the analytics team at Twitter and is an active member of the Hadoop community, and his colleague Eric Maland leads Operations. Needless to say, Twitter is doing amazing things with Hadoop. This guest blog from Kevin and Eric covers one of Twitter’s open-source projects which provides a solution for splittable LZO for Hadoop.
In March of this year, we released our distribution for Apache Hadoop. Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier. For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:
# yum install hadoop
# apt-get install hadoop
As proof of this,
We asked Brian Bockelman, a Post Doc Research Associate in the Computer Science & Engineering Department at the University of NebraskaLincoln, to tell us how Hadoop is being used to process the results from High-Energy Physics experiments. His response gives insights into the kind and volume of data that High-Energy Physics experiments generate and how Hadoop is being used at the University of Nebraska. -Matt
In the least technical language,