Tag Archives: hadoop user group

Apache Hive on Apache Spark: The First Demo

Categories: Community Hive MapReduce Spark

The community effort to make Apache Spark an execution engine for Apache Hive is making solid progress.

Apache Spark is quickly becoming the programmatic successor to MapReduce for data processing on Apache Hadoop. Over the course of its short history, it has become one of the most popular projects in the Hadoop ecosystem, and is now supported by multiple industry vendors—ensuring its status as an emerging standard.

Two months ago Cloudera,

Read more

Community Meetups at Strata Conference + Hadoop World 2013

Categories: Community Events General

Strata Conference + Hadoop World 2013 (Oct. 28-30 in New York City) approaches (register here for an automatic 20% discount), and that means it’s time to get your meetup schedule sorted out!

There are a variety of them planned across the week (something for everyone!), onsite at the conference hotel as well as offsite. Use the links below to RSVP.

(YES, there will be food, adult refreshments,

Read more

Process a Million Songs with Apache Pig

Categories: CDH Community MapReduce Pig

The following is a guest post kindly offered by Adam Kawa, a 26-year old Hadoop developer from Warsaw, Poland. This post was originally published in a slightly different form at his blog, Hakuna MapData!

Recently I have found an interesting dataset, called Million Song Dataset (MSD), which contains detailed acoustic and contextual data about a million songs. For each song we can find information like title, hotness,

Read more

Apache HBase Do’s and Don’ts

Categories: CDH Community HBase

I recently gave a talk at the LA Hadoop User Group about Apache HBase Do’s and Don’ts. The audience was excellent and had very informed and well articulated questions. Jody from Shopzilla was an excellent host and I owe him a big thanks for giving the opportunity to speak with over 60 LA Hadoopers. Since not everyone lives in LA or could make it to the meetup, I’ve summarized some of the salient points here.

Read more

Introducing Sqoop

Categories: Data Ingestion General Hadoop Hive

In addition to providing you with a dependable release of Hadoop that is easy to configure, at Cloudera we also focus on developing tools to extend Hadoop’s usability, and make Hadoop a more central component of your data infrastructure. In this vein, we’re proud to announce the availability of Sqoop, a tool designed to easily import information from SQL databases into your Hadoop cluster.

Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:

  • Imports individual tables or entire databases to files in HDFS
  • Generates Java classes to allow you to interact with your imported data
  • Provides the ability to import from SQL databases straight into your Hive data warehouse

After setting up an import job in Sqoop,

Read more