Category Archives: Hive

Grouping Related Trends with Hadoop and Hive

Categories: Community General Hadoop Hive

(guest blog post by Pete Skomoroch)

In a previous post, I outlined how to build a basic trend tracking site called trendingtopics.org with Cloudera’s Distribution for Hadoop and Hive.  TrendingTopics uses Hadoop to identify the top articles trending on Wikipedia and displays related news stories and charts.  The data powering the site was pulled from an Amazon EBS Wikipedia Public Dataset containing 8 months of hourly pageview logfiles. 

Read more

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more

Introducing Sqoop

Categories: Data Ingestion General Hadoop Hive

In addition to providing you with a dependable release of Hadoop that is easy to configure, at Cloudera we also focus on developing tools to extend Hadoop’s usability, and make Hadoop a more central component of your data infrastructure. In this vein, we’re proud to announce the availability of Sqoop, a tool designed to easily import information from SQL databases into your Hadoop cluster.

Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:

  • Imports individual tables or entire databases to files in HDFS
  • Generates Java classes to allow you to interact with your imported data
  • Provides the ability to import from SQL databases straight into your Hive data warehouse

After setting up an import job in Sqoop,

Read more

Hive and JobTracker Needed Logos…

Categories: Hadoop Hive

In the process of working on a few things here I wanted to add some links to launch Apache Hive and the Hadoop Jobtracker. At first I considered just adding the links but I found myself wanting a button of some sort; an icon for them. I didn’t want to just use the (awesomely cute) Apache Hadoop logo elephant because these things are related to and part of Hadoop, but they aren’t Hadoop itself…

Read more