Cloudera Developer Blog

Big Data best practices, how-to's, and internals from Cloudera Engineering and the community


Migrating from Hive CLI to Beeline: A Primer

Migrating from the Hive CLI to Beeline isn’t as simple as changing the executable name, but this post makes it easy nonetheless.

In its original form, Apache Hive was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce. Later, the tool split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).

This Month in the Ecosystem (January 2014)

Welcome to our fifth edition of “This Month in the Ecosystem,” a digest of highlights from January 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

How-to: Write and Run Giraph Jobs on Hadoop

Create a test environment for writing and testing Giraph jobs, or just for playing around with Giraph and small sample datasets.

Apache Giraph is a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes. Giraph is in use at companies like Facebook and PayPal, for example, to help represent and analyze the billions (or even trillions) of connections across massive datasets. Giraph was inspired by Google’s Pregel framework and integrates well with Apache Accumulo, Apache HBase, Apache Hive, and Cloudera Impala.

Spark is Now Generally Available for Cloudera Enterprise

Cloudera is announcing the general availability of support for Spark, bringing interactive machine learning and stream processing to enterprise data hubs.

Cloudera is pleased to announce the immediate availability of its first release of Apache Spark for Cloudera Enterprise (comprising CDH and Cloudera Manager).

How Wajam Answers Business Questions Faster With Hadoop

Thanks to Xavier Clements of Wajam for allowing us to re-publish his blog post about Wajam’s Hadoop experiences below!

Wajam is a social search engine that gives you access to the knowledge of your friends. We gather your friends’ recommendations from Facebook, Twitter, and other social platforms and serve these back to you on supported sites like Google, eBay, TripAdvisor, and Wikipedia.

How-to: Create a Simple Hadoop Cluster with VirtualBox

Set up a CDH-based Hadoop cluster in less than an hour using VirtualBox and Cloudera Manager.

Thanks to Christian Javet for his permission to republish his blog post below!

Pro Tips for Pitching an HBaseCon Talk

These suggestions from the Program Committee offer an inside track to getting your talk accepted!

With HBaseCon 2014 (in San Francisco on May 5) Call for Papers closing in just over three weeks (on Feb. 14 — sooner than you think), there’s no better time than “now” to start thinking about your proposal.

How-to: Get Started Writing Impala UDFs

Cloudera provides docs and a sample build environment to help you get easily started writing your own Impala UDFs.

User-defined functions (UDFs) let you code your own application logic for processing column values during a Cloudera Impala query. For example, a UDF could perform calculations using an external math library, combine several column values into one, do geospatial calculations, or other kinds of tests and transformations that are outside the scope of the built-in SQL operators and functions.

Meet the Engineer: Romain Rigaux

In this installment of “Meet the Engineer” we speak with Romain Rigaux, a Software Engineer on the Hue team.

What do you do at Cloudera, and in which project are you involved?
Currently I work on Hue, the open source Web interface that lets users do Big Data analysis directly from their browser. Its goal is to make that process easier, so that more users can get more insights, more quickly.

NYU, Analytics, and Cloudera’s QuickStart VM

The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.

In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.

Newer Posts Older Posts