Cloudera Developer Blog
Big Data best practices, how-to's, and internals from Cloudera Engineering and the community
Migrating from the Hive CLI to Beeline isn’t as simple as changing the executable name, but this post makes it easy nonetheless.
In its original form, Apache Hive was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce. Later, the tool split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).
With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):
Create a test environment for writing and testing Giraph jobs, or just for playing around with Giraph and small sample datasets.
Apache Giraph is a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes. Giraph is in use at companies like Facebook and PayPal, for example, to help represent and analyze the billions (or even trillions) of connections across massive datasets. Giraph was inspired by Google’s Pregel framework and integrates well with Apache Accumulo, Apache HBase, Apache Hive, and Cloudera Impala.
Cloudera is announcing the general availability of support for Spark, bringing interactive machine learning and stream processing to enterprise data hubs.
Cloudera is pleased to announce the immediate availability of its first release of Apache Spark for Cloudera Enterprise (comprising CDH and Cloudera Manager).
Thanks to Xavier Clements of Wajam for allowing us to re-publish his blog post about Wajam’s Hadoop experiences below!
Wajam is a social search engine that gives you access to the knowledge of your friends. We gather your friends’ recommendations from Facebook, Twitter, and other social platforms and serve these back to you on supported sites like Google, eBay, TripAdvisor, and Wikipedia.
Set up a CDH-based Hadoop cluster in less than an hour using VirtualBox and Cloudera Manager.
Thanks to Christian Javet for his permission to republish his blog post below!
These suggestions from the Program Committee offer an inside track to getting your talk accepted!
With HBaseCon 2014 (in San Francisco on May 5) Call for Papers closing in just over three weeks (on Feb. 14 — sooner than you think), there’s no better time than “now” to start thinking about your proposal.
Cloudera provides docs and a sample build environment to help you get easily started writing your own Impala UDFs.
User-defined functions (UDFs) let you code your own application logic for processing column values during a Cloudera Impala query. For example, a UDF could perform calculations using an external math library, combine several column values into one, do geospatial calculations, or other kinds of tests and transformations that are outside the scope of the built-in SQL operators and functions.
In this installment of “Meet the Engineer” we speak with Romain Rigaux, a Software Engineer on the Hue team.
What do you do at Cloudera, and in which project are you involved?
Currently I work on Hue, the open source Web interface that lets users do Big Data analysis directly from their browser. Its goal is to make that process easier, so that more users can get more insights, more quickly.
The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.
In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.