Cloudera Engineering Blog · Hue Posts
Learn how to set up Hue, the open source GUI that makes Apache Hadoop easier to use, on your Mac.
You might have already all the prerequisites installed but we are going to show how to start from a fresh Yosemite (10.10) install and end up with running Hue on your Mac in almost no time!
Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to make web log analysis, powered in part by Kafka, one of your first steps in a pervasive analytics journey.
If you are not looking at your company’s operational logs, then you are at a competitive disadvantage in your industry. Web server logs, application logs, and system logs are all valuable sources of operational intelligence, uncovering potential revenue opportunities and helping drive down the bottom line. Whether your firm is an advertising agency that analyzes clickstream logs for customer insight, or you are responsible for protecting the firm’s information assets by preventing cyber-security threats, you should strive to get the most value from your data as soon as possible.
Thanks to new improvements in Hue, CDH 5.2 offers the best GUI yet for using Hadoop.
CDH 5.2 includes important new usability functionality via Hue, the open source GUI that makes Apache Hadoop easy to use. In addition to shipping a brand-new app for managing security permissions, this release is particularly feature-packed, and is becoming a great complement to BI tools from Cloudera partners like Tableau, MicroStrategy, and Zoomdata because a more usable Hadoop translates into better BI overall across your organization!
This new feature, jointly developed by Cloudera and Intel engineers, makes management of role-based security much easier in Apache Hive, Impala, and Hue.
Apache Sentry (incubating) provides centralized authorization for services and applications in the Apache Hadoop ecosystem, allowing administrators to set up granular, role-based protection on resources, and to review them in one place. Previously, Sentry only designated administrators to
REVOKE privileges on an authorizable object. In Apache Sentry 1.5.0 (shipping inside CDH 5.2), we have implemented a new feature (SENTRY-327) that allows admin users to delegate the
GRANT privilege to other users using
WITH GRANT OPTION. If a user has the
GRANT OPTION privilege on a specific resource, the user can now grant the
GRANT privilege to other users on the same resource. Apache Hive, Impala, and Hue have all been updated to take advantage of this new Sentry functionality.
An improved Search app in Hue 3.6 makes the Hadoop user experience even better.
Hue 3.6 (now packaged in CDH 5.1) has brought the second version of the Search App up to even higher standards. The user experience has been greatly improved, as the app now provides a very easy way to build custom dashboards and visualizations.
Get started with Apache Hadoop and use-case examples online in just seconds.
Today, we announced the Cloudera Live Read-Only Demo, a new online service for developers and analysts (currently in public beta) that makes it easy to learn, explore, and try out CDH, Cloudera’s open source software distribution containing Apache Hadoop and related projects. No downloads, no installations, no waiting — just point-and-play!
Our thanks to Amar Parkash, a Software Developer at Goibibo, a leading travel portal in India, for the enthusiastic support of Hue you’ll read below.
At Goibibo, we use Hue in our production environment. I came across Hue while looking for a near real-time log search tool and got to know about Cloudera Search and the interface provided by Hue. I tried it on my machine and was really impressed by the UI it provides for Apache Hive, Apache Pig, HDFS, job browser, and basically everything in the Big Data domain. We immediately deployed Hue in production, and that has been one of the best decisions we have ever made for our data platform at Goibibo.
Hue users can learn a lot about new features by following a steady stream of new demos.
Hue, the open source Web UI that makes Apache Hadoop easier to use, is now a standard across the ecosystem — shipping within multiple software distributions and sandboxes. One of the reasons for its success is an agile developer community behind it that is constantly rolling out new features to its users.
Integrating Hue with LDAP can help make your secure Hadoop apps as widely consumed as possible.
Hue, the open source Web UI that makes Apache Hadoop easier to use, easily integrates with your corporation’s existing identity management systems and provides authentication mechanisms for SSO providers. So, by changing a few configuration parameters, your employees can start analyzing Big Data in their own browsers under an existing security policy.
In this installment of “Meet the Engineer” we speak with Romain Rigaux, a Software Engineer on the Hue team.
What do you do at Cloudera, and in which project are you involved?
Currently I work on Hue, the open source Web interface that lets users do Big Data analysis directly from their browser. Its goal is to make that process easier, so that more users can get more insights, more quickly.
The team behind Hue, the open source Web UI that makes Apache Hadoop easier to use, strikes again with a new Spark app.
Editor’s note: This post was recently published on the Hue blog. We republish it here for your convenience.
You can use Hue and Cloudera Search to build your own integrated Big Data search app.
In a previous post, you learned how to analyze data using Apache Hive via Hue’s Beeswax and Catalog apps. This time, you’ll see how to make Yelp Dataset Challenge data searchable by indexing it and building a customizable UI with the Hue Search app.
Indexing Data in Cloudera Search
Hue, the open source Web UI that makes Apache Hadoop easier to use, has a brand-new application that enables transferring data between relational databases and Hadoop. This new application is driven by Apache Sqoop 2 and has several user experience improvements, to boot.
Sqoop is a batch data migration tool for transferring data between traditional databases and Hadoop. The first version of Sqoop is a heavy client that drives and oversees data transfer via MapReduce. In Sqoop 2, the majority of the work was moved to a server that a thin client communicates with. Also, any client can communicate with the Sqoop 2 server over its JSON-REST protocol. Sqoop 2 was chosen instead of its predecessors because of its client-server design.
Importing from MySQL to HDFS
There’s good news for users of Hue, the open source web UI that makes Apache Hadoop easier to use: A new SAML 2.0-compliant backend, which is scheduled to ship in the next release of the Cloudera platform, will provide a better authentication experience for users as well as IT.
With this new feature, single sign-on (SSO) authentication can be achieved instead of using Hue credentials – thus, user credentials can be managed centrally (a big benefit for IT), and users needn’t log in to Hue if they have already logged in to another Web application sharing the SSO (a big benefit for users).
The following post was originally published by the Hue Team at the Hue blog in a slightly different form.
Hue, the open source web GUI that makes Apache Hadoop easy to use, has supported Cloudera Impala since its inception to enable fast, interactive SQL queries from within your browser. In this post, you’ll see a demo of Hue’s Impala app in action and explore its impressive query speed for yourself.
Impala App Demo
The following post was originally published by the Hue Team at the Hue blog in a slightly different form.
In this post, we’ll take a look at the new Apache HBase Browser App added in Hue 2.5 and which has improved significantly since then. To get the Hue HBase browser, grab Hue via CDH 4.4 packages, via Cloudera Manager, or build it directly from GitHub.
Few projects within the Apache Hadoop umbrella have as much end-user visibility as Hue, the open source Web UI that makes Hadoop easier to use. Due to the great number of potential end users, it is useful to add a degree of fault tolerance to your deployment. This how-to describes how to achieve higher availability by placing several Hue instances behind a load balancer.
This tutorial demonstrates how to set up high availability by:
This installment of the Hue demo series is about accessing the Hive Metastore from Hue, as well as using HCatalog with Hue. (Hue, of course, is the open source Web UI that makes Apache Hadoop easier to use.)
What is HCatalog?
HCatalog is a module in Apache Hive that enables non-Hive scripts to access Hive tables. You can then directly load tables with Apache Pig or MapReduce without having to worry about re-defining the input schemas, or caring about or duplicating the data’s location.
For those who are unfamiliar with it, Hue is a very popular, end-user focused, fully open source Web UI designed for interaction with Apache Hadoop and its ecosystem components. Founded by Cloudera employees, Hue has been around for quite some time, but only in the last 12 months has it evolved into the great ramp-up and interaction tool it is today. It’s fair to say that Hue is the most popular open source GUI for the Hadoop ecosystem among beginners — as well as a valuable tool for seasoned Hadoop users (and users generally in an enterprise environment) – and it is the only end-user tool that ships with Hadoop distributions today. In fact, Hue is even redistributed and marketed as part of other user-experience and ramp-up-on-Hadoop VMs in the market.
In version 2.4 of Hue, the open source Web UI that makes Apache Hadoop easier to use, a new app was added in addition to more than 150 fixes: Search!
Using this app, which is based on Apache Solr, you can now search across Hadoop data just like you would do keyword searches with Google or Yahoo! In addition, a wizard lets you tweak the result snippets and tailors the search experience to your needs.
For years, Cloudera has provided virtual machines that give you a working Apache Hadoop environment out-of-the-box. It’s the quickest way to learn and experiment with Hadoop right from your desktop.
We’re constantly updating and improving the QuickStart VM, and in the latest release there are two of Cloudera’s new products that give you easier and faster access to your data: Cloudera Search and Cloudera Impala. We’ve also added corresponding applications to Hue – an open source web-based interface for Hadoop, and the easiest way to interact with your data.
In the previous installment of the demo series about Hue — the open source Web UI that makes Apache Hadoop easier to use — you learned how to analyze data with Hue using Apache Hive via Hue’s Beeswax and Catalog applications. In this installment, we’ll focus on using the new editor for Apache Pig in Hue 2.3.
Complementing the editors for Hive and Cloudera Impala, the Pig editor provides a great starting point for exploration and real-time interaction with Hadoop. This new application lets you edit and run Pig scripts interactively in an editor tailored for a great user experience. Features include:
We’re very happy to announce the 2.3 release of Hue, the open source Web UI that makes Apache Hadoop easier to use.
Hue 2.3 comes only two months after 2.2 but contains more than 100 improvements and fixes. In particular, two new apps were added (including an Apache Pig editor) and the query editors are now easier to use.
In the first installment of the demo series about Hue — the open source Web UI that makes Apache Hadoop easier to use — you learned how file operations are simplified via the File Browser application. In this installment, we’ll focus on analyzing data with Hue, using Apache Hive via Hue’s Beeswax and Catalog applications (based on Hue 2.3 and later).
The Yelp Dataset Challenge provides a good use case. This post explains, through a video and tutorial, how you can get started doing some analysis and exploration of Yelp data with Hue. The goal is to find the coolest restaurants in Phoenix!
Dataset Challenge with Hue
Managing and viewing data in HDFS is an important part of Big Data analytics. Hue, the open source web-based interface that makes Apache Hadoop easier to use, helps you do that through a GUI in your browser — instead of logging into a Hadoop gateway host with a terminal program and using the command line.
The first episode in a new series of Hue demos, the video below demonstrates how to get up and running quickly with HDFS file operations via Hue’s File Browser application.
Hue 2.2 , the open source web-based interface that makes Apache Hadoop easier to use, lets you interact with Hadoop services from within your browser without having to go to a command-line interface. It features different applications like an Apache Hive editor and Apache Oozie dashboard and workflow builder.
This post is based on our “Analyzing Twitter Data with Hadoop” sample app and details how the same results can be achieved through Hue in a simpler way. Moreover, all the code and examples of the previous series have been updated to the recent CDH4.2 release.
Hue is an open-source web interface for Apache Hadoop packaged with CDH that focuses on improving the overall experience for the average user. The Apache Oozie application in Hue provides an easy-to-use interface to build workflows and coordinators. Basic management of workflows and coordinators is available through the dashboards with operations such as killing, suspending, or resuming a job.
Prior to Hue 2.2 (included in CDH 4.2), there was no way to manage workflows within Hue that were created outside of Hue. As of Hue 2.2, importing a pre-existing Oozie workflow by its XML definition is now possible.
How to import a workflow
Hue lets you interact with Hadoop services from within your browser without having to go to a command-line interface. It features a file browser for HDFS, an Apache Oozie Application for creating workflows of data processing jobs, a job designer/browser for MapReduce, Apache Hive and Cloudera Impala query editors, a Shell, and a collection of Hadoop APIs.
For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.
In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala, a real-time query engine for analytics across HDFS and Apache HBase data.
Hue is a web interface for Apache Hadoop that makes common Hadoop tasks such as running MapReduce jobs, browsing HDFS, and creating Apache Oozie workflows, easier. In this post, we’re going to focus on the dynamic workflow builder that Hue provides for Oozie that will be released in Hue 2.2.0 (For a high-level description of Oozie integration in Hue, see this blog post).
Basic Operations on Actions
Hue is a web interface for Apache Hadoop that makes common Hadoop tasks such as running MapReduce jobs, browsing HDFS, and creating Apache Oozie workflows, easier. (To learn more about the integration of Oozie and Hue, see this blog post.) In this post, we’re going to focus on how one of the fundamental components in Hue, Useradmin, has matured.
New User and Permission Features
User and permission management in Hue has changed drastically over the past year. Oozie workflows, Apache Hive queries, and MapReduce jobs can be shared with other users or kept private. Permissions exist at the app level. Access to particular apps can be restricted, as well as certain sections of the apps. For instance, access to the shell app can be restricted, as well as access to the Apache HBase, Apache Pig, and Apache Flume shells themselves. Access privileges are defined for groups and users can be members of one or more groups.
Changes to Users, Groups, and Permissions
Hue is a Web-based interface that makes it easier to use Apache Hadoop. Hue 2.1 (included in CDH4.1) provides a new application on top of Apache Oozie (a workflow scheduler system for Apache Hadoop) for creating workflows and scheduling them repetitively. For example, Hue makes it easy to group a set of MapReduce jobs and Hive scripts and run them every day of the week.
In this post, we’re going to focus on the Workflow component of the new application.
Yesterday’s post gave an overview of the HUE (aka. Hadoop User Experience) project which was released in CDH3b2 and available on github. HUE is a graphical “desktop” style web application that runs in modern browsers (Firefox, Chrome, Safari, and IE8+) that allows users to interact with a Hadoop installation as if it were just another computer. They browse the file system, create and manage user accounts, view and edit files, upload files, and then use some Hadoop-specific applications like the Job Browser and Beeswax (our Hive app). Here’s a quick demo from yesterday’s post running through Beeswax. It’s about 10 minutes long, but even if you only watch the first 2 or 3 you’ll get an idea of what HUE is and what it can do.
[vimeo clip_id="13463965" width="400" height="300"]
The HUE (aka. Hadoop User Experience) project [download|installation|manual] started as Cloudera Desktop about a year ago. The old name “Desktop” really refers to a desktop look-and-feel, since HUE is a web UI for Hadoop. Beyond delivering a suite of web applications, it is also a platform for building custom applications with a nice UI library. Gradually, we realized how much value such a UI platform would bring to the community, and I am very excited that Cloudera contributed HUE as an open source project.