Demo: Using Hue to Access Hive Data Through Pig

This installment of the Hue demo series is about accessing the Hive Metastore from Hue, as well as using HCatalog with Hue. (Hue, of course, is the open source Web UI that makes Apache Hadoop easier to use.) 

What is HCatalog?

HCatalog is a module in Apache Hive that enables non-Hive scripts to access Hive tables. You can then directly load tables with Apache Pig or MapReduce without having to worry about re-defining the input schemas, or caring about or duplicating the data’s location.

Hue contains a Web application for accessing the Hive metastore called Metastore Browser, which lets you explore, create, or delete databases and tables using wizards. (You can see a demo of these wizards in a previous tutorial about how to analyze Yelp data.) However, Hue uses HiveServer2 for accessing the metastore instead of HCatalog. This is because HiveServer2 is the new secure and concurrent server for Hive and it includes a fast Hive Metastore API.

HCatalog connectors are still useful for accessing Hive data through Pig, though. Here is a demo about accessing the Hive example tables from the Pig Editor:

Tutorial

To try this yourself, first, you need to install HCatalog via Cloudera Manager (or do it the manual way). If you are using a fully distributed cluster (e.g. not on a demo VM), make sure that the Hive Metastore is remote or you will see an error like the one below. Then, upload the three jars from /usr/lib/hcatalog/share/hcatalog/ and all the Hive ones from /usr/lib/hive/lib to the Oozie Pig sharelib in /user/oozie/share/lib/pig. This can be done in a few clicks while being logged in as ‘oozie’ or ‘hdfs’ in the File Browser.

Keep in mind that all the jars will be included in all the future Pig scripts, which might be unnecessary. An alternative would be to upload these jars in your HDFS home directory and then include the path to the directory with the Hadoop property ‘oozie.libpath’ in the Properties section of the Pig Editor.

Then, confirm the Beeswax examples are installed (Step #2 in the Hue Quick Start Wizard), open the Pig Editor, and compute the average salary in the table — equivalent to this Hive query:

 

As HCatalog needs to access the metastore, you need to specify the hive-site.xml. Go to Properties > Resources and add a ‘File’ pointing to the hive-site.xml uploaded on HDFS. Then, submit the script by pressing CTRL + ENTER. The result (47963.62637362637) will appear at the end of the log output. (Notice that you don’t need to redefine the schema as it is automatically picked up by the loader.) If you use the Oozie App, you can now freely use HCatalog in your Pig actions.

Warning! If you get the error below, it means that your metastore is owned by the Hive user and is not remote.

 

A workaround is to make sure that Beeswax is shut down and then change the permissions of the SQLite database:

 

Similar to HCatLoader, use HCatStorer to update the table, e.g.:

 

Conclusion

Here you have seen how Hue makes it easy to access Hive’s metastore and how it supports the HCatalog connectors for Pig. Hue 3.0 will simplify things even more by automatically copying the required jar files and making the table names auto-complete.

As usual, we welcome any feedback – via our new Hue Community Forum or the user group!

Filed under:

1 Response
  • Slavo Nagy / May 20, 2014 / 4:50 AM

    Dear Hue Team,
    we are trying to follow your recipe, but can not find the JARs in /usr/lib directories. Where can these be found, when our cluster is installed using Cloudera Manager?
    And how can we log on tu Hue as hdfs if our cluster is using Kerberos and the Hue user is set to OS user in AD?
    Thank you in advance.
    Best regards,
    Slavo

Leave a comment


8 − = one