Migrating from Hive CLI to Beeline: A Primer

Migrating from the Hive CLI to Beeline isn’t as simple as changing the executable name, but this post makes it easy nonetheless.

In its original form, Apache Hive was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce. Later, the tool split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).

Recently, the Hive community (with Cloudera engineers leading the charge) introduced HiveServer2, an enhanced Hive server designed for multi-client concurrency and improved authentication that also provides better support for clients connecting through JDBC and ODBC. Now HiveServer2, with Beeline as the command-line interface, is the recommended solution; HiveServer1 and Hive CLI are deprecated and the latter won’t even work with HiveServer2.

Beeline was developed specifically to interact with the new server. Unlike Hive CLI, which is an Apache Thrift-based client, Beeline is a JDBC client based on the SQLLine CLI — although the JDBC driver used communicates with HiveServer2 using HiveServer2’s Thrift APIs.

As Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), users and developers accordingly need to switch to the new client tool. However, there’s more to this process than simply switching the executable name from “hive” to “beeline”.

In this post, you’ll learn how to make this migration as smooth as possible, and learn the differences and similarities between the two clients. While Beeline offers some more non-essential options such as coloring, this post mainly focuses on how to achieve with Beeline what you used to do with Hive CLI.

Use Cases: Hive CLI versus Beeline

The following section focuses on the common uses of Hive CLI/HiveServer1 and how you can migrate to Beeline/HiveServer2 in each case.

Server Connection

Hive CLI connects to a remote HiveServer1 instance using the Thrift protocol. To connect to a server, you specify the host name and optionally the port number of the remote server:

> hive -h <hostname> -p <port>

 

In contrast, Beeline connects to a remote HiveServer2 instance using JDBC. Thus, the connection parameter is a JDBC URL that’s common in JDBC-based clients:

> beeline -u  <url> -n <username> -p <password>

 

Here are a few URL examples:

jdbc:hive2://ubuntu:11000/db2?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID
jdbc:hive2://?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID
jdbc:hive2://ubuntu:11000/db2;user=foo;password=bar
jdbc:hive2://server:10001/db;user=foo;password=bar?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2

 

Query Execution

Executing queries in Beeline is very similar to that in Hive CLI. In Hive CLI:

> hive -e <query in quotes>
> hive -f <query file name>

 

In Beeline:

> beeline -e <query in quotes>
> beeline -f <query file name>

 

In either case, if no -e or -f options are given, both client tools go into an interactive mode in which you can give and execute queries or commands line by line.

Embedded Mode

Running Hive client tools with embedded servers is a convenient way to test a query or debug a problem. While both Hive CLI and Beeline can embed a Hive server instance, you would start them in embedded mode in slightly different ways.

To start Hive CLI in embedded mode, just launch the client without giving any connection parameters:

> hive

 

To start Beeline in embedded mode, a little more work is required. Basically, a connection URL of jdbc:hive2:// needs to be specified:

> beeline -u jdbc:hive2://

 

At this point, Beeline enters interactive mode, in which queries and commands against the embedded HiveServer2 instance can be executed.

Variables

Perhaps the most interesting difference between the clients concerns the use of Hive variables. There are four namespaces for variables:

  • hiveconf for Hive configuration variables
  • system for system variables
  • env for environment variables
  • hivevar for Hive variables (HIVE-1096)

A variable is expressed as &namespace>:<variable_name>. For Hive configuration variables, the name space hiveconf can be skipped. The value of the variable can be referenced using dollar notation, such as ${hivevar:var}.

There are two ways to define a variable: as a command-line argument or using the set command in interactive mode.

Defining Hive variables in command line in Hive CLI:

> hive -d key=value
> hive --define key=value
> hive --hivevar key=value

 

Defining Hive variables in command line in Beeline:

> beeline --hivevar key=value

 

Defining Hive configuration variables in command line in Hive CLI:

> hive --hiveconf key=value

 

At the time of this writing, in Beeline it’s not possible to define Hive configuration variables in command line (HIVE-6173).

In either Hive CLI and Beeline, you would set variables in interactive mode the same way using the set command:

hive> set system:os.name=OS2;
0: jdbc:hive2://> set system:os.name=OS2;

 

Show the value of a variable:

hive> set env:TERM;
env:TERM=xterm
0: jdbc:hive2://> set env:TERM;
(Currently display nothing. HIVE-6174)

 

Note that environment variables cannot be set:

hive> set env:TERM=xterm;
env:* variables cannot be set.
0: jdbc:hive2://> set env:TERM=xterm;
env:* variables can not be set.

 

The set command without any arguments lists all variables with their values:

hive> set;
datanucleus.autoCreateSchema=true
...
0: jdbc:hive2://> set;
+----------------------------------------------------------------+
|                                                                |
+----------------------------------------------------------------+
| datanucleus.autoCreateSchema=true

 

Command-Line Help

Of course, you can always find help on the command-line arguments:

> hive -H
> beeline -h
> beeline --help

 

Interactive Mode

In Hive CLI interactive mode, you can execute any SQL query that is supported by HiveServer. For example:

hive> show databases;
OK
default

 

Furthermore, you can execute shell command without leaving Hive CLI:

hive> !cat myfile.txt;
This is my file!
hive>

 

In Beeline, you can execute any SQL query as you would in Hive CLI. For example:

0: jdbc:hive2://> show databases;
14/01/31 16:50:47 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
...
+----------------+
| database_name  |
+----------------+
| default    	|
+----------------+

1 row selected (0.026 seconds)

...

 

The above command is equivalent to:

0: jdbc:hive2://> !sql show databases;

 

As you can see, you use “!” to execute Beeline commands instead of shell commands. Among Beeline commands, !connect is among the most important; it allows you to connect to a database:

beeline> !connect jdbc:hive2://
scan complete in 2ms
Connecting to jdbc:hive2://
Enter username for jdbc:hive2://:
Enter password for jdbc:hive2://:
...
Connected to: Apache Hive (version 0.13.0-SNAPSHOT)
Driver: Hive JDBC (version 0.13.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://>

 

Another important command is !quit (or !q), which allows you to exit interactive mode:

0: jdbc:hive2://> !quit
Closing: org.apache.hive.jdbc.HiveConnection

 

For a list of all Beeline commands, please refer to the SQLLine document here.

Conclusion

As you can see, the Hive community is working hard to make Beeline as similar to Hive CLI as possible in terms of functionality as well as syntax. The comparisons above should help make your transition relatively painless.

Xuefu Zhang is a Software Engineer at Cloudera and a Hive Committer.

Filed under:

2 Responses
  • Christian / February 12, 2014 / 3:55 AM

    How can I install beeline separately/stand-alone, i.e. on a gateway machine.

  • Prateek Rungta / March 12, 2014 / 1:25 PM

    Worth pointing out: `–hivevar` is supported in CDH5 and up, it’s not in CDH4.6 yet.

Leave a comment


5 × = five