Migrating from Hive CLI to Beeline: A Primer
- by Xuefu Zhang
- February 10, 2014
- 2 comments
Migrating from the Hive CLI to Beeline isn’t as simple as changing the executable name, but this post makes it easy nonetheless.
In its original form, Apache Hive was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce. Later, the tool split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).
Recently, the Hive community (with Cloudera engineers leading the charge) introduced HiveServer2, an enhanced Hive server designed for multi-client concurrency and improved authentication that also provides better support for clients connecting through JDBC and ODBC. Now HiveServer2, with Beeline as the command-line interface, is the recommended solution; HiveServer1 and Hive CLI are deprecated and the latter won’t even work with HiveServer2.
Beeline was developed specifically to interact with the new server. Unlike Hive CLI, which is an Apache Thrift-based client, Beeline is a JDBC client based on the SQLLine CLI — although the JDBC driver used communicates with HiveServer2 using HiveServer2’s Thrift APIs.
As Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), users and developers accordingly need to switch to the new client tool. However, there’s more to this process than simply switching the executable name from “hive” to “beeline”.
In this post, you’ll learn how to make this migration as smooth as possible, and learn the differences and similarities between the two clients. While Beeline offers some more non-essential options such as coloring, this post mainly focuses on how to achieve with Beeline what you used to do with Hive CLI.
Use Cases: Hive CLI versus Beeline
The following section focuses on the common uses of Hive CLI/HiveServer1 and how you can migrate to Beeline/HiveServer2 in each case.
Hive CLI connects to a remote HiveServer1 instance using the Thrift protocol. To connect to a server, you specify the host name and optionally the port number of the remote server:
> hive -h <hostname> -p <port>
In contrast, Beeline connects to a remote HiveServer2 instance using JDBC. Thus, the connection parameter is a JDBC URL that’s common in JDBC-based clients:
> beeline -u <url> -n <username> -p <password>
Here are a few URL examples:
jdbc:hive2://ubuntu:11000/db2?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID jdbc:hive2://?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID jdbc:hive2://ubuntu:11000/db2;user=foo;password=bar jdbc:hive2://server:10001/db;user=foo;password=bar?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2
Executing queries in Beeline is very similar to that in Hive CLI. In Hive CLI:
> hive -e <query in quotes> > hive -f <query file name>
> beeline -e <query in quotes> > beeline -f <query file name>
In either case, if no -e or -f options are given, both client tools go into an interactive mode in which you can give and execute queries or commands line by line.
Running Hive client tools with embedded servers is a convenient way to test a query or debug a problem. While both Hive CLI and Beeline can embed a Hive server instance, you would start them in embedded mode in slightly different ways.
To start Hive CLI in embedded mode, just launch the client without giving any connection parameters:
To start Beeline in embedded mode, a little more work is required. Basically, a connection URL of jdbc:hive2:// needs to be specified:
> beeline -u jdbc:hive2://
At this point, Beeline enters interactive mode, in which queries and commands against the embedded HiveServer2 instance can be executed.
Perhaps the most interesting difference between the clients concerns the use of Hive variables. There are four namespaces for variables:
hiveconffor Hive configuration variables
systemfor system variables
envfor environment variables
hivevarfor Hive variables (HIVE-1096)
A variable is expressed as
&namespace>:<variable_name>. For Hive configuration variables, the name space
hiveconf can be skipped. The value of the variable can be referenced using dollar notation, such as
There are two ways to define a variable: as a command-line argument or using the
set command in interactive mode.
Defining Hive variables in command line in Hive CLI:
> hive -d key=value > hive --define key=value > hive --hivevar key=value
Defining Hive variables in command line in Beeline:
> beeline --hivevar key=value
Defining Hive configuration variables in command line in Hive CLI:
> hive --hiveconf key=value
At the time of this writing, in Beeline it’s not possible to define Hive configuration variables in command line (HIVE-6173).
In either Hive CLI and Beeline, you would set variables in interactive mode the same way using the
hive> set system:os.name=OS2; 0: jdbc:hive2://> set system:os.name=OS2;
Show the value of a variable:
hive> set env:TERM; env:TERM=xterm 0: jdbc:hive2://> set env:TERM; (Currently display nothing. HIVE-6174)
Note that environment variables cannot be set:
hive> set env:TERM=xterm; env:* variables cannot be set. 0: jdbc:hive2://> set env:TERM=xterm; env:* variables can not be set.
set command without any arguments lists all variables with their values:
hive> set; datanucleus.autoCreateSchema=true ... 0: jdbc:hive2://> set; +----------------------------------------------------------------+ | | +----------------------------------------------------------------+ | datanucleus.autoCreateSchema=true
Of course, you can always find help on the command-line arguments:
> hive -H > beeline -h > beeline --help
In Hive CLI interactive mode, you can execute any SQL query that is supported by HiveServer. For example:
hive> show databases; OK default
Furthermore, you can execute shell command without leaving Hive CLI:
hive> !cat myfile.txt; This is my file! hive>
In Beeline, you can execute any SQL query as you would in Hive CLI. For example:
0: jdbc:hive2://> show databases; 14/01/31 16:50:47 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver> ... +----------------+ | database_name | +----------------+ | default | +----------------+ 1 row selected (0.026 seconds) ...
The above command is equivalent to:
0: jdbc:hive2://> !sql show databases;
As you can see, you use “!” to execute Beeline commands instead of shell commands. Among Beeline commands,
!connect is among the most important; it allows you to connect to a database:
beeline> !connect jdbc:hive2:// scan complete in 2ms Connecting to jdbc:hive2:// Enter username for jdbc:hive2://: Enter password for jdbc:hive2://: ... Connected to: Apache Hive (version 0.13.0-SNAPSHOT) Driver: Hive JDBC (version 0.13.0-SNAPSHOT) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://>
Another important command is
!q), which allows you to exit interactive mode:
0: jdbc:hive2://> !quit Closing: org.apache.hive.jdbc.HiveConnection
For a list of all Beeline commands, please refer to the SQLLine document here.
As you can see, the Hive community is working hard to make Beeline as similar to Hive CLI as possible in terms of functionality as well as syntax. The comparisons above should help make your transition relatively painless.
Xuefu Zhang is a Software Engineer at Cloudera and a Hive Committer.