Migrating from Hive CLI to Beeline: A Primer

Categories: Hive

Migrating from the Hive CLI to Beeline isn’t as simple as changing the executable name, but this post makes it easy nonetheless.

In its original form, Apache Hive was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce. Later, the tool split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).

Recently, the Hive community (with Cloudera engineers leading the charge) introduced HiveServer2, an enhanced Hive server designed for multi-client concurrency and improved authentication that also provides better support for clients connecting through JDBC and ODBC. Now HiveServer2, with Beeline as the command-line interface, is the recommended solution; HiveServer1 and Hive CLI are deprecated and the latter won’t even work with HiveServer2.

Beeline was developed specifically to interact with the new server. Unlike Hive CLI, which is an Apache Thrift-based client, Beeline is a JDBC client based on the SQLLine CLI — although the JDBC driver used communicates with HiveServer2 using HiveServer2’s Thrift APIs.

As Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), users and developers accordingly need to switch to the new client tool. However, there’s more to this process than simply switching the executable name from “hive” to “beeline”.

In this post, you’ll learn how to make this migration as smooth as possible, and learn the differences and similarities between the two clients. While Beeline offers some more non-essential options such as coloring, this post mainly focuses on how to achieve with Beeline what you used to do with Hive CLI.

Use Cases: Hive CLI versus Beeline

The following section focuses on the common uses of Hive CLI/HiveServer1 and how you can migrate to Beeline/HiveServer2 in each case.

Server Connection

Hive CLI connects to a remote HiveServer1 instance using the Thrift protocol. To connect to a server, you specify the host name and optionally the port number of the remote server:

In contrast, Beeline connects to a remote HiveServer2 instance using JDBC. Thus, the connection parameter is a JDBC URL that’s common in JDBC-based clients:

Here are a few URL examples:

Query Execution

Executing queries in Beeline is very similar to that in Hive CLI. In Hive CLI:

In Beeline:

In either case, if no -e or -f options are given, both client tools go into an interactive mode in which you can give and execute queries or commands line by line.

Embedded Mode

Running Hive client tools with embedded servers is a convenient way to test a query or debug a problem. While both Hive CLI and Beeline can embed a Hive server instance, you would start them in embedded mode in slightly different ways.

To start Hive CLI in embedded mode, just launch the client without giving any connection parameters:

To start Beeline in embedded mode, a little more work is required. Basically, a connection URL of jdbc:hive2:// needs to be specified:

At this point, Beeline enters interactive mode, in which queries and commands against the embedded HiveServer2 instance can be executed.

Variables

Perhaps the most interesting difference between the clients concerns the use of Hive variables. There are four namespaces for variables:

  • hiveconf for Hive configuration variables
  • system for system variables
  • env for environment variables
  • hivevar for Hive variables (HIVE-1096)

A variable is expressed as &namespace>:. For Hive configuration variables, the name space hiveconf can be skipped. The value of the variable can be referenced using dollar notation, such as ${hivevar:var}.

There are two ways to define a variable: as a command-line argument or using the set command in interactive mode.

Defining Hive variables in command line in Hive CLI:

Defining Hive variables in command line in Beeline (works only for CDH 5 as HIVE-4568 is not backported to CDH 4):

Defining Hive configuration variables in command line in Hive CLI:

At the time of this writing, in Beeline it’s not possible to define Hive configuration variables in command line (HIVE-6173).

In either Hive CLI and Beeline, you would set variables in interactive mode the same way using the set command:

Show the value of a variable:

Note that environment variables cannot be set:

The set command without any arguments lists all variables with their values:

Command-Line Help

Of course, you can always find help on the command-line arguments:

Interactive Mode

In Hive CLI interactive mode, you can execute any SQL query that is supported by HiveServer. For example:

Furthermore, you can execute shell command without leaving Hive CLI:

In Beeline, you can execute any SQL query as you would in Hive CLI. For example:

The above command is equivalent to:

As you can see, you use “!” to execute Beeline commands instead of shell commands. Among Beeline commands, !connect is among the most important; it allows you to connect to a database:

Another important command is !quit (or !q), which allows you to exit interactive mode:

For a list of all Beeline commands, please refer to the SQLLine document here.

Conclusion

As you can see, the Hive community is working hard to make Beeline as similar to Hive CLI as possible in terms of functionality as well as syntax. The comparisons above should help make your transition relatively painless.

Xuefu Zhang is a Software Engineer at Cloudera and a Hive Committer.

facebooktwittergoogle_pluslinkedinmailfacebooktwittergoogle_pluslinkedinmail

19 responses on “Migrating from Hive CLI to Beeline: A Primer

  1. Prateek Rungta

    Worth pointing out: --hivevar is supported in CDH5 and up, it’s not in CDH4.6 yet.

  2. Stephen Boesch

    How does one execute shell command from beeline? Why was “!” chosen to run “normal” beeline commands – when doing so is the natural way to run shell commands?

  3. Andrey

    Some Beeline/HS2 issues that may effect people:
    – Relative paths for ‘ADD JAR/FILE’ are not supported
    – Resource files for ‘ADD JAR/FILE’ must be present on the HS2 machine (HIVE-9302)
    – Inline comments in SQL statements are broken (HIVE-7340)
    – No hiverc support until 0.14 (HIVE-5160)

  4. msciwoj

    Once --hivevar KEY1=VALUE1 is specified, how to refer to the KEY1 values in the scripts?

  5. Bruce

    Are there any plans to support additional hadoop commands (such as hdfs) within beeline? Hive supports this today and it’s often necessary (or at least very inconvenient without it) with external tables.

  6. Eric

    I used to be able to use below hive command to run query file in batch mode and output the result to a CSV file. I used -S option to suppress the query log so only the query output is exported to the CSV file. I also used nohup to let the big job run even I logged out the system.

    nohup hive -S -f /home/wj19670/pad.sql >pad.csv &

    However under beeline, I can use the query in batch mode using below. But seems I can not use the -S option and nohup anymore. beeline is new to me. Does that beeline does not support -S and nohup?

    beeline -u ‘jdbc:hive2://server.domain..com:10000/default;principal=hive/server.domain.com@SERVER.DOMAIN.COM’ -f filename.hql

  7. Xuefu Zhang

    -S issue has been fixed in latest Hive. There seems to be an issue with nohup beeline when running in the background. This will be addressed.

  8. kishore

    Hi,

    I am using hive 1.1.0 and was able to update and delete the records after modifying the hive-site.xml. The same is not working in beeline and with jav-jdbc connectivity to hive.

    Do we need to configure any property to make beeline work.

    Please help me on this.

    Thanks
    Kishore

    1. Justin Kestelyn (@kestelyn) Post author

      For faster response, I recommend that you post this question in the “Hive” area at community.cloudera.com.

  9. Sanjeev

    I am trying tto use jason serde jar. But How to add it in beeline. I am facing issue while adding it. Any knowledge document to add serde in beeline.

  10. Rajesh J

    Hi

    How to pass parameter values to a hq1 file using beeline.
    I have a Hql file with Hive insert scripts.
    Another sh file calling the hql file with beeline as below
    beeline -u ‘jdbc:hive2://cm:10000/default;principal=hive/wuwcw0hd3mn01.wudip.com@HADOOP1.WUDIP.COM’ -f STAGE3.hql.
    Here the hql file runs based on date criteria.
    How to pass values to STAGE3.hql file from beeline.

    Kindly advice.

    1. Justin Kestelyn Post author

      Rahesh, best if you post this question to the “Hive” area at community.cloudera.com.

  11. Pasha

    The biggest issue for me right now is the use of the ‘source’ command, that executes another script inside the current one. It is very useful when there is a number of external functions and macros that can be reused in multiple files. In Hive, it was used as:
    source /path/to/udfs.hql
    In Beeline or HiveServer2, I’d have to paste the same code over and over to each script.

    It would be great if there was a workaround or solution to use ‘source’, just like in Hive CLI.

  12. Kenz

    I am using beeline version 1.1.0-cdh5.4.7 to run job from a file using “-f” option and when it executes, display the whole file code on the screen or logs. And if I put –silent=true then its displays empty screen of same length as code in file. Which is not a case with Hiveserver1 CLI. I would like to know if there is any way to avoid this displaying whole code on screen/logs.

    beeline -u “jdbc:hive2://” –silent=true -f filename.hql

    1. Justin Kestelyn Post author

      Can you please post this issue in the “Hive” area at community.cloudera.com?

      1. Kenz

        Thanks Justin for quick response. I have posted this question at community.cloudera.com and waiting for a response.

  13. kalyan

    Hi,
    I am trying to execute ADD JAR command from beeline and am ending up with error as below.
    Error: Insufficient privileges to execute ADD (state=42000,code=0)
    I am able to get it added with out issues using the hive terminal.
    Note 1. Kerberos is enabled. 2. have multiple roles like admin_role, hive_role etc…

    could you please give your thoughts to resolve.
    Thanks. :)