Testing Apache Kudu Applications on the JVM

Categories: Kudu Testing

Although the Kudu server is written in C++ for performance and efficiency, developers can write client applications in C++, Java, or Python. To make it easier for Java developers to create reliable client applications, we’ve added new utilities in Kudu 1.9.0 that allow you to write tests using a Kudu cluster without needing to build Kudu yourself, without any knowledge of C++, and without any complicated coordination around starting and stopping Kudu clusters for each test. This post describes how the new testing utilities work and how you can use them in your application tests.

User Guide

Note: It is possible this blog post could become outdated – for the latest documentation on using the JVM testing utilities see the Kudu documentation.

Requirements

In order to use the new testing utilities, the following requirements must be met:

  • OS
    • macOS El Capitan (10.11) or later
    • CentOS 6.6+, Ubuntu 14.04+, or another recent distribution of Linux supported by Kudu
  • JVM
    • Java 8+
    • Note: Java 7+ is deprecated, but still supported
  • Build Tool

Build Configuration

In order to use the Kudu testing utilities, add two dependencies to your classpath:

  • The kudu-test-utils dependency
  • The kudu-binary dependency

The kudu-test-utils dependency has useful utilities for testing applications that use Kudu. Primarily, it provides the KuduTestHarness class to manage the lifecycle of a Kudu cluster for each test. The KuduTestHarness is a JUnit TestRule that not only starts and stops a Kudu cluster for each test, but also has methods to manage the cluster and get pre-configured KuduClient instances for use while testing.

The kudu-binary dependency contains the native Kudu (server and command-line tool) binaries for the specified operating system. In order to download the right artifact for the running operating system it is easiest to use a plugin, such as the os-maven-plugin or osdetector-gradle-plugin, to detect the current runtime environment. The  KuduTestHarness will automatically find and use the kudu-binary jar on the classpath.

WARNING: The kudu-binary module should only be used to run Kudu for integration testing purposes. It should never be used to run an actual Kudu service, in production or development, because the kudu-binary module includes native security-related dependencies that have been copied from the build system and will not be patched when the operating system on the runtime host is patched.

Maven Configuration

If you are using Maven to build your project, add the following entries to your project’s pom.xml file:

Gradle Configuration

If you are using Gradle to build your project, add the following entries to your project’s build.gradle file:

Test Setup

Once your project is configured correctly, you can start writing tests using the kudu-test-utils and kudu-binary artifacts. One line of code will ensure that each test automatically starts and stops a real Kudu cluster and that cluster logging is output through slf4j:

The KuduTestHarness has methods to get pre-configured clients, start and stop servers, and more. Below is an example test to showcase some of the capabilities:

For a complete example of a project using the KuduTestHarness, see the java-example project in the Kudu source code repository. The Kudu project itself uses the KuduTestHarness for all of its own integration tests. For more complex examples, you can explore the various Kudu integration tests in the Kudu source code repository.

Feedback

Kudu 1.9.0 is the first release to have these testing utilities available. Although these utilities simplify testing of Kudu applications, there is always room for improvement. Please report any issues, ideas, or feedback to the Kudu user mailing list, Jira, or Slack channel and we will try to incorporate your feedback quickly. See the Kudu community page for details.

Thank you

We would like to give a special thank you to everyone who helped contribute to the kudu-test-utils and kudu-binary artifacts. We would especially like to thank Brian McDevitt at phData and Tim Robertson at GBIF who helped us tremendously.

 

Grant Henke is a Software Engineer at Cloudera
Mike Percy is a Software Engineer at Cloudera

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *