New in CDH 5.2: Improvements for Running Multiple Workloads on a Single HBase Cluster

Categories: CDH HBase

These new Apache HBase features in CDH 5.2 make multi-tenant environments easier to manage.

Historically, Apache HBase treats all tables, users, and workloads with equal weight. This approach is sufficient for a single workload, but when multiple users and multiple workloads were applied on the same cluster or table, conflicts can arise. Fortunately, starting with HBase in CDH 5.2 (HBase 0.98 + backports), workloads and users can now be prioritized.

One can categorize the approaches to this multi-tenancy problem in three ways:

  • Physical isolation or partitioning – each application or workload operates on its own table and each table can be assigned to a set of machines.
  • Scheduling – applications and workloads are scheduled based on access patterns or time and resources needed to complete.
  • Quotas – limited ad-hoc queries on a table can be shared with other applications.

In this post, I’ll explain three new HBase mechanisms (see umbrella JIRA HBASE-10994 – HBase Multitenancy) focused on enabling some of the approaches above.

Throttling

In a multi-tenant environment, it is useful to enforce manual limits that prevent users from abusing the system. (A simple example is: “Let MyApp run as fast as possible and limit all the other users to 100 request per second.”)

The new throttling feature in CDH 5.2 (HBASE-11598 – Add rpc throttling) allows an admin to enforce a limit on number of requests by time or data by time for a specified user, table, or namespace. Some examples are:

  • Throttle Table A to X req/min
  • Throttle Namespace B to Y req/hour
  • Throttle User K on Table Z to KZ MB/sec

An admin can also change the throttle at runtime. The change will propagate after the quota refresh period has expired, which at the moment has a default refresh period of 5 minutes. This value is configurable by modifying the hbase.quota.refresh.period property in hbase-site.xml. In future releases, a notification will be sent to apply the changes instantly.

In the chart below, you can see an example of the results of throttling. 

Initially, User 1 and User2 are unlimited and then the admin decides that the User 1 job is more important and throttles the User 2 job, reducing contention with the User 1 requests.

The shell allows you to specify the limit in a descriptive way (e.g. LIMIT => 10req/sec or LIMIT => 50M/sec). To remove the limit, use LIMIT => NONE.

Examples:

You can also place a global limit and exclude a user or a table from the limit by applying the GLOBAL_BYPASS property. Consider a situation with a production workload and many ad-hoc workloads. You can choose to set a limit for all the workloads except the production one, reducing the impact of the ad-hoc queries on the production workload.

Note that the throttle is always enforced; even when the production workload is currently in-active, the ad-hoc requests are all throttled.

Request Queues

Assuming no throttling policy is in place, when the RegionServer receives multiple requests they are now placed into a queue waiting for a free execution slot (HBASE-6721 – RegionServer Group based Assignment).

The simplest queue is a FIFO queue, which means that each request has to wait for the completion of all the requests in the queue before it. And, as you can see from the picture below, fast/interactive queries can get stuck behind large requests. (To keep the example simple, let’s assume that there is a single executor.)

One solution would be to divide the large requests into small requests and interleave each chunk with other requests, allowing multiple requests to make progress. The current infrastructure doesn’t allow that; however, if you are able to guess how long a request will take to be served, you can reorder requests—pushing the long requests to the end of the queue and allowing short requests to jump in front of longer ones. At some point you have to execute the large requests and prioritize the new requests behind large requests. However, the short requests will be newer, so the result is not as bad as the FIFO case but still suboptimal compared to the solution described above where large requests are split into multiple smaller requests.

Deprioritizing Long-running Scanners

Along the line of what we described above, CDH 5.2 has a “fifo” queue and a new queue type called “deadline” configurable by setting the hbase.ipc.server.callqueue.type property (HBASE-10993 – Deprioritize long-running scanners). Currently there is no way to estimate how long each request may take, so de-prioritization only affects scans and is based on the number of “next” calls a scan request did. This assumes that when you are doing a full table scan, your job is probably not that interactive, so if there are concurrent requests you can delay long-running scans up to a limit tunable by setting the hbase.ipc.server.queue.max.call.delay property. The slope of the delay is calculated by a simple square root of (numNextCall * weight) where the weight is configurable by setting the hbase.ipc.server.scan.vtime.weight property.

Multiple-Typed Queues

Another way you can prioritize/deprioritize different kinds of requests is by having a specified number of dedicated handlers and queues. That way you can segregate the scan requests in a single queue with a single handler, and all the other available queues can service short Get requests.

Currently, some static tuning options are available to adjust the ipc queues/handlers based on the type of workload. This approach is an interim first step that will eventually allow you to change the settings at runtime as you do for throttling, and to enable dynamically adjusting values based on the load.

Multiple Queues

To avoid contention and separate different kinds of requests, a new property, hbase.ipc.server.callqueue.handler.factor, allows admins to increase the number of queues and decide how many handlers share the same queue (HBASE-11355 – Multiple Queues / Read-Write Queues).

Having more queues, such as one queue per handler, reduces contention when adding a task to a queue or selecting it from a queue. The trade-off is that if you have some queues with long-running tasks, a handler may end up waiting to execute from that queue rather than stealing from another queue which has waiting tasks.

Read and Write

With multiple queues, you can now divide read and write requests, giving more priority (queues) to one or the other type. Use the hbase.ipc.server.callqueue.read.ratio property to choose to serve more reads or writes (HBASE-11724 Short-Reads/Long-Reads Queues).

Similar to the read/write split, you can split gets and scans by tuning the hbase.ipc.server.callqueue.scan.ratio to give more priority to gets or to scans. The chart below shows the effect of the settings.

A scan ratio 0.1 will give more queue/handlers to the incoming gets, which means that more of them can be processed at the same time and that fewer scans can be executed at the same time. A value of 0.9 will give more queue/handlers to scans so the number of scan request served will increase and the number of gets will decrease.

Future Work

Aside from addressing the current limitations mentioned above (static conf, unsplittable large requests, and so on) and doing things like limiting the number of tables that a user can create or using the namespaces more, a couple of major new features on the roadmap will further improve interaction between multiple workloads:

  • Per-user queues: Instead of a global setting for the system, a more advanced way to schedule requests is to allow each user to have its own “scheduling policy” allowing each user to define priorities for each table, and allowing each table to define request-types priorities. This would be administered in a similar way to throttling.
  • Cost-based scheduling: Request execution can take advantage of the known system state to prioritize and optimize scheduling. For example, one could prioritize requests that are known to be served from cache, prefer concurrent execution of requests that are hitting two different disks, prioritize requests that are known to be short, and so on.
  • Isolation/partitioning: Separating workload onto different machines is useful in situations where the admin understand the workload of each table and how to manually separate them. The basic idea is to reserve enough resources to run everything smoothly. (The only way to achieve that today is to set up one cluster per use case.)

Conclusion

Based on the above, you should now understand how to improve the interaction between different workloads using this new functionality. Note however that these features are only down payments on what will become more robust functionality in future releases.

Matteo Bertozzi is a Software Engineer at Cloudera and an HBase committer/PMC member.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail