(guest blog post by Matei Zaharia)
As Hadoop clusters grow in size and data volume, it becomes more and more useful to share them between multiple users and to isolate these users. If User 1 is running a ten-hour machine learning job for example, this should not impair a User 2 from running a 2-minute Hive query. In November, I blogged about how Hadoop 0.19 supports pluggable job schedulers, and how we worked with Facebook to implement a Fair Scheduler for Hadoop using this new functionality. The Fair Scheduler gives each user a configurable share of the cluster when he/she has running jobs, but assigns these resources to other users when the user is inactive. Since last fall, the Fair Scheduler has been picked up by Hadoop users outside Facebook, including the Google/IBM academic Hadoop cluster. It’s also received extensive testing and patches from Yahoo!. Furthermore, we’ve included the Fair Scheduler in Cloudera’s Distribution for Hadoop, where it is integrated right into the JobTracker management UI. Through production experiences, testing, and feedback from users, we’ve made a lot of improvements to the Fair Scheduler, some of which are available now and others which will come out in the next major version, which I’m calling “Fair Scheduler 2.0”. Here is a summary of the upcoming functionality:
- Fair sharing has changed from giving equal shares to each job to giving equal shares to each user. This means that users that submitted many jobs don’t get an advantage over users running a few jobs. It’s also possible to give different weights to different users.
- The fair scheduler now supports killing tasks from other users’ jobs if they are not giving them up. For each pool (by default there is one pool per user, but one can also have specially named pools), there’s a configurable timeout after which it can kill other jobs’ tasks to start running. This means that it’s possible to provide “service guarantees” for production jobs that are sharing a cluster with experimental queries.
- The scheduler can now assign multiple tasks per heartbeat, which is important for maintaining high utilization in large clusters.
- A technique called delay scheduling increases data locality for small jobs, improving performance in a data warehouse workload with many small jobs such as Facebook’s.
- The internal logic has been simplified so that the scheduler can support different scheduling policies within each pool, and in particular we plan to support FIFO pools. Many users have requested FIFO pools because they want to be able to queue up batch workflows on the same cluster that’s running more interactive jobs.
- Many bug fixes and performance improvements were contributed or suggested by a team stress-testing the scheduler at Yahoo!.
- The same team has also contributed Forrest web-based documentation for the fair scheduler (to be available in Hadoop 0.20).
As a grad student and the original developer of the Fair Scheduler, I’ve had a great experience interacting with the Hadoop community to improve the scheduler. The fact that production experience at Facebook, large-scale testing at Yahoo!, and wishes from other users are being combined into this single piece of software is a testament to the strength of Hadoop’s open-source model. The next release of the Fair Scheduler (likely in Hadoop 0.21, although we will also release back-ports to older Hadoop versions) will make it easier to manage multi-user clusters, give FIFO scheduling to users who desire it, improve performance and reduce the need for manual intervention with misbehaving jobs. You can also be sure that we’ll continue supporting the scheduler in Cloudera’s Distribution for Hadoop.