A month ago, we publicly announced Cloudera Altus, our new platform–as–a–service offering, and today, we are expanding the Altus data engineering service to support AWS EC2 Spot instances. Cloud infrastructure is the most costly component of running Altus data engineering workloads in the cloud. Altus EC2 Spot instance support makes it easy to significantly reduce the cost of cloud infrastructure by allowing users to provision Altus data engineering clusters backed by excess EC2 compute capacity at reduced prices. As a result, running data engineering jobs on Altus clusters is more cost-effective than ever.
What are AWS EC2 Spot instances?
AWS EC2 provides the ability to purchase Spot instances at lower prices than On-demand instance prices. EC2 Spot instances are backed by excess EC2 capacity and can be revoked when demand for the instance type spikes. Spot instance prices fluctuate based on the supply and demand of a particular EC2 instance type and are specific to different regions and availability zones.
This is in contrast to EC2 On-Demand instances which are guaranteed for the lifetime of the cluster and have fixed prices. Because Spot instances opportunistically use spare EC2 capacity and lack lifecycle guarantees, the cost savings for using a Spot versus On-Demand instance can be very significant, reaching as high as 80% or more off of the On-Demand price.
When bidding on Spot instances, you choose the maximum price you’re willing to pay per EC2 instance hour. If your bid meets or exceeds the current market price for the Spot instance type, then you will get the requested Spot instances. When your Spot instances start running, you pay the Spot market price (not your bid amount).
The first hour of your Spot instance price is based on the current market price at the time of launch. EC2 then re-evaluates this market price every hour. If the Spot market price increases above your bid price at any point, then your EC2 instance will be automatically terminated, and you will not charged for that last partial hour. If you terminate your Spot instances, then you will be billed for the full last hour.
Cluster Topology & Lifecycle
Cloudera Altus data engineering clusters are comprised of the following:
- Cloudera Manager node
- Master node
- Worker nodes
In order to support Spot instances, we expanded the Altus data engineering cluster topology to optionally include compute-only worker nodes. Altus compute nodes augment the compute power of a cluster while running stateless processes such as YARN NodeManager or Spark worker.
Actively running Altus data engineering jobs are able to gracefully recover from unplanned instance termination if stateless processes are lost. Because Altus compute nodes run only stateless processes, compute nodes can be backed by either On-Demand instances (guaranteed instance lifecycle at a higher price point) or Spot instances (possible instance revocation at a lower price point). Other Altus cluster node types are stateful and must be backed by On-Demand instances with guaranteed instance lifecycles in order to ensure cluster stability and efficiency.
For an Altus data engineering cluster, Altus requires three or more workers, optionally augmented by compute workers. Altus will automatically attempt to provision/replace Spot instances when Spot instances backing compute workers:
- cannot be acquired due to AWS Spot instance availability
- cannot be acquired due to spot bid price
- are terminated due to spot price fluctuations.
As new compute nodes backed by spot instances join the cluster, you should see your jobs accelerate as the jobs begin to leverage the newly acquired compute capacity.
How to Use Spot Instances in Altus
It’s simple to use Spot Instances with Cloudera Altus. Just select the size of the compute workers group, decide on a bid price, and we’ll do the rest.
This capability is available in the UI, API and CLI (version 1.1.0 or newer). You can learn more about spot instance support from the Altus documentation.
The instance type and number of instances for a cluster should be selected to meet the SLA of a given workload. All workloads are different, and you will need to determine a cluster topology that satisfies your use case.
There are two important considerations when adding compute nodes backed by spot instances to a cluster topology:
- Mix of on-demands workers vs spot compute workers
- Maximum bid price per EC2 instance hour
A possible strategy for picking number of on-demand workers is to determine the minimum number of on-demand workers that are needed to ensure that your workload completes. Then augment the cluster with compute instances backed by spot to reduce overall execution time and minimize cost.
When using spot instances, you will pay market price for your EC2 spot instances as long as the price does not exceed your maximum bid price. A high bid price provides a greater ability to tolerate spot price spikes without instance revocation. A lower bid price guards against paying for sustained periods of high spot prices. However, regardless of whether you submit a high or low bid, you pay market price rates (not your bid amount!) for your spot instances.
Determining the optimal spot bidding strategy can be very complicated, however, a simple strategy of bidding 75% or similar of the On-Demand instance price works great most of the time. Remember that since you are paying market rate for your spot instances, a bid of 75% and a bid of 35% would result in the same bill for your spot instances if the market rate for your spot instances remains under 35% for the lifetime of the instances. Bidding 75% allows you to ride out spot price fluctuations of up to 75% of the On-demand price. Bidding 35% protects you against paying sustained prices above 35% of the On-demand price. AWS Spot Advisor is a useful tool that can help you pick instance types and bid price based on historical spot market prices.
With AWS EC2 Spot instance support, Cloudera Altus makes it even easier to manage cost-effective Cloudera data engineering clusters on cloud infrastructure. Next time you create a cluster, try adding additional compute nodes backed by spot instances to see the impact that additional compute capacity will have on your workload. We have made spot instances easy to use, so that you can focus on your workloads and avoid dealing with the complexity of the underlying infrastructure.
Sound useful? Get onboard!