Running spot instances effectively with Amazon EKS

Black Lives Matter Towards carbon negativity

Running spot instances effectively with Amazon EKS

Since we started working on HEY, one of the things that I’ve been a big proponent of was keeping as much of the app-side compute infrastructure on spot instances as possible (front-end and async job processing; excluding the database, Redis, and Elasticsearch). Coming out of our first two weeks running the app with a real production traffic load, we’re sitting at ~90% of our compute running on spot instances.

Especially around the launch of a new product, you typically don’t know what traffic and load levels are going to look like, making purchases of reserved instances or savings plans a risky proposition. Spot instances give us the ability to get the compute we need at an even deeper discount than a 1-year RI or savings plan rate would, without the same commitment. Combine the price with seamless integration with auto-scaling groups and it makes them a no-brainer for most of our workloads.

The big catch with spot instances? AWS can take them back from you with a two-minute notice.

Spot-appropriate workloads

Opting to run the bulk of your workloads on spot only works well if those workloads can handle being torn down and recreated with ease (in our case, via Kubernetes pods). What does “with ease” mean though? For us it means without causing exceptions (either customer-facing or internally) or job failures.

For HEY, we are able to get away with running the entire front-end stack (OpenResty, Puma) and the bulk of our async background jobs (Resque) on spot.

Enduring workloads

What doesn’t fit well on spot instances? For HEY, we’ve found three main categories: long-running jobs, Redis instances for branch deploys, and anything requiring a persistent volume (like certain pieces of the mail pipeline). We put these special cases on what we refer to as the “enduring” nodegroup that uses regular, on-demand instances (and probably RIs or savings plans in the future).

Let’s take a look at each:

Jobs that have the possibility of running for more than a minute-and-a-half to two-minutes are an automatic push over to the enduring node-group. Currently for HEY this is just account exports.Redis is a primary infrastructure component for HEY. We use it for a bunch of things — view caching, Resque, inbound mail pipeline info storage, etc. HEY has a dynamic branch deploy system that deploys any branch in GitHub to a unique hostname, with each of those branch deploys needing their own Redis instances so that they don’t step on each other (using the Redis databases feature doesn’t quite work here). For a while we tried running these Redis instances on spot, but ugh, the monitoring noise and random breakage from Redis pods coming and going and then the app connecting to the read-only pod and throwing exceptions…. it was too much. The fix: get them off of spot instances. [this only affects beta/staging — those environments use a vendored Redis Helm chart that we run in the cluster, production uses Elasticache]We’ve successfully run lots of things that require PVCs in Kubernetes. Heck, for several months we ran the entire Elasticsearch clusters for Basecamp 2 and 3 on Kubernetes without any major issues. But that doesn’t mean I’d recommend it, and I especially do not recommend it when using spot instances. One recurring issue we saw was that a node with a pod using a PVC would get reclaimed and that pod would not have a node to launch on in the same AZ as the existing PVC. Kubernetes ~~absolutely loves this~~ shows this via a volume affinity error and it frequently required manual clean-up to get the pods launching again. Rolling the dice on having to intervene whenever a spot reclamation happens it not worth it to our team.

Getting it right

node-termination-handler

Arguably the most important piece of the spot-with-Kubernetes puzzle is aws-node-termination-handler. It runs as a DaemonSet and continuously watches the EC2 metadata service to see if the current node has been issued any spot termination notifications (or scheduled maintenance notifications). If one is found, it’ll (attempt to) gracefully remove the running pods so that Kubernetes can schedule them elsewhere with no user impact.

cluster-autoscaler

cluster-autoscaler doesn’t need many changes to run with a spot auto-scaling group. Since we’re talking about ensuring that instance termination behaviors are handled appropriately though, you should know about the pain we went through to get ASG rebalancing handled properly. When rebalancing, we’d see nodes that just died and the ALB would continue to send traffic to them via kube-proxy because there was no warning to kubelet that they were about to go away. aws-node-termination-handler cannot handle rebalancing currently, but I believe that EKS managed nodegroups do — at the expense of not supporting spot instances. A project called lifecycle-manager proved to be critical for us in handling rebalancing with ease (even though we ended up just disabling rebalancing all together

View more on Jason Fried's website »

Like • 0 comments • flag

Published on June 29, 2020 09:12

No comments have been added yet.

Jason Fried's Blog

Jason Fried's profile
1429 followers

Jason Fried isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.

delete edit this post