Between predictable and practical - on kubernetes limits

By Noam Levy
Founding Engineer
7 min read
copy link to sharelinkedin share logotwitter share logo
Kubernetes
Overhead
CPU

You might think that limits are a Kubernetes admin’s best friend. By restricting the CPU and memory that Pods can consume, you can avoid “noisy neighbor” issues, streamline resource allocation and keep your clusters humming smoothly, right?

Well, not necessarily. Limits have their limitations. Although it’s good and well to set limits for those workloads that are predictable enough to benefit from resource consumption restrictions, limits can do more harm than good in cases where workloads are very “spiky.”

That’s why, on the whole, we think use of limits should be limited. As working with CPU limits requires a deep understanding of the implications, there are approaches that suggest avoiding the use of CPU limits.  While it’s critical to get to know your workloads well before making strategic decisions on whether to set limits, and which kinds of limits to apply if so, we would like to shed some light on use-cases where CPU limits can really come in handy.

Let’s dive in…

How Kubernetes limits work

Limits in Kubernetes work in a pretty straightforward way: They restrict how many CPU and/or memory resources Pods or containers can access. For example, if you want to set a CPU limit of 500m for a container inside your Pod, you’d describe it with YAML such as the following:

Using limits to reach resource consumption predictability 

The benefit of limits is obvious enough: By limiting the CPU or memory that a container or Pod can consume, you avoid situations where a workload sucks up resources, achieving resource consumption predictability - which is something your SRE will definitely love to see. When CPU and memory limits are used - you gain a predictable resource consumption for the workload, trading-off SLO in cases where the workloads demand more resources to fulfill their goal.

In more extreme scenarios, memory and CPU limits also help to stop an application from triggering a chain reaction that could cause your entire environment to crash. Want to make sure that the buggy development version of the app that you deployed into your dev namespace doesn’t suck up CPU to the point that your other production workloads are deprived? Limits will help to do that, by restricting how many resources the hungry app can consume.

Why CPU limits don’t come easy

The fact that limits can help workloads to share resources appropriately, and prevent performance issues in one workload from bleeding over into others, doesn’t mean that every container should have a limit.

On the contrary, limits are a bad idea for certain types of Kubernetes workloads – namely, those that have spiky, unpredictable resource consumption patterns due to fluctuations in user requests. 

Now, you may be thinking: “OK, but if I can benchmark my spiky workloads accurately to determine their maximum resource needs, then I can just cap the limits to the maximum and call it a day. Right?”

Probably not. Although in theory accurate benchmarking would be a way to configure safe limits, in practice – as the wise Tim Hockin of Google observed recently on Hacker News – “accurate benchmarking is DAMN hard… Even within Google we only have a few apps that we REALLY trust the benchmarks on.”

In other words, even if you think your benchmark tests have determined the maximum amount of CPU or memory that your workloads may require, they probably haven’t. And if you set limits based on those tests, things are probably going to happen that you don’t want to happen – namely, your limits are likely to end up being insufficient for real-world instances of peak demand. Meanwhile, if you set limits too high, you risk depriving your other workloads of the CPU and memory they require.

The bottom line here is that CPU limits tend to be less useful than a lot of people think, unless you truly know your workload requirements (which, again, you probably don’t, even if you think you do because you did a bunch of benchmarking).

Squaring the circle: Balancing requests and limits

You may also be thinking: “OK, since limits are really hard to get right, I’ll instead use requests to manage resource allocation for Pods.”

That’s not a bad thought. Requests are a helpful feature for controlling resource allocation.

A request tells the Kubernetes scheduler which minimum amount of memory or CPU resources should be available to your workload. For example, to request at least 64 megabytes of memory for a container, you could describe a Pod as follows:

The scheduler will then try to ensure that that minimum amount is available by scheduling the Pod on a node with the requisite resource availability.

The problem with requests, though, is that they’re not always enough on their own to manage resource allocations properly. Whether requests solve your resource-management woes depends – as does everything – your workload requirements.

To illustrate the point, let’s look at two common ways of leveraging requests and/or limits to improve resource allocation in Kubernetes.

Playing it safe: setting high requests

As a rule of thumb memory limits are encouraged. Defining high requests usually safeguards the node while not sacrificing app performance. This is a wise preventative measure since it signals to the server that this application will utilize a lot of memory before it actually starts consuming it. The caveat of this approach is that applications that don’t actually utilize the memory they request while other workloads remain under-provisioned because there aren’t enough resources left over for them. This leads to an underutilized node which can increase expenses, especially when node autoscaling is implemented. 

Set low requests, and set limits… only for spiky applications

In the case of a spiky workload, with a varying resource consumption, it’s possible to create a margin between the requests and the limits. 

The result is a cluster where nodes are less likely to go underutilized and Pods are less likely to have to wait to be assigned because your requests are not too high. At the same time, however, the limits serve as a protection against those workloads that may become especially disruptive to other containers.

However, creating a large margin between request and limits can lead to a very nasty problem called memory overcommit. In this case, workloads lie to the node about their minimum requirements, but once assigned to it, they completely drain it out of resources - quickly leading to unpredictable instabilities and crashes. This is the classic scenario where a kid at six flags wants to go on the rollercoaster, but doesn’t reach the minimum height requirement so he stands on his parent’s feet to fake the extra two inches and gets in. 

What about autoscaling?

Another creative way to try to leverage limits while minimizing the risk of running out of resources is to configure horizontal or vertical pod autoscaling for your clusters. Although each type of autoscaling works differently, they both serve the purpose of automatically making more resources available to your nodes (and, by extension, to your Pods) in situations where total cluster resources run low.

In theory, then, autoscaling can function against inadequate limit settings. The problem with this idea in practice, though, is that horizontal and vertical pod autoscalers “don't act instantly - certainly not on the timescale of seconds.” Your app may already be dropping requests by the time autoscaling rides in to save the day.

This doesn’t mean you shouldn’t use autoscaling. You certainly should in situations where it makes sense. But it is to say that you shouldn’t assume auto scaling has your back in the event that limit settings turn out to be insufficient. 

The bottom line: when it comes to K8s limits, know your app

Whichever approach you take, what matters most is knowing your application. You can’t determine which requests or limits to set unless you know what the consumption patterns of your application are likely to be. This requires tracking workload behavior over time, learning which resources the workload consumes during normal and peak activity and then setting requests and/or limits accordingly.

At the end of the day, Kubernetes is only as effective as the configurations you apply to it. It’s not an observability tool, so it can’t automatically determine which resources your apps need to run reliably. That task is on you, and it requires you to glean application insights with the help of a robust observability platform which will help you understand the resource consumption behavior of workloads in Kubernetes and beyond - so that you can make informed decisions about limit and request settings.