Memory in Kubernetes workloads is one of those things - like many other ‘adulting tasks’ - that you need to track because running out can have very bad consequences such as having insufficient memory to power your Kubernetes workloads... that might just result in some of those workloads being terminated.
In the case of Kubernetes, the thing that hauls workloads away is OOM Killer, a central component of the Linux kernel's memory management system. When a node that hosts containers starts to run low on memory, OOM Killer steps in to identify processes that it thinks are expendable, then terminates them to free up memory.
OOMKilled errors and K8s memory management
In theory, OOM Killer is there to save the most important Kubernetes workloads from situations where they'll crash because the nodes hosting them run out of available memory. But in practice, OOM Killer can sometimes make decisions that admins don't want – such as terminating critical Pods and creating what are colloquially known as OOMKilled errors. OOM Killer doesn't have a good way on its own of determining which workloads to terminate first.
This is true in part because Kubernetes itself does not trigger OOMKilled events (which are events where OOM Killer terminates processes). Instead, it relies on the Linux kernel of each node to do that work behind the scenes in order to prevent the kernels from surpassing their allocatable resource levels. You need to know how to configure Kubernetes in such a way that the Linux kernel's memory manager makes the right decisions, without being able to tell the memory manager explicitly what to do.
Plus, memory management in general isn't exactly a straightforward task because it's tricky for the kernel to predict exactly when the system is running out of memory, let alone determine which processes are the most expendable and can therefore be terminated. If OOM Killer makes the wrong choice about which processes to kill – or if it starts killing processes because it thinks your workloads are running out of memory, when in actuality they are not – it can cause more harm than good.
Add to that complexity the fact that OOM Killer operates on each individual node, while Kubernetes workloads may span multiple nodes, and memory management in Kubernetes becomes truly complicated.
To clear all of this up, let's spend some time unpacking how OOM Killer works, specifically within the context of Kubernetes. So, let’s dive into what OOM Killer does, why OOMKilled errors happen, how to work with memory management in Kubernetes and best practices for ensuring that memory-related problems never get in the way of Kubernetes performance.
Kubernetes memory management: An overview
Although Kubernetes doesn't let you directly control how the Linux kernel manages memory, it does allow you to configure memory management within your Kubernetes cluster. Here's a look at the three main memory management concepts in Kubernetes.
Memory requests are minimum amounts of memory that you can request for containers. Requests provide a means of guaranteeing that containers receive sufficient memory to function properly. That said, if you set requests too high, you may tie up more memory resources than a container actually requires, while depriving other containers of the memory they need to run.
Memory limits define the maximum amount of memory that a container can consume. Setting memory limits prevents a container from utilizing excessive memory and potentially causing performance issues or triggering OOMKilled events due to insufficient memory on the node. When a container exceeds its memory limit, it may be terminated by the Linux kernel's OOM manager to free up memory for other critical processes.
Quality of Service (QoS) classes
QoS classes govern the level of priority that Kubernetes assigns to Pods. There are three possible classes:
- Guaranteed: Pods with guaranteed QoS have both memory requests and limits defined. The Kubernetes scheduler ensures that sufficient memory is available on the node for these Pods, and it won't evict them due to memory constraints.
- Burstable: Pods with burstable QoS have memory limits defined, but their memory requests may be lower than the limit. These Pods can use more memory than their requests if available, but they are not guaranteed to receive the full limit. If the node runs out of memory, Pods in the burstable class are more likely to be evicted compared to guaranteed pods.
- Best-effort: Pods in the best-effort class do not have any memory requests or limits defined. They are considered low-priority and can consume any available memory on the node. When the node experiences memory pressure, best-effort pods are the first to be evicted.
Note that you don't set QoS classes for Pods explicitly. Instead, Kubernetes assigns them automatically based on resource requests and limits that you set (or don't set).
You can check which class Kubernetes has assigned to a given Pod by running:
How the pieces fit together
Understanding the relationship between memory requests, limits, and QoS classes is essential for effectively managing memory resources in Kubernetes. By accurately defining these parameters based on application requirements, you can optimize memory utilization, reduce the risk of OOMKilled events, and ensure the stability and performance of your Kubernetes cluster.
Why does the OOMkilled error happen in Kubernetes?
If you don't set the correct memory limits and requests, and/or if Kubernetes doesn't assign proper QoS classes for some reason, you could end up with an OOMkilled error (or Out of Memory Killed error). OOMKilled errors occur when the Linux kernel's OOM Killer decides to terminate a process because it needs to free up some memory.
Again, one of the tricky things about memory management in Kubernetes is that you can't explicitly tell the kernel which processes to kill or not to kill (at least not from within Kubernetes). Instead, the best you can do is manage memory inside Kubernetes via limits and requests in such a way that OOMkilled errors won't happen because memory will always be distributed properly between containers and Pods.
Of course, a complicating factor is that Kubernetes clusters are distributed environments that typically include many nodes. Each node runs its own kernel, which can terminate processes running on that node. But each kernel doesn't know what's happening on other nodes, so it's not as if one node can look at a neighboring node and say "Hey, I'm going to kill this Pod because I'm short on memory but it looks like you have memory to spare, so I'll move the Pod to you." The Linux kernel can't do that because it's not an orchestration system for clusters or servers.
Kubernetes, however, can track the available memory on each node in a cluster and distribute Pods on different nodes accordingly. This means that as long as you set the right memory limits and requests – and assuming your cluster's total memory availability is sufficient to meet the needs of your workloads – you should be able to avoid OOMkilled errors, regardless of what individual kernels decide to do. Kubernetes should automatically move Pods to different nodes based on memory availability before it becomes necessary for one Pod's kernel to start terminating Pods due to low memory availability.
Memory monitoring and debugging
Unfortunately, even when you do assign memory limits and requests that you think make sense, there's no guarantee that the unexpected won't happen. That's why it's important to monitor memory usage in Kubernetes. Memory monitoring allows you to detect potential issues, identify resource bottlenecks and proactively address memory concerns before you run into OOMkilled events or similar errors.
There are a number of tools for monitoring memory in Kubernetes. Our favorite solutions include:
- kubectl top: The `kubectl top` command allows you to retrieve real-time resource utilization metrics for Pods, nodes and containers within your Kubernetes cluster. By running `kubectl top` with the appropriate options, such as `kubectl top pods or `kubectl top nodes`, you can obtain memory usage statistics that help you understand how the cluster is using resources.
- Metrics Server: Metrics Server is a Kubernetes component that collects resource metrics, including memory usage, from all nodes and Pods in the cluster. By deploying and configuring Metrics Server, you can enable cluster-wide monitoring of memory utilization. Metrics Server is not enabled by default in most Kubernetes distributions, so you should check your distribution's documentation to determine whether you need to turn it on manually and, if so, how to do it. Once Metrics Server is up and running, you can utilize the Kubernetes API to retrieve memory metrics programmatically.
- Prometheus and Grafana: Prometheus is a popular open source monitoring system widely used in the Kubernetes ecosystem. By deploying Prometheus in your cluster, you can collect and store detailed metrics, including memory usage, over time. From there, you can integrate Grafana, an open source visualization and analytics tool, with Prometheus to create custom dashboards and visualizations for monitoring memory-related metrics.
- Murre: Our very own Murre is an open source on-demand, scalable source of container resource metrics for Kubernetes. Murre fetches CPU and memory resource metrics directly from the kubelet on each K8s node and enriches the resources with the relevant K8s requests and limits from each PodSpec.
Each of these tools helps you maintain continuous visibility into Kubernetes memory usage trends and identify anomalies (such as sudden spikes in memory consumption) that could lead to a problem. In addition, these solutions allow you to drill down into the memory usage of individual Pods and nodes, which is helpful in situations where you need to figure out what's consuming memory and whether high rates of memory usage reflect legitimate workload demand or result from a problem like a memory leak bug.
Analyzing Logs and Events for OOMKilled Indicators
In addition to monitoring Kubernetes memory usage, analyzing logs and events is another essential step in detecting OOMKilled indicators and understanding the circumstances leading up to them. By examining various types of logs and events within your Kubernetes cluster, you can gain valuable insights into memory-related issues and troubleshoot OOMKilled occurrences.
Here are the key sources to investigate
• Kubernetes API Events: Kubernetes generates API events for various activities occurring within the cluster. By monitoring these events, you can identify any OOMKilled events triggered by the cluster's OOM manager. These events provide information about which pod and container were affected, allowing you to narrow down the scope of investigation.
• Kernel message logs: The Linux kernel generates logs that capture system-level events, including out-of-memory situations. Kernel message logs, which you can access in the /var/log/messages file on most distributions or by running the `dmesg` command, may contain valuable information about memory-related issues leading to OOMKilled events. These logs can also provide insight into the system's overall memory pressure and potential kernel-level factors that influence OOM decisions.
To gain the fullest possible context into OOMKilled events, you should analyze logs and events in conjunction with other monitoring techniques, such as monitoring the resource metrics and utilization data that we mentioned earlier. By correlating memory-related events with resource utilization patterns, you can better understand the context surrounding OOMKilled incidents.
Best Practices for Preventing Kubernetes OOMKilled
As with any type of error, it's better to prevent an OOMKilled problem from happening in the first place than to troubleshoot it after the event. Here are tips for managing memory in Kubernetes effectively and avoiding memory-related problems.
Set appropriate memory limits and requests
As we explained above, memory limits and requests play a central role in determining how Kubernetes allocates memory to Pods, as well as how it prioritizes different Pods in situations where free memory is in short supply.
To set the right limits and requests for your workloads, strive to:
• Calculate optimal memory values: Understand your application's memory usage patterns and profile its resource requirements. By benchmarking and analyzing memory consumption, you can determine suitable values for memory limits and requests. Consider factors such as peak usage, scalability and any memory-intensive tasks performed by the application.
• Avoid the risks of overcommitting: Be cautious when overcommitting memory resources. Overcommitment in Kubernetes means assigning more memory to containers than what is physically available, with the assumption (or hope, to be more blunt) that not all containers will consume their total allocated memory simultaneously. While overcommitting can increase resource utilization, it carries the risk of triggering OOM events if containers demand more memory than available. Assess your workload's characteristics and the impact of overcommitment on performance and stability before taking this route.
Autoscaling and eviction are essential mechanisms in Kubernetes for efficiently managing resources and ensuring optimal utilization within a cluster. Autoscaling, which is facilitated by tools like Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), dynamically adjusts the number of pods or the resource allocations of pods based on workload demands and resource utilization.
HPA scales the number of Pods up or down to maintain desired CPU and memory utilization thresholds, ensuring that applications have the necessary resources without unnecessary resource allocation. VPA, on the other hand, adjusts the resource requests and limits of containers based on actual resource usage, optimizing resource allocation and minimizing the risk of OOMKilled events.
Where possible, take advantage of HPA and VPA to help ensure that your cluster never runs out of sufficient memory.
Define custom eviction policies
Defining custom eviction policies and thresholds allows Kubernetes to prioritize critical workloads and evict lower-priority Pods during periods of low resource availability, which prevents starvation and maintains cluster stability. These mechanisms collectively provide elasticity, efficiency, and reliability to Kubernetes clusters, enabling them to adapt dynamically to changing workloads while ensuring optimal resource utilization.
Not today, OOM Killer
Although OOM Killer's job is to try to keep the overall state of servers and workloads as stable as possible, your goal as a Kubernetes admin should be to ensure that OOM Killer never has to kill. In a well-managed cluster – which means one where resource requests and limits are properly configured, auto-scaling rules help the cluster manage resources efficiently and admins constantly monitor for unexpected memory-related errors and events – nodes should not run out of allocatable memory, and OOM Killer should never have to terminate processes.
Check out our Kubernetes Troubleshooting Guide for more errors -->