How long can a Kubernetes memory leak go unnoticed in production?

There is no fixed timeframe. A leak can go unnoticed as long as memory usage keeps rising without hitting a limit, causing restarts, or creating enough pressure to affect the node. How soon it becomes visible depends on how fast memory is growing, how much memory the workload normally uses, the request and limit around it, and how much free memory the node still has.

Can Kubernetes memory leaks occur even when memory limits are set correctly?

Yes. Correct memory limits do not prevent a leak. They only define the boundary where the container becomes unstable or gets killed if retained memory keeps growing, and Kubernetes enforces that boundary reactively rather than immediately.

How does groundcover help identify the root cause of Kubernetes memory leaks faster?

groundcover speeds up the cluster-side investigation with eBPF-based observability. Its eBPF sensor, Flora, runs as a DaemonSet, provides application metrics and traces with zero code changes, and helps you investigate logs, metrics, traces, events, and Kubernetes context in one flow. That makes it faster to narrow the problem to the affected workload and time window before confirming the retained-memory source with runtime debugging.

Cost Optimization

Kubernetes Memory Leaks: Detection, Impact & Fixes

groundcover Team

March 14, 2026

min read

Cost Optimization

A Kubernetes memory leak can look like ordinary memory growth until it starts disrupting the workload. A pod keeps consuming more memory than expected. Restarts begin, containers get OOMKilled, and if the leak is not contained, the node starts running low on available memory. Kubernetes only enforces memory limits reactively, so the failure may not be visible until long after the memory leak has started.

In this guide, you’ll learn what a Kubernetes memory leak is, what causes it in workloads running on Kubernetes, how Kubernetes handles memory allocation and limits, which metrics help detect it, how it affects performance and stability, and how to debug, contain, and prevent it more effectively.

What a Kubernetes Memory Leak Is and Why It’s Hard to Catch

In Kubernetes, a memory leak occurs when a process or component keeps allocated memory after it is no longer needed. When this happens, usage grows over time instead of returning to a steady state. Kubernetes does not expose that retention directly. What it surfaces are the effects, such as sustained memory growth in resource metrics, pod restarts, `OOMKilled` containers, and node `MemoryPressure`.

That is what makes a leak difficult to catch early. A rising memory graph shows that usage is increasing, but it does not indicate what is holding memory. Memory limits also do not solve the detection problem because Kubernetes enforces them reactively. A container may keep consuming memory until the kernel detects pressure and triggers an OOM kill, meaning the visible failure can occur well after the leak has already been building.

Kubernetes Memory Leak vs Memory Pressure vs OOMKills

To correctly diagnose a Kubernetes memory problem, you need to distinguish between a memory leak, MemoryPressure, and OOMKilled. All three can appear in the same incident, but each represents a different part of the failure path.

| Issue | Definition | Scope | Kubernetes Signal | What It Doesn't Confirm | | ---------------------- | --------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | | Kubernetes memory leak | Memory keeps increasing because allocated memory is being retained instead of released | Application, dependency, runtime, or another cluster component | Sustained memory growth in metrics; pod restarts and OOMKilled events may appear later | Whether the node is already under pressure, or exactly which component is responsible | | MemoryPressure | A node-level condition declared when available memory drops below a configured eviction threshold | Node level | Node condition shows MemoryPressure; kubelet may reclaim resources and may evict pods depending on policy | That a memory leak is definitely the cause | | OOMKilled | A container termination result after memory exhaustion crosses a hard memory boundary, and the kernel OOM killer is invoked | Container or cgroup level, then reported by Kubernetes | Container state shows OOMKilled; exit code is 137; the pod may restart based on policy | Whether the cause was a leak, a low memory limit, a traffic spike, or another memory issue. |

Because Kubernetes surfaces the effects rather than the retained memory itself, a memory leak is usually recognized by the symptoms it creates.

Common Symptoms of a Kubernetes Memory Leak

A Kubernetes memory leak usually becomes apparent through recurring symptoms rather than a single clear indicator. Some symptoms appear in resource usage first, while others emerge later as the leak starts affecting workload and node stability.

Sustained memory growth in pod or container metrics: This is usually the earliest sign. `kubectl top` shows recent CPU and memory consumption for pods and nodes, but those metrics are designed to provide a stable autoscaler signal rather than pinpoint accuracy.
Containers showing OOMKilled and pods restarting: As retained memory continues to grow, a container can exceed its hard memory limit and be terminated by the kernel. In Kubernetes, that typically shows up as OOMKilled, repeated restarts, or `CrashLoopBackOff` after the container fails again soon after restart.
Nodes reporting MemoryPressure: Kubernetes ties MemoryPressure to the `memory.available` eviction signal. The kubelet reports that the node condition is satisfied when available memory has satisfied an eviction threshold.
Pods being evicted from the node: When eviction thresholds are met, the kubelet may evict pods to protect node stability. This is a node-level response to pressure, not the same as a container being OOMKilled after hitting its memory limit.
New BestEffort pods no longer being scheduled on the affected node: When a node enters MemoryPressure, the control plane adds the `node.kubernetes.io/memory-pressure` taint. Kubernetes adds the matching toleration automatically to pods whose QoS class is Burstable or Guaranteed (i.e., not BestEffort), so new BestEffort pods are not scheduled onto a node under memory pressure.

Not every rise in memory is a leak. Kubernetes tracks memory-backed `tmpfs` emptyDir volumes as container memory use. Without a `sizeLimit`, they may consume up to the pod’s memory limit, or all available node memory if no memory limit is set.

What Causes Memory Leaks in Kubernetes?

Kubernetes usually do not cause the leak. In most cases, the retained memory comes from the application code, a library on which the workload depends, or a lower-level platform component. The main causes of memory leaks in workloads running on Kubernetes are:

Application Code That Keeps References Alive

This is the most common cause. Objects stay in memory because the code still holds references to them after they are no longer needed, so usage keeps growing instead of returning to a steady range.

Unbounded Caches, Queues, and Buffers

A cache is not a leak by design, but it becomes one when it has no size limit, expiry, or eviction policy. The same pattern applies to in-memory queues, retry backlogs, and request buffers that keep growing under load and never shrink back to a stable baseline.

Background Workers, Goroutines, Threads, and Listeners That Never Exit

Leaks do not always sit on the main request path. Background workers, timers, listeners, and long-lived goroutines or threads can keep memory alive quietly while request handling still looks normal. That is why some leaks only appear after the service has been running for a while. Node.js even warns about a “possible EventEmitter memory leak” when listener buildup crosses its default threshold.

Third-Party Libraries, Agents, and Sidecars

The retained memory is not always in your own code. It can live in a client library, a monitoring agent, a service-mesh sidecar, or another component running in the same pod. In practice, this is one reason memory incidents are easy to misread at first. The pod appears to be the source, but the actual leak may reside in a dependency attached to it. Shopify documented a production leak that was eventually traced to a memcached client library rather than the application itself.

Native Extensions and Off-Heap Allocations

Some leaks happen outside the part of memory your language runtime exposes most clearly. Native extensions and off-heap allocations can keep growing while heap-focused tools show only part of the picture. That makes these leaks harder to confirm and easier to misdiagnose until you profile beyond the managed heap.

Container Runtime, Kernel, and Platform Bugs

Some incidents that look like application leaks come from lower layers of the stack. A container runtime bug, a kernel memory issue, or another platform defect can produce the same symptoms inside Kubernetes even though the workload itself is not the source. containerd, for example, has had cases where goroutine leaks in its CRI implementation led to host memory exhaustion.

How Kubernetes Handles Memory Allocation and Limits

Kubernetes handles memory in stages. It does not treat memory as a single setting that controls everything from scheduling to runtime protection. Instead, it uses requests to decide placement, limits to constrain runtime behavior, QoS classes to influence how Pods are treated under pressure, and eviction rules to protect the node when memory becomes scarce. Here is how Kubernetes handles memory allocation and limits:

Memory Requests Decide Placement

A memory request tells the scheduler how much memory a container needs for placement. The scheduler compares that request against the node’s allocatable memory and only places the Pod if enough requested memory is still available. The request is not a cap. It affects where the Pod can run, not how much memory the container can consume later. If you set a limit but omit a request, Kubernetes can copy the limit into the request.

Limits Set the Runtime Kill Boundary

A memory limit is passed to the runtime and enforced by the kernel through cgroups. Unlike CPU limits, memory limits are not enforced by throttling. They are enforced reactively, which means a container can keep allocating memory until the kernel detects pressure and cannot reclaim enough memory. That is when the OOM path is triggered. This is why a memory leak often shows up first as rising usage and only later as restarts or OOMKilled containers.

QoS Classification Shapes Eviction Priority

Kubernetes uses requests and limits to classify pods as Guaranteed, Burstable, or BestEffort. That classification influences eviction priority under memory pressure: BestEffort pods are the first candidates for eviction, while Guaranteed pods are the last. So requests and limits do more than control placement and boundaries. They also affect how Kubernetes treats the pod when memory becomes scarce.

Eviction Thresholds Protect the Node

Kubernetes also protects the node itself. Kubelet watches signals, such as `memory.available`, and compares them against configured eviction thresholds. If those thresholds are met, kubelet can evict Pods to reclaim memory before the node reaches a system-wide OOM condition. This is a different failure path from a single container crossing its own memory limit. One happens at the node level. The other happens at the container level.

How Kubernetes Memory Leaks Impact Performance and Stability

A Kubernetes memory leak affects more than memory consumption. As retained memory continues to grow, the impact spreads beyond the original allocation problem and begins to affect the behavior and stability of the workload.

Performance Degrades Before the Process Dies

The first impact is often slower application behavior rather than an immediate crash. As retained memory grows, the process has less room to work with, and garbage-collected runtimes can spend more time reclaiming memory and less time serving requests. That is why leaks often first appear as rising latency or lower throughput before any restart occurs.

Container Stability Starts to Break

If memory keeps growing, the container eventually reaches a point where the kernel cannot reclaim enough memory and terminates the process. In Kubernetes, that shows up as `OOMKilled`, repeated restarts, and sometimes `CrashLoopBackOff` when the container fails again soon after coming back up. At that point, the problem is no longer just higher memory usage. It has become an availability problem for the workload itself.

Memory Pressure Reaches the Node

A leak becomes more serious when it stops being contained within one workload. Kubernetes tracks node memory pressure through the `memory.available` eviction signal, and once that threshold is met, kubelet can start evicting pods to protect the node. Under hard eviction thresholds, that can happen immediately. One leaking workload can then start affecting other pods on the same node, even if those pods are healthy.

Recovery Turns Into Churn

When retained memory keeps growing until a container is OOMKilled or a pod is evicted, Kubernetes can restart the container or create a replacement pod through the workload controller. That may restore service temporarily, but it does not remove the memory growth that caused the failure. The new instance can follow the same path and fail again later, which turns recovery into repeated churn.

Scheduling and Scaling Become Less Reliable

Memory leaks also interfere with normal scheduling and scaling behavior. When a node is under MemoryPressure, Kubernetes starts restricting which new pods can be placed there. New BestEffort pods are blocked by default, while non-BestEffort pods can still be scheduled there automatically. If the workload also scales on memory, the leak can look like persistent demand because the autoscaler works from average memory utilization or average memory value. That may add replicas, but it does not free up memory already retained by the leaking pods.

Key Metrics to Monitor for Kubernetes Memory Leak Detection

No single metric proves the existence of a Kubernetes memory leak. You need to track memory growth in the workload, how close it is to its request and limit, and whether the same growth is starting to affect the node.

Sustained workload memory growth: Start here. In the Kubernetes Metrics API, memory usage is reported as the memory working set. A leak usually appears as a sustained upward trend that does not return to its original level after work is completed. One reading is not enough. The trend over time is what matters.
Memory usage relative to the request: This shows how far the workload has drifted above the memory level at which it was scheduled. It helps answer whether the pod is still behaving like the scheduler expected, or whether retained memory has pushed it well beyond its planned footprint. Kubernetes schedules on requests, not on limits.
Memory usage relative to the limit: This tells you how close the workload is to its runtime kill boundary. It is one of the best indicators of imminent instability because memory limits are enforced reactively. A leak can keep growing for some time, but the closer the working set or RSS gets to the limit, the closer the container gets to OOMKilled.
Resident memory growth: If your monitoring stack exposes cAdvisor metrics, `container_memory_rss` is one of the best supporting signals because it tracks anonymous memory and, on cgroup v1, swap cache memory. A steady rise here is usually a stronger leak signal than working set alone because working set can still include reclaimable memory, such as file cache.
Available memory at the node level: This is the key node-side metric. Kubelet uses `memory.available` for eviction decisions, and `MemoryPressure` is tied directly to that signal. When this starts falling alongside a leaking workload, the problem is no longer just inside one container.
Memory stall pressure: PSI is valuable because it measures stalled time, not just bytes used. Watch `some` as an early sign that tasks are stalling on memory, and `full` as a sign of a more severe shortage where all non-idle tasks are stalled at once. Kubernetes exposes PSI at the node, pod, and container levels through the Summary API and the kubelet’s `/metrics/cadvisor` endpoint.

These metrics give you the evidence you need to decide whether rising memory is a leak, a boundary problem, or the start of node pressure.

Challenges of Detecting Kubernetes Memory Leaks With Traditional Monitoring

Traditional monitoring shows that memory is rising, but it does not make it easy to determine whether the problem is a true memory leak. Here are the main reasons why.

Resource Metrics Are Limited

The built-in metrics pipeline is intentionally narrow. The Metrics API exposes the minimum CPU and memory data needed for autoscaling and similar use cases, and `kubectl top` is designed to provide a stable autoscaler signal rather than a full diagnostic view. That makes it useful for spotting growth, but not for proving why memory is growing.

Memory Usage Does Not Show Ownership

The memory value exposed through the Metrics API is the working set. That is useful as a starting point, but it does not tell you whether the retained memory comes from heap objects, native allocations, page cache, or memory-backed storage. Kubernetes also tracks `tmpfs` `emptyDir` volumes as container memory use, so some growth that looks like a leak may be coming from memory-backed volume usage instead.

The Clearest Signals Arrive Late

Some of the most visible signals appear only after the leak has already been building. `OOMKilled` appears after the container exceeds its memory limit and the kernel triggers the OOM path. `MemoryPressure` appears only after node memory falls far enough to satisfy an eviction threshold. Traditional monitoring is better at showing that failure occurred than at showing when retained memory began to build.

Container and Node Signals Follow Different Paths

Kubernetes splits memory trouble across layers. One path is container-level, where a process gets `OOMKilled` and restarted. The other is node-level, where kubelet watches `memory.available` and may evict pods to protect the host. Traditional dashboards often show these as separate signals, which makes it easy to confuse a limit problem with a node-pressure problem.

Crash-Looping Pods and Minimal Images Make Live Debugging Harder

Even when monitoring indicates a likely leak, gathering sufficient evidence from the running pod can still be difficult. kubectl exec is often not enough when the container is crash-looping, or the image does not include a shell or debugging tools. In those cases, you may need `kubectl debug` with an ephemeral container attached to the pod to inspect the environment and collect evidence more directly.

Best Practices to Prevent Kubernetes Memory Leaks

In Kubernetes, prevention is mostly about guardrails. Kubernetes does not stop application code from retaining memory, but it can stop a leak from running without limits, staying hidden for too long, or spreading into a node-wide problem.

| Kubernetes Practice | How It Helps With Memory Leaks | | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Set realistic memory requests and limits | Requests reserve memory for scheduling, and limits define the runtime boundary enforced by the kernel later. Together, they stop leaks from growing without clear placement and runtime constraints. | | Use QoS classes deliberately | Requests and limits determine whether a pod is \`Guaranteed\`, \`Burstable\`, or \`BestEffort\`. That affects how much protection the pod keeps when the node is short on memory. | | Enforce defaults and boundaries with LimitRange | A \`LimitRange\` sets default requests and limits, enforces minimum and maximum values, and constrains the request-to-limit ratio. This prevents workloads from entering a namespace without meaningful memory boundaries. | | Require memory declarations with ResourceQuota | A \`[ResourceQuota](https://www.groundcover.com/blog/kubernetes-resource-quota)\` can require pods in a namespace to declare memory requests or limits and can cap total requested or limited memory. This reduces the risk that unconstrained workloads hide or worsen leaks. | | Use Vertical Pod Autoscaler for rightsizing | VPA corrects chronic under-requesting by adjusting requests and, in the default mode, scaling limits proportionally. It does not fix the leak, but it makes memory pressure easier to interpret and harder to ignore. | | Put size limits on memory-backed \`emptyDir\` volumes | A \`sizeLimit\` prevents tmpfs-backed storage from growing to the point where it appears to be a leak or worsens a real one. | | Isolate high-risk workloads onto dedicated nodes | Taints, tolerations, and node affinity keep memory-sensitive workloads away from unrelated pods. That limits how far a bad leak can spread across the cluster. |

These controls do not remove the leak itself, but they are the Kubernetes mechanisms that keep leaks bounded, visible, and less likely to destabilize the rest of the cluster.

Debugging and Fixing a Kubernetes Memory Leak in Live Environments

In a live cluster, leak debugging has two goals. Preserve Kubernetes evidence before restarts or evictions erase it, then collect enough runtime data to identify what is retaining memory and verify the fix after rollout. Here is how to achieve this.

Capture the Kubernetes Evidence First

Start with the evidence Kubernetes already has. `kubectl describe pod` shows status changes, events, and restart history. `kubectl logs --previous` matters when the container has already restarted because it gives you logs from the previous instance.

# Show pod events, restart history, and last known state
kubectl describe pod <pod-name>

# Get logs from the previous container instance after a restart
kubectl logs <pod-name> -c <container-name> --previous

# Dump full pod status for exact reason, exit code, and container state
kubectl get pod <pod-name> -o yaml

Look for the restart count, `OOMKilled` exit code `137`, eviction messages, and the last logs from the failed container. That gives you the failure window before you change anything.

Use Kubernetes Debugging Tools to Reach the Pod Safely

If the image already includes the tools you need, `kubectl exec` is the simplest option. When it does not, `kubectl debug` lets you attach an ephemeral container to a running pod or copy a crashing pod with a different command or image.

# Attach an ephemeral container to a running pod
kubectl debug -it <pod-name> --image=busybox --target=<container-name>

# Copy a crashing pod and change its command
kubectl debug <pod-name> -it --copy-to=<pod-name>-debug --container=<container-name> -- sh

# Copy a pod and replace one container image with a debug image
kubectl debug <pod-name> --copy-to=<pod-name>-debug --set-image=<container-name>=<debug-image>

Once inside, look for process count, open files, temporary files, cache directories, mounted volumes, and any runtime state that matches the memory growth you saw in Kubernetes. The goal here is to connect the cluster symptom to a concrete element within the pod.

Collect Runtime-Specific Memory Evidence

Once you have access, move from Kubernetes symptoms to runtime evidence. In Go, heap and goroutine profiles are the main tools. In Node.js, the Heap Profiler traces every allocation and is not recommended for production because of overhead, while the Sampling Heap Profiler is low-overhead enough for production use.

// Expose pprof endpoints from a Go service
import _ "net/http/pprof"

At this stage, look for allocation paths that keep growing across captures, object types that do not drop after work completes, or goroutines and background work that keep accumulating. The useful result is not “memory is high,” but “this specific allocation path is still growing.”

Move to the Node When Pod-Level Evidence Does Not Explain the Leak

Not every memory incident lives inside the workload. If several unrelated pods on the same node are affected, or pod-level evidence does not match the memory growth, move to the node.

# Start a debug pod on the node
kubectl debug node/<node-name> -it --image=ubuntu

# Start a node debug pod with elevated access
kubectl debug node/<node-name> -it --image=ubuntu --profile=sysadmin

Look for runtime and host-level evidence, such as container runtime issues, kubelet issues, or memory growth that exceeds what the affected pod alone can explain. The node filesystem is mounted at `/host`, but some inspection may fail unless you use `--profile=sysadmin`.

Fix the Source, Then Verify the Rollout

Use the evidence you collected to fix the specific source of retained memory. If the profiles point to a cache that keeps growing, add a bound or eviction policy. If the leak is in a library, update or replace it. If the problem comes from a worker, listener, or retry loop that keeps building up state, change that code path. If the investigation showed memory-backed volume growth instead of a true leak, fix the workload configuration instead. After that, verify the rollout.

# Roll out the fixed container image
kubectl set image deployment/<deployment-name> <container-name>=<new-image>

# Watch the rollout until the new revision is available
kubectl rollout status deployment/<deployment-name>

After rollout, look for the pattern to change. Memory growth should flatten, restarts should stop, `OOMKilled` should disappear, and node pressure should clear if the leak had already spread that far.

Faster Kubernetes Memory Leak Detection With eBPF-Powered Observability Using groundcover

groundcover speeds up Kubernetes memory leak detection with eBPF-based observability. Its eBPF sensor, Flora, runs as a DaemonSet in the cluster and provides application metrics and traces with zero code changes. You can then investigate memory growth alongside logs, traces, and Kubernetes events in one place instead of piecing the evidence together across separate tools.

eBPF-based collection with zero code changes: Flora collects application metrics and traces without manual instrumentation, so you can start from usable telemetry instead of first wiring up custom collection.
One place for metrics, logs, traces, and events: groundcover’s Data Explorer & Monitors query builder supports Metrics, Logs, Traces, and Events, which makes it easier to move from rising memory to the surrounding logs, traces, and Kubernetes events without switching tools.
Kubernetes metadata enrichment: groundcover enriches transactions with Kubernetes context, such as participating pods, nodes, and container state, and it can enrich logs, traces, events, and metrics with selected pod labels and annotations. That makes it easier to keep the investigation scoped to the right workload, pod, or namespace.
Monitor Issues and Monitor Catalog for restart and OOM signals: groundcover’s current workflow is built around Monitor Catalog and Monitor Issues, which surface active problems and provide ready-made monitors to start from. That includes restart-related issues such as exit code 137 for OOMKilled, which helps narrow the investigation to the affected workload faster.
Log and trace correlation through shared trace context: groundcover can correlate logs and traces when they share trace context, most commonly a `trace_id`. It does not inject that context into logs automatically, so correlation depends on services being instrumented with OpenTelemetry or Datadog SDK and on the logging setup, including a trace_id field in the log payload.
Filtering and grouping that keep the investigation focused: the query builder supports filtering, grouping, and breakdowns across telemetry. This helps keep a memory investigation centered on the affected workload and time window instead of the whole cluster.

groundcover gives you the Kubernetes context and connected telemetry needed to narrow memory leaks faster. That makes it easier to move from cluster-level signals to runtime confirmation.

Conclusion

Kubernetes memory leaks are difficult to detect and trace because the visible failure often appears long after memory usage starts rising. Reliable diagnosis depends on watching the right signals, confirming the leak inside the workload, and then verifying the fix after rollout. groundcover helps speed up that process by bringing Kubernetes context, metrics, logs, traces, and events into one investigation flow, so you can narrow the problem faster and move to runtime debugging with clear evidence.

Back to Cost Optimization