Why can a pod get OOMKilled even when RSS looks healthy?

Kubernetes evaluates container memory limits against the working set size, not the Resident Set Size (RSS). Heavy file I/O operations—such as loading large models, scanning logs, or database reads—can cause the active file cache to swell. This pushes the working set over the configured limit and triggers an OOMKill, even if application heap usage remains perfectly stable.

How should teams set Kubernetes memory requests and limits using these metrics?

Teams should use working set metrics over a normal business cycle to establish a P95 baseline. Set resource requests close to this sustained usage to assist the scheduler with accurate placement. Then, add 20–30% headroom above typical working set values for limits to safely absorb unexpected cache growth and transient traffic bursts.

Why is RSS usually a better indicator of memory leaks than a working set?

RSS isolates anonymous memory growth because it excludes the fluctuating file system cache, which can introduce noise into working set trends. Tracking long-term RSS growth across deployment versions and measuring it before and after garbage collection cycles makes it much easier to distinguish a true code-level memory leak from temporary allocation pressure.

What operational risks arise from relying only on container_memory_usage_bytes?

The raw usage metric includes reclaimable page cache, which can make a completely healthy container look dangerously over-utilized. Relying solely on this metric can lead to unnecessary cluster overprovisioning. For accurate capacity forecasting and efficiency audits, teams should compare raw usage, working set, and RSS together.

How does eBPF help explain why a working set spike occurred instead of simply showing that it happened?

While standard metrics merely signal that memory has spiked, eBPF-based telemetry tracks kernel-level activities to reveal the cause. By correlating infrastructure memory pressure directly with application request streams, eBPF helps teams determine if cache growth was driven by an expensive endpoint, background cron jobs, or intensive storage operations.

Performance

Memory Working Set vs RSS: Key Differences & When to Use Each

groundcover Team

June 10, 2026

min read

Performance

Key Takeaways

Kubernetes enforces memory limits using the working set, making it the key metric for OOM prevention.
The working set includes active file cache, while RSS mainly reflects application memory, making RSS better for detecting leaks.
High file I/O can increase the working set without increasing RSS, so a growing working set does not always indicate a memory leak.
Alert on working set to avoid OOMs, and use RSS trends to debug application memory behavior.

What Is Memory Working Set

The memory working set is the portion of a process's memory that is actively in use and cannot be reclaimed without disrupting application behavior. Think of it as the minimum physical memory a process needs to run, without causing page faults. In the Linux kernel, the working set maps to:

Anonymous memory (heap allocations, stack, shared memory)
Active page cache (file-backed pages that have been recently accessed and promoted to the active LRU list by the kernel)

It deliberately excludes inactive file-backed pages, cached data that hasn't been touched recently, and that the kernel is free to evict without impacting the application. This makes the working set a lean, application-centric view of memory pressure.

In Kubernetes, the metric you see is container_memory_working_set_bytes, reported by cAdvisor. Under the hood, it's calculated as:

working_set = container_memory_usage_bytes - inactive_file_cache

Or equivalently, from the cgroup memory.stat file:

# Read raw cgroup memory stats inside a container
cat /sys/fs/cgroup/memory/memory.stat

# Key fields:
# total_rss          → anonymous memory
# total_active_file  → active file cache
# total_inactive_file → reclaimable cache (excluded from working set)

Because inactive cache is excluded, the working set tends to be a tighter, more conservative estimate of what the application actually needs.

What Is RSS (Resident Set Size)

RSS, or Resident Set Size, refers to the portion of a process's memory that is held in physical RAM right now, not swapped, not mapped, but untouched. In the Linux context, the traditional definition from /proc/<pid>/status includes:

Anonymous memory (heap, stack)
File-mapped memory currently loaded into RAM

In the Kubernetes/cAdvisor context, container_memory_rss means something more specific:

The amount of anonymous memory and swap cache memory (including transparent hugepages), pulled directly from total_rss in the cgroup memory.stat file.

# From cAdvisor source:
ret.Memory.RSS = s.MemoryStats.Stats["total_rss"]

Importantly, this does not include file-backed page cache at all. So cAdvisor's container_memory_rss is actually narrower than the traditional Linux RSS definition. It covers purely the non-file-backed memory.

The practical result: RSS in Kubernetes is relatively stable. It grows when your application allocates heap memory and shrinks when it frees it. It doesn't fluctuate with file I/O caching the way the working set does.

Memory Working Set vs RSS: Core Differences Explained

Here's how the two metrics stack up across the dimensions that actually matter for day-to-day operations:

| Dimension | Memory Working Set | RSS (container_memory_rss) | | --------------------------- | ------------------------------------------------------------------------------------------------------------- | ---------------------------------- | | Definition | Anonymous memory + active file cache | Anonymous memory + swap cache only | | Includes file cache? | Yes (active portion only) | No | | Reclaimable pages included? | No (inactive file cache excluded) | No | | Kubernetes metric | container_memory_working_set_bytes | container_memory_rss | | OOM trigger | Yes, used to evaluate memory limits | No, not directly checked | | Volatility | Higher (fluctuates with [I/O patterns](https://www.admin-magazine.com/HPC/Articles/Tuning-I-O-Patterns-in-C)) | Lower (tracks heap allocations) | | Memory leak signal | Good, but noisy | Better signal-to-noise for leaks | | cAdvisor source field | usage - total_inactive_file | total_rss | | Best for | Capacity planning, OOM prevention | Application heap debugging |

The single most important difference: Memory limits in Kubernetes are enforced against the working set, not RSS. The kubelet uses container_memory_working_set_bytes as the reference when deciding whether a container has exceeded its memory limit.

How Reclaimable Cache Impacts Memory Working Set and RSS

Understanding the role of the page cache is what separates engineers who truly understand memory from those who just read dashboards. When your application reads files, config files, libraries, and database files, the kernel stores those pages in the page cache to speed up future reads. This cache is split into two tiers:

Active file cache: Pages accessed recently (at least twice). The kernel is reluctant to evict these.‍
Inactive file cache: Pages accessed once and not since. These are prime candidates for reclamation under memory pressure.

Here's how these tiers flow through the two metrics:

Total Memory Usage
├── Anonymous Memory (heap, stack, shared)     → in both RSS and working set
├── Active File Cache                          → in working set only
└── Inactive File Cache                        → in neither (reclaimable)

The practical implication: If your service starts reading a lot of data from disk, for instance, loading a large model file or scanning logs, the active file cache will spike. Your working set goes up. RSS doesn't budge. If the kernel later decides those pages are inactive, the working set comes back down automatically, without your application doing anything.

This is why a working set can appear to "self-heal" and why RSS is a more stable measure of what your application code is actually holding onto.

How Memory Working Set and RSS Behave in Kubernetes and cAdvisor Metrics

cAdvisor (Container Advisor) is the component that actually collects memory statistics from the Linux cgroup filesystem and exposes them to Prometheus. Both metrics originate from /sys/fs/cgroup/memory/memory.stat.

Here's a simplified view of what a real memory.stat file looks like inside a running container:

$ cat /sys/fs/cgroup/memory/memory.stat cache 32575488 rss 33964032 mapped_file 16625664 active_anon 33927168 inactive_anon 1757184 active_file 2433024 inactive_file 27709440 total_cache 32575488 total_rss 33964032 total_active_file 2433024 total_inactive_file 27709440

From these raw values, cAdvisor derives the Prometheus metrics:

container_memory_rss
  = total_rss
  = 33,964,032 bytes (~32 MB)

container_memory_usage_bytes
  = total_rss + total_cache + other overhead
  ≈ 33,964,032 + 32,575,488 = ~63 MB

container_memory_working_set_bytes
  = container_memory_usage_bytes - total_inactive_file
  ≈ 63 MB - 27,709,440 ≈ ~36 MB

Notice that the working set is meaningfully higher than RSS but lower than raw usage. It sits in the middle and gives you a picture of "memory that matters."

In your Prometheus queries, you'd typically monitor these like:

# Working set usage as a % of limit
container_memory_working_set_bytes{container!=""}
  / container_spec_memory_limit_bytes{container!=""}
  * 100

# RSS trend over time (useful for leak detection)
rate(container_memory_rss{namespace="production"}[5m])

Why Memory Working Set Drives OOM and Memory Pressure Decisions

This is the part that has operational teeth. Kubernetes uses container_memory_working_set_bytes as the reference memory figure when evaluating whether a container has hit its memory limit. When a container's working set exceeds the configured resources.limits.memory, the kernel's cgroup OOM killer steps in and terminates the process.

| Scenario | Working Set Behavior | RSS Behavior | OOM Risk | | ----------------------------------- | -------------------------------- | -------------------- | ------------------------------- | | Normal operation | Stable or gradual growth | Stable | Low | | Heavy file I/O (e.g., log scanning) | Spikes (active cache grows) | Unchanged | Medium, can trigger OOM falsely | | Memory leak (heap growth) | Grows steadily | Grows steadily | High, both metrics increase | | After GC (Java/.NET) | May drop sharply | May drop sharply | Drops after GC pause | | Cache eviction by kernel | Drops (inactive pages reclaimed) | Unchanged | Reduces pressure | | Near memory limit | Critical, triggers OOM | Not directly checked | Working set is the trigger |

This table reveals a subtle danger: High I/O workloads can inflate the working set even when the application itself isn't leaking memory. You can get OOMKilled not because your app is broken, but because the kernel pulled a lot of data into the active file cache, and your memory limit is set too tight.

The kubelet also uses a working set for node-level memory pressure eviction. When a node is under memory pressure, pods are ranked partly by their working set vs. request ratio, and pods with the highest excess are evicted first.

When to Use Memory Working Set vs RSS for Monitoring and Debugging

Neither metric is universally better; the right choice depends on what question you're trying to answer.

Use the memory working set when:

Setting or validating memory limits and requests for pods
Diagnosing OOMKill events (the working set is what the kernel compares against the limit)
Capacity planning across a node (node memory pressure uses the working set)
Building alerts for containers approaching their memory limit

Use RSS when:

Investigating a suspected memory leak in application code (RSS growth without a corresponding working set growth narrows the search to heap behavior)
Profiling applications with high file I/O, where the working set is noisy
Comparing heap behavior between application versions in a controlled benchmark
Debugging JVM-based apps where GC patterns affect the heap but not file cache

Practical rule of thumb: Alert on working set (it's what causes OOM), debug with RSS (it's what reveals application-level memory behavior).

Common Mistakes When Interpreting Memory Working Set and RSS

A few patterns come up again and again when engineers are troubleshooting memory issues:

Treating RSS as the OOM trigger. It isn't. Kubernetes enforces limits against the working set. You can have a low RSS and still get OOMKilled if your active file cache is large enough to push the working set over the limit.
Assuming a growing working set always means a memory leak. Not necessarily, it could be the active page cache growing due to increased file reads. Compare with RSS: if RSS is flat but the working set is climbing, look at I/O patterns first.
Conflating container_memory_rss with traditional Linux RSS. In cAdvisor, container_memory_rss excludes file-mapped memory. The traditional Linux RSS from /proc/[pid]/status includes it. These are not the same numbers.
Setting memory limits based on container_memory_usage_bytes. This metric includes inactive file cache, which can be reclaimed. Setting limits here leads to over-provisioning. Base limits on working set with headroom.
Ignoring total_inactive_file when analyzing raw cgroup stats. If you're reading memory.stat directly and summing values to estimate usage, forgetting to subtract inactive file pages will make the container look like it's using more memory than it really needs.

How to Monitor Working Set and RSS

The most common setup is Prometheus + cAdvisor + Grafana, which gives you both metrics out of the box.

Prometheus alerting rules for working set:

groups:
  - name: memory.rules
    rules:
      - alert: ContainerMemoryHighWorkingSet
        expr: |
          container_memory_working_set_bytes{container!=""}
          / container_spec_memory_limit_bytes{container!=""}
          > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} working set at {{ $value | humanizePercentage }} of limit"

      - alert: ContainerMemoryLeakSuspected
        expr: |
          rate(container_memory_rss{container!="", namespace="production"}[30m]) > 0
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "RSS for {{ $labels.container }} growing steadily -- possible leak"

PromQL for a quick working set vs RSS comparison:

# Side-by-side for a specific pod
{__name__=~"container_memory_working_set_bytes|container_memory_rss",
 pod="my-app-abc123", container="my-app"}

Best Practices for Interpreting Memory Metrics

| Practice | Rationale | | ----------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | Alert on working set, not raw usage | [Working set](https://community.ibm.com/community/user/blogs/leo-varghese/2024/06/04/kubernetes-memory-metrics) is what the kernel enforces against limits | | Set limits with ~20–30% headroom above working set baseline | Accounts for active cache spikes and burst traffic | | Use RSS trend (rate) to detect leaks | RSS growing steadily without spikes → allocation without release | | Check inactive_file when working set is high | Large inactive cache means pressure is reclaimable, not application-driven | | Use requests ≈ working set at P95 | Scheduler uses requests for bin-packing; under-setting causes evictions | | Don't rely on container_memory_usage_bytes for limits | Includes reclaimable cache; leads to over-provisioning | | Monitor per-namespace and per-node totals | Node-level eviction decisions affect all pods, not just the one that is over limit | | Correlate memory spikes with deployment events | New code, new config, or increased traffic are common triggers |

Deeper Memory Visibility Across Working Set and RSS with groundcover

Standard Prometheus + cAdvisor setups expose metrics like container_memory_working_set_bytes and container_memory_rss, but they leave an important observability gap: they show that memory usage increased, not why. Correlating a working set spike with a specific request path, workload behavior, or service dependency often requires jumping across multiple tools such as Grafana, Jaeger, Kubernetes events, and application logs.

groundcover is a Kubernetes-native observability platform that uses eBPF to collect telemetry directly from the Linux kernel with very low overhead. Its eBPF sensor provides infrastructure metrics, Kubernetes context, traces, logs, and network visibility without requiring sidecars, code changes, or heavy manual instrumentation. For memory specifically, this matters because:

Working set spikes are automatically correlated with traces. When container_memory_working_set_bytes climbs on a specific pod, you can immediately drill into what request traffic was happening at that moment, without manually joining data across Grafana and Jaeger.
Infrastructure monitoring provides node-level and pod-level memory visibility together, making it easier to determine whether memory pressure is isolated to a container or affecting node scheduling and cluster stability. groundcover's BYOC architecture keeps all telemetry inside your own cloud environment, which matters for teams with data residency or privacy requirements.
Distributed tracing and APM capabilities help connect sustained RSS growth to specific services, endpoints, or workload behaviors, which is valuable when investigating potential memory leaks.
Kubernetes monitoring surfaces OOMKill events, restart counts, and memory limit proximity in a unified workflow instead of requiring separate investigation across Prometheus, kubectl, and logging systems.

Conclusion

The memory working set vs RSS distinction isn't academic; it directly shapes how Kubernetes makes OOM and eviction decisions, and getting it wrong leads to either unnecessary OOMKills or missed memory leaks. The working set is what gets checked against your memory limits; RSS is the purer signal for what your application code is actually holding. Use both, understand what each excludes, and you'll spend a lot less time debugging memory incidents at 2 AM.

Back to Performance

Memory Working Set vs RSS: Key Differences & When to Use Each

Key Takeaways

What Is Memory Working Set

What Is RSS (Resident Set Size)

Memory Working Set vs RSS: Core Differences Explained

How Reclaimable Cache Impacts Memory Working Set and RSS

How Memory Working Set and RSS Behave in Kubernetes and cAdvisor Metrics

Why Memory Working Set Drives OOM and Memory Pressure Decisions

When to Use Memory Working Set vs RSS for Monitoring and Debugging

Common Mistakes When Interpreting Memory Working Set and RSS

How to Monitor Working Set and RSS

Best Practices for Interpreting Memory Metrics

Deeper Memory Visibility Across Working Set and RSS with groundcover

Conclusion

Read more from Performance

CPU Shares in Kubernetes: Performance, Scheduling & Best Practices

Kubernetes Memory Metrics: What to Track, and Best Practices

Container CPU Throttling: Causes, Impact & Optimization

Memory Allocation Failed: Causes, Fixes & Monitoring in Kubernetes

Cascading Failures: Causes, Prevention Strategies & Best Practices

Circuit Breaker Pattern: How It Works, Benefits & Best Practices

Deadlock Detection Explained: Algorithms & Best Practices

Kubernetes CPU Limits Explained: Configuration & Impact

Sign up for Updates

Observability
for what comes next.

Memory Working Set vs RSS: Key Differences & When to Use Each

Key Takeaways

What Is Memory Working Set

What Is RSS (Resident Set Size)

Memory Working Set vs RSS: Core Differences Explained

How Reclaimable Cache Impacts Memory Working Set and RSS

How Memory Working Set and RSS Behave in Kubernetes and cAdvisor Metrics

Why Memory Working Set Drives OOM and Memory Pressure Decisions

When to Use Memory Working Set vs RSS for Monitoring and Debugging

Common Mistakes When Interpreting Memory Working Set and RSS

How to Monitor Working Set and RSS

Best Practices for Interpreting Memory Metrics

Deeper Memory Visibility Across Working Set and RSS with groundcover

Conclusion

Read more from Performance

CPU Shares in Kubernetes: Performance, Scheduling & Best Practices

Kubernetes Memory Metrics: What to Track, and Best Practices

Container CPU Throttling: Causes, Impact & Optimization

Memory Allocation Failed: Causes, Fixes & Monitoring in Kubernetes

Cascading Failures: Causes, Prevention Strategies & Best Practices

Circuit Breaker Pattern: How It Works, Benefits & Best Practices

Deadlock Detection Explained: Algorithms & Best Practices

Kubernetes CPU Limits Explained: Configuration & Impact

Sign up for Updates

Observability for what comes next.

Get startedwith groundcover

See the platform in action

Book an on-demand demo with a customer engineer

100% visibility all the time.

Troubleshoot like a pro.

Reduce data & growth costs, dramatically.

Done!

Book a demo

Observability
for what comes next.

Get started
with groundcover