Table of Content
x min
April 13, 2026

Node Disk Pressure in Kubernetes: Causes, Detection, and Fixes

groundcover Team
April 13, 2026

Key Takeaways

  • Node disk pressure happens when a Kubernetes node runs low on disk space, triggering pod evictions and stopping new workloads from being scheduled, which can quickly destabilize a cluster.
  • It’s mainly caused by buildup from container images, logs, and ephemeral storage, especially in high-churn or data-heavy environments like CI pipelines and microservices.
  • Kubernetes detects this through kubelet thresholds on disk space and inodes, marking nodes with DiskPressure=True and automatically reclaiming space by evicting lower-priority pods first.
  • Diagnosing the issue requires combining Kubernetes signals (node conditions) with actual node-level inspection to find what’s consuming storage (e.g., logs, images, or volumes).
  • Preventing disk pressure depends on proactive controls like log rotation, image cleanup, storage limits, and monitoring—so issues are caught early before they impact reliability.

As Kubernetes scales to meet the demands of AI and distributed microservices, the volume of logs, container images, and ephemeral data has reached an all-time high. According to the 2025 CNCF Annual Survey, with 82% of organizations now running Kubernetes in production, effective storage management is no longer optional - it is a prerequisite for cluster stability. Node disk pressure is a critical signal that a node’s storage is exhausted, a condition that, if ignored, triggers aggressive pod evictions and halts new scheduling.

When a Kubernetes node runs low on available disk space, the kubelet marks the node with the Node disk pressure condition. This state shows that the node cannot safely schedule additional workloads because disk usage has exceeded safe thresholds. If left unresolved, disk pressure can lead to pod evictions, degraded application performance, and even cluster instability.

Understanding node disk pressure, how Kubernetes detects it, and how to resolve it matters a lot when you're trying to keep production clusters from falling over. This guide walks through the causes, how detection actually works, and what good prevention looks like in practice.

What Is Node Disk Pressure in Kubernetes

Node disk pressure is a node condition in Kubernetes indicating that the node is running out of disk space or ephemeral storage resources. Kubernetes monitors disk usage through the kubelet, which continuously evaluates disk consumption for container images, log files, and ephemeral storage. When disk usage exceeds configured thresholds, Kubernetes sets the node condition:

DiskPressure=True

What This Means in Practice

  • The node has insufficient free disk space
  • Kubernetes may evict pods to reclaim storage
  • The scheduler may stop placing new pods on the node
  • Cluster stability may degrade if disk pressure continues

Resources Contributing to Disk Pressure

  • Container image layers
  • Container log files
  • Ephemeral volumes
  • EmptyDir volumes
  • Image cache
  • Temporary files from workloads

Disk pressure is particularly common in clusters running high-volume logging, CI pipelines, data processing workloads, or microservices architectures.

How Kubernetes Detects and Signals Node Disk Pressure

Kubernetes detects disk pressure through kubelet eviction signals and resource thresholds. The kubelet monitors several filesystem metrics, including:

  • Node filesystem usage
  • Image filesystem usage
  • Available ephemeral storage
  • Disk inode availability

When these values fall below defined thresholds, the kubelet marks the node with

DiskPressure=True.

Disk Pressure Signals Monitored by kubelet

| Signal | Description | | ------------------ | ----------------------------------------- | | nodefs.available | Available disk space on node filesystem | | nodefs.inodesFree | Available inodes | | imagefs.available | Available disk space for container images | | imagefs.inodesFree | Free inodes in image filesystem |

Example: Viewing Node Conditions

kubectl describe node <node-name>

Example output:

Conditions:
Type             Status
DiskPressure     True
MemoryPressure   False
PIDPressure      False
Ready            True

When disk pressure occurs, Kubernetes activates eviction policies to reclaim space.

Symptoms and Impact of Node Disk Pressure on Cluster Workloads

Node disk pressure creates several operational issues in Kubernetes clusters.

Common Symptoms

  • Pods entering Evicted status
  • New pods failing to schedule
  • Containers restarting repeatedly
  • Increased application latency
  • Log ingestion failures
  • Node instability

Impact on Cluster Behavior

  1. Pod Evictions: Kubernetes removes pods consuming ephemeral storage.
  2. Scheduling Restrictions: The scheduler avoids nodes under disk pressure.
  3. Reduced Reliability: Critical workloads may fail to start.
  4. Observability Gap: Log pipelines may fail if log files consume disk space.

Without proactive monitoring, disk pressure can escalate quickly in large clusters.

Common Causes of Node Disk Pressure in Kubernetes Nodes

Disk pressure typically results from uncontrolled growth of container artifacts, logs, or ephemeral storage.

| Cause | Description | Typical Example | | --------------------------- | ------------------------------------------- | ------------------------------- | | Excessive Container Logs | Log files accumulate without rotation | Applications writing large logs | | Large Container Images | Image layers consume disk space | ML or analytics images | | Unused Images | Old images remain cached | CI/CD pipelines | | Ephemeral Storage Usage | Pods store temporary data | Spark jobs, ETL workloads | | Persistent Volume Misuse | Incorrect volume configuration | Data stored on node filesystem | | Crash Loops Generating Logs | Repeated restarts fill disk quickly | Faulty microservices | | High Container Churn | Frequent deployments increase image storage | Dev/test environments |

Understanding these root causes helps teams design effective remediation strategies.

How Node Disk Pressure Triggers Pod Evictions and Scheduling Failures

When disk pressure occurs, Kubernetes activates the eviction manager. The eviction manager attempts to reclaim disk space by removing pods based on QoS class and resource usage.

Pod Eviction Priority

Pods are evicted in the following order:

  1. BestEffort pods
  2. Burstable pods
  3. Guaranteed pods (last)

Pods consuming high ephemeral storage are evicted first.

Scheduler Behavior

When DiskPressure=True:

  • The scheduler avoids the node
  • New pods are scheduled on other nodes
  • If no nodes are available, pods remain Pending

Example event message:

Warning  Evicted
The node was low on resource: ephemeral-storage

How to Diagnose Node Disk Pressure in Kubernetes Clusters

Diagnosing node disk pressure in Kubernetes means looking at two things together: the node conditions Kubernetes is reporting and what's actually consuming storage on the affected nodes themselves. Disk pressure happens when a node burns through its available disk space or ephemeral storage, so the job is figuring out which nodes are in trouble, how their storage is actually being used, and which workloads or leftover artifacts are the ones eating up all the space.

In practice, you work through this in a pretty logical order: you start by checking what Kubernetes thinks is wrong at the node condition level, then you get into the node's filesystem directly to see the real usage numbers, and finally you narrow it down to the specific culprits whether that's bloated container images, runaway log files, or volumes that have grown way beyond what anyone expected.

Checking Node Disk Pressure Status with Kubectl

The most direct way to detect node disk pressure is by checking node conditions using kubectl. Kubernetes nodes report conditions like DiskPressure, MemoryPressure, and PIDPressure, which are updated by the kubelet. When disk usage exceeds configured thresholds, the kubelet sets DiskPressure=True, indicating that the node is running low on available disk space.

Use the following command:

kubectl get nodes

Output example:

NAME        STATUS
node-1      Ready
node-2      Ready,SchedulingDisabled

Detailed inspection:

kubectl describe node <node-name>

Look for:

DiskPressure=True

You can also inspect node conditions via JSON:

kubectl get node <node-name> -o jsonpath='{.status.conditions}'

Inspecting Disk Usage on Affected Nodes

After identifying a node under disk pressure, the next step is to inspect the node’s disk usage to determine where space is being consumed. Kubernetes workloads store container images, logs, and ephemeral data on the node filesystem, so reviewing filesystem utilization helps identify which directories or partitions are nearing capacity and contributing to the reduced free disk space.

SSH into the affected node and check filesystem usage.

df -h

Example output:

Filesystem      Size  Used Avail Use%
/dev/xvda1       80G   78G   2G   97%

Check inode usage:

df -i

High inode usage can also trigger disk pressure.

Identifying Images, Logs, and Volumes Causing Node Disk Pressure

The next step is identifying the workloads or artifacts causing the disk pressure. Common contributors include accumulated container images, large or unrotated log files, and excessive ephemeral storage used by pods. Workloads using emptyDir volumes or generating large temporary files can also consume significant disk space, making it important to pinpoint the exact source of the storage usage.

Identify Large Container Images

crictl images

or

docker images

Check Container Logs

/var/log/containers

or

/var/log/pods

Check kubelet Directories

/var/lib/kubelet

Inspect Ephemeral Storage Usage

kubectl describe pod <pod-name>

Look for:

Ephemeral-storage usage

How to Fix Node Disk Pressure in Kubernetes Environments

Fixing node disk pressure in Kubernetes environments involves freeing up disk space on affected nodes and addressing the sources of high disk usage. This can include removing unused container images, cleaning up log files, deleting temporary data from ephemeral storage, or expanding the node’s disk capacity to restore normal cluster operations.

Immediate Remediation Steps

  1. Remove unused container images
crictl rmi --prune
  1. Clean container logs
sudo truncate -s 0 /var/log/containers/*.log
  1. Delete unused pods
kubectl delete pod <pod-name>
  1. Restart kubelet if required
systemctl restart kubelet
  1. Expand the node disk volume if the infrastructure allows.

Cluster-Level Solutions

  • Increase node storage
  • Use centralized log aggregation
  • Reduce image size
  • Implement automatic image garbage collection

How to Prevent Node Disk Pressure with Kubelet Configuration and Resource Limits

Preventing disk pressure requires proper kubelet configuration and resource limits.

Example kubelet configuration:

evictionHard:
  nodefs.available: "10%"
  nodefs.inodesFree: "5%"
  imagefs.available: "15%"

This instructs kubelet to start evicting pods before disk space becomes critically low.

Configure Ephemeral Storage Limits

Example pod configuration:

resources:
  limits:
    ephemeral-storage: "2Gi"
  requests:
    ephemeral-storage: "1Gi"

This prevents individual pods from consuming excessive disk space.

Configure Log Rotation

--container-log-max-size=10Mi
--container-log-max-files=5

These settings prevent uncontrolled log file growth.

Best Practices for Avoiding Node Disk Pressure in Production Clusters

Maintaining stable storage availability on Kubernetes nodes requires proactive operational practices and well-defined resource management policies. Applying consistent storage governance across workloads helps ensure nodes remain healthy and prevents disruptions caused by disk resource exhaustion.

| Best Practice | Description | Benefit | | ------------------------------- | ------------------------------ | -------------------------- | | Enable Log Rotation | Prevent log file growth | Reduces disk consumption | | Monitor Ephemeral Storage | Track pod storage usage | Early detection | | Use Smaller Container Images | Optimize Docker builds | Reduces image cache | | Enable Image Garbage Collection | Remove unused images | Frees disk space | | Centralize Logs | Use log aggregation systems | Avoid local disk buildup | | Monitor Disk Usage | Use observability platforms | Detect disk pressure early | | Enforce Storage Limits | Apply ephemeral storage quotas | Prevent runaway pods |

Following these practices significantly reduces disk pressure incidents.

Real-Time Visibility into Node Disk Pressure with groundcover

Modern Kubernetes observability platforms help teams detect and troubleshoot node disk pressure before it impacts application performance or cluster stability. groundcover provides eBPF-based Kubernetes observability, combining kernel-level telemetry with Kubernetes metrics to give teams real-time visibility into node resources, container behavior, and storage usage across the cluster.

  • Real-Time Disk Usage Monitoring Across Nodes and Containers: Platform teams can monitor node resource utilization and storage trends to identify disk pressure risks early using groundcover’s Kubernetes observability platform.
  • Automatic Detection of Resource Anomalies Affecting Cluster Health: groundcover provides real-time insights and alerts that help teams detect abnormal storage usage patterns and infrastructure issues before they escalate.
  • Deep Container Insights without Requiring Intrusive Instrumentation: Using eBPF-based telemetry, groundcover collects infrastructure and application signals directly from the kernel without requiring code instrumentation or sidecars.
  • Faster Troubleshooting of Storage Bottlenecks: By correlating Kubernetes events, container metrics, and infrastructure telemetry, teams can quickly identify workloads responsible for abnormal disk consumption.

Because groundcover uses eBPF-based telemetry, it captures low-level system activity while maintaining minimal overhead on production clusters. This enables DevOps and platform teams to quickly determine:

  • which pods are consuming excessive disk space
  • which nodes are approaching disk pressure thresholds
  • which workloads generate large volumes of logs or ephemeral storage

With this level of real-time visibility, teams can detect and resolve disk pressure issues earlier, improving the reliability and stability of Kubernetes workloads.

Conclusion

Node disk pressure is one of the most common operational issues affecting Kubernetes clusters. When nodes run low on disk space, Kubernetes triggers eviction policies and restricts scheduling to protect cluster stability. Understanding the causes, detection signals, and remediation strategies is essential for maintaining healthy Kubernetes environments.

By implementing proper kubelet configurations, ephemeral storage limits, log rotation policies, and observability tools, teams can proactively manage disk resources and prevent node disk pressure from disrupting workloads. Modern observability platforms such as groundcover provide the real-time insights required to detect disk pressure early and maintain reliable Kubernetes operations.

FAQs

Eviction thresholds should be tuned to create enough reclaim headroom for your image pull and log growth patterns, not merely set to generic defaults.

  • Measure worst-case image pull size, peak log burst rate, and average reclaim time per node pool, then set evictionHard and evictionSoft so kubelet reacts before storage becomes operationally unrecoverable.
  • Use different thresholds for heterogeneous pools, because GPU, CI, and data-processing nodes usually need more headroom than stateless web-serving nodes.
  • Monitor inode pressure separately from capacity pressure, since small-file explosions from logs or temp artifacts can trip evictions even when df -h looks acceptable.
  • Pair threshold tuning with image garbage collection policy and log rotation; otherwise kubelet will evict pods repeatedly without removing the underlying cause.

Learn more about Kubernetes alerting

Bigger disks reduce the time to failure, but they do not fix unbounded storage behaviors in workloads, runtimes, or logging pipelines.

  • Check whether the issue is structural: oversized images, chatty applications, emptyDir misuse, long image retention, or frequent rollout churn will eventually refill any larger volume.
  • Review whether storage is shared across nodefs and imagefs, because mixed-use layouts let image pulls and pod writes compete on the same underlying device.
  • Enforce ownership by namespace or workload class with ephemeral-storage limits, so the cluster has policy guardrails instead of relying on node size as the control plane.
  • Use trend analysis after remediation; if free space recovers briefly and then resumes the same slope, the real problem is workload behavior, not capacity.

Learn more about Kubernetes cost optimization

groundcover adds value when disk pressure comes from log growth by helping teams connect storage exhaustion to the specific services, deployments, and behavioral changes generating the log flood.

  • Correlate spikes in log volume with pod restarts, new releases, and node events so you can tell whether the root cause is bad application behavior or normal workload expansion.
  • Use centralized visibility to keep logs off the node as the long-term system of record, reducing dependence on local disk for retention during incidents.
  • Watch for services whose error loops produce both operational noise and infrastructure risk, because the same pattern often drives alert fatigue, disk pressure, and troubleshooting delays.
  • Treat log-heavy workloads as cost and reliability risks together; the winning fix is usually better log discipline, sampling, and retention strategy rather than just larger nodes.

Learn more about log aggregation

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

Trusted by teams who demand more

Real teams, real workloads, real results with groundcover.

“We cut our costs in half and now have full coverage in prod, dev, and testing environments where we previously had to limit it due to cost concerns.”

Sushant Gulati

Sr Engineering Mgr, BigBasket

“Observability used to be scattered and unreliable. With groundcover, we finally have one consolidated, no-touch solution we can rely on.“

ShemTov Fisher

DevOps team lead
Solidus Labs

“We went from limited visibility to a full-cluster view in no time. groundcover’s eBPF tracing gave us deep Kubernetes insights with zero months spent on instrumentation.”

Kristian Lee

Global DevOps Lead, Tracr

“The POC took only a day and suddenly we had trace-level insight. groundcover was the snappiest, easiest observability platform we’ve touched.”

Adam Ceresia

Software Engineering Mgr, Posh

“All vendors charge on data ingest, some even on users, which doesn’t fit a growing company. One of the first things that we liked about groundcover is the fact that pricing is based on nodes, not data volumes, not number of users. That seemed like a perfect fit for our rapid growth”

Elihai Blomberg,

DevOps Team Lead, Riskified

“We got a bill from Datadog that was more then double the cost of the entire EC2 instance”

Said Sinai Rijcov,

DevOps Engineer at EX.CO.

“We ditched Datadog’s integration overhead and embraced groundcover’s eBPF approach. Now we get full-stack Kubernetes visibility, auto-enriched logs, and reliable alerts across clusters with zero code changes.”

Eli Yaacov

Prod Eng Team Lead, Similarweb

Make observability yours

Stop renting visibility. With groundcover, you get full fidelity, flat cost, and total control — all inside your cloud.