How should teams tune eviction thresholds without causing unnecessary pod churn?

Eviction thresholds should be tuned to create enough reclaim headroom for your image pull and log growth patterns, not merely set to generic defaults. Measure worst-case image pull size, peak log burst rate, and average reclaim time per node pool, then set evictionHard and evictionSoft so kubelet reacts before storage becomes operationally unrecoverable. Use different thresholds for heterogeneous pools, because GPU, CI, and data-processing nodes usually need more headroom than stateless web-serving nodes. Monitor inode pressure separately from capacity pressure, since small-file explosions from logs or temp artifacts can trip evictions even when df -h looks acceptable. Pair threshold tuning with image garbage collection policy and log rotation; otherwise kubelet will evict pods repeatedly without removing the underlying cause. Learn more about Kubernetes alerting

Why do some teams keep seeing disk pressure even after increasing node volume size?

Bigger disks reduce the time to failure, but they do not fix unbounded storage behaviors in workloads, runtimes, or logging pipelines. Check whether the issue is structural: oversized images, chatty applications, emptyDir misuse, long image retention, or frequent rollout churn will eventually refill any larger volume. Review whether storage is shared across nodefs and imagefs, because mixed-use layouts let image pulls and pod writes compete on the same underlying device. Enforce ownership by namespace or workload class with ephemeral-storage limits, so the cluster has policy guardrails instead of relying on node size as the control plane. Use trend analysis after remediation; if free space recovers briefly and then resumes the same slope, the real problem is workload behavior, not capacity. Learn more about Kubernetes cost optimization

Where does groundcover add value when disk pressure is really a log-growth problem?

groundcover adds value when disk pressure comes from log growth by helping teams connect storage exhaustion to the specific services, deployments, and behavioral changes generating the log flood. Correlate spikes in log volume with pod restarts, new releases, and node events so you can tell whether the root cause is bad application behavior or normal workload expansion. Use centralized visibility to keep logs off the node as the long-term system of record, reducing dependence on local disk for retention during incidents. Watch for services whose error loops produce both operational noise and infrastructure risk, because the same pattern often drives alert fatigue, disk pressure, and troubleshooting delays. Treat log-heavy workloads as cost and reliability risks together; the winning fix is usually better log discipline, sampling, and retention strategy rather than just larger nodes. Learn more about log aggregation

Kubernetes

Node Disk Pressure in Kubernetes: Causes, Detection, and Fixes

groundcover Team

May 6, 2026

min read

Kubernetes

Key Takeaways

Node disk pressure happens when a Kubernetes node runs low on disk space, triggering pod evictions and stopping new workloads from being scheduled, which can quickly destabilize a cluster.
It’s mainly caused by buildup from container images, logs, and ephemeral storage, especially in high-churn or data-heavy environments like CI pipelines and microservices.
Kubernetes detects this through kubelet thresholds on disk space and inodes, marking nodes with DiskPressure=True and automatically reclaiming space by evicting lower-priority pods first.
Diagnosing the issue requires combining Kubernetes signals (node conditions) with actual node-level inspection to find what’s consuming storage (e.g., logs, images, or volumes).
Preventing disk pressure depends on proactive controls like log rotation, image cleanup, storage limits, and monitoring—so issues are caught early before they impact reliability.

As Kubernetes scales to meet the demands of AI and distributed microservices, the volume of logs, container images, and ephemeral data has reached an all-time high. According to the 2025 CNCF Annual Survey, with 82% of organizations now running Kubernetes in production, effective storage management is no longer optional - it is a prerequisite for cluster stability. Node disk pressure is a critical signal that a node’s storage is exhausted, a condition that, if ignored, triggers aggressive pod evictions and halts new scheduling.

When a Kubernetes node runs low on available disk space, the kubelet marks the node with the Node disk pressure condition. This state shows that the node cannot safely schedule additional workloads because disk usage has exceeded safe thresholds. If left unresolved, disk pressure can lead to pod evictions, degraded application performance, and even cluster instability.

Understanding node disk pressure, how Kubernetes detects it, and how to resolve it matters a lot when you're trying to keep production clusters from falling over. This guide walks through the causes, how detection actually works, and what good prevention looks like in practice.

What Is Node Disk Pressure in Kubernetes

Node disk pressure is a node condition in Kubernetes indicating that the node is running out of disk space or ephemeral storage resources. Kubernetes monitors disk usage through the kubelet, which continuously evaluates disk consumption for container images, log files, and ephemeral storage. When disk usage exceeds configured thresholds, Kubernetes sets the node condition:

DiskPressure=True

What This Means in Practice

The node has insufficient free disk space
Kubernetes may evict pods to reclaim storage
The scheduler may stop placing new pods on the node
Cluster stability may degrade if disk pressure continues

Resources Contributing to Disk Pressure

Container image layers
Container log files
Ephemeral volumes
EmptyDir volumes
Image cache
Temporary files from workloads

Disk pressure is particularly common in clusters running high-volume logging, CI pipelines, data processing workloads, or microservices architectures.

How Kubernetes Detects and Signals Node Disk Pressure

Kubernetes detects disk pressure through kubelet eviction signals and resource thresholds. The kubelet monitors several filesystem metrics, including:

Node filesystem usage
Image filesystem usage
Available ephemeral storage
Disk inode availability

When these values fall below defined thresholds, the kubelet marks the node with

DiskPressure=True.

Disk Pressure Signals Monitored by kubelet

| Signal | Description | | ------------------ | ----------------------------------------- | | nodefs.available | Available disk space on node filesystem | | nodefs.inodesFree | Available inodes | | imagefs.available | Available disk space for container images | | imagefs.inodesFree | Free inodes in image filesystem |

Example: Viewing Node Conditions

kubectl describe node <node-name>

Example output:

Conditions:
Type             Status
DiskPressure     True
MemoryPressure   False
PIDPressure      False
Ready            True

When disk pressure occurs, Kubernetes activates eviction policies to reclaim space.

Symptoms and Impact of Node Disk Pressure on Cluster Workloads

Node disk pressure creates several operational issues in Kubernetes clusters.

Common Symptoms

Pods entering Evicted status
New pods failing to schedule
Containers restarting repeatedly
Increased application latency
Log ingestion failures
Node instability

Impact on Cluster Behavior

Pod Evictions: Kubernetes removes pods consuming ephemeral storage.
Scheduling Restrictions: The scheduler avoids nodes under disk pressure.
Reduced Reliability: Critical workloads may fail to start.
Observability Gap: Log pipelines may fail if log files consume disk space.

Without proactive monitoring, disk pressure can escalate quickly in large clusters.

Common Causes of Node Disk Pressure in Kubernetes Nodes

Disk pressure typically results from uncontrolled growth of container artifacts, logs, or ephemeral storage.

| Cause | Description | Typical Example | | --------------------------- | ------------------------------------------- | ------------------------------- | | Excessive Container Logs | Log files accumulate without rotation | Applications writing large logs | | Large Container Images | Image layers consume disk space | ML or analytics images | | Unused Images | Old images remain cached | CI/CD pipelines | | Ephemeral Storage Usage | Pods store temporary data | Spark jobs, ETL workloads | | Persistent Volume Misuse | Incorrect volume configuration | Data stored on node filesystem | | Crash Loops Generating Logs | Repeated restarts fill disk quickly | Faulty microservices | | High Container Churn | Frequent deployments increase image storage | Dev/test environments |

Understanding these root causes helps teams design effective remediation strategies.

How Node Disk Pressure Triggers Pod Evictions and Scheduling Failures

When disk pressure occurs, Kubernetes activates the eviction manager. The eviction manager attempts to reclaim disk space by removing pods based on QoS class and resource usage.

Pod Eviction Priority

Pods are evicted in the following order:

BestEffort pods
Burstable pods
Guaranteed pods (last)

Pods consuming high ephemeral storage are evicted first.

Scheduler Behavior

When DiskPressure=True:

The scheduler avoids the node
New pods are scheduled on other nodes
If no nodes are available, pods remain Pending

Example event message:

Warning  Evicted
The node was low on resource: ephemeral-storage

How to Diagnose Node Disk Pressure in Kubernetes Clusters

Diagnosing node disk pressure in Kubernetes means looking at two things together: the node conditions Kubernetes is reporting and what's actually consuming storage on the affected nodes themselves. Disk pressure happens when a node burns through its available disk space or ephemeral storage, so the job is figuring out which nodes are in trouble, how their storage is actually being used, and which workloads or leftover artifacts are the ones eating up all the space.

In practice, you work through this in a pretty logical order: you start by checking what Kubernetes thinks is wrong at the node condition level, then you get into the node's filesystem directly to see the real usage numbers, and finally you narrow it down to the specific culprits whether that's bloated container images, runaway log files, or volumes that have grown way beyond what anyone expected.

Checking Node Disk Pressure Status with Kubectl

The most direct way to detect node disk pressure is by checking node conditions using kubectl. Kubernetes nodes report conditions like DiskPressure, MemoryPressure, and PIDPressure, which are updated by the kubelet. When disk usage exceeds configured thresholds, the kubelet sets DiskPressure=True, indicating that the node is running low on available disk space.

Use the following command:

kubectl get nodes

Output example:

NAME        STATUS
node-1      Ready
node-2      Ready,SchedulingDisabled

Detailed inspection:

kubectl describe node <node-name>

Look for:

DiskPressure=True

You can also inspect node conditions via JSON:

kubectl get node <node-name> -o jsonpath='{.status.conditions}'

Inspecting Disk Usage on Affected Nodes

After identifying a node under disk pressure, the next step is to inspect the node’s disk usage to determine where space is being consumed. Kubernetes workloads store container images, logs, and ephemeral data on the node filesystem, so reviewing filesystem utilization helps identify which directories or partitions are nearing capacity and contributing to the reduced free disk space.

SSH into the affected node and check filesystem usage.

df -h

Example output:

Filesystem      Size  Used Avail Use%
/dev/xvda1       80G   78G   2G   97%

Check inode usage:

df -i

High inode usage can also trigger disk pressure.

Identifying Images, Logs, and Volumes Causing Node Disk Pressure

The next step is identifying the workloads or artifacts causing the disk pressure. Common contributors include accumulated container images, large or unrotated log files, and excessive ephemeral storage used by pods. Workloads using emptyDir volumes or generating large temporary files can also consume significant disk space, making it important to pinpoint the exact source of the storage usage.

Identify Large Container Images

crictl images

docker images

Check Container Logs

/var/log/containers

/var/log/pods

Check kubelet Directories

/var/lib/kubelet

Inspect Ephemeral Storage Usage

kubectl describe pod <pod-name>

Look for:

Ephemeral-storage usage

How to Fix Node Disk Pressure in Kubernetes Environments

Fixing node disk pressure in Kubernetes environments involves freeing up disk space on affected nodes and addressing the sources of high disk usage. This can include removing unused container images, cleaning up log files, deleting temporary data from ephemeral storage, or expanding the node’s disk capacity to restore normal cluster operations.

Immediate Remediation Steps

Remove unused container images

crictl rmi --prune

Clean container logs

sudo truncate -s 0 /var/log/containers/*.log

Delete unused pods

kubectl delete pod <pod-name>

Restart kubelet if required

systemctl restart kubelet

Expand the node disk volume if the infrastructure allows.

Cluster-Level Solutions

Increase node storage
Use centralized log aggregation
Reduce image size
Implement automatic image garbage collection

How to Prevent Node Disk Pressure with Kubelet Configuration and Resource Limits

Preventing disk pressure requires proper kubelet configuration and resource limits.

Example kubelet configuration:

evictionHard:
  nodefs.available: "10%"
  nodefs.inodesFree: "5%"
  imagefs.available: "15%"

This instructs kubelet to start evicting pods before disk space becomes critically low.

Configure Ephemeral Storage Limits

Example pod configuration:

resources:
  limits:
    ephemeral-storage: "2Gi"
  requests:
    ephemeral-storage: "1Gi"

This prevents individual pods from consuming excessive disk space.

Configure Log Rotation

--container-log-max-size=10Mi
--container-log-max-files=5

These settings prevent uncontrolled log file growth.

Best Practices for Avoiding Node Disk Pressure in Production Clusters

Maintaining stable storage availability on Kubernetes nodes requires proactive operational practices and well-defined resource management policies. Applying consistent storage governance across workloads helps ensure nodes remain healthy and prevents disruptions caused by disk resource exhaustion.

| Best Practice | Description | Benefit | | ------------------------------- | ------------------------------ | -------------------------- | | Enable Log Rotation | Prevent log file growth | Reduces disk consumption | | Monitor Ephemeral Storage | Track pod storage usage | Early detection | | Use Smaller Container Images | Optimize Docker builds | Reduces image cache | | Enable Image Garbage Collection | Remove unused images | Frees disk space | | Centralize Logs | Use log aggregation systems | Avoid local disk buildup | | Monitor Disk Usage | Use observability platforms | Detect disk pressure early | | Enforce Storage Limits | Apply ephemeral storage quotas | Prevent runaway pods |

Following these practices significantly reduces disk pressure incidents.

Real-Time Visibility into Node Disk Pressure with groundcover

Modern Kubernetes observability platforms help teams detect and troubleshoot node disk pressure before it impacts application performance or cluster stability. groundcover provides eBPF-based Kubernetes observability, combining kernel-level telemetry with Kubernetes metrics to give teams real-time visibility into node resources, container behavior, and storage usage across the cluster.

Real-Time Disk Usage Monitoring Across Nodes and Containers: Platform teams can monitor node resource utilization and storage trends to identify disk pressure risks early using groundcover’s Kubernetes observability platform.‍
Automatic Detection of Resource Anomalies Affecting Cluster Health: groundcover provides real-time insights and alerts that help teams detect abnormal storage usage patterns and infrastructure issues before they escalate.‍
Deep Container Insights without Requiring Intrusive Instrumentation: Using eBPF-based telemetry, groundcover collects infrastructure and application signals directly from the kernel without requiring code instrumentation or sidecars.‍
Faster Troubleshooting of Storage Bottlenecks: By correlating Kubernetes events, container metrics, and infrastructure telemetry, teams can quickly identify workloads responsible for abnormal disk consumption.

Because groundcover uses eBPF-based telemetry, it captures low-level system activity while maintaining minimal overhead on production clusters. This enables DevOps and platform teams to quickly determine:

which pods are consuming excessive disk space
which nodes are approaching disk pressure thresholds
which workloads generate large volumes of logs or ephemeral storage

With this level of real-time visibility, teams can detect and resolve disk pressure issues earlier, improving the reliability and stability of Kubernetes workloads.

Conclusion

Node disk pressure is one of the most common operational issues affecting Kubernetes clusters. When nodes run low on disk space, Kubernetes triggers eviction policies and restricts scheduling to protect cluster stability. Understanding the causes, detection signals, and remediation strategies is essential for maintaining healthy Kubernetes environments.

By implementing proper kubelet configurations, ephemeral storage limits, log rotation policies, and observability tools, teams can proactively manage disk resources and prevent node disk pressure from disrupting workloads. Modern observability platforms such as groundcover provide the real-time insights required to detect disk pressure early and maintain reliable Kubernetes operations.

Back to Kubernetes