Kubernetes Cluster Autoscaler: Challenges & Best Practices
Often, the root cause of many performance problems in Kubernetes is simple enough: There aren’t enough nodes to support all workloads. The solution is also simple enough: Adding more nodes to the cluster will resolve the performance issues. The problem, though, is that determining when to add nodes, then provisioning new nodes and joining them to the cluster, is time-consuming and tedious if admins perform these tasks manually.
But thanks to the Cluster Autoscaler feature, you don’t have to scale manually. You can instead let Kubernetes automatically expand the size of your cluster, a faster and smoother way to address performance issues.
Read on for details as we explain how cluster autoscaling works in Kubernetes, how to enable it, and which best practices to follow to maximize the value of the Cluster Autoscaler.
What is a Kubernetes Cluster Autoscaler?
.png)
In Kubernetes, a Cluster Autoscaler is a tool that can automatically add or remove nodes from a cluster (nodes are the servers that host workloads within a Kubernetes cluster). It achieves this by monitoring the load placed on existing nodes, determining when it is necessary to scale the node count up or down, and then implementing the required changes.
Cluster Autoscaler vs. HPA and VPA
The Cluster Autoscaler shouldn’t be confused with the Horizontal Pod Autoscaler, or HPA. The latter is a feature that enables Kubernetes to automatically add or remove Pod replicas based on the resource utilization of individual Pods. The HPA attempts to ensure that there are enough replicas available to handle the requests a Pod is receiving, while avoiding extraneous replicas that waste resources.
In contrast, the Cluster Autoscaler adds or removes nodes from the cluster to ensure that the cluster as a whole (not individual Pods) has adequate resources available.
The Cluster Autoscaler is also distinct from the Vertical Pod Autoscaler, or VPA. This is an add-on component that you can install in Kubernetes to adjust requests and limits for containers automatically based on actual resource utilization. However, unlike the Cluster Autoscaler, the VPA only deals with scaling of individual workloads.
How the Cluster Autoscaler works
The Cluster Autoscaler in Kubernetes works based on the following process:
- Monitoring of pending Pods: The Autoscaler watches for Pods that are in the pending state. This means that the Pods should be running, but they haven’t been scheduled (i.e., placed on a node) yet because no node is available to host them. Typically, this happens because there aren’t enough spare CPU or memory resources available on existing nodes to support the additional Pods.
- Adding nodes to the cluster: After performing some tests to confirm that adding nodes would allow the pending Pods to schedule successfully, the Autoscaler adds one or more nodes to the cluster. The new nodes are typically part of a node group, meaning a pool of nodes that have an identical configuration. This approach allows the Autoscaler to add nodes that are similar (in terms of CPU and memory allocation) to existing nodes.
- Monitoring of node resource utilization: In addition to tracking pending Pods, the Cluster Autoscaler also monitors the resource utilization of existing nodes. If it detects nodes whose CPU or memory resource utilization is below an admin-defined threshold (such as 50 percent), it will spin nodes down, causing Pods hosted on those nodes to migrate to other nodes. Having fewer nodes helps to save resources (and, by extension, money spent on hosting costs) by avoiding the operation of more nodes than necessary.
To enable the process of actually adding or removing nodes from a cluster, the Cluster Autoscaler integrates with an external infrastructure service, such as EC2 in AWS or Azure Virtual Machines. It requests new virtual server instances from these services to scale a cluster up, then uses the same services to shut down instances that are no longer necessary.
Autoscaling is easiest to implement when you host Kubernetes on top of a cloud platform that provides on-demand access to virtual servers. However, it’s possible to implement autoscaling in self-managed or on-prem environments as well, using tools like Karpenter (an open source, platform-agnostic tool that can request nodes from a variety of infrastructure providers) or by integrating with local virtual infrastructure managed via platforms like VMware.
Benefits of using a Cluster Autoscaler
The Cluster Autoscaler provides two key benefits:
- It reduces the risk that your workloads will fail or experience performance degradations due to a lack of sufficient CPU or memory resources in your cluster.
- It helps to save money by shutting down under-utilized nodes.
You could, of course, achieve both of these goals manually by tracking node resource usage, then adding or removing nodes from your cluster as needed. But by automating the process, the Cluster Autoscaler makes it easy to obtain a healthy node count with minimal effort on the part of admins.
Cluster Autoscaler use cases and examples
The Cluster Autoscaler can be helpful in a variety of contexts, but it’s especially useful for use cases like the following:
- Workloads with highly fluctuating demand: Applications that experience significant changes in demand (such as a website that receives more traffic during certain days of the week than others) can benefit from autoscaling as a way to help ensure that there are adequate nodes to support upticks in requests.
- One-off workload deployments: When deploying a workload that will run temporarily (like a training process for an AI model) and consume significant resources during that time, autoscaling helps ensure that sufficient node resources are available. It also spins down the nodes when the workload finishes.
- Cost optimization: Autoscaling plays an important role in reducing waste and optimizing costs, especially for organizations that deploy Kubernetes clusters on Infrastructure-as-a-Service (IaaS) platforms that bill for the total time that servers are operational, regardless of how much load is actually placed on the servers.
Key configuration parameters for Cluster Autoscaler
To control Cluster Autoscaler behavior, admins can configure a variety of parameters:
- Scan-interval: Controls how frequently the Autoscaler checks the status of pending Pods and assesses whether to add or remove nodes.
- Min-nodes and max-nodes: Sets the minimum and maximum nodes allowed in a node group.
- Max-graceful-termination-sec: The time in seconds that the Autoscaler waits to allow Pods to shut down gracefully before it forcefully terminates them as part of a downscaling process.
- Cores-total-max: The total number of CPUs allowed in the cluster. The Autoscaler will avoid adding nodes if doing so would push the total core count past this limit. This can help to avoid billing surprises caused by the Autoscaler adding nodes beyond the present limit without an admin’s knowledge.
- Memory-total-max: The total memory allowed in the cluster.
- Max-nodes-total: The total nodes allowed in the cluster.
The process for modifying Cluster Autoscaler configuration parameters varies depending on which Kubernetes distribution or service you use. Usually, you have to use a CLI command to define a configuration value.
For example, if you use Azure Kubernetes Service to run your cluster, you can use the following command to modify the scan-interval parameter:
Common challenges in managing Cluster Autoscaler
While the Autoscaler in Kubernetes is a helpful feature, it can also present some challenges:
- Provisioning delays: It often takes several minutes to provision a new node and add it to a cluster via autoscaling. Thus, autoscaling doesn’t instantly resolve situations where workloads are under-performing or have failed due to a lack of sufficient nodes.
- Dependence on resource requests: To determine how many resources a new Pod will require, the Autoscaler analyzes the Pod’s resource requests and limits. These may not be the same as the amount of resources that the Pod actually needs. As a result, the nodes that the Autoscaler adds may not always be sufficient to schedule a new Pod and allow it to operate successfully.
- Variation in behavior across Kubernetes distributions: The Cluster Autoscaler depends on a backend infrastructure service to provide nodes, and backend services vary in terms of how they provision nodes and which server instances they offer. As a result, Cluster Autoscaler behavior can vary depending on which Kubernetes distribution and/or infrastructure platform you use.
- Adding nodes doesn’t always solve Pod pending problems: While lack of available nodes can be one reason why Pods are stuck in the pending state, there are other potential causes (such as issues with a Kubernetes Secret configuration or node affinity rules). Autoscaling won’t solve these problems – and hence, it won’t always magically fix Pod scheduling issues.
- Not a substitute for effective resource management: Adding nodes can mitigate performance issues that stem from a lack of sufficient resources. But in some cases, it’s a band-aid approach that doesn’t address underlying problems (like a container with a memory leak). In this sense, the Cluster Autoscaler isn’t always the ideal way to solve performance problems; a better approach is to identify and address the root cause of performance problems and to manage resource utilization effectively.
Best practices for Cluster Autoscaler optimization
To get the most value out of the Kubernetes Cluster Autoscaler while minimizing the risk of unexpected outcomes, consider the following best practices:
- Set accurate requests and limits: Since the Autoscaler uses requests and limits to make decisions about scaling, strive to set requests and limits that accurately reflect the actual resource needs of your Pods.
- Configure multiple node groups: Making multiple node groups available gives the Autoscaler more options to choose from when deciding which types of nodes to add. Ideally, you’ll create different node groups with varying levels of resource allocations to support diverse workload needs.
- Monitor resource utilization: To prevent scenarios where autoscaling doesn’t resolve performance issues, or where the Autoscaler adds more nodes than necessary, monitor resource utilization continuously across all layers of your cluster – nodes, Pods, and containers.
- Set CPU core, memory, and node maximums: As noted above, you can configure maximum core counts, memory, and node counts as a way of preventing the Autoscaler from scaling beyond a certain limit. Doing so is a good idea for keeping total resource utilization in check. You can always increase the limits if you deem, based on monitoring, that you need more nodes.
Monitoring and troubleshooting Cluster Autoscaler performance
The Cluster Autoscaler doesn’t report data about its operations, so there isn’t a whole lot you can do to monitor it. However, a quick way to check on the status of the Autoscaler is to view its ConfigMap using the following command:
You can also monitor for events within the cluster-autoscaler namespace, which hosts the components that control the Autoscaler. This will provide insight into which scaling decisions the Autoscaler makes.
It’s important, too, to monitor the resource utilization of nodes, Pods, and containers in your cluster. This data provides essential context for determining whether the Cluster Autoscaler is appropriately managing total cluster node count and resource allocations.
How groundcover enhances Cluster Autoscaler visibility
As an observability solution that provides end-to-end visibility into all parts and layers of your cluster, groundcover makes it easy to monitor the Cluster Autoscaler Pods, as well as the nodes that the Autoscaler is managing.
.png)
This means that when you use groundcover, you’re clued in early to issues like nodes that are starved of available resources or Pods that are stuck in the pending state – which could be signs that autoscaling is not working correctly due to problems like a failure within the Cluster Autoscaler Pods, or configuration rules that prevent scaling operations.
Node scaling on autopilot
You can manage Kubernetes node count and configuration the hard, manual way. But life for Kubernetes admins – especially those responsible for large-scale clusters where node resource consumption levels constantly change – is to outsource this work to the Cluster Autoscaler. As long as you configure the Autoscaler properly, while also monitoring to ensure it works as expected, this feature can save a lot of effort, money, and risk.
FAQ
How does the Kubernetes Cluster Autoscaler differ from the Horizontal Pod Autoscaler?
The Horizontal Pod Autoscaler, or HPA, adds or removes Pod replicas. This helps ensure that there are enough copies of a given Pod to allow it to handle all of its requests. In contrast, the Kubernetes Cluster Autoscaler manages the total number of nodes in a cluster, helping to avoid scenarios where there are not enough nodes to support all Pods. It also saves money by scaling node count down when extraneous nodes exist. The HPA can help improve the performance of individual Pods, but it doesn’t solve the problem of having too many or too few nodes in the overall cluster.
What metrics should I monitor to ensure effective Cluster Autoscaler performance?
The best way to monitor the effectiveness of the Cluster Autoscaler is to track the actual resource utilization of nodes, as well as to monitor the total number of pending Pods in your cluster. If the Autoscaler is doing its job, you should never see nodes whose total resource utilization approaches 100 percent for any length of time, and Pods should not be stuck in the pending state for more than a couple of minutes.
How can groundcover help detect and resolve scaling inefficiencies in Kubernetes clusters?
groundcover helps with Kubernetes cluster scaling by continuously tracking the actual resource consumption of all parts of your cluster. With this insight, admins can determine whether a cluster needs more or fewer total nodes. Although cluster autoscaling should automatically adjust node count, it doesn’t always work as intended due to issues like delays in spinning up new nodes or configuration parameters that prevent the Autoscaler from reacting quickly to resource utilization changes. Hence why it’s important to ensure proper observability into your cluster, even when you use autoscaling.






