Kubernetes nodes – the servers that provide the core infrastructure for hosting Kubernetes clusters – tend to get the short stick when it comes to discussions of Kubernetes monitoring and performance optimization.
After all, it has often been said that one of the great benefits of Kubernetes (and cloud computing in general) is that it lets you treat your servers like cattle, not pets. The implication is that we should avoid becoming overly attached to servers or investing too much time or energy on them because in a cloud native environment, servers are a commodity that you can replace at will.
Now, we're all for avoiding node attachment syndrome. Your nodes are not your furbabies, and you should not treat them as such (and yes - we also think the cattle vs. pets analogy is a bit crude, but we'll run with it because it's a widespread idea in the realm of cloud computing). One of the benefits of Kubernetes is indeed the ability to join and remove nodes from clusters at will, and to have individual servers fail without necessarily harming workload performance.
That said, problems with nodes can and often do have a deleterious effect on Kubernetes performance. Nodes provide the crucial memory and CPU resources that power your workloads, which means that underperforming nodes can quickly translate to underperforming clusters. On top of that, nodes play a central role in shaping the security of K8s-based workloads, too.
That's why managing nodes in an optimal way is an essential step toward optimizing overall Kubernetes performance and security. This article provides guidance by taking a deep dive into Kubernetes node management. We'll cover how nodes work, what runs on your Kubernetes nodes and best practices for node management.
What are Kubernetes nodes?
In Kubernetes, nodes are the servers that you use to build a Kubernetes cluster. There are two types of nodes:
• Control plane nodes, which host the Kubernetes control plane software.
• Worker nodes, which host applications that you deploy on Kubernetes.
Nodes can be either physical or virtual machines, and they can run any Linux-based operating system. (You can also use Windows machines as worker nodes – in fact, you need to if you’re going to host Windows containers in Kubernetes – although Windows doesn't support control plane nodes.)
This flexibility is part of what makes nodes in Kubernetes so powerful. You can spin up virtually any type of server to function as a node. The server's underlying configuration doesn't really matter; all that matters is that your node can join the cluster and is capable of hosting the core node components, like Kubelet, which we'll discuss later in this article.
Why are nodes important in a Kubernetes cluster?
The importance of nodes to Kubernetes is straightforward: Your cluster can't exist without nodes, because nodes are literally the ingredients out of which your cluster is constructed. The stronger and bigger your nodes are, the more powerful your cluster will be, in terms of total CPU, memory and other resources.
Going further, it's worth noting as well that part of the power of Kubernetes derives from the fact that Kubernetes can deploy workloads across multiple nodes. That makes it possible to run applications in a distributed, scale-out environment where the failure of a single server is typically not a big deal. Without nodes, the whole concept of distributed, cloud native computing in Kubernetes wouldn't work.
For clarity's sake, we should mention that it's possible to have a Kubernetes cluster that consists of just a single node (which would function as both a control plane node and a worker node in that case). But since this setup would deprive you of the important benefits of being able to deploy workloads in a distributed environment, it's basically unheard of in production to run a single-node cluster. Single-node clusters can come in handy for testing purposes, but usually not for deploying applications in the real world.
Kubernetes node components
Each worker node in Kubernetes is responsible for running several components. Let's look at them one-by-one.
Kubelet: The node agent
Kubelet is the envoy that allows worker nodes to talk to control plane nodes. In other words, Kubelet is the software that runs locally on each node and serves as an envoy between the node and the rest of the cluster.
Kubelet is responsible for executing whichever workloads the control plane tells it to. It also tracks workload operations and reports on their status back to the control plane.
Container runtime
In order to run workloads – which are packaged as containers in most cases, unless you're doing something less orthodox like using KubeVirt to run VMs on top of Kubernetes – you need a container runtime. A container runtime is the software that actually executes containers.
Examples of popular container runtimes include containerd and Docker Engine (although the latter is now deprecated). The runtime you choose doesn't really matter in most cases as long as it's compatible with your containers – which it probably is, because all mainstream runtimes comply with the Container Runtime Interface (CRI) standards, which Kubernetes requires as of release 1.27.
So, while there's a lot to say about the differences between the various runtimes, we're not going to go down that rabbit hole in this article. We'll just say that you should choose a CRI-compliant runtime and move on with your life.
Kube-proxy: Network management
Kube-proxy maintains network proxy rules for your nodes. These rules allow Kubernetes to redirect traffic flowing to and from Kubernetes services that operate in the cluster.
It's possible in certain environments to use Kubernetes without kube-proxy, which can help optimize performance in some cases. But unless you're worried about eliminating every single unnecessary CPU cycle, you should just stick with kube-proxy, which is the simplest and time-tested way of managing network proxies.
Node management 101: Understanding node status and conditions
Now that you know what Kubernetes nodes do and why they matter, let's talk about how to manage them.
The first thing to know about managing nodes is that you can check their status using the kubectl describe node command:
The output includes basic information about the state of your node, including:
• Addresses: Includes data about the node's network status, including IP addresses and hostname.
• Conditions: Describes the node's current state with optional additional information (which we'll cover in a bit more detail later).
• Capacity and Allocatable: Contains information about the CPU and memory resources available to the node.
• Additional information: You'll also usually see the node name, node operating system information and data about the node's kubelet instance.
For full details on node status reporting, check out the NodeStatus part of the Kubernetes documentation.
Common node conditions
The information that you'll find in the node Conditions field describes the various states that your node might be in. Possible states include:
• Ready: The node is healthy, with no problems detected.
• DiskPressure: The node lacks sufficient storage resources.
• MemoryPressure: The node lacks sufficient memory resources.
• PIDPressure: There are too many processes on the node.
• NetworkUnavailable: The node is experiencing network issues.
Ideally, your nodes will always be in the Ready state. Other states don’t necessarily mean the node has failed or is about to fail, but they do indicate some type of problem that will eventually lead to failure if you don't manage it.
Troubleshooting Kubernetes node issues
If you run into performance issues on a Kubernetes node, your first step should be to check the node's status. A Condition field that indicates it's not in the Ready state is usually your best indication of what's wrong, since the state will tell you if the node is short on, say, memory or disk resources.
If the Conditions field doesn't point you in a useful direction, look for other anomalies in the node status details. For example, check the addresses to make sure the node's network settings are properly configured.
If you're still unsure what's wrong, your best bet is to look at the Kubelet logs for the node. The Kubelet logs location varies depending on the node operating system and how you configured Kubernetes, but you can more often than not find them in the /var/log directory of the node.
Working with Kubernetes nodes
Here's an overview of other common tasks you might want to perform with your Kubernetes nodes.
Adding and removing nodes from your cluster
Depending on your Kubernetes distribution and configuration, there may be multiple ways to join a node to a cluster or remove it. The most common method – and the one that should work on any Kubernetes environment – is to use kubeadm to set up a node, then join it to a cluster with:
To remove a node, first "drain" it (which tells Kubernetes to migrate workloads hosted on the node to other nodes) with:
Then, remove it from the cluster with the delete command:
If you use a managed Kubernetes solution, such as EKS or GKE, your Kubernetes provider may also offer a graphical interface and/or custom tools (like eksctl) to add and remove nodes.
Tainting and untainting nodes
Taints are properties that make it possible to avoid scheduling specific pods on a node. In other words, when you "taint" a node, you can tell Kubernetes to treat it in a particular way.
To taint a node, use:
The key-value is the information that tells Kubernetes how to treat the tainted node.
To remove a taint, run the same command again, but add a - character following it:
Organizing nodes with labels and selectors
Like taints, labels are key-value pairs that are properties of a node. Unlike taints, labels don't have a direct impact on node scheduling. Instead, they allow you to organize nodes, kind of like tagging resources in a public cloud. You can also use labels in conjunction with Selectors to select which nodes Kubernetes should prefer for scheduling.
Labels can be added using the kubectl label command, which should specify the node’s name and the label key-value you want to use.
Selectors can be defined via the “nodeSelector” attribute for individual workloads.
Monitoring and scaling Kubernetes nodes
Kubernetes nodes should be monitored regularly if you want to prevent problems. The earlier you detect issues on your nodes, the better positioned you are to get ahead of them before they degrade application performance.
Kubernetes monitoring tools
There's a plethora of Kubernetes monitoring tools out in the world, and we won't try to describe them all here. We will say that many will find the following solutions helpful for meeting basic Kubernetes monitoring needs:
• The Node Metrics API: A native Kubernetes API that provides high-level node metrics, including CPU and memory usage, network usage and file system usage. You can use these metrics to monitor the overall health of Kubernetes nodes.
• Prometheus: An open source monitoring system that can be used to monitor Kubernetes nodes. Prometheus provides a wide range of metrics that can be used to monitor node performance, including CPU usage, memory usage, network traffic and disk usage.
• Grafana: An open source analytics and monitoring platform that can be used to visualize Prometheus metrics. Grafana provides a wide range of pre-built dashboards that can be used to monitor Kubernetes nodes.
All of these solutions are free, and all provide a simple means of collecting the core monitoring data you need to track the health and status of Kubernetes nodes.
Key node metrics to monitor
Which metrics matter most? If we were to make a list, it would include:
• Node CPU usage.
• Available disk space.
• Available memory.
• Network usage statistics.
If something's wrong with your node, there's a pretty good chance that anomalies in at least one of these metrics will help you pinpoint the cause quickly.
Node auto-scaling: Horizontal and vertical scaling strategies
One of the cool features of Kubernetes is the Horizontal Pod Autoscaler (HPA), which automatically scales the number of Pods in a Kubernetes deployment based on CPU or memory usage. HPA can be used to auto-scale Kubernetes nodes based on resource usage and avoid situations where nodes run out of sufficient resources for the Pods assigned to them.
We should note that horizontal scaling differs from vertical scaling in that its strategy is based on replicating the workloads to more instances, instead of supplying the existing workloads with more resources.
Securing Kubernetes nodes
Just as an underperforming node can become the weakest link in your Kubernetes performance strategy, an insecure node can quickly turn into an open door for attackers to compromise your cluster.
Best practices for node security include:
• Regularly update the nodes: Keep the nodes up-to-date with the latest security patches to prevent any vulnerabilities from being exploited.
• Limit the use of privileged containers: Avoid running containers in privileged mode, which increases the risk that they can bypass security measures gain access to the host operating system.
• Limit node access: Restrict access to the nodes to authorized users and services, rather than letting anyone log in or access node resources.
• Use secure communication channels: Use secure communication channels such as SSH to access the nodes. Avoid using unencrypted methods like telnet.
• Use container images from trusted sources: Avoid using container images from untrusted sources, since they may contain malicious code that attackers could use to plant malware on a node and build a backdoor into it for themselves.
You should also follow all standard server security best practices. Avoid unnecessary user accounts on your nodes, uninstall unnecessary software (which, in addition to wasting resources, increases the attack surface of your nodes) and consider deploying kernel-hardening frameworks like SELinux or AppArmor to add an extra layer of protection against attacks.
Implementing network policies
Network Policies is a Kubernetes feature that restricts network traffic to and from Kubernetes nodes. Network Policies are another way to help prevent unauthorized access to Kubernetes nodes.
Role-Based Access Control (RBAC)
RBAC is a Kubernetes feature that restricts access to Kubernetes resources based on user roles.
RBAC isn't just a node security feature; RBAC can help protect various other components of your cluster. But because you can use RBAC to ensure that only authorized users have access to Kubernetes nodes, it's one useful tool for protecting your infrastructure.
No node left behind!
Nodes may feel like the most boring part of your Kubernetes clusters. They are the things that sit in the background and host your workloads, and until something goes wrong, you probably don't think much about them.
But when one of your nodes does break, suffers performance degradation or experiences a security breach, your entire cluster can quickly fall apart if you don't manage the issue effectively. That's why it's essential to understand how nodes work in Kubernetes, manage their organization, monitor them continuously for problems and secure them.
After all, your containerized applications will only work as well as the nodes that host them. Your goal should be to ensure that you let no nodes be left behind in your quest to optimize Kubernetes performance and security.
Share this post
Copy linkhttps://www.groundcover.com/blog/kubernetes-nodes
Sign up for Updates
Keep up with all things cloud-native observability.
Yeah, somehow cookie consent is still a thing... accept them and move on or see what it all means here
Accept