Kubernetes Monitoring: Embrace metrics for the full picture
Discover everything you need to know about Kubernetes monitoring, how to keep it painless and effective.
Those just getting started on their Kubernetes journey may encounter an inconvenient truth: They need to monitor Kubernetes, but whether it’s the available tools, or the ability to correlate Kubernetes events with application events - this doesn’t come easy.
If you find yourself in that boat, then do read on and join us for an introduction to Kubernetes monitoring – including not just how Kubernetes monitoring works and why it's important, but also how to take the pain out of Kubernetes monitoring.
What is Kubernetes monitoring, anyway?
Simply put, Kubernetes monitoring is the practice of tracking the status of all components of a Kubernetes environment. Because there are many pieces inside Kubernetes, Kubernetes monitoring actually entails monitoring many distinct things, such as:
- The kube-system workloads
- Cluster information using the Kubernetes API
- Applications interactions with Kubernetes by monitoring apps bottom-up
By collecting Kubernetes data, you’ll get viable information regarding your cluster health, that can help you troubleshoot unexpected container termination, and it can also be leveraged for proactive decisions such as adjusting rate limits.
With that said, monitoring the Kubernetes infrastructure individually is not enough. You'll also want to be able to correlate the Kubernetes data with your application metrics to get a clear image of your cluster, and to pinpoint the root cause of issues that may involve multiple components (such as a Pod that has failed to start because there isn't an available node to host it).
So, why is it so important to monitor Kubernetes?
As you read about monitoring Kubernetes in the section above, you may have thought to yourself: “does Kubernetes monitoring really make the difference when assessing application health?” or “ Is it worth going beyond monitoring my workloads in terms of tracking down an issue?”
I’d say the short answer to those questions is: Absolutely. Although Kubernetes can be summarized as a container orchestration platform, it's a layer that constantly interacts with the deployed applications through their entire lifetime, and, like with anything, ‘the devil is in the details’. In our case - those details are often:
- How Kubernetes interacts with our applications (e.g. kube-proxy problems, affecting services connectivity)
- How Kubernetes behaves alongside the workloads (e.g. unresponsive node agents, affecting workload scheduling)
That's why it's critical to monitor Kubernetes using an approach tailored to Kubernetes. If you don't – if you monitor K8s using the same tools and methods that you'd use to monitor applications and infrastructure outside of a K8s cluster and don’t take Kubernetes into account – you almost certainly won't be able to properly assess your workload's resilience and robustness.
How to monitor Kubernetes, effectively
So, what does it take to monitor Kubernetes effectively?
Perhaps the best way to answer that question is first to talk about how not to monitor Kubernetes. You should not do things like:
- Settle for Kubernetes monitoring based on limited metrics or only metrics of a certain type – such as basic CPU and memory utilization data.
- Monitor Kubernetes as a separate being from your workloads, not correlating their metrics.
Instead, you should implement a Kubernetes monitoring strategy that allows you to collect any and all relevant data from across all parts of your K8s cluster in a centralized way. You can do that seamlessly with the help of eBPF, but I’ll delve into that later, after we set down the groundwork…
4 Best practices for Kubernetes monitoring
In order to leverage Kubernetes data effectively as an integral part of your workload monitoring, here are some general best practices we encourage:
- Correlate data: Because Kubernetes has so many layers and components, the ability to correlate monitoring data between different types of resources is critical. Monitoring and analyzing data from individual resources in isolation is often not enough to determine the root cause of a failure or assess how many resources it impacts.
- Configure contextual alerts: Generic threshold-based alerting – which means generating alerts whenever resource utilization crosses a predefined level – doesn’t typically work well in Kubernetes because Kubernetes workloads often scale up and down on a continuous basis. Instead, you should configure alerts that take context into account. For example, a workload instability that is due to a temporary cluster re-sizing might be treated with a lower severity.
- Analyze data in real time: Because Kubernetes clusters are constantly changing, analyzing data even just minutes after it was collected may not be enough to deliver actionable insights. You want to be able to ingest and analyze data in real time whenever possible.
- Keep monitoring predictable: Monitoring tools can consume a lot of resources and due to their sometimes “injected” nature (sidecars) - can deprive your production workloads of the resources they need to run well, and create ambiguity regarding resource consumption. Avoid this problem by choosing a monitoring architecture that can be tailored to your needs and is focusing on safety and consistency, eBPF powered observability is a great way to achieve this.
Comparing Kubernetes monitoring solutions
Unfortunately, conventional Kubernetes monitoring tools don’t always lend themselves well to monitoring best practices.
They monitor the kubernetes infrastructure and your workloads as if they are separate entities that don’t interact, they have ported traditional monitoring practices to Kubernetes that don’t take cloud-native considerations (Cluster elasticity, Kubernetes RBAC, workload replication) into account, or sometimes use kubernetes practices such as side-car injection that should be done with caution and are not always suited for long-running observability.
Alternatively, Kubernetes monitoring tools might use the Kubernetes Metrics API to track basic statistics about node and Pod resource utilization. This centralizes monitoring, but it comes at the cost of being able to collect only basic data (because you're limited to what the Metrics API supports), without peering deep inside nodes and Pods when necessary.
So, with non-native or conventional Kubernetes monitoring, you get a choice between limited visibility and complexity on the other. Needless to say, neither option is ideal.
Kubernetes monitoring made easy: The eBPF approach
Fortunately, there's a better solution to Kubernetes monitoring: eBPF.
eBPF is a framework that allows you to run programs in the kernel space of Linux-based servers. What that means, in essence, is that eBPF makes it possible to deploy monitoring software (among other types of tools) that is very efficient and secure, but that also provides very granular visibility into workloads.
The grand idea behind eBPF-based Kubernetes monitoring is that if you can run monitoring agents in kernel space on each node, you can use them as a vantage point for collecting any data you want in the cluster, not just your own workloads - but Kubernetes workloads as well. All of the data generated by these resources passes through the kernel of the operating system that hosts them, so there is virtually no limit to what you can monitor using eBPF.
eBPF is still pretty new – it debuted only in 2014, and it has taken some time to gain widespread adoption – so it wasn't always at the center of Kubernetes monitoring. But now that it has matured, eBPF has emerged and redefined the possibilities when it comes to monitoring Kubernetes clusters. It has opened up a radically simpler, more efficient, and more effective approach. And it enables a Kubernetes monitoring strategy that is truly tailored for the distributed nature of Kubernetes.
Gain with no pain: Monitoring Kubernetes without losing it
To sum up, Kubernetes monitoring has traditionally been tricky. There were so many types of resources to monitor, and so many different types of data to collect and correlate from each one, that there wasn't a great way of getting all of the data you needed to manage Kubernetes as part of a comprehensive fleet management.
Luckily, with a little help from eBPF, these problems disappear. Kubernetes monitoring based on eBPF makes it possible to get all of the information you need, across all Kubernetes resource types, in an efficient, consistent, secure and contextual way.