How Using eBPF for Observability Can Reduce System Load
Get to know how eBPF solves the overhead issues of traditional monitoring tools, and how you can get started with eBPF in Kubernetes and elsewhere as the basis of your observability strategy.
Alanis Morisette, the musical artist whose hits include a song about irony, is not an IT engineer. But if she were, she might have sung about the irony of using traditional monitoring and observability instead of a solution like eBPF.
What makes traditional monitoring and observability tools ironic, you ask? It's simple: The increase in system load that comes with using these solutions. Because conventional monitoring tools use so much CPU and memory, they can have the ironic effect of straining your servers and decreasing application performance – which is exactly the opposite of what you want to do when monitoring.
But it doesn't have to be this way. Using eBPF, it's possible to monitor systems like Kubernetes effectively while paying virtually no system load tax. Here's how eBPF solves the overhead issues of traditional monitoring tools, and how you can get started with eBPF in Kubernetes and elsewhere as the basis of your observability strategy.
What are traditional monitoring and observability tools, and how do they work?
Let's begin by making clear what we mean when we talk about "traditional" monitoring and observability tools.
We're referring to solutions that work by deploying monitoring agent software on each of the servers they are monitoring – either as processes that run directly on the server, or using a method like sidecar containers to deploy monitoring agents through Kubernetes. Once deployed, the tools can monitor metrics like CPU and memory utilization, inspect log files and run traces because they have access to these resources through the host.
Overhead issues with traditional monitoring tools
The traditional approach to monitoring is good and well in the respect that it effectively allows tools to collect the data necessary to enable visibility and observability. But there's a big drawback: The additional CPU and memory resources that traditional monitoring tools consume. This is known as overhead because it's an unavoidable requirement for deploying conventional monitoring solutions.
Overhead exists as part of traditional monitoring and observability strategy because software agents run in what's known as user space or userland, just like standard applications. And also like standard applications, they require non-negligible volumes of CPU and memory to operate.
The exact amount of CPU and memory that traditional monitoring and observability tools consume can vary depending on factors like how much data they're collecting and how efficient their code is. Research by groundcover, however, shows that conventional solutions can increase CPU utilization by up to 249 percent (whereas eBPF monitoring, which we discuss below, increases CPU by a mere 9 percent, and has no detectable impact on memory).
What this effectively means is that if your server is at, say, 40 percent CPU saturation (meaning 90 percent of available CPU capacity is in use by the operating system and applications) before you deploy a traditional monitoring tool on it, the monitoring software could suck up all of the rest of the available CPU. This could potentially cause workloads to run short of sufficient CPU and trigger CPU throttling, leading to performance issues.
Isn't it ironic? In order to find out how much CPU (or other resources) your workloads are consuming, you have to reduce the amount of CPU (and other resources) available to them. That's the inherent price you pay for traditional monitoring.
The cost of agentless monitoring
Before going further, we should note that there's a variant on traditional monitoring and observability tools. It's called agentless monitoring, and it works by collecting monitoring data over the network instead of deploying software agents on each host.
Agentless monitoring typically doesn't significantly increase CPU and memory utilization on hosts because it doesn't increase the number of applications running in user space. The caveat, though, is that agentless monitoring solutions usually can't collect as much data or deliver as much visibility as you get when you have software running directly on the systems you're observing. They can only collect whichever metrics or other types of data are exposed through network traffic.
In addition, agentless monitoring may add complexity to applications because it might require developers to implement logic that exposes metrics and logs over the network.
So, while agentless monitoring is more efficient from a resource utilization standpoint, it comes with its own cost in the form of less visibility and more complexity.
How eBPF minimizes system load
You might be thinking: If traditional monitoring solutions increase system load so much, and agentless isn't really much better, why hasn't anyone come up with a better solution to monitoring and observability?
The answer is that they have! It's called the extended Berkeley Packet Filter, or eBPF, and it's a radically novel – and radically efficient – approach to monitoring and observability.
eBPF is a framework built into the Linux kernel that makes it possible to run custom programs in what's known as kernel space. Kernel space programs are much more efficient than user space applications because the former effectively run as part of the operating system – which means they have instant access to all of the resources the operating system does.
Plus, eBPF doesn't require custom changes to kernel source code or force you to recompile your Linux kernel. Instead, it allows you to insert and execute kernel space programs dynamically. And, because eBPF code runs in sandboxed environments, it's more secure and stable in most respects than kernel modules, which could cause issues if they are buggy.
So, using eBPF profiling, it's possible to deploy code to trace system calls, monitor network events and network packets and collect observability data directly from the kernel. This requires some amount of CPU and memory, but it's usually so low as to be negligible. Again, our own testing shows that eBPF-based data collection increases CPU load by under 10 percent relative to baseline levels, and the increase in memory is virtually nonexistent. This makes eBPF up to 25 times more efficient than conventional monitoring tools.
Plus, unlike agentless monitoring approaches, using eBPF doesn't come at the price of deep visibility. On the contrary, because eBPF integrates directly into the kernel, eBPF can see virtually everything that is happening – unlike traditional monitoring tools, which are limited to whichever types of logs, metrics, and traces are available in user space.
In short, eBPF enables better monitoring and observability with much lower levels of system load.
How to leverage eBPF for efficient observability
There are three basic ways to go about using eBPF: Writing and deploying your own code to perform eBPF tracing, using open source eBPF utilities or using a monitoring and observability tool that comes with eBPF built-in. Let's take a look at what each approach entails.
Writing an eBPF program
Writing your own eBPF programs is complicated, but doable if you have a fair amount of time and programming experience.
In most cases, eBPF code is written in C. For example, Terence Li offers a nice eBPF tutorial on GitHub that uses the following simple eBPF sample code:
After writing and compiling the eBPF code, you insert it into the kernel using an eBPF loader – a tool that takes code that would otherwise only be able to run in user space and executes it in the kernel instead.
Then, you execute the program and view the results of whichever data it collects – which is determined by the logic implemented in the eBPF code. Again, virtually all of the data that the kernel can view is available to eBPF programs, so the sky is the limit when it comes to which types of metrics, logs and traces you can collect.
Using eBPF utilities
If you don't want to write, compile and load custom eBPF programs by hand, an alternative approach is to use prewritten open source eBPF utilities, like those available through the the BPF Compiler Collection on GitHub. These are tools designed to collect various types of data that you can readily build (or, in some cases, find precompiled through your Linux distribution's package management system) and deploy.
For instance, on Ubuntu you can install a variety of tools from the BPF Compiler Collection using:
Then, you can simply call the tools on the command line to run them. For example, to run execsnoop, which traces new processes, you'd call:
If you wait for new processes to start, you'll see output like the following (this is the result of opening a new Bash shell through a virtual terminal):
Using prewritten eBPF utilities is a simpler way to run eBPF. The downside is that you're restricted to the functionality and configuration options of the tools available. This is also not a practical way to use eBPF at scale because you have to deploy and orchestrate the utilities manually.
Using eBPF-enabled observability tools
A third approach – and one we're partial to here at groundcover – is to use monitoring and observability software that uses eBPF as its primary means of data collection.
Traditional monitoring and observability tools don't use eBPF. They use agent-based or agentless collection techniques – which, as we mentioned, entail a lot of overhead or result in limited visibility.
But newer solutions, including groundcover, come with eBPF built-in. That means you benefit from the low system load of eBPF-based data collection, without having to write or deploy eBPF programs or utilities yourself. In addition, the observability data collected through eBPF is automatically ingested into analytics tools so you can make sense of it.
How to use eBPF for observability with groundcover
If you choose groundcover as your observability solution, you don't really need to learn anything special about eBPF to benefit from efficient observability. You just have to deploy groundcover, which gives you all of the capabilities you'd expect from a modern observability tool.
Specifically, you get granular data like performance metrics for individual nodes, Pods and containers. You also get elegant, customizable Grafana-based visualizations to make sense of your data.
And you get all this with minimal resource overhead. Again, our tests show that when using our tools, there is a nearly negligible increase on CPU utilization, while memory increase is virtually undetectable.
A free ride for monitoring and observability tools
The old approach to monitoring and observability is like rain on your wedding day: The resources you have to spend just to collect the data you need can easily ruin the experience.
In contrast, eBPF observability is like getting a free ride – and not after you've already paid! – in the sense that it costs almost nothing, in terms of system load, for your tools to run.
So, take our good advice, and switch to eBPF as the foundation of your observability strategy.
Sign up for Updates
Keep up with all things cloud-native observability.