Demystifying eBPF Tracing: A Beginner's Guide to Performance Optimization
With eBPF, engineers no longer have to make the ludicrous choice of running resource-hungry monitoring software just to help figure out whether their applications are consuming resources efficiently. Dive into everything you have ever wanted to know – and perhaps a bit more – about eBPF tracing and its role in modern performance optimization.
If engineers were psychiatrists, conventional Linux performance optimization strategies would be lobotomies. The traditional strategies that engineers have relied upon for system tracing and performance tuning on Linux systems involve running software that actually decreases overall system performance in many cases – which is not unlike deliberately cutting up parts of your brain in a bid to make your brain healthier. (Lobotomies turn out not to be a good idea, by the way, despite what people thought a hundred years ago.)
Fortunately, it's no longer necessary to perform the equivalent of a lobotomy if you want to optimize the performance of Linux-based workloads. Instead, you can leverage eBPF tracing, a radically new approach to performance optimization for Linux systems and applications. (Well, actually, you could make the case that eBPF is not radically new at all and that it just happens to have become fashionable in recent years; we'll say more about the somewhat complicated history of eBPF below.) With eBPF, engineers no longer have to make the ludicrous choice of running resource-hungry monitoring software just to help figure out whether their applications are consuming resources efficiently.
Keep reading for everything you have ever wanted to know – and perhaps a bit more – about eBPF tracing and its role in modern performance optimization.
What is eBPF tracing?
eBPF tracing is the use of eBPF, a framework built into modern versions of the Linux kernel, to collect performance data from Linux-based applications and services. You can then use this data to troubleshoot performance issues, assist with Linux kernel debugging, perform system tracing and basically manage any other tasks that require granular, low-level visibility into what your applications are doing.
The history of eBPF is either relatively short or quite long, depending on how you want to think about it. eBPF stands for extended Berkeley Packet Filter. The Berkeley Packet Filter (BPF) is a technology for Unix-like operating systems that originated all the way back in 1992, when Linux was barely a year old. However, it wasn't until 2014 that Linux kernel developers extended BPF into its modern incarnation, which allows engineers to run sandboxed programs directly inside the Linux kernel. Developers have also created new tooling that makes it easier to interact with eBPF.
Early-on, large organizations, like Facebook and Netflix, were big into using eBPF to achieve visibility into Linux-based workloads. But as the framework and the tooling surrounding it have matured, eBPF is not just for the FAANG (or MAMAA, or whatever they're calling themselves these days) companies anymore. Anyone can now use eBPF to streamline tracing.
By the way, if you're wondering whether eBPF supports Windows, the answer is no, although Microsoft says it's working on it. (When Microsoft says it's bringing a Linux feature to Windows, you know it must really be a big deal.) For now, eBPF is only usable for production purposes in conjunction with workloads hosted on Linux. Sorry, Windows fans. The good news, though, is that virtually every modern Linux system supports eBPF, regardless of which distribution you're dealing with, and you don't typically have to install any special software (beyond some basic command-line tools, in certain cases) or customize your kernel to use eBPF. If you have applications running on Linux, you can use eBPF.
The basics of eBPF tracing
Although the technology that makes eBPF work is complex, eBPF tracing works in a pretty simple way from an end-user perspective. You write code that tells eBPF which data you want to collect, and from what. Then, you deploy your eBPF program into what some call the eBPF virtual machine.
An eBPF virtual machine isn't a VM in the traditional sense. There's no hypervisor, and eBPF has nothing in particular to do with Linux-native virtualization technologies like KVM. Instead, eBPF virtual machines are VMs in the sense that Java virtual machines are VMs: They provide a sandboxed environment where eBPF code can run without interfering with other processes.
There are a number of eBPF program types that the Linux kernel supports. Without getting too far into the weeds, suffice it to say that most eBPF programs either allow you to monitor network packets as they pass into or out of a Linux server, or they let you collect performance and debugging information from Linux kernel routines.
What that means, in essence, is that virtually any data that touches your Linux system – whether it's data related to the network or to an internal process or service – is accessible via eBPF. And to get that data, the only thing you need to do is write and deploy an eBPF program that specifies which data you want.
The advantages of eBPF tracing
We haven't yet explained why using eBPF is an alternative to the insanity that traditionally characterized performance optimization strategies for Linux-based workloads. So let's do that by discussing which advantages eBPF offers compared to other approaches to performance monitoring and system tracing.
Improved performance and scalability
Programs run via eBPF are super-efficient. The main reason why is that eBPF code runs directly in the Linux kernel, instead of running in so-called user space like traditional monitoring software.
The efficiency of eBPF translates not just to faster collection of monitoring data, but also to better system performance and scalability. When you minimize the CPU and memory that your monitoring software consumes, you maximize the resources available to other workloads, and you optimize their performance.
This is the major reason why eBPF frees you from the craziness of conventional performance monitoring: With eBPF, you no longer force your workloads to take a performance hit just so that you can monitor them and optimize their performance. Instead, you can optimize for performance without undercutting performance. Imagine that!
Enhanced system troubleshooting and debugging
You might assume that the efficiency and high performance of eBPF programs implies that eBPF doesn't give you as much data and visibility as other forms of monitoring. But oh, how wrong you'd be.
On the contrary, because eBPF, as we've said, can collect basically every piece of data that touches a Linux system, you get virtually unlimited levels of visibility. That beats the heck out of traditional performance monitoring, which restricts you primarily to collecting logs, metrics and whichever traces your applications are designed to support. With eBPF, your only constraints are the types of eBPF programs that Linux supports (and again, there are lots of them) and your ability to implement those programs (which is easy to do when you take advantage of user-friendly monitoring software that is powered by eBPF under the hood, as we'll explain in more detail in a bit).
Increase granularity and accuracy in Linux container monitoring
Nor does using eBPF require you to compromise on your ability to map monitoring data to specific workloads. Once again, the opposite is true. With eBPF, you can trace performance and debugging data on a process-by-process basis, which translates to a very high degree of granularity.
That means you can monitor individual containers, or even specific processes running within containers. You can also monitor specific VMs, applications or basically anything else that runs on top of Linux. Instead of getting a bunch of generic log or metrics data that you then have to go and try to map onto specific parts of your workloads, eBPF gives you as much granularity as you could reasonably desire.
Avoid breaking your system with poorly written tracing software
Before allowing eBPF programs to run, the Linux kernel submits them to a verification process designed to ensure they won't disrupt any parts of the system. Thanks to the verifier, the chances that your eBPF code will break your applications or cause your server to crash are basically zero.
That makes eBPF programs better than traditional Linux kernel modules, which are another way of running software inside the kernel. Linux doesn't validate kernel modules in the same way it verifies eBPF programs, and buggy kernel modules could cause your entire kernel to crash – which is another example of the opposite type of thing from what you want to happen if you are trying to optimize the performance of your stuff.
Getting started with eBPF tracing
There are two ways to get started with eBPF tracing. The best one for you depends on your goals.
If you just want to play around with eBPF or set up a basic eBPF tracing environment for fun, you can use command-line tools like bpftrace or bcc to deploy eBPF code. To create the code itself, you can write it from scratch if you're so-inclined, or grab ready-made sample eBPF programs from GitHub. You'll also need a Linux kernel running version 4.14 or later, since that is the minimum version that supports eBPF.
Note that not all eBPF features are available on all supported versions of the Linux kernel or on all distributions, so if you care a lot about specific eBPF program types and functionality, you should check out the documentation for different Linux systems to ensure that your Linux environment will support what you want. But if you're just looking to get started with eBPF in a basic way, you probably don't need to care too much about these details.
The other way to get started with eBPF – and the approach we suggest if you plan to use eBPF at scale in a production environment – is to leverage a monitoring and observability tool that comes with eBPF built in, like groundcover. When you take this route, you don't have to worry about writing eBPF programs, deploying them or figuring out what to do with the data they generate. Instead, you simply deploy your monitoring tool using more or less the same process you'd follow for conventional monitoring software, leaving it to your monitoring suite to run eBPF under the hood and generate the insights you're looking for.
Best practices for eBPF tracing
No matter how you choose to deploy it, eBPF almost always results in more efficient, more granular and more actionable insights than you could glean from other approaches to performance monitoring.
However, adhering to a few key best practices can help you get even more out of eBPF tracing:
- Choose the right eBPF tracing tools: You can find a lot of eBPF tracing tools out there. Some are open source and some are built into observability platforms. The best tool or tools for you depend on what you're trying to do. Do you just want to test out eBPF and experiment with writing eBPF code? If so, go ahead and play with an open source eBPF CLI tool. If you want to take advantage of eBPF tracing for production workloads, you will likely be better served by a user-friendly observability suite that uses eBPF under the hood to collect monitoring data.
- Optimize your eBPF code: Although eBPF is inherently more efficient than monitoring tools that run in user space, poorly written eBPF programs will not run as fast or efficiently as those optimized for performance. For that reason, if you choose to write your own eBPF code, it's worth learning to optimize it if you want to get the very best performance possible.
- Keep eBPF programs up-to-date: Because eBPF continues to evolve, the best eBPF programs and tools available today might not be ideal in the future. Follow the eBPF development landscape to ensure that you're always taking advantage of the latest innovations in eBPF – or, find an observability vendor that updates its software whenever eBPF is updated.
- Secure eBPF data: Because the data collected through eBPF is so granular and detailed, it could potentially help those with malicious intent to find ways to compromise your system by, for example, alerting them to vulnerabilities that exist within your systems. Avoid this risk by keeping eBPF data secure and sharing it only with team members who have a reason to access it.
Real-world applications of eBPF tracing
A few years ago, eBPF remained mostly an experimental technology that was generating a lot of buzz, but had not yet been deployed for many real-world purposes. That's no longer the case. Today, eBPF is deployed widely for a variety of use cases at companies large and small.
For example, check out this talk to learn how engineers at Google take advantage of eBPF to monitor resource usage efficiency and to help manage user space applications efficiently. Or, read the Android documentation for details on how Android takes advantage of eBPF to help support debugging and data collection. (Did you know you probably have eBPF programs running inside your pocket on your Android phone. Well, now you do!)
eBPF is also in use by telcos to optimize network performance, by eCommerce platforms to assist in intrusion detection and by cloud computing providers to monitor infrastructure performance.
The list could go on, but you get the point: Unlike, say, generative AI, which has created lots of buzz but which people are still trying to figure out how to use for real-world purposes, eBPF tracing is ready for primetime here and now. The fact that it continues to evolve doesn't mean you can't start taking advantage of it to improve your approach to monitoring.
Conclusion: An insanity-free approach to system tracing
You could keep monitoring your workloads the old, crazy way. You could deploy monitoring agents that run in user space and suck up tons of resources just to tell you, in a way that is not particularly granular or scalable, whether applications are sucking up too many resources. And then you can spend the rest of your day figuring out how to ask your boss for more infrastructure budget so you can allocate additional resources to help support your monitoring software.
Or, you could leverage eBPF tracing, the latest, greatest way to achieve low-level visibility into software performance without all of the headache and gotchas that come with traditional performance monitoring. And thanks to observability suites that leverage eBPF to collect data, getting started with eBPF tracing is easier than you might imagine.