eBPF observability is like triple-ply toilet paper: It's one of those things you may have lived without for a long time, but once you try it, you can never go back.
After all, plenty of teams were doing observability long before eBPF became popular. But by making it possible to collect critical observability data through the Linux kernel – instead of via user space agents – in a hyper-efficient and super-secure way, eBPF enables a whole new approach to observability.
That doesn't mean, of course, that eBPF is the right solution for every observability need. But by and large, if you need to observe modern, complex applications, chances are that eBPF will help you do so in a more efficient and effective way than other observability strategies.
Keep reading for everything you need to know about eBPF observability, including how eBPF works, why observing applications from kernel space is so much better than other approaches and how to get started with eBPF as an observability solution.
What is eBPF Observability?
eBPF observability is the use of the extended Berkeley Packet Filter, or eBPF, to collect the data necessary to observe applications from kernel space.
eBPF is a technology included in the Linux kernel source code that makes it possible to collect information about network usage, application processes and system resources usage through system calls. Thus, rather than having to deploy monitoring agents in user space to gather the data you need to understand what is happening deep inside your applications and the servers that host them, you can get that information via eBPF tracing instead.
The ability to collect observability data through the Linux kernel instead of through agents running in user space provides several key benefits:
- Data collection is much more efficient because eBPF programs consume minimal resources.
- eBPF programs run in sandboxed environments, which minimizes the risk of security breaches.
- The Linux kernel verifies eBPF programs for safety before it allows them to execute. This reduces the risk of buggy code in an eBPF system call causing the kernel to crash.
- Because eBPF is built into the kernel source code of modern versions of Linux, you don't have to install any special frameworks to run eBPF programs. Nor do you have to load kernel modules or modify the Linux source code directly. All you need to install are some tools for interacting with the eBPF framework that is built into the kernel.
In short, eBPF enables simpler, more efficient and more secure observability.
How eBPF Observability Agents Optimize Application Performance
We already noted that eBPF programs consume far fewer resources when they run. But this is a really big deal, so let's talk about this eBPF advantage some more.
With traditional observability software, you typically had to deploy a monitoring agent in user space. You'd either install it on each node that hosted your workloads, or (on Kubernetes) you'd deploy it inside a Pod using the sidecar pattern. Either way, the observability agent ended up sucking up a fair amount of resources because it was essentially a middleman between the applications it observed and the kernel. If it wanted information, it had to ask the kernel for it (or get it through kernel logs), which led to higher resource consumption and left fewer resources available for actual workloads.
With eBPF observability agents, this problem goes away. Because eBPF programs run in kernel space, they don't have to ask the kernel to give the information they need, which means they consume fewer resources. Although the exact resource consumption levels of eBPF programs will vary depending on exactly what they do, you can generally expect your programs to require a small fraction – as little as 2 or 3 percent – of the resource allocation of a traditional observability agent.
eBPF Observability Use Cases
Because eBPF can collect virtually any information that is available to the Linux kernel, it's a versatile tool that supports a wide range of observability use cases. Here's a look at four of the most common.
eBPF can monitor L3, L4 and L7 network traffic flows without having to rely on iptables logs, the more traditional solution for network observability on Linux, to collect the data. This means that eBPF gives you deep visibility into network usage on a workload-by-workload basis, allowing you to identify network anomalies that could impact the performance or security of your applications.
Armed with this information, you can determine whether application problems are the result of networking issues or are caused instead by a problem with the application itself. In addition, monitoring network traffic with eBPF can alert you to security threats, such as DDoS attacks against your applications.
Kubernetes is a complex system consisting of many components that interact with each other in complicated ways. eBPF observability helps you understand these complex relationships so that you can pinpoint the source of performance issues.
With eBPF, you can monitor the resource consumption patterns of individual applications running on every node in Kubernetes. You can also gain granular visibility into the processes behind the Kubernetes control plane. And you can do all of this in a hyper-efficient way, without worrying that your observability software will suck up resources needed by your workloads.
By providing granular visibility into both network traffic and individual processes on every server, eBPF observability helps to identify and investigate a wide variety of security risks. With eBPF, you can detect network events that could be the sign of an attempted security breach, for example. You can also map processes to network connections to assess whether a process that is behaving in an unusual way is attempting to access network resources that it should not legitimately need to connect with.
It's important to note that eBPF on its own is not a full-fledged security solution. It doesn't provide capabilities like scanning applications for vulnerabilities or providing threat intelligence. But it is an efficient way of collecting the data you need to know when your environments might be under attack or have been compromised.
eBPF is an excellent all-around performance observability solution. It allows you to monitor the resource consumption of individual applications and processes in a highly granular way, which is valuable when you’re trying to figure out if, for example, a sudden spike in memory or CPU usage is due to a bug inside an application or a legitimate need by a process that runs as part of the application to respond to higher demand.
As we explained above, the fact that eBPF programs consume very few resources to run is a boon to observability, too. It frees teams from having to pay the "observability tax" that comes with traditional observability strategies that rely on resource-hungry monitoring agents hosted in user space.
When you're tracing an application, you are monitoring how it processes requests, and how different parts of the application respond to requests. Because the eBPF program can collect data about the resource consumption and network usage of individual processes, it provides critical context during tracing operations. It can help you identify the microservice that’s causing a bottleneck because it's consuming too many resources, for instance.
Who Should Use eBPF for Observability?
While eBPF is a great way to address many observability needs, eBPF is not the ideal solution for every observability need or use case.
In general, you should use eBPF if the following statements are true...
Your workloads are hosted on Linux
Currently, eBPF requires the Linux kernel technology. Microsoft is developing a version of eBPF for Windows, but it's not currently ready for production.
Efficiency is a priority
The ability to run in a highly efficient manner is one of the major benefits of eBPF programs. That's an important advantage if you need to minimize the resource overhead associated with observability. But if you'd prefer to trade simplicity for efficiency, you might be better served by a monitoring agent that runs in user space – an approach that is typically simpler to manage, although the agent will almost certainly consume more resources than an eBPF program.
You’re able to master a new technology
Although eBPF and its predecessor (the Berkeley Packet Filter, or BPF) have been around for years, eBPF as a modern observability solution remains relatively new. For teams that don't have the capacity to learn a novel technology, more traditional observability strategies are probably a better fit than eBPF.
Taking advantage of eBPF is great in theory, but the reality is that many teams are already stretched thin putting out fires and keeping their current workloads running, so overhauling their observability strategy in favor of eBPF may just not be practical, at least in the short run.
You have full access to your environment
In general, running eBPF programs requires having root-level access to your server. (It's possible to configure the Linux kernel to support eBPF in non-privileged mode, but this is insecure and you're not likely to encounter configurations like this outside of dev/test environments.)
As a result, eBPF observability isn't viable if you're using a shared hosting environment where you lack root access. You also can't take full advantage of eBPF observability if you're running a VM that’s hosted on a physical server that you don't control. In that case, you can use eBPF to monitor and observe workloads hosted on the VM, but not to gain visibility into the underlying infrastructure.
You Are Running Modern, Cloud-Native pps
You can use eBPF to observe any kind of application, including legacy apps or monoliths. But eBPF is most valuable when you're dealing with complex, distributed, cloud-native applications. For those apps, eBPF provides granular, process-by-process level visibility into the state of each microservice. It also helps you understand the complex relationships between microservices, as well as how microservices map onto network network usage patterns.
If you’re only supporting monoliths, though, eBPF might be more trouble than it's worth, because you don't require the same level of depth and context as you do when managing microservices-based, cloud-native apps.
How to Use eBPF to Collect Observability Data
There are two main ways to use eBPF today: Via standalone tools that allow you to execute eBPF programs, or via eBPF tooling that’s built into larger monitoring and observability solutions. The first approach makes sense if you want to experiment with eBPF or collect data on a one-off basis. For production use, though, you'll typically want to use a full-fledged observability solution that has eBPF baked in.
Standalone eBPF tools
A variety of command-line tools (which we'll discuss in more depth in the next section) allow you to deploy programs that leverage eBPF to collect observability data. These tools aren't built into the kernel source code by default, but they provide user space utilities for interacting with the eBPF code that is built into Linux.
The BCC toolkit is an example of one popular collection of eBPF tools. You can install BCC on most Linux distributions though the package manager. For example, to install on Ubuntu:
Once installed, the package provides a variety of command-line utilities (which are documented on GitHub) that use eBPF to collect data. For instance, if you want to use eBPF to trace new system processes, you'd run:
Again, these tools are a handy way to collect some quick observability data with eBPF. But if you want to run custom eBPF programs or system calls, this is not the best approach.
Observability platforms that include eBPF
A growing number of observability platforms are now leveraging eBPF to collect data. For example, groundcover provides an eBPF observability agent called Flora for monitoring.
The advantage of using an observability platform that features eBPF is that you don't have to set up and deploy individual eBPF programs to collect observability data. Instead, you get the convenience of a prebuilt solution that takes advantage of eBPF to provide observability in a hyper-efficient and super-secure way.
eBPF Observability Tools and Frameworks
As we mentioned above, there are a number of eBPF tools and frameworks that let you deploy eBPF programs. Here's a list of the most popular open source solutions, along with a summary of their intended use cases and features:
- BCC: A general-purpose toolkit and library for running eBPF-based programs.
- Bpftrace: A tracing language that you can use to execute traces via eBPF.
- eCapture: An SSL capture tool, useful for network observability.
- Tracee: An eBPF-based toolkit tailored toward security monitoring and incident investigation.
- Kubectl trace: A tool that helps schedule and manage eBPF-based programs running on a Kubernetes cluster.
This list is likely to grow as eBPF continues to gain popularity, and as the tooling surrounding it matures.
eBPF Best Practices for Observability
Although the efficiency and security of eBPF programs deliver key benefits no matter how you use eBPF, the following best practices can help you get the very most from the tool:
- Avoid eBPF unprivileged mode: Although requiring root access to use eBPF can be cumbersome, allowing any user or application to execute eBPF programs is a huge security risk because it gives them access to kernel space.
- Keep your kernel up to date: Since eBPF is built into the Linux kernel source code, updating your kernel ensures you have access to the latest version of eBPF.
- Keep kernel versions consistent: If you're running eBPF across multiple servers, installing the same kernel on each of them will help ensure consistency in eBPF output, since you'll have the same version of eBPF on each server as long as the OS version is also the same.
- Write separate programs for each task: eBPF delivers the greatest visibility when each program is designed to handle a specific observability task. It's better to develop separate eBPF programs for each use case you need to support than to try to shoehorn all of your observability needs into a single program.
The Pros and Cons of eBPF for Observability
To sum up the points above, the key advantages of eBPF include the following:
- It's built into Linux, so there’s little to install or configure.
- eBPF code runs in a super-efficient way, leaving more resources for your actual workloads.
- eBPF safety verification and sandboxing minimize the risk of security or performance problems due to eBPF.
The main disadvantage of eBPF is that it can be complex to set up and run eBPF programs. However, modern tooling makes eBPF easier to use, especially if you take advantage of an observability solution that gives you turnkey access to eBPF, without having to install an eBPF toolkit or write programs for it by hand.
To say that eBPF is exciting is an understatement. We prefer adjectives like "mind-blowing" or "astounding," because that's what eBPF observability is. Thanks to incredible efficiency and security, eBPF makes it possible to redefine the way you observe workloads, especially those that involve complex, cloud-native architectures.
Historically, the difficulty of using eBPF was a major drawback. But that issue has disappeared thanks to modern observability solutions that bake in eBPF, allowing teams to take full advantage of the eBPF framework without working any harder than they would when using a traditional observability solution.