APM in groundcover: how OpenTelemetry became a first-class producer of APM measurements
Discover how groundcover transformed OpenTelemetry from a trace source into a full APM producer, enabling accurate service metrics, dashboards, alerts, and agent-driven observability without eBPF.
.jpg)
Our eBPF sensor is something we’re really proud of at groundcover. It attaches at the Linux kernel level and watches every network call, every HTTP request, every system event that passes through your cluster without SDK instrumentation, code changes, restarts, or sampling decision. A new service shows up in your service map the moment it starts taking traffic, and the APM data is already there when you open the screen. You didn't instrument anything. That's the relationship with your own infrastructure we want every customer to have: see everything, out of the box, in your own cloud.
But "powered by eBPF" was never meant to be "eBPF only." Plenty of teams already emit OpenTelemetry, and we don't think choosing groundcover should mean abandoning the instrumentation you've invested in. So our goal is plain: OTel should be a first-class citizen next to the eBPF sensor.
That goal is harder than it sounds, and we'd rather show you the work than just claim the badge. We wrote about this recently in our two-part series on the OTel normalizer we built for GenAI — where "we support OpenTelemetry" turned out to mean something different to every SDK, framework, and provider, and where keeping the eBPF path and the SDK path producing identical output became the actual job. This post is the APM version of the same story. It's part of how we work in the open: not just telling you what we can do, but showing you what it took to get there, including the parts that are still in progress.
So here's the honest version of what the last stretch of APM work was really about.
APM–the observability metrics that tell the big picture
APM is not the same thing as your infrastructure observability, and it's not the same thing as traces either. Infrastructure observability tells you about all your nodes, pods, containers, CPU, memory, the network between them. Traces tell you about individual requests. One span is one event: this request hit this endpoint, took 240ms, came back 500, called these three downstream services.
APM sits a level up from traces. It's the service-level story you build by aggregating thousands of those requests over time: this endpoint is handling hundreds of thousands requests over the last few minutes at some error rate with a p95 latency of so-many milliseconds. Ensuring that those aggregates are correct when your metrics are coming from multiple data sources is non-trivial. This post is about how groundcover delivers a complete and accurate APM story with both eBPF and OTel.
The screens existed before the data did
groundcover instruments your cluster at the kernel level with eBPF, so for a long time APM measurements had exactly one producer: the eBPF pipeline rolling up kernel-observed traffic into counters and latency distributions. If you ran groundcover the normal way, the API Catalog filled in, workloads showed up, the service map drew its edges.
But what would happen if you were an OTel-only deployment and were shipping spans into groundcover without the eBPF sensor? Unfortunately, the APM screens were empty. Empty catalog, empty workload list, no service-map edges.
The tempting read is "the UI is broken." It wasn't. The queries ran fine. There was just nothing to return. Those customers were sending plenty of traces. Their spans landed in the trace pipeline like they should. Nothing was turning those spans into APM measurements, because the only thing that knew how to do that was the eBPF path, and they weren't running it. So the fix wasn't in the UI. It was a new producer. groundcover added a sensor-side span processor that watches OTLP spans in the trace pipeline, aggregates them into per-bucket counters and DDSketch latency distributions, and flushes measurement records to ClickHouse every 60 seconds. Same shape of data the eBPF path produces, derived from OTel instead.
This is how OpenTelemetry became a first-class producer of APM measurements, not just a source of traces in groundcover. If you're an OTel shop, your APM screens now fill in from your own spans, without eBPF required.
Two producers is a different problem than one
Here's the part that's easy to miss and impossible to undo once you see it. The moment OTel could produce APM measurements, groundcover had two producers — eBPF and OpenTelemetry — and both of them can describe the same operation.
That breaks aggregation in a quiet, nasty way. Ask "how many requests did this service handle" across both sources and you can count the same traffic twice. The graph renders a confident, wrong number without crashes or errors. That's the worst class of observability bug: the dashboard looks healthy while it lies to you.
So how did groundcover make APM trustworthy? First, groundcover made the source part of the data. Sensor-produced measurements get tagged source: ebpf, so the system can actually tell the two producers apart. If those tags are sloppy, every "just pick one source" rule downstream falls apart. Second, groundcover engineers recognized that source is part of what an APM measurement means, not a nice-to-have filter. The source picker is a single-select, mandatory query gate. You can see it in the query builder today, sitting right next to the Type dropdown.
The query contract
The same instinct shows up in how APM queries are shaped. An APM query now has to carry two things up front:
- a resource_type (what kind of service or dependency you mean)
- a direction is_inbound:true or is_outbound:true.
You can see both in the builder: the Type picker, and the Inbound / Outbound toggle.
.png)
The resource_type is a partitioning hint, so ClickHouse can prune whole protocol families before it does any real work instead of scanning everything and sorting it out later. The direction predicate forces you to decide whether you're asking about traffic into a service or out of it — mix the two and your aggregate blends two different behaviors into one meaningless line. Now, a bad query fails fast with a clear error instead of slowly returning a misleading chart. The guardrails make an APM query specific enough to be both fast and true.
.png)
APM as a real datasource
Pull all of this together and the advancement is bigger than "OTel works now." APM became a datasource you query the same way you query everything else in groundcover — the same gcQL-style path as logs, traces, events, entities, metrics, monitors. It got the same treatment they did: a place in the monitor wizard, source and type pickers wired into the API Catalog and the Workload API tab, and an MCP query_apm tool so an agent can ask about request rate, error rate, and latency quantiles directly. APM had been the only observability datasource without one.
The practical payoff: you can build alerts straight off real service telemetry. The counter fields (total_counter, success_counter, error_counter) and the latency fields (total_latency_seconds, latency_seconds_quantiles) are right there in the query surface, so the usual service alerts — traffic dropped, errors climbing, p95 regressing — evaluate against the actual measurements instead of some proxy metric you had to invent. The one rule to remember: when you aggregate, pin source:ebpf or source:opentelemetry, or the double-counting comes back.
.png)
Final Thoughts
"Support APM" was the start of the work. The challenge was getting the product, the backend, and the agent to agree on what an APM measurement means before any of them was allowed to chart it. . Give groundcover a try with our playground or launch groundcover's eBPF sensor on your cluster for free whether you want to use our eBPF sensor or OTel.

.png)
.jpg)




