
.jpg)
In our previous post, we explored why eBPF and OpenTelemetry are stronger together, and how eBPF provides the network-level coverage baseline that no amount of manual instrumentation can replicate. Likewise, how OpenTelemetry SDKs fill in the application-level depth that wire-level visibility alone can't provide. That combination solves a problem that has plagued observability teams for years.
But there's a new problem that makes the old one look straightforward.
AI is generating a tidal wave of code.
Every new service, library, and model integration introduces new log shapes, new field conventions, and new instrumentation assumptions. The engineers who would traditionally own that pipeline, the ones who understand the transformation rules, the parsing edge cases, and the tribal knowledge that keeps telemetry usable, simply can't keep up. That model was already fragile - and at AI scale, it's collapsing entirely.
In this post we'll look at what this means for observability, how the instrumentation assumptions that teams have relied on are shifting, and how eBPF and OpenTelemetry are each stepping up to meet the challenge in different but complementary ways.
The AI-Generated Code Problem
As coding agents write more production code, a quiet assumption starts to break down: that someone on the team understands the instrumentation. Every new agent-generated service arrives with its own log shapes and field conventions that nobody on the team defined or anticipated, and the engineers who would traditionally own that pipeline simply can't keep up with the volume.
But the instrumentation ownership problem is actually the smaller of the two issues. The more fundamental problem is that traditional tracing concepts don't map well to LLM systems at all, regardless of who or what generated the instrumentation. Traces were designed around request/response cycles with meaningful latency and error signals. LLM agents break all of those assumptions. A single transaction might be an hour-long reasoning loop. Latency is expected and not indicative of a problem. Errors don't surface the way they do in conventional services. The span structure that works beautifully for a microservices architecture produces an incomprehensible wall of nested calls when applied to an agent chain.
eBPF sidesteps the instrumentation ownership problem at the infrastructure level. It doesn't matter who wrote the code or whether they thought about observability. If it makes HTTP calls, you'll see them. The network behavior is observable regardless of what generated the application logic. But for the deeper problem of understanding whether an LLM agent is actually behaving correctly, a different approach is needed entirely.
The Instrumentation Pipeline is Due for a Rethink
The artisanal approach to observability pipelines, hand-written transformation rules, manually maintained parsing logic, tribal knowledge encoded in regexes, was always a bottleneck. It just wasn't a visible one until the volume of code started outpacing the team's ability to keep up.
As AI-generated code accelerates, the pipeline needs to become adaptive rather than static.
The emerging approach is to have agents reason over raw telemetry, infer its structure, and generate transformation logic that standardizes attributes across services, models, and teams, rather than waiting for engineers to write rules by hand. Instead of forcing code to follow logging conventions, the pipeline learns what logs and traces actually emit and adapts to it.
The key challenge, and an active area of work in the community, is ensuring that agent-generated transformation rules are properly constrained and validated before being deployed to production pipelines. This is not a solved problem, but it's where the observability community is heading.
Observability for AI Agents
That said, observing AI agents themselves is a different and evolving problem entirely. As previously noted, traditional APM metrics - latency and error rates - are almost useless for LLM agents. What actually matters is the internal logic, the decisions being made, the tool calls being executed, the chains, whether the agent is using the right tools and in the right order.
This is where the division of responsibility between eBPF and OpenTelemetry SDKs becomes particularly clear. eBPF gives you visibility into the calls to model providers and token usage at the network level, while SDK instrumentation and the evolving OpenTelemetry semantic conventions for AI handle the application-level logic that determines whether an agent is actually behaving correctly.
Sampling strategy is where this distinction becomes most apparent. The real signal in an LLM trace lives in the semantic content, the prompts, the intermediate reasoning steps, the tool calls, the model outputs, not in latency and error codes. Conventional sampling systematically discards exactly that content. Effective observability for LLM systems requires a shift toward content-aware sampling, scoring traces by quality, correctness, and alignment with task intent. Auto-instrumentation applied to AI SDKs compounds this by generating large volumes of spans that aren't particularly useful, making deliberate, targeted instrumentation far more valuable than comprehensive but noisy coverage.
Why Specs Aren't Enough
The frameworks and conventions in this area are developing fast. The OpenTelemetry semantic conventions for LLM observability are actively evolving as the community works out what good looks like for agentic systems, and the gap between traditional observability tooling and the realities of LLM-based architectures is closing. The goal is to evolve without discarding existing infrastructure, adapting OpenTelemetry pipelines and tracing stacks to serve LLM systems rather than rebuilding from scratch.
There is however a compounding problem that the specs alone won't solve. AI systems don't produce incremental telemetry growth, they produce step changes. LLMs in production generate payload-heavy inference logs. Agent workflows create nested spans. Vector database calls stack up. Retry storms multiply events. Streaming outputs expand trace volume quickly. When your observability pricing is tied to ingestion and retention volume, every AI feature launch becomes a cost event, and the natural response is to sample more aggressively, retain less, and instrument more cautiously than you actually want to.
That tension directly undermines the deliberate, content-aware instrumentation approach that LLM observability demands. You can't make intelligent decisions about what spans to keep and what context to retain if economics are making those decisions for you.
This is where groundcover's approach becomes particularly relevant for teams building at AI scale. By running the observability backend entirely within your own cloud through a BYOC architecture, the cost of telemetry decouples from volume. Engineers can instrument deeply, retain the full context needed for model evaluation and agent behavior analysis, and apply content-aware sampling based on what actually matters, without triggering overage surprises. The eBPF sensor handles the network-level baseline automatically, while OpenTelemetry SDKs and groundcover's LLM observability layer handle the application-level depth, all without the economics forcing visibility compromises at exactly the moment when visibility matters most.
Sign up for Updates
Keep up with all things cloud-native observability.
We care about data. Check out our privacy policy.

.jpg)
.jpg)




