Why We Built AI Natively Into Observability

Explore why native AI observability delivers deeper visibility, better context, and full data control compared to solutions that bolt AI onto a SaaS backend.

Shahar Azulay

March 24, 2026

June 17, 2026

min read

AI Observability

Every observability platform has shipped an AI feature in the last six months. The architecture is almost always the same: a chat interface appears somewhere in the product, your logs and traces get sent to a third-party model, an answer comes back. It's fast to build. It's easy to demo. And I think it's fundamentally the wrong approach.

I want to explain how we thought about this differently, not because I'm trying to score points against competitors, but because I think the architectural choices we made are worth understanding. They have real consequences for what the agent can actually do.

The problem with bolting AI on

When you treat AI as a layer you add on top of an existing platform, you inherit every limitation of that platform. If your observability data lives in a SaaS backend that was ingested, stored, and queried on someone else's infrastructure, then your AI agent is working from the same position as a third party. It can only see what's been instrumented. It can only call the APIs you've exposed. It gets rate-limited like any other external consumer.

We've seen this play out with a few vendors who built entire standalone products for their AI experience, as a separate app, a separate interface, completely decoupled from the investigation context a developer was already working in. You leave what you're looking at, ask a question, get an answer, and then navigate back. You lose the thread every time you cross that boundary.

That's not an AI-powered platform. That's a chatbot sitting next to a platform.

The best engineering happens when the product beats the engineering and when you know exactly what the user expects, and you can make fundamental choices accordingly.

We made a different bet from the beginning. The agent had to be native and context-aware of wherever you are in the product, capable of creating real groundcover assets like dashboards and monitors and gcQL queries, and running inside your own environment rather than ours. It took longer to build. It required us to rethink how we expose data internally and to build a unified query language on top of all our data sources. But I think that's exactly right.

Why eBPF changes what the agent can answer

There's another problem with bolting AI onto a traditional observability stack, and it goes deeper than architecture. Most observability platforms are limited by what developers have manually instrumented. If a service was never set up with OpenTelemetry, it doesn't exist to the platform and it doesn't exist to the agent.

groundcover deploys an eBPF sensor at the kernel level. It doesn't require developer instrumentation. It captures telemetry automatically from every workload running on the infrastructure, and it enriches every signal like logs, traces, metrics, events with a shared identifier at ingest so they can be correlated without any manual wiring.

The practical difference is significant. With an instrumentation-dependent agent, you can ask questions about the services you configured. With eBPF as the data foundation, you can ask questions about your entire infrastructure including the parts no one thought to instrument.

Try asking "How many databases am I running?" with manual instrumentation. You'll get a partial answer at best. With eBPF, that's a straightforward question because we actually see everything. The agent can answer it accurately because the data is complete. This isn't a feature difference. It's a structural one.

Data that leaves your environment is data you don't control

The third reason we built the way we did is compliance and data sovereignty and this one matters more than people realize until it's too late.

Your observability data isn't just logs and traces. It's API keys showing up in payloads, service credentials in environment variables, traffic patterns that reveal your architecture, error messages that expose your vulnerabilities. When you give an AI agent your API key and ask it to fetch your data, you're creating two external actors handling your most sensitive production information; the observability vendor and the AI provider with limited controls over what either does next.

Our agent deploys on Amazon Bedrock inside the customer's own AWS account. It's provisioned automatically during onboarding. Prompts never leave the environment. Production data never leaves the environment. Customers pay Bedrock token costs directly, at cost and we don't mark up tokens the same way we don't charge for ingestion. They can set usage quotas per user or per team, the same model engineering teams already understand from tools like Cursor.

We didn't bring the customer's data to the AI. We brought the AI to the customer's data.

One of our early customers told us they were able to roll out the agent without triggering a security review internally, specifically because the data never leaves their account. That's not a coincidence of our design. That's the point of it.

What this unlocks

When the agent is built natively on top of the platform, a few things become possible that aren't possible any other way.

The agent is accessible from any page in groundcover, context-aware of what you're looking at. Its output creates first-class groundcover objects and not text that you have to copy somewhere, but actual dashboards, monitors, and query pipelines that live in your environment. Multiple agent tabs let you run parallel investigations. When the root cause might be in the codebase, the agent can use Cursor or Claude Code as a specialist tool to keep the investigation continuous rather than fragmenting it across different products.

There's also something important about the query language. We rebuilt our internal APIs to support one unified query language — GCQL — across all our data sources. This wasn't just for the agent. It's for everyone. Everything the agent does, a developer can do themselves. Every tool call the agent makes is visible and clickable. Every result can be taken and used as a native groundcover asset. The agent teaches rather than obscures.

We're still early

I want to be honest about where we are. The investigation and data exploration capabilities of the agent are solid. The asset creation like generating dashboards and monitors is still being improved. Dashboard creation in particular involves a lot of ambiguous design decisions, and we're still refining how the agent handles that well. This is active work.

The memory system is also something we're thinking hard about. How does the agent retain knowledge about your infrastructure across sessions? How do you build organizational memory while running an understanding of what every service does, what its normal behavior looks like without it getting stale or polluted? These are hard problems. I don't think anyone has fully solved them yet.

But the foundation is right. Running natively inside your environment. Seeing everything your infrastructure produces, not just the instrumented slice. Creating real assets that live in your platform. And doing all of it without your production data leaving your account.

That's what we're building toward. And I think the teams that will win in AI-powered observability are the ones who got the architecture right from the start — not the ones who bolted a chat interface onto a SaaS backend and called it an agent.