Product Updates
Shahar Azulay • Jun 17, 2026

Your expertise, running without you

Most agents guess at how your systems work. groundcover Agent Mode investigates incidents with your team's own judgment, telemetry, and cloud.

Your expertise, running without you
Shahar Azulay
Shahar Azulay
June 17, 2026
June 17, 2026
7
min read
Product Updates

Your knowledge is the most valuable thing your team has, and the most trapped. It lives in your head. So when a service you designed falls over at 2 a.m., the fix is in your head too, which means it's a Slack DM waking you up, or the on-call engineer guessing without you. Every new hire relearns what you already know. You spend your days as a lookup table for the things only you can answer, instead of building the things only you can build.

That's the real cost of being the person who knows how everything works. It isn't the hours. It's that none of it scales past you.

This release is about closing that gap. It takes the way you investigate and makes it ambient, so it runs for the whole org whether or not you're awake. Observability stops being a place people go and becomes part of how the team already works. Here's what that looks like across one single day.

Daytime: you encode it once

You write your triage playbook down as a skill: the lookback window you always use, the services you check first when checkout gets slow, the exact format an incident summary has to land in. A skill is just a set of instructions the agent follows whenever they apply, written once. You publish it across the org, so it's no longer your playbook. It's everyone's.

Then you wire up the rest the way you'd set up any system you're responsible for. You connect the tools your team already lives in: Slack, Linear, Jira, and the coding agents they run, whether that's Cursor, OpenAI Codex, GitHub Copilot, or Claude. You set up the monitors and notification routes so the right alert reaches whoever's on call, fast. And you set the boundaries: who can reach what, and how far an agent is allowed to go on its own.

Twenty minutes of deliberate work. Then you close the laptop and go to dinner.

2 a.m.: it runs without you

Checkout latency trips an alert. You're asleep, and you stay asleep.

The alert lands in the on-call channel on its own. That's the notification route you set up months ago, doing its job. The engineer who catches it has never touched this service. A year ago that meant an hour of digging and a 2 a.m. message to you. Tonight they pull up Agent Mode and give it two words: triage this.

From there it's agents working with agents. Carrying your skill, the agent reads the Slack history where someone flagged an afternoon Postgres migration, pulls the linked Linear ticket, and works the cause across logs, traces, metrics, and Kubernetes events. The engineer spins up a second agent to run the Postgres service down in parallel, so two lines of inquiry move at once. When the first lands on the likely cause, the handoff kicks in: it passes the fix to your coding agent, whether that's Cursor, Codex, or whatever your team runs, which opens a pull request, branch and all, for a human to approve.

The engineer reads the summary in the format you defined, approves the PR, and goes back to bed. They drove it with two words, but they were never alone, and they never needed to know this service the way you do. They never built a dashboard, and they never paged you. They resolved an incident using judgment that was yours an hour before they needed it.

The next morning: it compounded

The part that matters most happened while everyone slept. After that investigation, the agent can turn what it just did into a new skill. So the next person to hit this class of problem doesn't start where the on-call engineer started tonight. They start where the agent finished. Your expertise didn't run once and reset. It ran, then it leveled up, and now it's waiting for the next person.

That's the leverage. You encoded your judgment a single time, and it paid off for an engineer you didn't talk to, on a night you weren't working, and left the whole team smarter than it was yesterday.

The features under that day are worth naming, because they're what make each step real, and what most "AI SRE" tools on the market can't do.

The data it works on is already there. None of this assumes a long instrumentation project. groundcover collects with eBPF, so a sensor drops in and captures your workloads out of the box, with no code changes and no OTel setup. And because the platform is OpenTelemetry-native, teams already emitting OTel get ingested and enriched right alongside everything the sensor sees. eBPF gives you the baseline with zero effort, OTel gives you the depth, and you don't have to choose between them. It all sits in your own cloud under BYOC, which is exactly why the agent can work on it where it already lives.

The agent carries your judgment, not generic guesses. Most agent tools start with no institutional knowledge. They don't know how your systems behave, so they guess. Skills are the difference. They shape what the agent knows (your playbooks, your conventions) and how it behaves (its tone, when to escalate, the lines it won't cross). Encode the playbook once and it stops guessing. That's the gap between a flashy demo and someone you'd put on call.

It reaches the tools where the work already happens. Connectors let the agent act inside Slack, Linear, Jira, and your coding agents under each user's own credentials, and nothing touches your repo until a human approves the pull request. Slack matters more than it looks. It's where developers work, and where the offhand clues that explain a problem tend to live, far from any dashboard.

It respects the boundaries you set, automatically. This is your job made enforceable. The agent takes on the identity of whoever triggered it, with the same RBAC and the same data scope, so if a person can only see one cluster, neither can the agent working on their behalf. Every action, every agent, and every API call lands in an audit trail. That same BYOC boundary keeps your data in your own cloud, and the Claude models behind Agent Mode run through the managed service your cloud already trusts, whether that's Bedrock on AWS, Vertex AI on Google, or Foundry on Azure. Your prompts and telemetry never train those models and aren't kept past the request.

And it doesn't wall you off. groundcover exposes its own MCP server, so the agents and tools your team has already built can query it directly. Connectors are how our agent reaches out. The MCP server is how your agents reach in. It's the piece already showing up in our numbers: nearly 70% of our customers are actively using MCP.

Why this is the new shape of the job

None of this is a prediction. Three out of every four of our customers are already using at least one of these AI capabilities, on live production data. One of them recently used Agent Mode to work out a fix for a frontend bug that had them stuck, and shipped the workaround it landed on. The shift is happening now, and it's changing what a platform engineer's day is for. Less time being the human escalation path, more time deciding how the whole system should think.

The platform a company trusts to watch production will come down to how AI-native it actually is. The deeper change is quieter than that, though, and it's the one you'll feel first. The best version of your expertise is no longer the incident you personally solved. It's the one that got solved without you, the way you would have, because you taught the system once.

Shahar Azulay
Shahar Azulay
 
CEO

8 min read |
Published on: Jun 17, 2026

Latest posts

Explore related posts

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.