TAO Testing Brings Observability to Every Environment with groundcover
Customer Story

TAO Testing Brings Observability to Every Environment with groundcover

Chris Churilo
June 17, 2026
 |  
7
min read
TAO Testing Brings Observability to Every Environment with groundcover

0

Monitoring-induced outages (down from 4 sev-1 incidents the prior year)

5

Environments now fully monitored (dev, staging + 3 production)

100%

Coverage across all dev, staging & production clusters (up from production-only)

"groundcover is the perfect combination of helping you open source your stack, with fair, predictable pricing on the sensor — it delivers the promise the Dynatrace OneAgent never did."

Rafael Jose Tovar Garrido, Cloud Architect, TAO Testing,

After cycling through Grafana and Dynatrace, the global digital assessment platform consolidated on groundcover thus eliminating monitoring-induced outages, expanding coverage from production-only to every dev, staging, and production cluster, and turning a tool nobody touched into one developers open every day.

"If the monitoring tool went down, our applications went down."

"In the past year on our previous platform, we had four severity-one incidents because the monitoring tool was breaking our applications. With groundcover, if the sensor goes down, we just lose that slice of data — our applications keep running. That alone changed how I think about observability."

— Rafael Jose Tovar Garrido, Cloud Architect, TAO Testing

About TAO Testing

TAO Testing is a leading digital assessment and testing platform used by educational institutions, certification bodies, and enterprises around the world to create, deliver, and manage online assessments. The platform helps organizations build secure, scalable testing experiences spanning everything from high-stakes certification exams to classroom assessments and workforce evaluations, with a strong focus on reliability, accessibility, and compliance with open standards.

TAO Testing runs its platform on Kubernetes in Google Cloud (GCP), supported by a team of roughly 120 developers. As its assessment ecosystem grew globally, deep visibility into infrastructure performance and reliability became essential — both to protect a seamless experience for test takers and to let engineering teams catch and resolve issues before they reach production.

The Challenge: A Two-Stop Tour Through Expensive, Fragile Observability

Before groundcover, TAO Testing went through two observability platforms, and neither held up as the business scaled.

Stop one: Grafana. The team started with the open-source Kubernetes standard which comprises Grafana with Prometheus, Tempo, and Loki. It was stable, but as the platform grew, the team didn't want to spend its time running the infrastructure required to monitor its infrastructure. Everything was managed by hand. Moving to Grafana's enterprise tier didn't solve it: every new capability around logs and traces required significant changes on TAO Testing's side, and the support experience fell short. The team canceled the contract.

Stop two: Dynatrace. The Dynatrace proof of concept looked perfect with full observability on the enterprise tier with nothing to manage. But the enterprise pricing was out of reach for production use, so TAO Testing landed on a mid-tier plan and immediately hit a wall of metered costs. Reading logs cost money. Injecting data cost money. Enabling functionality costs money. As Rafael put it, the team "paid and paid for basically everything." Reading the data they needed ran roughly 10x more expensive than it had been on Grafana.

“Groundcover is the perfect combination of helping you open source your stack, with fair predictable pricing on the sensor, which delivers the promise of the Dynatrace One Agent never delivered.”

— Rafael Jose Tovar Garrido, Cloud Architect, TAO Testing

That cost model forced painful trade-offs:

  • Coverage was rationed. Every application and namespace had to be tightly scoped, because anything outside the monitored set was prohibitively expensive. The team had monitoring in production only and not dev, nor staging.
  • The tool became fragile. A failed Dynatrace upgrade could take down the very services it was meant to watch. Over one year, TAO Testing logged four severity-one incidents caused by the monitoring tool breaking its applications.
  • Developers stopped using it. Because every query and action carried a cost, Rafael had to tell engineers not to touch the tool. Some developers' last login was three years old. Adoption effectively died.

With a two-year contract running down and AI-driven tooling maturing fast, TAO Testing started looking for an alternative.

Why groundcover?

Rafael worked through a list of platforms on the market and kept coming back to the least-hyped option. The deeper he looked, the more the architecture made sense for a team that wanted full visibility without surrendering control of cost or reliability.

1. A BYOC, eBPF architecture that can't take down production. groundcover deploys into TAO Testing's own cloud, and its eBPF sensor is decoupled from the applications it observes. If the sensor fails, the team loses that window of data and nothing else. This directly answered the failure mode that had caused four sev-1s on the previous stack. It also removed the dependency on an external observability cloud to decide what could be monitored, when, and how.

2. Full coverage, no rationing. Freed from per-action and per-read billing, TAO Testing stopped scoping monitoring down to a handful of services. The team now collects metrics, logs, traces, and events across all clusters on every dev, staging, and production environment rather than production alone.

3. Pricing that isn't tied to how much you look. Because groundcover's model isn't driven by data reads or query volume, engineers can explore freely. As Rafael noted, groundcover doesn't make money when more data comes in, which keeps incentives aligned with the customer's.

4. Operationally simple to run. Installation through the BYOC model was straightforward, and the platform is easy to maintain and upgrade; a sharp contrast to the upgrade fragility the team had lived with before.

5. A fast, two-sided POC. TAO Testing validated groundcover from two angles at once, on staging data. Rafael's operations lens focused on install, maintenance, upgrades, and resilience thus confirming that the platform's health is independent of application health. In parallel, software engineers tested correlation between logs and traces, querying, and dashboard building. When the team flagged early rough edges such as custom trace colors, public dashboards, and RBAC permissions, the groundcover responded directly, including shipping a new "smart color" feature so visualizations adapt automatically.

Impact / Results

Coverage expanded from production-only to everywhere. TAO Testing is now fully deployed across dev, staging, and all three production environments, with full data from every cluster giving visibility the team simply could not afford before.

Tooling consolidated onto one platform. A multi-tool, hand-managed Grafana/Prometheus/Tempo/Loki stack and a metered Dynatrace deployment gave way to a single, unified observability platform.

Monitoring-induced outages eliminated. The decoupled eBPF architecture removed the failure mode behind four severity-one incidents in the prior year.

Cost anxiety removed from daily work. Reading data on the old platform cost roughly 10x what Grafana had; on groundcover, querying and dashboard-building no longer carry a per-action penalty.

Developers came back — and stayed. A tool some engineers hadn't logged into for three years became one the team uses constantly. With open access, everyone can build personal dashboards and explore data freely. In Rafael's words, "they are super happy now."

Today TAO Testing actively uses groundcover for infrastructure monitoring, traces (a daily tool for developers and operations, e.g. filtering by 502 errors during incidents), synthetics (proactively detecting outages and auto-alerting the support team in Slack), alerting, dashboards, and the new agent capabilities (covered in detail below).

Spotlight: AI-Powered Observability with groundcover

Two of the capabilities TAO Testing is most excited about are groundcover's agent mode and the groundcover MCP — together, they let the team investigate issues by asking what's happening in plain language instead of manually piecing together signals.

Agent mode

Agent mode is groundcover's in-platform AI. The assistant that investigates the stack from inside it. Rafael  can ask, in plain language, what's going on with their system, and the agent investigates across their observability data, pulling scoped results for a specific time window, application, or container and assembling the full picture in one place: Kubernetes events, pod logs, response and respawn times, and more. Because groundcover is built on an eBPF sensor, the agent sees that depth automatically, without anyone manually instrumenting services. "It's fantastic," Rafael said. "It helped me understand, because you have the full picture."

groundcover MCP

The groundcover MCP server delivers that same production-grade telemetry to the tools engineers already use. Rafael queries live observability data through Cursor, asking questions in natural language from inside his IDE with no query syntax, no tool-switching,  so groundcover context sits right where developers are already working.

Scaling it across the team

Because the experience is so powerful, TAO Testing is being deliberate about how it scales. Agent mode is enabled for Rafael today while the team sets the right guardrails before a broader rollout to its ~120 developers. Importantly, the model runs on Vertex AI in TAO Testing's own Google Cloud environment, and the team pays those token costs directly and there's no groundcover surcharge on top. Usage is metered per query, so Rafael has set spend limits (groundcover supports quota budgets per user and team) to keep costs predictable before opening it up more widely. groundcover is also building workflows that route agent use through defined paths, so the whole team can get the benefit without unbounded cost.

Looking ahead, two roadmap items stand out for the team:

  • Agent-built Jira tickets — the feature Rafael is most looking forward to. His synthetics already alert Slack; next, he wants incidents to land as Jira tickets the agent assembles itself, complete with groundcover links, context, and a suggested root cause and fix, so the right person can jump straight in. A native Jira connector is already in development (under data sources → connectors).
  • One-click cardinality control — using the agent to spot and drop high-cardinality metrics in a single click, keeping both the backend and the bill in check.

Future / Next Steps

Looking ahead, TAO Testing plans to:

  • Roll out agent mode to its full engineering team — opening AI-powered investigation to all ~120 developers once cost guardrails are in place.
  • Adopt OpenTelemetry alongside the eBPF sensor to capture deeper telemetry from its internal applications.
  • Connect incident workflows to Jira, so alerts route into tickets with the links, context, and suggested root cause engineers need to act fast.
  • Quantify the coverage it gained by capturing a month of dev and staging data volume — visibility it could never afford to collect on its previous platform.

Ready to see what full observability without the metered bill looks like?

Book a demo to see how groundcover delivers full-stack, full-coverage observability in your own cloud — without per-query pricing or the risk of monitoring taking down production.

Chris Churilo

8 min read |
Published on: Jun 17, 2026

Latest stories

Explore more stories

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

Observability
for what comes next.

Start in minutes. No migrations. No data leaving your infrastructure. No surprises on the bill.