How Posh found issues with only 1 hour invested into the POC

In complete contrast with their experience with Elastic, which they also tried to get running for 4 weeks before giving up, Posh chooses groundcover to replace Datadog in a truly seamless migration process.

Gilles Perez

min read,

May 14, 2025

•

min read

Industry:

FinTech

Company Size:

150+ employees

Installation:

Founded

2018

Headquarters

Boston, MA

“The whole proof-of-concept took about a day. Security approved the role in the afternoon, I deployed the sensor in under two hours, and suddenly we had trace-level insight - even breaking out individual GraphQL operations we’d never seen before. groundcover was the snappiest, easiest observability platform we’ve touched.”

Adam Ceresia

Software Engineering Manager

Posh

About Posh

Posh is a venture-backed conversational AI company that provides chatbots and voice assistants tailored for the banking industry. Dozens of financial institutions rely on Posh’s platform to automate customer service via natural language conversations - a mission-critical task where reliability and responsiveness are paramount. On the technology side, Posh’s infrastructure is built on a modern microservices architecture (leveraging GraphQL APIs) running in Kubernetes. With a lean engineering organization (Posh has ~150 employees, and a platform team of just 3), every tool needs to deliver value without heavy maintenance. This was especially true for observability across Posh’s development, staging, and production clusters, which handle thousands of chatbot interactions and requests every day.

The Challenge: High costs, blind spots, and DIY frustrations

As Posh’s platform grew, the limitations of its existing observability stack became apparent. The team had been using Datadog APM and logs, but the costs skyrocketed as data volumes increased. They continuously tried to rein in usage - trimming log retention, limiting host and container counts - to stay under budget. Eventually, only the production environment remained fully monitored in Datadog, leaving development and staging with minimal coverage.

“Datadog was very expensive for us. We spent two years saying ‘we gotta reduce the logs, reduce the number of hosts, reduce the number of containers.’ It got to the point where only production was in Datadog. We even built out Grafana open source, but that was a huge time sink, wasn’t very stable, and honestly wasn’t that great.”

- Adam Ceresia, Software Engineering Manager, Posh

In an effort to reduce reliance on Datadog, Posh’s team created a self-hosted Prometheus/Grafana stack for metrics and dashboards. However, maintaining an open-source monitoring stack proved costly in terms of engineering time. A three-person platform team simply couldn’t afford to spend weeks babysitting monitoring tools.

Even then, the open-source solution lacked some capabilities (like distributed tracing) and still didn’t solve the visibility gap in non-prod environments. Posh’s engineers felt they were flying partially blind - especially for complex requests going through their GraphQL services - and the overhead of managing multiple monitoring tools was dragging them down.

Why groundcover?

Posh knew it could do better and began evaluating virtually every observability vendor on the market. Elastic, Honeycomb, Sentry, Sumo Logic, Splunk, New Relic, Dynatrace, Grafana Cloud - you name it, Posh considered it. But each alternative came with trade-offs.

Many commercial APM suites were as cost-prohibitive as Datadog, or introduced integration headaches with Posh’s existing OpenTelemetry instrumentation. In one case, a proof-of-concept with Elastic consumed 3-4 weeks of an engineer’s time before the team gave up.

“For a three-person team, dedicating one person for about a month and it still didn’t work… that’s a lot of effort. The groundcover PoC was the opposite experience: security signed off in the afternoon, I deployed the eBPF sensor, pointed our OpenTelemetry pipeline, and within an hour the dashboard lit up with live traces-no code changes, no firefighting.”

- Adam Ceresia, Software Engineering Manager, Posh

When Posh trialed groundcover, by contrast, the experience was night and day. They deployed groundcover’s lightweight sensor across their Kubernetes clusters in minutes, not weeks. Almost immediately, groundcover began collecting metrics, logs, and traces from Posh’s services with zero code changes - thanks to its eBPF-based data collection. The team was surprised by how quickly they started seeing actionable insights. In fact, groundcover’s proof-of-concept was up and delivering value within a day, allowing Posh to confidently move forward much faster than with any other vendor.

A closer look at the evaluation and deployment of groundcover:

Effortless deployment and immediate visibility: Installing groundcover as a DaemonSet in Posh’s Kubernetes clusters was straightforward and required no changes to application code. The eBPF-powered agents auto-instrumented everything from HTTP calls to database queries. This meant full-stack observability from day one - Posh instantly gained visibility into all their microservices (and even network flows) without writing a single line of instrumentation. In particular, the team discovered that groundcover could trace GraphQL requests at the operation level, a capability they hadn’t seen elsewhere. With groundcover, they could finally see traces broken down by individual GraphQL query/mutation, which was impossible before. This out-of-the-box insight into GraphQL performance helped the team pinpoint slow resolvers and understand variability in request latencies that previously baffled them.‍

"We use GraphQL quite a bit, and none of the other services were distinguishing between the GraphQL methods. groundcover is the first tool that breaks out every individual GraphQL operation; those murky ‘POST/graphql’ calls that Datadog lumped together suddenly became clear, query‑level traces we can monitor and alert on.”

- Adam Ceresia, Software Engineering Manager, Posh

High performance with minimal overhead: groundcover’s use of eBPF and its BYOC architecture (data stays in Posh’s own environment) meant that adding deep observability did not burden the platform. The agent’s footprint was negligible in terms of CPU/memory, and it cleanly integrated with Posh’s existing tooling (including OpenTelemetry). Unlike some other vendors that claimed to support OpenTelemetry but required custom agents or suffered ingestion issues, groundcover accepted Posh’s standard OTLP data with no hassle. The result was a smooth integration - Posh was able to feed their existing OpenTelemetry traces/metrics into groundcover and immediately see them in the unified dashboard. This saved the team from instrumenting everything yet again or dealing with vendor-specific SDKs (one vendor “wanted you to use their OpenTelemetry package, which kind of defeats the purpose,” Adam noted. In short, groundcover delivered the promised capabilities without weeks of tinkering or performance tuning.
Predictable, flat pricing model: A huge factor in Posh’s decision was groundcover’s node-based pricing. Unlike Datadog (and most competitors) that charge by volume of data or number of hosts with unpredictable bills, groundcover’s pricing is a fixed flat rate per node. This was a game-changer for Posh. With groundcover, Posh no longer has to constantly prune data or worry that enabling extra monitoring in a dev environment will break the budget. This pricing predictability gave the team confidence to roll out observability everywhere (and turn on more detailed logs or traces when needed) without fear of a surprise bill. Essentially, groundcover decoupled their observability costs from usage growth - a crucial benefit for a scaling startup.
Fast, painless proof-of-concept: groundcover proved its value to Posh in a matter of hours, not weeks. The ease of deployment and immediate insights meant the team spent very little time on setup and could focus on evaluating features that mattered. In calls with groundcover’s team, Posh was able to see their own services’ data populating dashboards almost right away. This rapid time-to-value stood in stark contrast to other vendors’ POCs that dragged on. The flat effort required also meant even Posh’s developers could try groundcover in lower environments without extensive coordination. By the end of a short evaluation, Posh saw that groundcover checked all their boxes - comprehensive visibility, better performance, and sane pricing. The decision to choose groundcover became an easy one.

The Impact

After a successful evaluation, Posh began standardizing on groundcover as its observability solution across all environments. They rolled out groundcover’s monitoring to every Kubernetes cluster - dev, QA, staging, and prod - eliminating the visibility gaps they had before. Now, whether an engineer is debugging an issue on a local dev cluster or tracking an incident in production, groundcover is the go-to tool for metrics, logs, and traces. This universal coverage has made it far easier to catch problems early and ensure consistency when promoting code to production. Features like trace views and service maps are used in development and test cycles, not just in firefights after deployment. By adopting groundcover, Posh achieved several key results:

Full observability in every environment: With groundcover in place, Posh can now easily monitor 100% of its services across all environments. Thanks to the platform’s flat pricing model, they’re able to ingest all the telemetry they need without worrying about cost constraints. Developers are now empowered to enable detailed logging or tracing in lower environments without incurring huge costs. This has improved the quality of testing and gives the team much more confidence before changes reach customers. No service is left as a blind spot - even experimental features in dev get the same level of visibility as production.
Improved visibility & faster troubleshooting: With groundcover, Posh finally has deep insight into their GraphQL-based architecture and more. They can see performance metrics for each GraphQL query or mutation, identify slow database calls, and correlate logs with traces seamlessly. In the past, when a request spanned multiple services (especially via GraphQL), their old tools would show a single opaque “POST/graphql” entry with a broad range of response times. Now, engineers can pinpoint exactly which GraphQL resolver or downstream service is causing a slowdown. When a GraphQL service sat in the middle, Datadog traces were worthless because they didn’t break down the queries. groundcover fixed that by distinguishing each GraphQL operation and showing its individual latency. This level of detail has significantly accelerated Posh’s troubleshooting. Issues that might have taken days of guesswork can now be resolved in hours with concrete data. Overall, mean time to resolve incidents is dropping, and the team spends less time in war-room debugging sessions.
Lower cost and eliminated wasteful effort: groundcover’s efficient data handling and pricing model immediately reduced Posh’s observability spend. They no longer have to pay by the gigabyte or push data into expensive third-party log stores. In fact, Posh was able to cancel a project aimed at aggressively reducing logs across the engineering org - a project that existed solely to cope with Datadog/GCP logging costs. With groundcover, such extreme cost-cutting measures are unnecessary, freeing the engineers to focus on feature development rather than playing “data janitor.” Moreover, Posh has avoided the significant engineering effort that would have been required to maintain and scale an open-source monitoring stack. The team can trust groundcover to handle the heavy lifting of observability, saving them man-months of work in the long run. For a small team, this is invaluable - those reclaimed cycles are now spent on product improvements and reliability enhancements that directly benefit Posh’s customers.
Smooth migration path from Datadog: Transitioning from an entrenched platform like Datadog can be daunting, but Posh’s move to groundcover was relatively straightforward. groundcover’s support for OpenTelemetry meant that the instrumentation already present in Posh’s services could be seamlessly redirected - they didn’t have to rewrite their metrics or tracing code. Additionally, groundcover provided a Terraform provider for managing dashboards and alerts as code. Posh took advantage of this to codify equivalent monitors to what they had in Datadog. Today, 80-90% of their alerts are managed through Terraform definitions, making the monitoring setup reproducible and version-controlled. This Infrastructure-as-Code approach simplified the migration of dozens of alerts and dashboards. All of Posh’s data has already been moved into groundcover. The migration has been low-friction, with groundcover’s team supporting Posh through the process.

“We were having to constantly clamp down on logging because of cost. Now with groundcover, I can encourage my developers to log more. If there’s something they’re not seeing and they want to add a log, it’s not a concern anymore… We took that whole ‘reduce the logs’ project and just dropped it.”

- Adam Ceresia, Software Engineering Manager, Posh

Posh’s journey with groundcover highlights how a small engineering team can achieve big-company observability. By replacing their expensive, patchwork monitoring stack with groundcover’s unified platform, Posh eliminated blind spots and freed themselves from punitive costs. Engineers at Posh now have the confidence to monitor everything - from a new feature in dev to a critical production API - with the same clarity and detail. This has not only improved their system’s reliability, but also empowered the team to iterate faster and deliver a better experience to Posh’s banking customers, all without breaking the bank.

Adam Ceresia

Software Engineering Manager

Posh

Sign up for Updates

Keep up with all things cloud-native observability.

How Posh found issues with only 1 hour invested into the POC

About Posh

The Challenge: High costs, blind spots, and DIY frustrations

Why groundcover?

A closer look at the evaluation and deployment of groundcover:

The Impact

Sign up for Updates

Get startedwith groundcover

See the platform in action

Book an on-demand demo with a customer engineer

100% visibility all the time.

Troubleshoot like a pro.

Reduce data & growth costs, dramatically.

Done!

Book a demo

Get started
with groundcover