Omer Mayer
Omer Mayer
November 13, 2025
November 13, 2025
X
min read

One of our leading principles when thinking about the migration processes off of legacy observability vendors, was making it a minimal effort process for our users. And of course, anyone who has switched (or used multiple) observability tools, knows that one of the most fragile and time consuming parts is mapping and validating metrics to metric, data to data, across all resources: Dashboards, Monitors, Queries, and more.

  • Imagine disassembling your wardrobe when moving to a new house only to remember that after the wardrobe was originally assembled the instruction manual was, of course, thrown out.
  • Suddenly you realize that all the planks, screws, holes, although look similar, go in a very specific order. Assembling even one of these parts wrong means the wardrobe will be shaky, the doors won’t close properly or it might even totally collapse.
  • After passing the point of exhaustion, you (hopefully) don’t give up and turn to the manufacturer’s support to build it again step by step with their help.

Enter groundcover movers to the rescue.

Having experienced movers that are familiar with the spec of every furniture that you have, makes disassembling and reassembling so seamless and quick you can’t even be able to tell the difference. Pure magic.

We had three main challenges we were looking to tackle in this effort:

  1. Close the gap in our out-of-the-box metrics collected by our eBPF sensors. Many competitors have built an extensive list of metrics over the years they’ve been operating. In addition Linux systems evolved significantly over the last decades. We were strong in many areas, but had gaps we needed to close in others.
  2. Comprehensive mapping of all the built-in metrics. Most legacy vendors use proprietary prefixes and styling. At groundcover we follow the Prometheus convention for naming metrics along with a groundcover_ prefix for any metric our sensor collects, a standard endorsed by both the CNCF and the Linux Foundation.
  3. Enriching metrics with more metadata in order to be used as labels when filtering, grouping, and exploring the metrics.

Here’s an example of a challenge that might seem fairly easy or trivial to solve at first, but when putting scale and performance at the top of our minds, require creativity and wit:

  • We originally collected container metrics by parsing cgroup files ourselves with a caching layer that kept file handles open between collection cycles, minimizing expensive I/O operations and syscalls.
  • As our metric coverage grew, we had to choose between expanding this custom parser or adopting a more widely used and maintained solution. We decided to move to cAdvisor.
  • The problem: cAdvisor wasn’t built for scraping thousands of containers every few seconds. 
  • Its cgroups library calls open() on every cgroup file for every read (dozens of open/close cycles per container per scrape) which quickly became a performance bottleneck at scale.
  • While digging through the code, we found that the library exposes a function pointer for file operations, intended for testing and mocking. The problem was that it was private. So we used the //go:linkname compiler directive to access and replace this private function pointer at runtime, redirecting all file operations through our caching layer.
  • The outcome: reduction in 90% fewer file ops with similar performance to our custom collector, and all the benefits of cAdvisor’s coverage and maintenance without forking the library or maintaining thousands of lines of code.

To close off, let's review a simple example with a query from a legacy vendor: sum:kubernetes.cpu.usage.total{$cluster,$namespace} by {pod_name}

This is how the translation will work:

  • Translate the metric: kubernetes.cpu.usage.totalgroundcover_container_m_cpu_usage_seconds_total
  • Translate the label: pod_namepod
  • Translate the conditions (in this case they are kept the same): $cluster, $namespace
    • (More on Variables in <translation blog>)
  • Translate the functions: sum:sum by 
    • (More on the query framework in <Amir’s blog>)
  • Finally, the query you get looks like:
    sum by (pod) (groundcover_container_m_cpu_usage_seconds_total{$cluster,$namespace})

This is done automatically.

For all your metrics, all your queries, all dashboards and all your monitors.

Without lifting a finger.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.