Aviv Zohari's profile Image
Aviv Zohari
Founding Engineer
minutes read,
May 30th, 2023

If you've ever cooked a filet of fish, you know something about what it's like to monitor Kubernetes applications. That’s right, because frying a filet to perfection – just enough that you eliminate the risk of food poisoning, but not so much that you end up with a tough and dry piece of meat – is akin to the challenge of Kubernetes application performance monitoring. From the surface, it's hard to tell how cooked your fish is, just as collecting surface-level monitoring metrics from Kubernetes provides relatively little insight into how well the apps are actually performing.

Fish cooking tips aside, what we're actually here for is to help you conquer the challenges of monitoring Kubernetes application performance to optimize efficiency and reliability. Below, we walk through the metrics you should track for Kubernetes apps, as well as the challenges that make Kubernetes app performance monitoring difficult. We then discuss tools and practices that help to streamline Kubernetes application performance monitoring, regardless of which Kubernetes distribution you use or which types of app you deploy.

What is Kubernetes APM?

Kubernetes Application Performance Monitoring (APM) is the practice of monitoring and managing the stability, responsiveness, and overall health of Kubernetes applications. To do this, teams collect a range of metrics and other data using various tools.

The purpose of Kubernetes APM is to help applications achieve the best overall performance given the resources available to them. This requires identifying and fixing issues such as a slowdown in the rate at which an application is responding to requests. It also involves practices like setting the right resource quotas and limits so that each app has the resources it needs to function properly, but does not consume so many that other applications experience performance issues.

Kubernetes monitoring vs Kubernetes APM

Kubernetes monitoring and Kubernetes APM are closely related concepts, but they entail different practices.

Kubernetes monitoring is a generic term that can refer to monitoring any aspect of Kubernetes – such as control plane components, like Etcd or the API server. In contrast, Kubernetes APM focuses on monitoring the performance of Kubernetes applications.

Of course, to monitor application performance, you also typically need to monitor other parts of Kubernetes, because an issue with the control plane could impact applications. That said, the core focus of Kubernetes APM is on optimizing the performance of applications, not other parts of your Kubernetes cluster.

This may seem like a nitpicky semantic difference, but it has implications for how you approach Kubernetes monitoring compared to Kubernetes APM. For the latter, the ability to correlate application performance metrics with data related to the performance of other parts of the stack is crucial for determining whether application performance issues stem from a problem with an app itself (such as buggy code) or one rooted in the environment that hosts the app (like lack of sufficient node CPU or memory resources to support the app).

Correlating different types of metrics is often important for Kubernetes monitoring, too, but in that case you'd be more focused on optimizing the overall performance of your Kubernetes cluster, not using metrics to distinguish application issues from other issues. 

Why Kubernetes application performance monitoring matters

Let's start by discussing why you should care about Kubernetes app performance monitoring.

The answer might seem simple – if you don't monitor your apps, you risk running into performance issues like slow response rates or an inability to maintain service level agreements (SLAs) with your customers. But actually, the importance of Kubernetes performance monitoring involves more than just ensuring that your apps keep operating. It's also about ensuring that you can optimize your Kubernetes environment for efficiency.

After all, Kubernetes can mitigate some performance issues on its own. If an application starts sucking up a large amount of memory and risks exhausting the resources available on its host node, for example, Kubernetes will know that it should move that app to a different node that has more memory available. In this way, Kubernetes can help prevent the app (and host node) from dropping requests or crashing due to high resource utilization.

But just because your Kubernetes app isn't failing doesn't mean you should not be monitoring it. In the example above, you probably want to know why your app uses a lot of memory. It could be a memory leak bug in the application code. It could be due to poor memory management. It could be a backlog of requests that the app is struggling to catch up on. It could be any number of other things, too, but you won't be able to identify and resolve the root cause of the problem unless you’re monitoring the app.

By extension, you can't optimize the overall performance and efficiency of Kubernetes unless you monitor your apps. If you have applications that are consuming resources inefficiently or not handling requests as quickly as they should, your Kubernetes cluster is probably wasting resources. That leads to higher infrastructure costs, as well as an increased risk that Kubernetes will eventually reach the point where it can no longer keep shifting workloads between nodes, and your entire Kubernetes cluster will come crashing down.

Kubernetes app monitoring helps you stay ahead of issues like these. It allows you to identify application performance issues early, so you can address them before they lead to wasted money or place the stability of your Kubernetes cluster at risk.

The challenges of monitoring Kubernetes applications

Unfortunately, detecting performance issues for Kubernetes apps is often complicated, for a variety of reasons:

  • Distributed applications: The apps you deploy on Kubernetes are probably microservices-based and include multiple containers (each hosting its own microservice). As a result, you need to monitor the performance of each container, while simultaneously tracking how performance issues of one container impact other containers.
  • Complex root causes: The complex dependencies between Kubernetes applications and host infrastructure mean that an application component where you detect an issue may be different from the component associated with the issue's root cause. An app could be experiencing errors due to a hardware problem on a host node, for example, but you probably wouldn't know that just by tracking the application's error rate.
  • Dynamic scaling: Kubernetes constantly scales applications up and down in response to fluctuations in demand. That's one of its main jobs, and it helps ensure optimal use of available resources. But it also makes it impossible to establish a baseline of "normal" activity and measure deviations against it. As a result, you can't detect app performance problems based on simplistic strategies like monitoring whether an application exceeds a preset level of resource consumption. You need a more dynamic monitoring strategy.
  • Varying application needs: Along similar lines, there’s no one-size-fits-all guide for which performance metrics are appropriate for a Kubernetes app. You need to look at the unique requirements of each application, not conclude that an app is under-performing just because its latency rate surpasses a certain number, for instance.

The bottom line is: Kubernetes is a complex and dynamic platform, which makes it quite difficult in many cases to figure out what's causing application performance problems.

Key application performance metrics in Kubernetes

Just because Kubernetes application monitoring is hard doesn't mean it's impossible. What you need is the right strategy, and that starts with knowing which performance metrics to track.

Request rate

Request rate measures how many requests an application is receiving. For example, if clients request data from an app 300 times in a minute, then the request rate is 5 requests per second.

Request rate is an important metric for Kubernetes APM because it provides insight into the overall level of load that an application is handling. Ideally, an application will be able to maintain an adequate level of performance as its request rate increases, but a common performance problem is applications failing to keep up with increased demand.

For example, if you notice that an application takes longer to respond to requests during periods when it sees a higher overall volume of requests, you know that the issue is probably linked to the app's failure to perform well under heavy load. In that case, you might take steps to help the app handle higher request volumes more efficiently by optimizing its code. Or, allocating more memory or CPU resources to it could help it handle higher numbers of requests.

Example snapshot from groundcover showing rate of requests and errors along with latency measured for a specific service on the Kubernetes cluster:


Response time

Response time (also sometimes called latency) is a metric that measures how long it takes an application to respond to a request. It's typically measured in milliseconds, and shorter is better.

Tracking response time is important for Kubernetes APM because response time helps you assess how well an application is meeting user expectations. Users want applications to be highly responsive, and they may abandon apps that take too long to respond.

There are many potential causes of slow response time. The issue sometimes happens because an application maxes out available resources under heavy load. Buggy or inefficient code could also trigger slow responses due to the app's inability to process requests quickly enough. Networking issues, too, may cause slow responses in the event that data can't move fast enough over the network.

Memory usage

Memory usage tracks how much volatile memory (also known as RAM) an application is using.

The amount of memory an appneeds can vary widely, and consuming large (or small) amounts of memory does not necessarily indicate that an application has a performance issue as long as memory consumption doesn't approach 100 percent of total available memory.

Still, you should monitor memory usage continuously to ensure that there is sufficient total memory available on your Kubernetes nodes to keep your applications running smoothly. For example, if your Kubernetes cluster only has 128 gigabytes of memory available in total, and you’re running a dozen applications that each need 12 gigabytes of memory under heavy load, you risk running out of sufficient memory resources in the event that all apps experience high load at the same time.

Monitoring memory usage can also help detect memory leaks, which occur when poorly written application code causes an app to use memory inefficiently. If you notice that an app's memory usage steadily increases over time, and the increases don't correlate with an increase in request rate, a memory leak is the likely cause.

Example snapshot from a groundcover dashboard showing a node’s CPU, memory, and disk usage (along with other Kubernetes cluster resources):

source: https://app.groundcover.com

CPU usage

In Kubernetes APM, CPU usage is a measure of how much of a CPU's total processing power is being consumed by an app.

As with memory, high rates of CPU usage do not necessarily indicate a problem as long as usage doesn't approach 100 percent. That said, you do want to ensure that you have enough CPU capacity available to support all of your applications. You should also investigate any sudden changes in CPU usage that can't be explained by changes in application request rate, since issues like buggy code could trigger spikes in CPU usage.

Keep in mind that the processing power of a given CPU can vary widely from one device to another – so a Pod that consumes a lot of CPU when hosted on one node might see lower levels of CPU usage if you move it to a different node not because the application's need for CPU resources has changed, but because the nodes on the two CPUs are different. The capacity of CPUs on virtual machines is almost always different from the capacity of bare-metal CPUs, too.

Persistent storage usage

Persistent storage usage measures how much disk space or other persistent storage resources an application is consuming. In most cases, persistent storage in Kubernetes takes the form of persistent volumes, which are storage resources that admins configure and make available to Kubernetes applications. Thus, in Kubernetes, persistent storage usage metrics don't usually track usage of all disk space available on nodes. They track how much space within persistent volumes is in use.

Persistent storage usage typically has little direct impact on application performance because an application's ability to handle requests quickly is not linked to how much disk space it is using. However, if you run out of disk space, applications may be unable to store important data, leading them to drop requests. For that reason, it's important to monitor persistent storage usage and ensure that your storage resources are not being maxed out. If they are, you should either add more storage volumes or delete data to free up space.


Uptime measures how long an application has been running. On its own, this metric doesn't tell you a whole lot about overall app performance because simply being up doesn't mean an app is able to handle all requests it's receiving or that its response rate is meeting user expectations. Indeed, you could have a Pod that is stuck in an unknown or failed state but is still considered "up," depending on how exactly your APM tools define uptime.

Still, monitoring uptime is helpful for gaining a baseline understanding of how well your apps are performing and whether you are meeting any availability guarantees you've made to your users. In addition, tracking how uptime changes over time can help you measure overall application performance trends. For instance, if you see an increase in application uptime rates over the course of a year, it's a sign that your Kubernetes APM strategy is working.

Kubernetes monitoring tools and techniques

Now that we know what you should monitor for in Kubernetes, let's look at the tools and techniques available for doing it.

Logging and log analysis

There are plenty of logging tools out there that provide centralized log collection, aggregation and analysis for Kubernetes. Elasticsearch, Fluentd and Kibana (which are known as the EFK stack when used collectively) are a popular option, as are solutions like Loki by Grafana and ClickHouse.

If you run Kubernetes in the cloud using a managed service like GKE or EKS, you may also be able to take advantage of cloud-based logging solutions, such as Google Cloud Logging and AWS CloudWatch, that integrate with your Kubernetes distribution.

No matter which logging solutions you use, your goal should be to ensure that you collect and aggregate all available log data from all application components, then analyze it centrally so that you have full context into how issues revealed by one log correlate with other logs.

Custom metrics and instrumentation with OpenTelemetry  

Custom metrics and instrumentation in Kubernetes can be achieved using tools like OpenTelemetry, an open-source observability framework. OpenTelemetry provides libraries and components for instrumenting applications, allowing organizations to capture fine-grained performance data and custom metrics without having to bake extensive custom logging and monitoring logic into each app. Instead, they can expose those metrics through OpenTelemetry libraries.

OpenTelemetry also integrates well with groundcover and other monitoring and data visualization tools, making it a versatile choice for capturing and analyzing custom metrics in a standardized manner.

Leveraging Kubernetes APIs for performance monitoring

You can also gain insight into Kubernetes application performance using the APIs provided by Kubernetes itself. The Kubernetes APIs are particularly useful for tracking resource utilization metrics. In addition, you can monitor the status of Pods and containers, which is helpful for understanding whether unusual performance is linked to an issue like a failed container or one that is taking longer than expected to start.

API-based monitoring solutions, such as Prometheus Operator or Kubernetes Metrics Server, enable organizations to collect and visualize Kubernetes-specific metrics directly from the API server.

Although the Kubernetes APIs aren't designed solely for application monitoring and can't provide visibility into internal application issues (such as buggy code), they do provide some useful metrics. More generally, they offer critical contextual information that helps you make informed decisions and identify the root cause of performance problems.

Kubernetes APM Methods: Pros & Cons

Method Advantages Disadvantages
Metrics logic Relatively efficient. Gives maximum control over metrics data. Requires substantial effort to implement.
Sidecar containers Easy to implement. High resource overhead.
Cluster-wide collector Easy to implement. Reduced resource overhead compared to sidecar containers. Not all agents support this approach. Resource overhead may be high compared to approaches that don't require agents.
Collecting metrics with eBPF Very efficient, with minimal resource overhead. Strong control over which data you collect. Requires familiarity with eBPF or an APM tool that uses eBPF for data collection.

In addition to the different types of tools and data sources available for Kubernetes APM, the ways you go about collecting data can vary. Here's a look at the three most common Kubernetes APM methods, along with the pros and cons of each.

Building metrics logic into containers

One option is to include code inside your containers that exposes the application metrics you want to monitor. From there, you can collect the application metrics using any data collector that supports the format you used to expose the data. For example, if you use OpenTelemetry to instrument the metrics logic, you can use any OpenTelemetry-compatible data collector to collect the application metrics data.

The biggest advantage of this approach is that it gives you the most control over exactly which metrics you expose and how the data is structured. In addition, metrics logic usually consumes few resources (assuming you write the code efficiently). The downside is that this is a lot of work because you have to write the logic yourself. OpenTelemetry libraries can streamline the effort because they provide ready-made code for metrics generation, but you still have to integrate them into your app yourself.

Using sidecar containers

Another option is using a so-called sidecar container. In Kubernetes, a sidecar container is a container that runs alongside in the same Pod as an application you want to monitor. The purpose of the sidecar container is to collect metrics about application request rate, response time, resource utilization, and so on, and then send them to a location of your choice.

This approach eliminates the need to write custom metrics logic for your containers, so it's easier to implement. The major disadvantage is that it leads to higher overall resource utilization because the sidecar container becomes another container that you need to host inside your Kubernetes cluster – so there is a "cost" (in terms of resource overhead) of using the sidecar method.

Using a cluster-wide collector

A variant on the sidecar approach to collecting APM data is to deploy a monitoring agent within your Kubernetes cluster and have it monitor multiple Pods or containers (assuming you are using an agent, such as Grafana, that supports this approach).

This reduces resource overhead because there are fewer data collectors running.

However, you’re still likely to have greater resource consumption than you would if you avoided running a metrics agent altogether. In addition, not all metrics tools support this approach; some can only run as sidecars.

Collecting metrics with eBPF

A fourth option is to collect application metrics using eBPF, a framework built into the Linux kernel that lets you execute custom programs in kernel space to track what is happening on your system. Among other types of data, eBPF can monitor metrics for any Pods or containers running on Kubernetes nodes, making it a viable APM solution.

Because eBPF code runs in kernel space, it's hyper-efficient. And because the code is customizable, you can use eBPF to collect virtually any type of data you want – all without having to integrate metrics logic directly into your app.

Currently, the major challenge of using eBPF is that you may have to write custom eBPF programs from scratch, which is no mean feat. But you can simplify the process by using an APM tool like groundcover that leverages eBPF under the hood to collect monitoring data for you.

Monitoring strategies for K8s apps

Collecting the right metrics with the right tools is only part of the battle when it comes to Kubernetes application performance monitoring. You'll also want to deploy effective monitoring strategies and techniques.

Define clear monitoring objectives and KPIs

To align monitoring efforts with business goals, it's important to set clear objectives and KPIs that your Kubernetes apps need to hit. For example, you should determine which level of latency is tolerable for a given application based on what you're using it for.

A useful set of principles for establishing performance monitoring objectives is to remember that they should be specific, measurable, achievable, relevant and time-bound (or SMART, as people who like acronyms like to put it). When your metrics have these qualities and are bound to business outcomes, monitoring drives actionable insights.

Leverage service meshes

Service meshes, which help to manage interactions between microservices within a Kubernetes cluster or other distributed environment, can help to optimize monitoring workflows by providing highly granular, service-by-service level visibility into application performance. You can also use service meshes to perform distributed tracing, which is a useful technique for pinpointing the root cause of performance issues in Kubernetes.

Build monitoring pipelines with Operators

Kubernetes Operators, which provide packaging, deployment and management functionality for Kubernetes apps, can simplify the deployment of monitoring tools (among other types of software) in Kubernetes. For example, you can use the Prometheus Operator to simplify the setup and management of Prometheus as an application and network monitoring tool.

By leveraging operators, you can deploy the multiple components needed to create a monitoring pipeline more quickly than you could if you set up each tool independently.

Monitor continuously

Rather than pulling performance metrics from Kubernetes applications periodically, strive to monitor continuously wherever possible. Continuous monitoring and performance optimization help to ensure the ongoing health, stability, and optimal performance of Kubernetes applications.

Continuous monitoring is especially important if you also automate response operations. By detecting issues as soon as they occur and then automating the action required to fix them, you can resolve many Kubernetes app performance problems within a fraction of the time it would take a human to recognize and remediate the issue.

For example, imagine that you have an app that experiences a sudden spike in CPU utilization and is maxing out the CPU limit assigned to it. If you wait even just five minutes to detect that issue due to periodic rather than continuous monitoring, your app may well have crashed by the time you can respond. But if you detect the CPU utilization spike in real time and automatically change the CPU limits assigned to the app, you’re more likely to prevent a user-impacting failure.

Keeping Kubernetes apps running swimmingly

In short, monitoring your Kubernetes apps continuously is the only way to ensure that your apps deliver the experience your end-users require while also using resources efficiently. But given the complexity of Kubernetes and microservices apps, the only way to monitor effectively is to collect a variety of data points, then correlate them so that you can get to the root cause of multi-layered performance problems.

Fortunately, a variety of tools are available to help you do this. Solutions like OpenTelemetry simplify the collection of custom metrics from applications, while eBPF-based monitoring tools such as groundcover provide deep visibility into Kubernetes applications, nodes and networks so that you can figure out where the sources of performance issues lay.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.