EKS Monitoring: The Key to Unlocking Kubernetes Performance
Walk through everything you need to know about getting started with Amazon EKS and how to monitor the platform. Gain a deep understanding of the EKS architecture, what to monitor in EKS as well as best practices for getting the most out of EKS logging and monitoring.
Among the many platforms for deploying containerized applications, one of the most popular is Amazon Elastic Kubernetes Service, also known as EKS. As a managed Kubernetes solution, EKS makes it easy to set up and operate Kubernetes clusters with minimal effort on the part of users.
However, the fact that EKS is a managed Kubernetes platform doesn't mean users bear no responsibility for monitoring and managing it. On the contrary, EKS logging and monitoring are just as important as logging and monitoring for any type of Kubernetes environment.
This article walks through everything you need to know about getting started with Amazon EKS, as well as how to monitor the platform. It walks through the EKS architecture, explains what to monitor in EKS and discusses tips and best practices for getting the most out of EKS logging and monitoring.
What is EKS?
EKS is a managed Kubernetes platform that is built into Amazon Web Services, or AWS. The purpose of EKS is to make it easy to run containerized applications in the AWS cloud.
EKS includes two major components:
- A Kubernetes control plane: This is the Kubernetes software that is responsible for managing containerized applications. Since it's a fully managed service, all you need to do is turn it on; there is nothing to install or set up. Essentially, the EKS control plane is a Kubernetes control plane instance that is delivered via a SaaS model.
- Nodes: EKS provides nodes, which are the servers that are clustered together to build Kubernetes environments. In EKS, nodes are based on EC2 virtual machine instances. EKS manages nodes for you, so you don't have to worry about launching or configuring them manually. You can also use EKS's autoscaling features to add or remove nodes from your cluster automatically in response to changes in demand.
EKS is not the only container service available from AWS. Amazon also offers Elastic Container Service, or ECS, which is a managed container platform that is based on proprietary orchestration technology rather than Kubernetes. However, because Kubernetes has become the go-to container orchestrator, EKS has gained a larger following than ECS.
Note, too, that EKS is not the only way to deploy Kubernetes on the AWS cloud. You can also set up your own Kubernetes cluster on virtual machines hosted on Amazon EC2. If you take this approach, you get what's called a self-managed cluster, as opposed to a Kubernetes environment that is managed by Amazon.
Benefits of EKS
Self-hosted clusters have some advantages over EKS. Above all, they give you more control over how your Kubernetes environment is configured and managed, and you get the flexibility to use a Kubernetes distribution of your choosing.
However, compared to self-hosted Kubernetes, EKS and other managed Kubernetes solutions (such as Azure AKS and Google GKE, which are similar to EKS in that they are also managed Kubernetes platforms on major public clouds), offer several important advantages:
• High availability: Amazon EKS provides a highly available Kubernetes control plane, with automated replication and recovery of control plane components. It's an easy way to build a highly reliable Kubernetes environment without having to configure complex backup and replication routines on your own.
• Cost efficiency: Amazon EKS offers a cost-effective way to run Kubernetes clusters, as the major cost is for the underlying EC2 instances. There is an additional fee of $0.10 per hour per cluster, or about $73 per month, but that is relatively low – less than the licensing cost of many Kubernetes distributions that you'd have to use in a self-hosted Kubernetes environment.
• Security: EKS provides enhanced security for your Kubernetes clusters, with automated security patching, IAM authentication, and encryption of data in transit. You can also take advantage of role-based access control (RBAC) frameworks and network segmentation to add more layers of security to your clusters.
• Elasticity and scalability: Amazon EKS allows you to scale your Kubernetes clusters up or down easily to meet your changing resource needs.
• Ease of use: EKS makes it easy to deploy and manage Kubernetes clusters. You can launch clusters in minutes with just a few commands or clicks in the AWS Console. You also get automated updates, patching and built-in logging and monitoring, all of which make for an even more user-friendly Kubernetes experience.
If you don't mind sacrificing a bit of control and flexibility, EKS and similar Kubernetes solutions provide the easiest onramp to launching Kubernetes clusters that are secure, cost-effective and reliable.
What do you need to monitor on EKS?
The fact that EKS does most of the heavy lifting required to set up and operate Kubernetes clusters doesn't mean that users don't have to worry about EKS monitoring. On the contrary, even with a fully managed Kubernetes platform like EKS, a variety of things can go wrong. If you don't monitor your EKS environment, you won't know about these problems in time to respond to them.
That's why it's important to monitor the following aspects of EKS:
• Cluster and node health: Although EKS should automatically scale clusters up and down as needed, sometimes that is not the case; for instance, a misconfiguration in your autoscaling policies or a problem with the OS configuration running on nodes could lead to cluster or node stability issues. To identify problems like node failures or a lack of available resources for workloads, you should monitor the overall CPU and memory utilization of your cluster, as well as of individual nodes.
• Pod and container health: EKS manages the control plane and infrastructure required to run containers, but it doesn't guarantee the health of containers that you deploy on the platform. Buggy containers or misconfigured Pods could lead to a wide variety of performance issues with applications deployed on EKS, and EKS won't fix them for you. It's on you to make sure you have the Pod and container insights necessary to ensure that, for instance, your containers are starting properly, and their resource utilization remains stable.
• Network and ingress monitoring: Monitoring network traffic and ingress in EKS environments is important both for detecting networking issues, like high latency, that could undercut application performance, and for catching potential security problems, such as attempted DDoS attacks.
The takeaway here is that although EKS is designed to make it easy to run containerized apps, it's not a hands-off solution. You need to monitor your clusters, nodes, Pods, containers and networks, just as you would in any type of Kubernetes environment.
EKS monitoring and logging tools
Tools for monitoring EKS fall into two main categories.
The first consists of tools and services available from AWS itself. The most important is CloudWatch, which can collect a variety of metrics related to EKS cluster, node, Pod and container performance. In addition, AWS tools like X-Ray can be used to trace the source of performance problems for apps hosted on EKS, and the container image scanner that is built into Amazon ECR (a managed container registry that can be used to host images that you deploy into EKS) is useful for detecting malware of vulnerability inside container images.
The other category of EKS monitoring tools includes third-party solutions, some of which (like Prometheus) are open source and others of which are available from commercial vendors. In general, third-party tools offer a deeper level of EKS performance and security monitoring because they support more configuration options, and they can be used to run more complex analyses than simpler tools like CloudWatch.
The tradeoff is that third-party tools are somewhat harder to set up because you need to configure them to work with your EKS clusters. In contrast, you can launch most of the Amazon monitoring tools by simply turning them on in the AWS Console, after which they integrate with your EKS clusters automatically.
Best practices for EKS monitoring
Whether you choose to monitor EKS using Amazon's tools, third-party tools or a combination thereof, there are a number of ways to get the most out of your monitoring strategy.
Setting up EKS logging and monitoring tools is a first step toward ensuring you can react to problems. But on their own, logging and monitoring solutions don't necessarily alert you when something is wrong. And if they do, their alerting policies may be based on generic rules that aren't adapted for your specific workloads or cluster configurations.
For that reason, it's important to configure alerts based on your unique requirements. Consider factors like what level of performance and availability risk you can tolerate, then set up alerts accordingly.
Monitor cluster and application health
In the context of EKS, infrastructure monitoring and application monitoring should not be either-or propositions. You need to monitor both.
Monitor cluster health and performance by tracking the CPU, memory and network usage of your EKS worker nodes. At the same time, collect performance metrics from your applications, including resource utilization rates, as well as the availability and response time for services.
Monitoring the complete EKS stack is important because even though Amazon manages the infrastructure for you, various problems could arise within the stack, and you won't be able to pinpoint their source if you're not monitoring both the cluster infrastructure and the workloads running on it.
Configure effective log retention
Most EKS monitoring tools (including CloudWatch) allow you to retain EKS logs and metrics data, but the longer you store that information, the more you'll pay.
You should therefore configure log retention policies in your monitoring tools that ensure you'll have access to your EKS data for as long as you need it, but that you don't store it unnecessarily. You can also consider pushing older EKS monitoring data to lower-cost storage, like a data lake that you use to host archived information.
Monitoring Kubernetes with eBPF
In addition to standard Kubernetes logging tools, there's a powerful, next-generation tool that can supercharge Kubernetes monitoring.
It's called eBPF, and it allows you to deploy programs that attach to the Linux kernel in order to monitor networking interfaces and system call entry points from a low level. With eBPF, you can collect information very efficiently and securely to track EKS networking performance, node health and Pod and container operations.
Although eBPF isn't the only way to monitor Kubernetes, it provides advantages that conventional approaches to Kubernetes monitoring – which usually involve deploying monitoring agents into user space on nodes and/or as "sidecar" containers that run alongside actual workloads – lack:
• Access: Because eBPF programs run in kernel space, they can collect virtually any data you might want to monitor. They're not limited to data exposed in user space.
• Efficiency: eBPF programs are lightweight and efficient, so they impose minimal overhead. The result is more resources for your actual workloads, and less money wasted on EKS infrastructure.
• Security: eBPF programs run in sandboxed environments, which minimizes security risks even though the programs operate in kernel space.
Because eBPF remains a pretty new technology, it hasn't traditionally been at the core of EKS monitoring and logging tooling. Indeed, most EKS monitoring solutions, including CloudWatch, offer no support for eBPF.
However, a new generation of monitoring tools, like groundcover, are emerging that leverage eBPF to revolutionize how EKS – along with any platform or application that runs on top of Linux-based servers – is monitored and observed. With the help of eBPF, these tools make it possible to perform deep, secure, highly efficient monitoring across the entire stack, from infrastructure to applications.