Table of Content
x min
February 20, 2026

Kubernetes CNI: Architecture, Plugins & Best Practices

groundcover Team
February 20, 2026

Kubernetes is powerful because it provides a dense layer of abstractions that, when composed together, form a reliable and scalable orchestration framework.

One of the most challenging aspects of container orchestration is networking. How two containers communicate with each other can quickly become complicated. Containers may be running on the same Kubernetes node, across multiple nodes, in different cloud providers, or even in a hybrid setup that includes bare-metal infrastructure.

The Container Network Interface (CNI) is the abstraction Kubernetes clusters use for networking. It is where networking engineering meets Kubernetes.

In practice, when a Pod is created, several things must happen:

  • The Pod must be assigned a unique, cluster-wide IP address, and containers within the same Pod must be able to communicate over localhost.
  • Routing must be configured, so Pods can discover and communicate directly with each other.
  • The kubelet must be able to communicate with the container, even though it runs in its own network namespace.
  • When a Service is created, the CNI performs a similar role by assigning a long-lived virtual IP address.
  • For Services, Kubernetes provides a list of backing Pods, but the networking layer must configure connectivity and keep it up to date as Pods are added or removed.

All of these requirements—and many more—define the Kubernetes networking model. The same principles apply when Pods or Services are deleted: IP addresses must be released and routing rules cleaned up.

Because Kubernetes clusters can be highly heterogeneous, the need for a common networking interface quickly emerged and spread across the ecosystem. Calico, Cilium, and Flannel are examples of open-source, cloud-provider-agnostic CNI implementations. At the same time, cloud providers such as AWS and Google Cloud (GKE) offer their own implementations, as well as customized or forked versions of projects like Cilium.

Below is the core CNI interface that every plugin must implement:

type CNI interface {
	AddNetworkList(ctx context.Context, net *NetworkConfigList, rt *RuntimeConf) (types.Result, error)
	CheckNetworkList(ctx context.Context, net *NetworkConfigList, rt *RuntimeConf) error
	DelNetworkList(ctx context.Context, net *NetworkConfigList, rt *RuntimeConf) error
	GetNetworkListCachedResult(net *NetworkConfigList, rt *RuntimeConf) (types.Result, error)
	GetNetworkListCachedConfig(net *NetworkConfigList, rt *RuntimeConf) ([]byte, *RuntimeConf, error)

	AddNetwork(ctx context.Context, net *PluginConfig, rt *RuntimeConf) (types.Result, error)
	CheckNetwork(ctx context.Context, net *PluginConfig, rt *RuntimeConf) error
	DelNetwork(ctx context.Context, net *PluginConfig, rt *RuntimeConf) error
	GetNetworkCachedResult(net *PluginConfig, rt *RuntimeConf) (types.Result, error)
	GetNetworkCachedConfig(net *PluginConfig, rt *RuntimeConf) ([]byte, *RuntimeConf, error)

	ValidateNetworkList(ctx context.Context, net *NetworkConfigList) ([]string, error)
	ValidateNetwork(ctx context.Context, net *PluginConfig) ([]string, error)

	GCNetworkList(ctx context.Context, net *NetworkConfigList, args *GCArgs) error
	GetStatusNetworkList(ctx context.Context, net *NetworkConfigList) error

	GetCachedAttachments(containerID string) ([]*NetworkAttachment, error)

	GetVersionInfo(ctx context.Context, pluginType string) (version.PluginInfo, error)
}

By exploring the CNI repository on GitHub, you can find the various structs and helper functions that support this interface. However, this interface represents the most important abstraction that every CNI plugin must implement.

Networking as the foundation for security and observability

Over time, the CNI interface proved to operate at exactly the right layer to address additional cross-cutting concerns such as security and observability. Because CNI implementations sit close to the Linux networking stack, they have direct access to kernel-level information such as packet flow, connection state, and transport-layer protocols. This makes the networking layer an ideal place to enforce policy and collect high-fidelity signals without requiring application-level instrumentation.

Security at the networking layer

Traditional network security models rely heavily on IP addresses and static firewall rules. In a Kubernetes environment, where Pods are ephemeral and IPs are frequently recycled, this model quickly breaks down. CNIs address this mismatch by integrating deeply with Kubernetes primitives.

Cilium, for example, has grown in popularity by implementing a powerful firewall using eBPF. Instead of defining policies in terms of IP addresses, Cilium expresses security rules using Kubernetes labels and identities. This allows network policies to be defined declaratively and remain stable even as Pods are rescheduled, scaled, or replaced.

By attaching eBPF programs directly to kernel hooks, Cilium can enforce policies at L3–L7, including:

  • Allowing or denying traffic based on Pod identity rather than IP
  • Inspecting protocols such as HTTP, gRPC, or DNS
  • Enforcing least-privilege communication between services

Because these policies are enforced in-kernel, they avoid extra network hops and reduce reliance on sidecars or user-space proxies, improving both performance and reliability.

Observability driven by the CNI

The same kernel-level visibility that enables fine-grained security also enables deep observability. CNIs can observe traffic flows, connection lifecycles, and resource usage as they occur, providing insight into cluster behavior that would otherwise be difficult to reconstruct from application logs alone.

Calico, for instance, exposes detailed IP Address Management (IPAM) metrics as Prometheus metrics. These include information such as:

  • The size and utilization of IP pools
  • The number of allocated and free IP addresses
  • Distribution of IPs across nodes

When combined, these metrics provide a clear picture of how a cluster is scaling, how network resources are consumed, and where potential bottlenecks may arise. This is particularly valuable in large clusters, where IP exhaustion or uneven allocation can silently become a limiting factor.

More broadly, modern CNIs increasingly expose flow logs, dropped packet counters, and latency measurements, allowing operators to answer questions such as:

  • Which services are communicating with each other?
  • Where is traffic being dropped or denied?
  • How does network behavior change under load or during rollouts?

By anchoring observability and security at the networking layer, CNIs turn Kubernetes networking from a hidden implementation detail into a first-class source of truth about cluster behavior.

groundcover collects, stores, and manages the full lifecycle of networking data, helping operators navigate the enormous volume of telemetry generated at the network level. Networking acts as a force multiplier for telemetry, which often leads operations teams to reduce noise by sampling data or limiting collection altogether. As a result, their ability to troubleshoot effectively and identify meaningful patterns is compromised. With groundcover and its cost-optimization capabilities, this trade-off is eliminated.

High performance and scalability

Networking can quickly become either the bottleneck or the superpower of a Kubernetes cluster, depending on the technologies chosen and how they are implemented. At scale, even small inefficiencies in packet routing, encapsulation, or policy enforcement can compound into increased latency, reduced throughput, or higher CPU utilization across the entire cluster.

This is why the existence of a common interface such as CNI is so important. By standardizing how networking is configured and managed, Kubernetes allows operators to adopt new networking technologies or replace existing ones without rewriting the rest of the platform. In practice, this enables drop-in replacement of CNI implementations as requirements evolve—from simple overlays to high-performance, kernel-native solutions—while preserving the same Kubernetes API and operational model.

In distributed systems it is often said that “it’s always a DNS problem,” but in Kubernetes, a misconfigured or poorly understood CNI can lead to even more subtle and difficult-to-diagnose behavior. Packet loss, asymmetric routing, unexpected latency, or intermittent connectivity between Pods may not immediately surface as obvious failures, yet they can significantly impact application reliability. These issues are often amplified in large or multi-tenant clusters, where networking assumptions break down under load.

For this reason, it is critical to invest time in understanding how the chosen CNI plugin works:
at which layers it operates, how it handles routing and encapsulation, how policies are enforced, and what trade-offs it makes between performance, flexibility, and operational complexity. Without this understanding, tuning or troubleshooting the network becomes guesswork.

A well-established monitoring and observability stack is what ultimately enables safe iteration. Metrics, flow visibility, and latency measurements provide operators with the confidence to experiment, optimize, and evolve the cluster’s networking over time. When performance regressions or anomalies can be detected early and correlated with configuration changes, networking shifts from being a fragile dependency into a controllable and optimizable part of the system.

FAQs

Networking and connecting in Kubernetes spans across many layers. First it is important to identify the right one. Is it a pod to pod communication issue? Pod to service or pod to external? Is the issue affecting node to node communication? A single node or namespace?

Different CNI tools have different debug tools based on how they do routing eBPF, BGP?

The metrics exposed by the CNI itself are important and they can be helpful to identify errors but you still have to get your hands dirty with old school networking tools such as ping, nslookup, traceroute.

The environment you run into is important, there is a pool of CNI supported by your cloud provider? If that’s the case it is wise to start from one of those. There are other aspects to take under consideration that span security, observability, operational complexity and performance. Overlay and VXLan for example are easier to set up, they work in most environments: Calico, Flannel support. When it comes to performance and flexibility, BGP based CNI provides all the power you need: Calico has a BGP mode but the most famous one for this technology is Cilium.

groundcover takes a fundamentally different approach to Kubernetes troubleshooting by using eBPF-based, zero-instrumentation observability. Instead of relying on metrics and logs alone, it inspects live network traffic directly at the kernel level. A few example are:

  • Pod A → Pod B 12% retransmissions, 320ms spike in RTT, 4% dropped packets. The cause can be a routing issue, NetworkPolicy misconfigured
  • Detect packet losses

With eBPG based monitoring you can quickly inspect all of this in a way that is almost impossible to replicate with other technologies.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

Trusted by teams who demand more

Real teams, real workloads, real results with groundcover.

“We cut our costs in half and now have full coverage in prod, dev, and testing environments where we previously had to limit it due to cost concerns.”

Sushant Gulati

Sr Engineering Mgr, BigBasket

“Observability used to be scattered and unreliable. With groundcover, we finally have one consolidated, no-touch solution we can rely on.“

ShemTov Fisher

DevOps team lead
Solidus Labs

“We went from limited visibility to a full-cluster view in no time. groundcover’s eBPF tracing gave us deep Kubernetes insights with zero months spent on instrumentation.”

Kristian Lee

Global DevOps Lead, Tracr

“The POC took only a day and suddenly we had trace-level insight. groundcover was the snappiest, easiest observability platform we’ve touched.”

Adam Ceresia

Software Engineering Mgr, Posh

“All vendors charge on data ingest, some even on users, which doesn’t fit a growing company. One of the first things that we liked about groundcover is the fact that pricing is based on nodes, not data volumes, not number of users. That seemed like a perfect fit for our rapid growth”

Elihai Blomberg,

DevOps Team Lead, Riskified

“We got a bill from Datadog that was more then double the cost of the entire EC2 instance”

Said Sinai Rijcov,

DevOps Engineer at EX.CO.

“We ditched Datadog’s integration overhead and embraced groundcover’s eBPF approach. Now we get full-stack Kubernetes visibility, auto-enriched logs, and reliable alerts across clusters with zero code changes.”

Eli Yaacov

Prod Eng Team Lead, Similarweb

Make observability yours

Stop renting visibility. With groundcover, you get full fidelity, flat cost, and total control — all inside your cloud.