eBPF Is Eating Kubernetes' iptables Plumbing

Wed, May 20, 2026 · 8 min read

For most of Kubernetes’ life, the cluster data path has been a tower of iptables rules. Pod-to-service routing, NAT, network policy, even the way kube-proxy programs a Service IP — all of it expressed as netfilter chains evaluated linearly on every packet. It worked. It also aged badly.

In 2026, the answer the ecosystem has converged on is eBPF, and the project doing most of the convergence is Cilium. The shift is no longer aspirational: kube-proxy itself shipped an nftables mode that is expected to go GA in Kubernetes 1.33, the old IPVS backend is deprecated as of v1.35, and the major managed Kubernetes providers (EKS, GKE, AKS) all offer a Cilium-powered data plane as a first-class option. Azure CNI Powered by Cilium is GA on K8s 1.33.

If you operate clusters, this is one of those rare infrastructure shifts worth understanding before your hand is forced.

What eBPF actually buys you

eBPF is, in one sentence, safe sandboxed programs that run inside the Linux kernel in response to events — packet arrivals, syscalls, tracepoints, scheduler hooks. The verifier guarantees they terminate and don’t crash the kernel; the JIT compiles them to native code.

For a CNI, the consequence is concrete: instead of expressing “forward traffic for Service X to one of these N pods” as a chain of netfilter rules that grows with the cluster, you express it as a hash-map lookup in kernel memory. The kernel reads the destination, hits the map, and forwards. No chain traversal. No rule reload. No userspace round-trip.

This is the difference between O(n) per-packet rule evaluation and O(1) map lookup, and it shows up everywhere in benchmarks.

Why iptables hits a wall

iptables wasn’t designed for a world where a single host might terminate thousands of Services, each with dozens of endpoints, all re-shuffled by the control plane every few seconds. Specifically:

Linear evaluation. Every packet walks the rule chain until it matches. 1,000 Services means roughly 1,000 NAT rules to scan per flow.
Reload is global. A single Endpoint change forces kube-proxy to recompute and re-apply the entire ruleset. With churn, the proxy starts measurably lagging reality.
conntrack pressure. Both iptables and IPVS lean on the kernel conntrack table. Under high connection rates the table fills, entries get evicted, and you see dropped flows that don’t look like anything in the application logs.
No L7 awareness. Netfilter sees 5-tuples. Anything above that — HTTP method, gRPC service, Kafka topic — needs a sidecar.

kube-proxy’s new nftables mode (alpha in 1.29, beta in 1.31, GA target 1.33) is a real improvement: nftables expresses the same intent with set-based matches and incremental updates instead of full rebuilds. But it’s still netfilter, still conntrack-bound, still operating below the abstractions your platform team actually cares about. It buys time. It does not change the trajectory.

Cilium, in 2026 shape

Cilium has been the eBPF-native CNI since 2017, and in 2026 it is effectively the default answer for new clusters. v1.19, released earlier this year, is the current stable line; it focuses on encryption hardening, large-cluster scalability, and Gateway API polish.

What you actually get with Cilium running:

Kube-proxy replacement. Cilium implements the Service abstraction directly in eBPF maps. You can run with kube-proxy entirely disabled. Service lookup is constant-time regardless of how many Services or endpoints you have.
Native routing. With eBPF host routing, packets bypass the iptables and conntrack code path on the host stack entirely. On AKS, Microsoft’s published numbers put this at meaningful throughput gains for AI workloads on 100GbE NICs.
Identity-based network policy. Policies match on workload identity (pod labels) rather than ephemeral pod IPs, which is the only sane model once you have churn.
Gateway API and Ingress. Cilium’s Gateway API implementation is production-grade in 1.18+, with L7 routing handled in the data plane instead of a separate proxy pod.
ClusterMesh. Multi-cluster service discovery and policy without bolting on a service mesh — useful even if you never wanted one.

The price you pay is kernel version discipline: 5.10 LTS is the floor, 6.1+ is recommended for production, and a few features (host routing, BIG TCP, some socket-LB modes) only light up on more recent kernels. If your nodes are on RHEL 8 or ancient Ubuntu LTS, plan that upgrade first.

The performance numbers that matter

Independent 2026 benchmarks tell a fairly consistent story.

Pod-to-Service throughput: Cilium eBPF mode delivers roughly 28.5 Gbps vs iptables-based Calico at 22.1 Gbps on the same hardware — about 25% more for typical traffic.
Throughput under policy load: With 100+ NetworkPolicies active, Cilium holds ~8.9 Gbps while iptables-based Calico collapses to ~3.2 Gbps. That’s a 64% gap and it widens with more policies.
Service scale: At 1,000+ Services, Cilium’s eBPF kube-proxy replacement shows ~30–60% lower P99 latency and ~50% lower CPU on the proxy nodes versus iptables mode.
CPU at line rate: A 2026 OpenMetal benchmark at 100 Gbps recorded Cilium under 10% CPU, Calico-with-nftables at 15–20%, and Flannel at 25–30% doing the same work.
Latency floor: Cilium eBPF mode lands around P50 0.15 ms for pod-to-service traffic — a small absolute win, but it stays flat as the cluster grows, which is the point.

The pattern is the same in every published comparison: eBPF starts slightly faster, and degrades much more gracefully as scale and policy complexity grow.

Security and observability come for free

Two things you get once eBPF is already in the kernel:

Hubble is Cilium’s built-in observability layer. It generates structured flow events for every packet the data plane sees, enriched with full Kubernetes context: pod names, namespaces, labels, service names, policy verdicts. No sampling. No sidecar. No packet copying. For a lot of teams this replaces a meaningful slice of what they were using a service mesh for.

Tetragon is the runtime security sibling. It’s a CNCF project under the Cilium umbrella, attaches eBPF programs to kernel hooks (syscalls, file access, process exec, privilege escalation), and — crucially — can enforce as well as observe. Where Falco reports on a suspicious exec after the fact, Tetragon can kill the process inline before the syscall returns. In 2026 it’s reasonable to treat the Cilium + Hubble + Tetragon stack as the default eBPF observability and runtime-security baseline for new clusters, with telemetry shipped out via the OpenTelemetry Collector.

What changed in Kubernetes 1.32 and after

A few signals worth tracking from upstream:

nftables kube-proxy is GA-bound for 1.33, with iptables remaining the compatibility default. Treat nftables as the right choice for new clusters that are not yet ready to run a full eBPF CNI.
IPVS mode is deprecated as of v1.35 and slated for removal. If you’re still on IPVS, the migration target is either nftables or, more strategically, eBPF.
Cloud providers have moved. Azure CNI Powered by Cilium is GA on K8s 1.33, GKE Dataplane V2 is Cilium-based, and EKS supports Cilium as the primary CNI. Defaults aren’t fully flipped yet, but the recommended path on all three is now eBPF.
Gateway API momentum. With the Ingress resource frozen and Gateway API the modern replacement, Cilium’s L7 story moves from “service mesh problem” to “CNI feature.”

Migrating without breaking production

The honest answer is that migrating a live cluster’s CNI is the kind of change you take seriously, but it is no longer scary. Two realistic paths:

1. Live, node-by-node with cilium-cli. The official Cilium migration guide supports running the old CNI and Cilium in parallel overlays, then draining and re-joining nodes one at a time. Existing pods keep their networking; new pods on migrated nodes get Cilium. Open UDP 6081 between nodes for Geneve first. Plan a maintenance window for the conntrack flush. Budget a week or two for a production cluster.

2. CNI chaining as a stepping stone. If you’re on Flannel and mostly want NetworkPolicy enforcement and Hubble without a full data-plane swap, Cilium’s generic-veth chaining mode lets Cilium sit on top of Flannel. You lose the kube-proxy replacement and the native routing gains, but you get policy and observability today and buy time for the bigger move.

A practical rollout looks like:

Week 1 — Prep. Upgrade nodes to a 6.1+ kernel. Audit existing NetworkPolicies and iptables-based “hacks” (egress NAT, custom chains) so nothing surprises you on the other side.
Week 2 — Parallel install. Deploy Cilium alongside the existing CNI in migration mode. Verify Hubble flows look sane.
Weeks 3–4 — Drain and migrate. Node by node, with rollback ready. Disable kube-proxy only after every node is on Cilium.
Week 5 — Cleanup. Remove the old CNI. Reboot nodes to flush stale iptables rules. Turn on Tetragon if you want the security layer.

The two things that bite teams are kernel version and undocumented iptables rules elsewhere in the stack (CNI plugins, security agents, custom egress). Find those first.

TL;DR

iptables is structural debt in modern Kubernetes — linear rule evaluation, conntrack pressure, and global reloads don’t fit a cluster with thousands of Services and constant churn.
eBPF replaces it with O(1) kernel maps, and Cilium 1.19 is the mature, dominant implementation. Kube-proxy replacement is the headline feature; identity-based policy, Hubble observability, and Tetragon runtime security come along for the ride.
The numbers are real: ~25% more throughput on a quiet cluster, ~64% advantage once policies pile up, ~50% CPU savings at scale, and a flat latency curve as the cluster grows.
Upstream is moving with you. nftables kube-proxy is GA-bound in 1.33, IPVS is deprecated in 1.35, and the big managed providers all default to (or recommend) eBPF data planes.
Migration is a project, not a rewrite. Live, node-by-node with cilium-cli works; CNI chaining is a valid stepping stone. Fix your kernel version before you fix anything else.

The iptables-based Kubernetes data path was an excellent 2015 decision. In 2026, the cost of staying on it is no longer mostly performance — it’s also observability, policy expressiveness, and the runtime-security story you don’t have. eBPF is eating that plumbing, and Cilium is the spoon.