Advanced eBPF Memory Observability: Container Tracing and Rust Aya
The first two articles covered eBPF fundamentals and OOM Killer event tracing. This article goes deeper: container-level OOM pinpointing, real-time memory allocation rate tracking, and implementing the same functionality with the Rust Aya framework.
Container-Level OOM Pinpointing
In Kubernetes, “a Pod OOM’d” is actually a vague statement. A Pod consists of multiple containers, each belonging to different cgroups. eBPF can drill through this layer and precisely identify which container and which process caused the OOM.
The correlation chain:
| |
In the eBPF program, the oom_control parameter from oom_kill_process carries a memcg pointer, giving access to cgroup-level information:
| |
User-space programs can parse the cgroup path to extract Pod and container names:
| |
Memory Allocation Rate Tracking
OOM is the final result. The real value lies in the trend leading up to it. By tracking kmalloc and free events, abnormal growth can be detected before OOM occurs.
| |
Key Design Points
BPF_MAP_TYPE_PERCPU_HASH: Each CPU core has its own hash table — writes require no locking. On multi-core systems, this is the optimal performance choice- Sampling:
kmallocfires millions of times per second — can’t record every event. Proportional sampling keeps overhead manageable - Tracepoint first:
tracepoint/kmem/kmallocis a stable ABI, safer and more discoverable than kprobe
The user-space program periodically reads the alloc_stats map to calculate allocation rates:
| |
Rust Aya Implementation
Aya is a pure Rust eBPF framework with no libbpf dependency. Here’s the same OOM monitor implemented in Rust:
eBPF Kernel Side (Rust)
| |
Note: The code above uses
bpf_probe_read_kernelfor safe kernel memory access instead of hardcoded offsets. CO-RE works the same way in Aya — BPF programs leverage BTF info to correctly resolve struct field positions without relying on specific kernel versions.
User Side (Rust)
| |
Build and Run with cargo-aya
| |
The compiled binary is self-contained — eBPF bytecode is embedded via the include_bytes_aligned! macro.
About cargo-aya:
cargo aya newcreates a dual-package structure (-ebpfkernel + user-space). Kernel eBPF code lives inebpf-oom-ebpf/, user-space code inebpf-oom/, matching the paths in the code snippets above.
Aya vs cilium/ebpf
|| Dimension | cilium/ebpf (Go) | Aya (Rust) |
||———–|—————–|————|
|| Kernel language | C (compiled by Clang) | Rust (custom target) |
|| User language | Go | Rust |
|| Toolchain | Requires Clang/LLVM | Pure Rust toolchain |
|| Type safety | C has no type guarantees | Rust compile-time checks |
|| Learning curve | Must know both C and Go | Unified Rust |
|| Maturity | More mature, more production use | Rapidly developing |
|| BTF/CO-RE | Full support | Full support |
|| Dev experience | bpf2go code generation | cargo-aya all-in-one toolchain |
Choosing Go or Rust depends on your team background and project needs. Go + cilium/ebpf is more mature with richer ecosystem; Rust + Aya offers better type safety and developer experience.
Best Practices
Tracepoint over Kprobe
| Property | tracepoint | kprobe |
|---|---|---|
| ABI stability | Stable, maintained by kernel devs | No guarantee |
| Discoverability | bpftrace -l lists them | Need to read source |
| Performance | Slightly better | Slightly higher overhead |
| When to use | Officially supported scenarios | No tracepoint available for target function |
Sampling Strategy
High-frequency events (like kmalloc, page_fault) must be sampled:
- Proportional sampling: record 1 in N events via
bpf_get_prandom_u32() % N == 0 - Adaptive sampling: dynamically adjust sample rate based on current event rate
- Key-based sampling: only track specific PIDs or cgroups
Ring Buffer vs Perf Buffer
Ring Buffer (BPF_MAP_TYPE_RINGBUF) is the recommended event transport mechanism:
- Higher performance (lock-free, batched submission)
- Supports event loss notification (
bpf_ringbuf_discard) - Supports reserve/commit two-phase write, avoiding extra copies
Perf Buffer (BPF_MAP_TYPE_PERF_EVENT_ARRAY) is the legacy approach. New projects should prefer Ring Buffer.
Summary
This article covered:
- Container-level OOM pinpointing via cgroup paths mapped to Kubernetes Pods
- Memory allocation rate tracking using per-CPU maps and sampling
- Rust Aya implementation showing an alternative development paradigm
The next (and final) article in this series covers the BPF OOM kernel patches — a new feature being discussed in the community that allows eBPF programs to fully take over the kernel’s OOM policy.