eBPF Observability: Getting Started with OOM Killer Monitoring
eBPF (Extended Berkeley Packet Filter) started as a network packet filtering tool, but over nearly a decade it has evolved into the most powerful observability framework in the Linux kernel. It allows you to safely inject and execute custom programs without modifying kernel source code or loading kernel modules.
This article kicks off the series, using OOM (Out-of-Memory) monitoring as a concrete entry point to learn the core eBPF concepts and toolchain.
Why eBPF for Observability
Traditional monitoring tools like top, free, and ps only show final states — how much memory a process is using, how much is left. But many problems happen “in flight”: why did a process OOM? Which process triggered it? What was the context at that moment?
eBPF hooks into the kernel’s critical paths, capturing full context the instant an event occurs:
- Safe: eBPF programs pass the verifier — they won’t crash the kernel
- Non-invasive: no system restarts or application modifications needed
- High performance: events are preprocessed in kernel space, avoiding frequent user/kernel context switches
- Dynamic: load and unload on demand, zero overhead when not needed
Comparing kernel module development vs eBPF:
| Aspect | Kernel Module | eBPF |
|---|---|---|
| Safety | One bug crashes the whole system | Verifier enforces safety sandbox |
| Development cost | Must know kernel internals, hard to debug | CO-RE + libbpf, compile once run anywhere |
| Deployment | Rebuild / reload module | Dynamic loading, no restart |
| Performance | Native execution, zero overhead | JIT-compiled to native, near-native speed |
| Programmability | Full kernel capabilities | Restricted (bounded loops, limited stack, limited instructions) |
Quick Demo: One-Liner OOM Monitoring
Before diving into theory, let’s run a real bpftrace command and see what an OOM Killer event looks like:
| |
This attaches a probe to the oom_kill_process() kernel function’s entry. When an OOM occurs, it prints the killed process’s PID and name.
The catch: you need an actual OOM event to see output. That’s the beginner’s dilemma — “the command runs but nothing happens.” We’ll address this later with a safe OOM simulation.
The BCC toolkit also includes a ready-to-use oomkill tool:
| |
Sample output from a real OOM on a Java process:
| |
OOM Killer Overview
When the Linux system runs out of memory, the OOM Killer selects and kills a process to free memory. The selection logic is based on oom_badness() — a scoring function that considers memory usage, runtime, priority (oom_score_adj), and other factors.
Key kernel functions live in mm/oom_kill.c:
| |
The problem with traditional OOM handling: you can only see that an OOM occurred via dmesg or kubectl describe pod, but you can’t see the memory allocation trends leading up to it, which container triggered it, or the process memory allocation hotspots. eBPF fills these gaps.
Environment Setup
eBPF development requires a Linux kernel with BTF (BPF Type Format) support, which is standard on modern distributions:
| |
Recommended kernel version: Linux 5.4+, where CO-RE, BTF, and the 1M instruction limit for BPF programs are all available.
Verify Your Setup
Run a simple bpftrace command to confirm everything works:
| |
If you see Hello eBPF!, your environment is ready.
bpftrace: The Fastest Way to Start
bpftrace is a high-level tracing language built on eBPF. It lets you monitor kernel events with just a line or two.
Discover Available Probes
Before writing a monitoring script, see what OOM-related probes exist on your system:
| |
Sample output:
| |
Using tracepoints (recommended)
Tracepoints are stable hooks baked into the kernel source with a guaranteed ABI. Prefer them when available:
| |
Using kprobes (more flexible)
kprobes can attach to any kernel function, but depend on internal implementation details. The second argument of oom_kill_process is the victim task’s task_struct:
| |
To also capture the victim’s memory usage, read mm_struct from task_struct:
| |
Hands-on: Trigger an OOM and Watch
So far we’ve been waiting for OOMs to happen. This section shows how to safely trigger one yourself and watch eBPF catch it in real time.
Method 1: stress-ng + Docker (safest)
Use Docker to cap available memory, then trigger OOM inside the container. This won’t affect your host system:
| |
If stress-ng isn’t available, use a simple C program:
| |
Method 2: direct stress-ng with cgroup (for VMs)
If you’re on a VM or test machine, you can run stress-ng directly with a memory cgroup:
| |
Expected Output
When the OOM triggers, bpftrace will output something like:
| |
The BCC oomkill tool gives more detail:
| |
Note: total-vm shows virtual memory mapping size (including overcommitted space); actual physical memory is anon-rss.
Troubleshooting
Q: No output at all — is bpftrace broken?
A: First verify with sudo bpftrace -e 'BEGIN { printf("OK\n"); exit(); }'. If that doesn’t work, reinstall bpftrace.
Q: stress-ng fails inside Docker?
A: Try docker run --rm -m 64m alpine sh -c 'dd if=/dev/zero of=/dev/null bs=1M count=200' — this will also trigger OOM.
Q: Can I run this on WSL2 / macOS? A: eBPF needs a Linux kernel. WSL2 works (needs WSL2 kernel mode), macOS doesn’t. Use a Linux VM if needed.
Core Concepts Recap
You’ve already used the most important eBPF concepts in the hands-on section above. Here’s a quick reference.
kprobe / kretprobe
Dynamic kernel function probes. Attach eBPF programs at any kernel function entry (kprobe) or return (kretprobe). Flexible — works wherever there’s a kernel function — but depends on internal kernel implementation that may change between versions.
The kprobe:oom_kill_process example above is a classic kprobe usage.
tracepoint
Static kernel tracing points — stable hook points with guaranteed ABI. Prefer tracepoints when available.
tracepoint:oom:mark_victim is the stable OOM tracepoint.
BPF Maps
The data exchange channel between kernel and user space. Common types:
BPF_MAP_TYPE_HASH— key-value storageBPF_MAP_TYPE_PERCPU_HASH— per-CPU, lock-freeBPF_MAP_TYPE_RINGBUF— high-performance ring buffer for event streamsBPF_MAP_TYPE_STACK_TRACE— kernel call stacks
We’ll cover these in detail in the next articles.
CO-RE and BTF
CO-RE (Compile Once, Run Everywhere) is key to eBPF portability. BTF encodes kernel data structure layout information, so eBPF programs use BPF_CORE_READ macros instead of hardcoded offsets. A program compiled on Ubuntu 22.04 can run unchanged on a different kernel version.
From bpftrace to Production Tools
bpftrace is great for quick probing and ad-hoc debugging. For long-running production tools, you’ll need libbpf (C), cilium/ebpf (Go), or Aya (Rust).
In this series, coming next:
- Part 2: Write eBPF kernel code in C + a Go user-space loader for a complete OOM tracer
- Part 3: Container and cgroup-level OOM pinpointing with Rust Aya
- Part 4: Deep dive into BPF OOM kernel patch evolution
Summary
This article covered:
- Why eBPF is ideal for observability (safe, non-invasive, high-performance)
- Monitoring OOM Killer events with a single bpftrace line
- OOM Killer fundamentals
- Development environment setup and verification
- Hands-on OOM simulation with Docker and stress-ng
- Core concepts: kprobe, tracepoint, BPF Maps, CO-RE
Try it now: Open two terminals. In Terminal A run sudo bpftrace -e 'kprobe:oom_kill_process { printf("OOM! %s\n", ((struct task_struct *)arg1)->comm); }'. In Terminal B run docker run --rm -m 64m alpine stress-ng --vm 1 --vm-bytes 128m --timeout 5s. Watch Terminal A capture the OOM in real time.