eBPF Observability: Getting Started with OOM Killer Monitoring

eBPF (Extended Berkeley Packet Filter) started as a network packet filtering tool, but over nearly a decade it has evolved into the most powerful observability framework in the Linux kernel. It allows you to safely inject and execute custom programs without modifying kernel source code or loading kernel modules.

This article kicks off the series, using OOM (Out-of-Memory) monitoring as a concrete entry point to learn the core eBPF concepts and toolchain.

Why eBPF for Observability

Traditional monitoring tools like top, free, and ps only show final states — how much memory a process is using, how much is left. But many problems happen “in flight”: why did a process OOM? Which process triggered it? What was the context at that moment?

eBPF hooks into the kernel’s critical paths, capturing full context the instant an event occurs:

  • Safe: eBPF programs pass the verifier — they won’t crash the kernel
  • Non-invasive: no system restarts or application modifications needed
  • High performance: events are preprocessed in kernel space, avoiding frequent user/kernel context switches
  • Dynamic: load and unload on demand, zero overhead when not needed

Comparing kernel module development vs eBPF:

AspectKernel ModuleeBPF
SafetyOne bug crashes the whole systemVerifier enforces safety sandbox
Development costMust know kernel internals, hard to debugCO-RE + libbpf, compile once run anywhere
DeploymentRebuild / reload moduleDynamic loading, no restart
PerformanceNative execution, zero overheadJIT-compiled to native, near-native speed
ProgrammabilityFull kernel capabilitiesRestricted (bounded loops, limited stack, limited instructions)

Quick Demo: One-Liner OOM Monitoring

Before diving into theory, let’s run a real bpftrace command and see what an OOM Killer event looks like:

bash
1
2
3
4
5
6
# Terminal A: Start OOM monitoring
sudo bpftrace -e '
  kprobe:oom_kill_process {
    $task = (struct task_struct *)arg1;
    printf("OOM killed: %s (PID: %d)\n", $task->comm, $task->pid);
  }'

This attaches a probe to the oom_kill_process() kernel function’s entry. When an OOM occurs, it prints the killed process’s PID and name.

The catch: you need an actual OOM event to see output. That’s the beginner’s dilemma — “the command runs but nothing happens.” We’ll address this later with a safe OOM simulation.

The BCC toolkit also includes a ready-to-use oomkill tool:

bash
1
sudo /usr/share/bcc/tools/oomkill

Sample output from a real OOM on a Java process:

1
2
06:13:42  oom-killer  gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
06:13:42  Killed process 1234 (java), total-vm:2.5GB, anon-rss:1.8GB

OOM Killer Overview

When the Linux system runs out of memory, the OOM Killer selects and kills a process to free memory. The selection logic is based on oom_badness() — a scoring function that considers memory usage, runtime, priority (oom_score_adj), and other factors.

Key kernel functions live in mm/oom_kill.c:

c
1
2
3
4
void oom_kill_process(struct oom_control *oc, struct task_struct *p,
                      const char *message);
bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
                              int order);

The problem with traditional OOM handling: you can only see that an OOM occurred via dmesg or kubectl describe pod, but you can’t see the memory allocation trends leading up to it, which container triggered it, or the process memory allocation hotspots. eBPF fills these gaps.

Environment Setup

eBPF development requires a Linux kernel with BTF (BPF Type Format) support, which is standard on modern distributions:

bash
1
2
3
4
5
6
# Verify BTF support
cat /boot/config-$(uname -r) | grep CONFIG_DEBUG_INFO_BTF

# Install toolchain (Ubuntu/Debian)
sudo apt install -y llvm clang libbpf-dev linux-tools-common
sudo apt install -y bpfcc-tools bpftrace

Recommended kernel version: Linux 5.4+, where CO-RE, BTF, and the 1M instruction limit for BPF programs are all available.

Verify Your Setup

Run a simple bpftrace command to confirm everything works:

bash
1
sudo bpftrace -e 'BEGIN { printf("Hello eBPF!\n"); exit(); }'

If you see Hello eBPF!, your environment is ready.

bpftrace: The Fastest Way to Start

bpftrace is a high-level tracing language built on eBPF. It lets you monitor kernel events with just a line or two.

Discover Available Probes

Before writing a monitoring script, see what OOM-related probes exist on your system:

bash
1
2
3
4
5
# List all OOM-related kprobes
bpftrace -l 'kprobe:*oom*'

# List all OOM-related tracepoints
bpftrace -l 'tracepoint:oom:*'

Sample output:

1
2
3
4
kprobe:oom_kill_process
kprobe:__oom_kill_process
tracepoint:oom:mark_victim
tracepoint:oom:reap_task

Tracepoints are stable hooks baked into the kernel source with a guaranteed ABI. Prefer them when available:

bash
1
2
3
4
sudo bpftrace -e '
  tracepoint:oom:mark_victim {
    printf("OOM victim marked: PID=%d\n", args->pid);
  }'

Using kprobes (more flexible)

kprobes can attach to any kernel function, but depend on internal implementation details. The second argument of oom_kill_process is the victim task’s task_struct:

bash
1
2
3
4
5
sudo bpftrace -e '
  kprobe:oom_kill_process {
    $task = (struct task_struct *)arg1;
    printf("OOM! killed=%s pid=%d\n", $task->comm, $task->pid);
  }'

To also capture the victim’s memory usage, read mm_struct from task_struct:

bash
1
2
3
4
5
6
7
8
9
sudo bpftrace -e '
  kprobe:oom_kill_process {
    $task = (struct task_struct *)arg1;
    $mm = $task->mm;
    printf("OOM! %s (PID: %d) total_vm=%dMB rss=%dMB\n",
      $task->comm, $task->pid,
      ($mm->total_vm * 4096) / 1024 / 1024,
      ($mm->resident_vm * 4096) / 1024 / 1024);
  }'

Hands-on: Trigger an OOM and Watch

So far we’ve been waiting for OOMs to happen. This section shows how to safely trigger one yourself and watch eBPF catch it in real time.

Method 1: stress-ng + Docker (safest)

Use Docker to cap available memory, then trigger OOM inside the container. This won’t affect your host system:

bash
1
2
3
4
5
6
7
8
9
# Terminal A: Start bpftrace monitoring
sudo bpftrace -e '
  kprobe:oom_kill_process {
    $task = (struct task_struct *)arg1;
    printf("OOM! killed=%s pid=%d\n", $task->comm, $task->pid);
  }'

# Terminal B: Trigger OOM inside a memory-limited container
docker run --rm -m 64m alpine stress-ng --vm 1 --vm-bytes 128m --timeout 5s

If stress-ng isn’t available, use a simple C program:

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Terminal B: Compile and run a memory-hungry program in Docker
cat > /tmp/oom-trigger.c << 'EOF'
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main() {
    while (1) {
        char *p = malloc(10 * 1024 * 1024);  // 10MB at a time
        if (!p) break;
        memset(p, 0, 10 * 1024 * 1024);
        sleep(1);
    }
    pause();
    return 0;
}
EOF
docker run --rm -m 64m -v /tmp:/tmp alpine sh -c '
  apk add gcc musl-dev >/dev/null 2>&1 && \
  gcc -o /tmp/trigger /tmp/oom-trigger.c && \
  /tmp/trigger'

Method 2: direct stress-ng with cgroup (for VMs)

If you’re on a VM or test machine, you can run stress-ng directly with a memory cgroup:

bash
1
2
3
4
5
6
7
8
9
# Create a memory-limited cgroup (cgroup v1)
sudo mkdir -p /sys/fs/cgroup/memory/oom-lab
echo "50M" | sudo tee /sys/fs/cgroup/memory/oom-lab/memory.limit_in_bytes

# Run stress-ng inside this cgroup
sudo bash -c '
  echo $$ > /sys/fs/cgroup/memory/oom-lab/cgroup.procs
  exec stress-ng --vm 1 --vm-bytes 100M --timeout 5s
'

Expected Output

When the OOM triggers, bpftrace will output something like:

1
2
Attaching 1 probe...
OOM! killed=stress-ng pid=7891

The BCC oomkill tool gives more detail:

1
2
06:13:42  oom-killer  gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
06:13:42  Killed process 7891 (stress-ng), total-vm:102.4GB, anon-rss:49.6GB

Note: total-vm shows virtual memory mapping size (including overcommitted space); actual physical memory is anon-rss.

Troubleshooting

Q: No output at all — is bpftrace broken? A: First verify with sudo bpftrace -e 'BEGIN { printf("OK\n"); exit(); }'. If that doesn’t work, reinstall bpftrace.

Q: stress-ng fails inside Docker? A: Try docker run --rm -m 64m alpine sh -c 'dd if=/dev/zero of=/dev/null bs=1M count=200' — this will also trigger OOM.

Q: Can I run this on WSL2 / macOS? A: eBPF needs a Linux kernel. WSL2 works (needs WSL2 kernel mode), macOS doesn’t. Use a Linux VM if needed.

Core Concepts Recap

You’ve already used the most important eBPF concepts in the hands-on section above. Here’s a quick reference.

kprobe / kretprobe

Dynamic kernel function probes. Attach eBPF programs at any kernel function entry (kprobe) or return (kretprobe). Flexible — works wherever there’s a kernel function — but depends on internal kernel implementation that may change between versions.

The kprobe:oom_kill_process example above is a classic kprobe usage.

tracepoint

Static kernel tracing points — stable hook points with guaranteed ABI. Prefer tracepoints when available.

tracepoint:oom:mark_victim is the stable OOM tracepoint.

BPF Maps

The data exchange channel between kernel and user space. Common types:

  • BPF_MAP_TYPE_HASH — key-value storage
  • BPF_MAP_TYPE_PERCPU_HASH — per-CPU, lock-free
  • BPF_MAP_TYPE_RINGBUF — high-performance ring buffer for event streams
  • BPF_MAP_TYPE_STACK_TRACE — kernel call stacks

We’ll cover these in detail in the next articles.

CO-RE and BTF

CO-RE (Compile Once, Run Everywhere) is key to eBPF portability. BTF encodes kernel data structure layout information, so eBPF programs use BPF_CORE_READ macros instead of hardcoded offsets. A program compiled on Ubuntu 22.04 can run unchanged on a different kernel version.

From bpftrace to Production Tools

bpftrace is great for quick probing and ad-hoc debugging. For long-running production tools, you’ll need libbpf (C), cilium/ebpf (Go), or Aya (Rust).

In this series, coming next:

  • Part 2: Write eBPF kernel code in C + a Go user-space loader for a complete OOM tracer
  • Part 3: Container and cgroup-level OOM pinpointing with Rust Aya
  • Part 4: Deep dive into BPF OOM kernel patch evolution

Summary

This article covered:

  • Why eBPF is ideal for observability (safe, non-invasive, high-performance)
  • Monitoring OOM Killer events with a single bpftrace line
  • OOM Killer fundamentals
  • Development environment setup and verification
  • Hands-on OOM simulation with Docker and stress-ng
  • Core concepts: kprobe, tracepoint, BPF Maps, CO-RE

Try it now: Open two terminals. In Terminal A run sudo bpftrace -e 'kprobe:oom_kill_process { printf("OOM! %s\n", ((struct task_struct *)arg1)->comm); }'. In Terminal B run docker run --rm -m 64m alpine stress-ng --vm 1 --vm-bytes 128m --timeout 5s. Watch Terminal A capture the OOM in real time.