Replacing VMs with ESP32 for Network Probing — esp32-blackbox Project in Action

Why

I have several LANs in different locations around the city, roughly 10 km apart. To make these networks talk to each other, I used tools like NetBird, ZeroTier, and Cloudflare Tunnel to set up a cross-region virtual LAN.

The network was set up, but how to ensure stability? After all, these tunnels traverse the public internet with varying link quality. The most direct approach is to use Prometheus’s blackbox_exporter for probing — periodic HTTP requests, Pings, DNS queries — feeding results into a time-series database with alert rules, so problems are detected immediately.

But here’s the problem: blackbox_exporter needs to run on a machine. Spinning up a VM just for a probing service doesn’t justify the electricity and hardware costs. The Proxmox server at home already uses enough power; adding another machine is unnecessary.

I happened to have several ESP32 dev boards lying around. The ESP32 is designed for network connectivity — running HTTP requests and sending ICMP packets is well within its capabilities. Power consumption is minimal; it can run on USB power at a cost of a few cents per month in electricity. So this project was born: esp32-blackbox.

Overall Architecture

Let’s first look at where this system fits in the overall network monitoring picture:

mermaid
graph TB
    subgraph SiteA["Site A (Home)"]
        A1@{shape: hex, label: "ESP32 Blackbox"}
        A2["Router/Switch"]
        A1 --- A2
    end
    subgraph SiteB["Site B (Office)"]
        B1@{shape: hex, label: "ESP32 Blackbox"}
        B2["Router/Switch"]
        B1 --- B2
    end
    subgraph SiteC["Site C (Other)"]
        C1@{shape: hex, label: "ESP32 Blackbox"}
        C2["Router/Switch"]
        C1 --- C2
    end
    
    subgraph Overlay["Virtual Network Layer"]
        N1["NetBird"]
        N2["ZeroTier"]
        N3["Cloudflare Tunnel"]
    end
    
    A2 <--> N1 <--> B2
    B2 <--> N2 <--> C2
    A2 <--> N3 <--> C2
    
    subgraph Monitor["Monitoring Center"]
        P@{shape: cyl, label: "Prometheus"}
        G@{shape: doc, label: "Grafana"}
    end
    
    A1 -->|":9090/metrics"| P
    B1 -->|":9090/metrics"| P
    C1 -->|":9090/metrics"| P
    P --> G

    classDef hardware fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef network fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    classDef overlay fill:#fff3e0,stroke:#ff9800,stroke-width:2px
    classDef monitor fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
    class A1,B1,C1 hardware
    class A2,B2,C2,N1,N2,N3 network
    class SiteA,SiteB,SiteC overlay
    class P,G monitor

Each site has an ESP32 that performs probing through its respective egress network. Prometheus scrapes metrics from each node’s port 9090, and Grafana handles visualization and alerting.

Cross-Region Networking Solution

A brief introduction to the networking tools used:

  • NetBird: WireGuard-based mesh VPN, very low latency after P2P hole punching, convenient management interface
  • ZeroTier: Software-defined Layer 2 virtual network, decent stability, good as a backup link
  • Cloudflare Tunnel: Reverse proxy tunnel, doesn’t require a public port to expose internal services, suitable for scenarios where P2P isn’t supported

The benefit of a three-layer networking approach is mutual redundancy. If NetBird goes down, ZeroTier takes over; if neither works, Cloudflare Tunnel provides a safety net. But more redundancy means more links to monitor — and that’s exactly where ESP32 probing shines.

ESP32 Blackbox Project Introduction

Project URL: github.com/Mi-Bee-Studio/esp32-blackbox

Hardware Selection

Currently supports two chips:

ChipRecommended BoardFeatures
ESP32-C3SuperMiniCheap, ~$1.5 on AliExpress
ESP32-C6XIAO ESP32C6Supports WiFi 6, better performance

I’m using the ESP32-C3 SuperMini, which is more than sufficient for this project.

Supported Probe Types

ProtocolDescription
HTTP/HTTPSGET/POST requests, status code validation
TCPTCP connection test
TCP+TLSTLS handshake timing
DNSDNS resolution test
ICMP PingNative socket implementation, RTT measurement
WebSocket/WSSWS connection test

Basically everything blackbox_exporter can do.

Zero Configuration on First Boot

I’m quite satisfied with this design. On first power-up, the ESP32 automatically enters AP mode. Connect your phone to the ESP32_Blackbox hotspot (password 12345678), open a browser to 192.168.4.1 and configure the WiFi. After saving, it reboots and connects automatically — no serial console needed.

mermaid
flowchart TD
    A["Power on"] --> B@{shape: diam, label: "WiFi credentials in NVS?"}
    B -->|"No"| C["Enter AP mode"]
    C --> D["Phone connects to ESP32_Blackbox hotspot"]
    D --> E["Browser opens 192.168.4.1"]
    E --> F["Select WiFi and enter password"]
    F --> G["Save to NVS"]
    G --> H["Reboot"]
    B -->|"Yes"| I["STA mode connect WiFi"]
    H --> I
    I --> J["Start probe tasks"]
    I --> K["Start Web management UI :80"]
    I --> L["Start Metrics service :9090"]

    classDef decision fill:#fff3e0,stroke:#ff9800,stroke-width:2px
    classDef startup fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef config fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    classDef running fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
    class B decision
    class A startup
    class C,D,E,F,G config
    class I,J,K,L running

Web Management Interface

In STA mode, open a browser to the ESP32’s IP to access the management interface:

You can edit the JSON configuration directly on the interface and save it without recompiling the firmware. It also supports hot-reload — POST to /api/reload after changing the config and it takes effect immediately.

Configuration File Format

Probe targets are configured via JSON, stored in the SPIFFS filesystem:

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "scrape_interval": 30,
  "metrics_port": 9090,
  "modules": {
    "http_2xx": {
      "prober": "http",
      "timeout": 10,
      "http": {
        "method": "GET",
        "valid_status_codes": [200]
      }
    },
    "icmp_ping": {
      "prober": "icmp",
      "timeout": 5,
      "icmp": {
        "packets": 3,
        "payload_size": 56
      }
    }
  },
  "targets": [
    {
      "name": "httpbin_http",
      "target": "httpbin.org",
      "module": "http_2xx"
    },
    {
      "name": "dns_google",
      "target": "8.8.8.8",
      "module": "dns_resolve"
    }
  ]
}

modules define probe behavior (protocol, timeout, validation rules), and targets define probe targets which reference module configurations via the module field. Want to add a new probe target? Just edit the JSON — no need to touch the code.

Prometheus Integration

ESP32 Blackbox is fully compatible with Prometheus’s scraping model. The /metrics endpoint outputs standard Prometheus text format:

text
1
2
3
4
5
6
7
# HELP probe_success Whether the probe succeeded
# TYPE probe_success gauge
probe_success{target="httpbin_http", module="http_2xx"} 1

# HELP probe_duration_seconds Duration of the probe in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds{target="httpbin_http", module="http_2xx"} 0.234

Configure a scrape job in Prometheus:

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
scrape_configs:
  # Directly scrape all probe results from ESP32
  - job_name: 'esp32-blackbox'
    static_configs:
      - targets: ['192.168.1.100:9090']

  # Or use /probe endpoint for ad-hoc probing (same as original blackbox_exporter)
  - job_name: 'blackbox_http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['httpbin.org:80']
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 192.168.1.100:9090

The second approach uses the /probe endpoint, identical to how the original blackbox_exporter works. You can even take your existing Prometheus config pointing to blackbox_exporter, change the IP to point to the ESP32, and leave everything else unchanged.

Real Deployment

I placed several ESP32s at different sites, each configured with different probe targets:

  • ESP32 at Site A probes services at Sites B and C
  • ESP32 at Site B probes services at Sites A and C
  • Site C follows the same pattern

This way, we have data on link quality between any two sites. A Grafana dashboard shows latency, packet loss, and HTTP success rates at a glance.

mermaid
graph LR
    subgraph SiteA["Site A"]
        EA@{shape: hex, label: "ESP32 #1"}
    end
    subgraph SiteB["Site B"]
        EB@{shape: hex, label: "ESP32 #2"}
    end
    subgraph SiteC["Site C"]
        EC@{shape: hex, label: "ESP32 #3"}
    end
    subgraph Monitor["Monitoring"]
        P@{shape: cyl, label: "Prometheus"}
        G@{shape: doc, label: "Grafana"}
    end
    
    EA -->|"Probes B/C"| EB
    EA -->|"Probes B/C"| EC
    EB -->|"Probes A/C"| EA
    EB -->|"Probes A/C"| EC
    EC -->|"Probes A/B"| EA
    EC -->|"Probes A/B"| EB
    
    EA -->|:9090/metrics| P
    EB -->|:9090/metrics| P
    EC -->|:9090/metrics| P
    P --> G

    classDef hardware fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef site fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    classDef monitor fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
    class EA,EB,EC hardware
    class SiteA,SiteB,SiteC site
    class P,G monitor

Build & Flash

The project is based on ESP-IDF v6.0 and provides several build methods:

bash
1
2
3
4
5
6
7
# Recommended: Python script, single command
python build.py esp32c3 flash COM3

# Or use idf.py directly
idf.py set-target esp32c3
idf.py build
idf.py -p COM3 flash

If using ESP32-C6, just replace esp32c3 with esp32c6.

Summary

In short: I didn’t want to spin up another server just to run blackbox_exporter. An ESP32 costs a few dollars, consumes less than 1W of power, runs off a USB charger, and can be placed anywhere without a second thought.

The project is open source on GitHub — give it a try if you’re interested: Mi-Bee-Studio/esp32-blackbox