MiBeeNvr v0.6.0's Test Machines: Three Camera Projects Updated in Sync

The concurrently released MiBeeNvr v0.6.0 brought major features like timelapse, video transcoding, and ONVIF enhancements. Unit tests alone are far from enough — the full workflow must be tested against real camera hardware. To provide reliable test machines for this release, three camera projects were updated on the same day, June 5th — both to supply testing environments for the NVR and to solve some typical embedded development engineering problems along the way.

MiBeeHomeCam v0.2.0: RTSP Source and Timelapse Test Machine on ESP32-S3

MiBeeHomeCam is a surveillance camera firmware based on the Seeed Xiao ESP32-S3 Sense (ESP-IDF development), serving as an RTSP and timelapse test source for the NVR. The core problem this update solves: how a resource-constrained MCU system can simultaneously handle real-time video streaming, recording, motion detection, and network transmission.

PSRAM Double Buffering and Frame Resource Contention

The ESP32-S3 camera driver retrieves frame buffers via esp_camera_fb_get(). The fb (frame buffer) is a shared resource owned by the camera sensor driver — the sensor writes data to the fb at a fixed framerate, and upper-layer tasks read from it. The problem is that esp_camera_fb_get() returns a pointer to the sensor driver’s internal DMA buffer, not a copy. This means:

  1. When the recording task holds the fb for an extended period, the MJPEG stream task can’t get a frame, causing TCP connections to time out and disconnect due to no data being sent
  2. Simultaneous reads from both sides can lead to data races

v0.2.0’s solution: immediately copy the JPEG data to the PSRAM heap via memcpy() after getting the fb, then call esp_camera_fb_return() to release it. The copies in PSRAM are held independently by the stream and recording tasks, without interference:

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Old approach: holding fb directly, stream and recording conflict
fb = esp_camera_fb_get();
// Recording occupied fb for 200ms...
esp_camera_fb_return(fb);

// New approach: copy to PSRAM, release fb immediately
fb = esp_camera_fb_get();
memcpy(psram_buffer, fb->buf, fb->len); // ~5ms @ 800x600 JPEG
esp_camera_fb_return(fb);
// psram_buffer processed independently by each task, no more contention

The ESP32-S3’s Octal PSRAM has enough bandwidth to support simultaneous reads and writes — tested at 800×600 JPEG (~50KB/frame), memcpy + dual-task concurrency shows no significant throughput bottleneck. But chips without PSRAM (like regular ESP32-S3 modules) simply can’t use this approach — they only have 512KB of internal SRAM.

Motion Detection in Timelapse: JPEG Software Decoding + Frame Differencing

The ESP32-S3 has no hardware JPEG decoder, so motion detection must be implemented in software. Full decoding is not feasible — fully decoding an 800×600 JPEG to RGB requires ~1.4MB of memory. While PSRAM has enough capacity, CPU time is insufficient (software JPEG decoding takes 200-500ms per frame).

The practical approach is partial decoding: only decode the Y channel (grayscale), discarding the UV channels. The Y channel data volume is only 1/3 of full RGB, and motion detection only needs luminance change information — the chroma channels contribute nothing to detection.

The simplified frame differencing workflow:

1
2
3
Frame N JPEG → soft decode Y channel → 8x8 block luminance mean matrix (75x100 blocks)
Frame N+1 JPEG → soft decode Y channel → 8x8 block luminance mean matrix
Per-block difference calculation → count blocks exceeding threshold → area exceeded → motion event

The sensitivity levels 1-5 map to two parameters: difference threshold (10-50) and trigger area ratio (5%-1%). Level 1 is the least sensitive (threshold 50 + 5% area trigger), suitable for outdoor environments to reduce false alarms; level 5 is the most sensitive (threshold 10 + 1% area trigger), suitable for fine-grained indoor monitoring.

A subtle bug: motion detection processing takes ~500ms, but the timelapse frame drop threshold was originally also set to 500ms — meaning every motion detection would cause the current frame to be judged as “timed out and dropped”, resulting in all frames being dropped in timelapse mode. The fix was simple: dynamically adjust the frame drop threshold from 500ms to 2000ms (when detection mode is enabled).

DRAM Heap Corruption: A Problem Without a Root Cause Yet

This was the most agonizing bug. After recording runs for more than 30 seconds, calling fopen() would directly panic. Serial logs showed that the DRAM heap’s last_remainder_byte or free_bytes fields were overwritten with garbage data. The puzzle:

  • Heap metadata is in DRAM (internal SRAM), while all frame data is allocated in PSRAM
  • The code itself showed no obvious memory overruns
  • The same code worked fine under ESP-IDF v5.x, only appearing after upgrading to v6.0
  • PSRAM-allocated memory was completely normal

The eventual workaround: instead of using fopen() + fread() (standard C library file I/O) for file downloads, use POSIX open() + read() to read the entire file into a PSRAM buffer (~130 KB/s), then send it to the client. PSRAM is unaffected by the corrupted DRAM heap, so this approach works reliably.

The root cause is still under investigation — suspected to be a driver (SDMMC or WiFi) in ESP-IDF v6.0 writing out of bounds to memory, but the specific module hasn’t been identified yet.

OTA Dual Slots and Partition Table

v0.2.0 restructured the partition table to support dual OTA slots:

PartitionOffsetSizePurpose
bootloader0x048KBBootloader
ota_00x100002MBCurrently running firmware
ota_10x2100002MBUpgrade backup slot
nvs0x41000020KBPersistent configuration
spiffs0x4150001.5MBWeb UI files

Upgrade flow: Web upload firmware → write to ota_1 → set boot partition to ota_1 → reboot. If the new firmware fails to start (watchdog timeout), the bootloader automatically falls back to ota_0. After changing the partition table, idf.py erase-flash is required, otherwise the old partition table won’t match the new firmware’s locations.

ESP-IDF v6.0 Compatibility

Upgrading from v0.1.0’s ESP-IDF v5.x to v6.0 encountered a series of API changes:

  • esp_vfs_fat_sdmmc_unmount()esp_vfs_fat_sdcard_unmount() (function rename)
  • httpd_resp_set_status(req, "503 Service Unavailable")HTTPD_503 macro removed, use string instead
  • config.timeout_sec split into config.timeout_att (auth timeout) and config.timeout_bcn (beacon timeout)

These changes are documented in the ESP-IDF migration guide, but in practice there are always gaps — compiling successfully doesn’t mean the behavior is correct.

Other Technical Points Worth Mentioning

  • Upload queue persistence: Uses NVS blob type to store serialized state of the pending upload file list. Each enqueue/dequeue writes to NVS, and the queue is rebuilt on startup after power loss. NVS has a write endurance limit (~100k writes), so writes are coalesced — consecutive changes within 10 seconds are merged into a single write.
  • Structured JSON logging: Unified log format {"ts":"...","level":"info","module":"storage","msg":"..."}, making it easy to integrate with Loki/ELK.
  • Smart frame drop strategy: When PSRAM remaining space drops below 20%, calculates the difference between write speed and capture speed, and drops frames proportionally. Prioritizes dropping redundant frames from motion detection results while retaining key frames.
  • Watchdog segmented feeding: Long vTaskDelay(30000) calls would trigger TWDT (Task WatchDog Timer) timeout. Changed to for (int i = 0; i < 6; i++) { vTaskDelay(5000 / portTICK_PERIOD_MS); esp_task_wdt_reset(); } to reset the watchdog every 5 seconds.

MiBeeCam v0.2.1: WiFi Compatibility Fix for WPA2-PSK Routers

MiBeeCam is a compact smart camera based on the Luatos ESP32-S3 A10 module. The core fix in this version was WiFi STA mode failing to connect to WPA2-PSK routers. After examining the actual code diff, the real fix turned out to be much more complex than what the Release Notes described — it wasn’t just about disabling SAE and increasing stack size.

Actual Code Diff Changes

Pulling the diff between v0.2.0 and v0.2.1, the wifi_manager.c changes involved 7 independent aspects:

1. Event Loop Decoupling:

This was the most critical change. The old code directly called the user-registered s_callback() within the WiFi event callback, which runs in the ESP-IDF wifi task context (very limited stack space). Once the callback triggered esp_wifi_connect() which re-enters WiFi event handling, it formed a recursion:

1
2
3
4
5
6
7
wifi event task (stack: 3072 bytes)
  → s_callback()
    → esp_wifi_connect()
      → (internal) triggers new event
        → s_callback()  // recursion, -800 bytes
          → esp_wifi_connect()  // more recursion, -800 bytes
            → stack overflow, crash

The fix introduces a custom event base WIFI_MANAGER_EVENTS. notify_state() no longer calls the callback directly, but instead posts the event to the event loop task via esp_event_post() (which has more stack space and won’t recurse):

c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
ESP_EVENT_DEFINE_BASE(WIFI_MANAGER_EVENTS);

// Old approach: call callback directly in WiFi task context
static void notify_state(wifi_state_t new_state) {
    if (s_callback) {
        s_callback(new_state, s_user_data);  // recursion risk
    }
}

// New approach: post to event loop, handled by independent task
static void notify_state(wifi_state_t new_state) {
    esp_event_post(WIFI_MANAGER_EVENTS, (int32_t)new_state, NULL, 0, portMAX_DELAY);
}

An independent wifi_state_event_handler was registered to consume events from this event base, safely calling s_callback() in the event loop context.

2. Disable Power Saving: esp_wifi_set_ps(WIFI_PS_NONE)

ESP-IDF’s WiFi power saving mode (Modem Sleep) turns off the radio circuit between DTIM beacon intervals. In power saving mode, the STA may miss the router’s EAPOL frames (frames 2/4 of the 4-way handshake), causing authentication timeout. This is fatal for WPA2-PSK handshakes — EAPOL frames have no retransmission mechanism, so missing one frame means restarting the entire authentication process.

3. PMF Configuration: pmf_cfg.capable = true, pmf_cfg.required = false

PMF (Protected Management Frames, 802.11w) is an optional extension for WPA2 and mandatory for WPA3. For routers that support PMF but don’t implement it correctly, a STA declaring PMF capable may have its association rejected. Setting capable = true, required = false means the STA supports PMF but doesn’t require it — the router can choose not to use it.

4. SAE PWE Configuration: sae_pwe_h2e = WPA3_SAE_PWE_BOTH

SAE’s PWE (Password Element) derivation has two methods: Hunting-and-Pecking (traditional, computationally intensive) and Hash-to-Element (efficient, new standard). WPA3_SAE_PWE_BOTH means the STA supports both, letting the router choose. Combined with disabling CONFIG_ESP_WIFI_ENABLE_WPA3_SAE in sdkconfig, this ensures the STA doesn’t initiate SAE negotiation.

5. Country Code Setting: esp_wifi_set_country_code("CN", false)

Different countries have different 2.4GHz channel ranges (Japan 1-14, US 1-11, China 1-13). Without setting a country code, ESP-IDF defaults to globally common channels (1-11). If the router operates on channels 12/13, the STA won’t find it during scanning.

6. TX Power Boost: Same as MiBeeHomeCam, set WiFi transmit power to 15 dBm. The code uses the same esp_wifi_set_max_tx_power(60) (15 / 0.25 = 60).

7. Retry Interval Adjustment: Changed from 5 seconds to 10 seconds to avoid the router rejecting service due to frequent retries. The retry timer callback also removed the old pattern of calling esp_wifi_disconnect() before esp_wifi_connect() — the old code would disconnect before each retry, triggering a STA_DISCONNECTED event, and if the event handler called esp_wifi_connect() again, it formed a recursive loop. The new code directly calls esp_wifi_connect() without disconnecting, relying on ESP-IDF’s internal connection state machine to manage retries.

What the Release Notes Didn’t Mention

The code diff also revealed a debugging leftover. The wifi_start_sta() function contained hardcoded WiFi credentials:

c
1
2
strncpy((char *)wifi_config.sta.ssid, "TEST_SSID", ...);
(void)ssid;(void)pass;  // parameters are ignored!

This was left over from debugging — to bypass parameter passing issues, the test router’s SSID and password were hardcoded directly in the function body (already desensitized), causing the function parameters ssid and pass to be completely ignored after (void) suppressed compiler warnings.

This obviously can’t go into a release — if your device flashes this firmware, it will only connect to that hardcoded router. This needs to be fixed.

rpi-cam v0.2.0: ONVIF Reference Implementation and HLS Test Machine

rpi-cam is a Raspberry Pi Go ONVIF camera service, providing a test environment for NVR v0.6.0’s ONVIF enhancements and HLS/LL-HLS features.

HLS Live Streaming: FFmpeg Subprocess Management

rpi-cam doesn’t do encoding/decoding itself — it spawns an FFmpeg subprocess to convert RTSP streams into HLS segments:

1
rpi-cam RTSP Server (gortsplib) → pipe → FFmpeg (libx264 + mpegts) → .ts segments + .m3u8 playlist

Several key details of subprocess management:

  1. Process monitoring: Uses Go’s os.Process.Signal(syscall.Signal(0)) to probe whether FFmpeg is alive every 5 seconds. Automatically restarts on unexpected exit, with restart interval backing off exponentially from 1s to 30s to avoid疯狂 forking during repeated crashes.
  2. SIGPIPE handling: When FFmpeg writes to a pipe whose read end has closed (e.g., HLS client disconnects), the default SIGPIPE will kill the process. Must set signal.Ignore(syscall.SIGPIPE) in Go.
  3. HLS segment cleanup: FFmpeg’s -hls_list_size only controls the number of entries in the playlist, it doesn’t delete old segments from disk. A separate goroutine periodically scans the HLS directory and removes .ts files no longer referenced in the m3u8.

H.264 SPS/PPS Caching and Snapshot Reliability

SPS (Sequence Parameter Set) and PPS (Picture Parameter Set) in H.264 streams are critical parameter sets for decoding. They typically appear before IDR frames, but some cameras only declare SPS/PPS in RTSP’s SDP and don’t resend them in the stream.

When converting H.264 to JPEG snapshots, without SPS/PPS, libavcodec cannot initialize the decoding context, and the conversion fails immediately. The fix in v0.2.0 parses the SDP in the RTSP DESCRIBE response, extracts the base64-encoded SPS/PPS data from the sprop-parameter-sets field, and caches them in a global struct:

go
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
type SPSPPS struct {
    SPS []byte
    PPS []byte
    Raw string // raw base64 string
}

// Extract sprop-parameter-sets from SDP
// sprop-parameter-sets=Z0LAH6oHgUaA,aL4HiP4A
func extractSPSPPS(sdp string) (*SPSPPS, error) {
    re := regexp.MustCompile(`sprop-parameter-sets=([^,]+),([^;\s]+)`)
    m := re.FindStringSubmatch(sdp)
    if len(m) < 3 {
        return nil, fmt.Errorf("sprop-parameter-sets not found")
    }
    sps, _ := base64.StdEncoding.DecodeString(m[1])
    pps, _ := base64.StdEncoding.DecodeString(m[2])
    return &SPSPPS{SPS: sps, PPS: pps}, nil
}

The cached SPS/PPS are injected into avcodec’s extradata on each snapshot request, ensuring libavcodec can properly initialize the decoder. Snapshots are generated correctly even when the stream doesn’t carry parameter sets.

Web Admin UI Token Authentication

v0.2.0’s Web UI is based on Go’s embed package + Gin framework. The token authentication design:

  1. Login endpoint: POST /api/auth/login validates username/password, returns a 256-bit random token (generated via crypto/rand)
  2. Token storage: An in-memory sync.Map with the token’s SHA256 hash as key (to prevent timing attacks) and expiration time as value
  3. Middleware: Gin’s AbortWithStatusJSON(401) directly rejects expired or invalid tokens
  4. First-character routing: Web UI static file paths starting with /app/ bypass authentication; all API paths require authentication

Why not JWT? Because rpi-cam has no external dependencies, and JWT would require importing golang-jwt/jwt — not worth adding a dependency for a small feature. Simple random token + SHA256 hash is more than sufficient for a standalone scenario.

i18n Chinese/English Switching

Translation files use JSON format, structured by page:

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "login": {
    "title": {"en": "Login", "zh": "登录"},
    "username": {"en": "Username", "zh": "用户名"},
    "password": {"en": "Password", "zh": "密码"},
    "submit": {"en": "Sign In", "zh": "登录"}
  },
  "ptz": {
    "pan_left": {"en": "Pan Left", "zh": "左转"},
    "tilt_up": {"en": "Tilt Up", "zh": "上仰"}
  }
}

The frontend detects browser language preference via navigator.language, defaulting to English if not set. Switching is persisted via localStorage.setItem('locale', 'zh'), and the choice is preserved across refreshes. Translation files are embedded into the binary at build time via Go embed, with no external file dependencies at runtime.

Summary

Three projects released on the same day, June 5th, each solving a typical class of embedded/backend engineering problems:

  • MiBeeHomeCam demonstrates how an MCU-level camera firmware, under resource constraints (512KB SRAM / 8MB PSRAM / 240MHz dual-core), can simultaneously handle real-time streaming, recording, motion detection, and network transmission through techniques like PSRAM double buffering, partial JPEG decoding, and POSIX-based bypass of corrupted DRAM heap.
  • MiBeeCam’s WiFi fixes address two very typical problems in embedded development: protocol compatibility (interoperability between WPA3’s SAE and WPA2 routers) and task stack overflow (recursive callbacks exhausting stack space). Neither is hard to fix, but the debugging process required understanding ESP-IDF’s WiFi stack design.
  • rpi-cam’s HLS and Web UI demonstrate how a Go backend service can achieve production-grade functionality with minimal dependencies (zero CGO, no JWT library, no external frontend build tools).

If you’re building your own NVR system, these projects can serve as reference implementations or be used directly as components.


Related project links: