MiBeeNvr v0.4.0: Audio Recording Pipeline and Multi-Layer Health Monitoring Architecture

After v0.3.1 shipped, I put in another 196 commits. v0.4.0 is a feature-dense release: audio recording pipeline, multi-layer health monitoring engine, HLS/LL-HLS playback stability optimization, and a major UI redesign. For the full changelog, see GitHub Release Notes.

The previous post covered v0.3.x’s multi-protocol streaming and Xiaomi camera support (v0.3.0 Tech Post). If you haven’t read the first post, start with MiBeeNvr Introduction.

Audio Recording: From Silent to Sound

In the v0.3.x era, recorded MP4 files only had a video track. v0.4.0 introduces a complete audio capture and muxing pipeline, supporting AAC audio from RTSP cameras and G.711 audio from ONVIF/Xiaomi cameras.

Audio Pipeline Architecture

The core challenge of audio processing is that different protocol cameras use different audio codecs, while the final MP4 container needs unified muxing.

mermaid
flowchart LR
    subgraph "RTSP Camera"
        BH[RTSP Source]
        BM[AAC Audio]
    end
    
    subgraph "ONVIF / Xiaomi"
        SQ[ONVIF/Xiaomi]
        JR[G.711 mu-law]
        VN[G.711 A-law]
    end
    
    subgraph "MiBeeNvr Audio Pipeline"
        JQ[StreamHub Audio Broadcast]
        SH[MP4 Muxer]
    end
    
    subgraph "Output"
        YZ[MP4 Segment<br/>Video + Audio]
    end
    
    BH --> BM --> JQ
    SQ --> JR --> JQ
    SQ --> VN --> JQ
    JQ --> SH
    SH --> YZ

    classDef input fill:#E3F2FD,stroke:#1565C0,color:#1565C0
    classDef process fill:#FFF3E0,stroke:#E65100,color:#BF360C
    classDef output fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20
    
    class BH,BM,SQ,JR,VN input
    class JQ,SH process
    class YZ output

Per-Camera Audio Toggle

Audio recording is disabled by default, enabled per-camera via audio_enabled:

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cameras:
  - id: "front-door"
    name: "Front Door"
    protocol: "rtsp"
    encoding: "h264"
    audio_enabled: true
  - id: "baby-room"
    name: "Baby Room"
    protocol: "xiaomi"
    encoding: "h264"
    audio_enabled: true
  - id: "driveway"
    name: "Driveway"
    protocol: "rtsp"
    encoding: "h265"
    audio_enabled: false   # Camera that doesn't need audio

G.711 Audio Muxing

G.711 (mu-law and A-law) is a common audio codec for IP cameras and Xiaomi cameras. The MP4 container doesn’t natively support G.711 directly, so v0.4.0 uses custom box types (ulaw / alaw) to encapsulate G.711 audio frames:

  • AAC audio -> Standard mp4a box
  • G.711 mu-law -> ulaw box
  • G.711 A-law -> alaw box

The advantage of this approach is zero CPU overhead – raw G.711 frames are packed directly into the MP4 without transcoding. The downside is that some generic media players may not recognize the custom boxes, but MiBeeNvr’s Web player handles them correctly.

StreamHub Audio Broadcast

The StreamHub introduced in v0.3.1 is extended with audio support in v0.4.0. The recorder now broadcasts not only video frames but also audio frames. All real-time consumers (WebRTC, HTTP-FLV, etc.) receive audio data, so “recordings have sound, and live preview has sound too.”

Audio Retention During Merging

The segment merge pipeline has also been adapted to ensure audio tracks are fully retained after merging. Previously, merging only processed video tracks; now it checks and preserves audio track information as well.

Multi-Layer Health Monitoring: Beyond “Connected or Not”

Health monitoring is v0.4.0’s most complex subsystem. internal/health/ contains 17 source files covering the full chain from connection detection to auto-recovery.

Three-Layer Detection Model

mermaid
flowchart LR
    L1["Layer 1 Connection Layer<br/>RTSP / ONVIF / CS2 Heartbeat"] --> L2["Layer 2 Stream Layer<br/>FPS / Bitrate / Keyframe Interval"]
    L2 --> L3["Layer 3 Picture Layer<br/>Freeze Detection / Black Screen Detection"]
    L3 --> SCORE["Health Score Engine"]

    classDef layer fill:#FFF3E0,stroke:#E65100,color:#BF360C
    classDef hub fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20
    class L1,L2,L3 layer
    class SCORE hub

Three layers progress from shallow to deep: the connection layer checks RTSP session, ONVIF response, CS2 P2P heartbeat; the stream layer counts FPS, bitrate, and keyframe intervals; the picture layer performs freeze and black screen detection. Each layer runs independently with its own thresholds. Detection results are aggregated by the health score engine into a comprehensive score.

Health Score Engine

The scoring engine maintains a real-time health score for each camera, combining results from all three layers:

  • Layer 1 has the highest weight – if the connection is down, nothing else matters
  • Layer 2 is second – stream abnormalities mean degraded video quality
  • Layer 3 is the safety net – if connection and stream are normal but picture is frozen, it might be a camera issue

Each layer has independent thresholds and cooldown periods to prevent brief jitter from triggering unnecessary alerts.

Auto-Recovery Engine

When problems are detected, what happens next? auto_remediate.go implements a recovery engine with safety boundaries:

mermaid
flowchart LR
    YK[Health detection anomaly] --> B{Within safety boundary?}
    B -->|Yes| C[Exponential backoff retry]
    B -->|No| D[Stop recovery<br/>Mark as failed]
    C --> E{Recovery successful?}
    E -->|Yes| F[Restore normal<br/>Reset counter]
    E -->|No| G[Increase backoff time]
    G --> B

    classDef detect fill:#E3F2FD,stroke:#1565C0,color:#1565C0
    classDef decision fill:#FFF3E0,stroke:#E65100,color:#BF360C
    classDef success fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20
    classDef failure fill:#FFEBEE,stroke:#C62828,color:#B71C1C
    
    class YK detect
    class B,E decision
    class C,F success
    class D,G failure

Key design decisions:

  1. Safety boundary – Maximum N consecutive recovery attempts before stopping to prevent infinite loops
  2. Exponential backoff – Retry interval grows progressively (1s -> 2s -> 4s -> 8s -> …), won’t crash the device or network
  3. Alert cooldown – No duplicate alerts for the same camera within cooldown period to prevent alert storms

Health History and Visualization

Health events are persisted to storage, and the Web interface provides timeline visualization:

  • Per-camera health event history
  • Anomaly occurrence and recovery timestamps
  • Real-time health indicator on camera cards

Health API endpoints:

  • GET /api/health – System-wide health status
  • GET /api/cameras/{id}/health – Individual camera health details

HLS/LL-HLS Stability Optimization

This release includes extensive HLS playback polish, especially for low-performance devices like Raspberry Pi.

IDR Frame Waiting

Previously, the recorder could start writing a segment at any frame, causing the player to receive a segment that doesn’t start with a keyframe – resulting in black frames. v0.4.0 waits for the next IDR frame before writing a new segment, sacrificing some segment boundary precision in exchange for every segment having a decodable first frame.

Credit-Based Frame Rate Throttling

Introduced a credit-based FPS throttling mechanism. The core idea is to give consumers a “frame budget” – each frame consumed consumes one credit, and the producer only sends new frames when credits are available. This smooths out frame delivery, preventing stuttering on low-performance devices during burst frames.

LL-HLS Parameter Tuning

For Raspberry Pi playback stability, two key parameters were adjusted:

  • backBuffer: 0.5 -> 2.0 seconds – larger playback buffer
  • liveSync: 2 -> 3 seconds – looser live sync distance

These two parameter adjustments improved LL-HLS playback on Raspberry Pi from “frequently stuttering” to “basically smooth.”

Sub-Stream Fallback

When a sub-stream fails, it now automatically falls back to the main stream. Previously, if the sub-stream broke, playback just stopped. Now it tries switching to the main stream – resolution might be lower, but the video doesn’t cut out.

UI Redesign

Camera Page Tab Navigation

The camera page was restructured with tab navigation:

  • Active – Cameras currently recording
  • Archived – Archived cameras with expandable recording list

Previously, active and archived cameras were mixed in one list – now they’re clearly separated.

Settings Page Streaming Section

The Advanced tab in settings now includes a streaming protocol configuration section with detailed settings for WebRTC, HTTP-FLV, RTMP, and SRT.

Health History Page

A new health history page with Chinese/English i18n support. Timeline visualization shows each camera’s health events.

Other Improvements

ARMv7 Support

Added ARMv7 binary builds, covering Raspberry Pi 2/3 and older devices. Docker images now support three architectures:

ArchitectureApplicable Devices
linux/amd64PC, servers
linux/arm64Raspberry Pi 4/5, Mac M series
linux/arm/v7Raspberry Pi 2/3

ONVIF Encoding Auto-Detection

When adding ONVIF cameras, the encoding format (H.264/H.265) is auto-detected without manual selection.

Xiaomi Cloud Sync

New Xiaomi cloud sync endpoint that can synchronize camera metadata (name, model, etc.), saving the trouble of manual entry.

Security Hardening

Continuing the v0.3.0 security hardening tradition, v0.4.0 adds:

  • Rate limiting for health/readyz endpoints
  • Path traversal protection for download/FTP/WebDAV
  • Input validation for camera URLs and ONVIF IPs
  • SQL LIKE injection protection
  • Minimum 8-character initialization password
  • Auth bypass protection when no password is configured

Quality Assurance

1651 tests passing, 60.7% coverage. This release adds a complete test suite for health monitoring (connection detection, scoring engine, auto-recovery), plus HLS playback end-to-end tests.

Summary

From v0.3.1 to v0.4.0: 196 commits, 73 features, 54 fixes, 21 refactors, 9 tests, 16 documentation updates.

Audio recording fills MiBeeNvr’s biggest feature gap. The health monitoring system makes it more reliable in unattended scenarios. HLS optimizations bring a qualitative improvement to playback experience on low-performance devices.

If v0.3.x was about solving “can it work” (multi-protocol, Xiaomi cameras), v0.4.0 starts addressing “is it good to use” – with sound, self-healing, and smooth playback.

Open source:

If you’re looking for a lightweight, audio-capable, self-healing open-source NVR, v0.4.0 is worth a try.