MiBeeNvr v0.4.0: Audio Recording Pipeline and Multi-Layer Health Monitoring Architecture
After v0.3.1 shipped, I put in another 196 commits. v0.4.0 is a feature-dense release: audio recording pipeline, multi-layer health monitoring engine, HLS/LL-HLS playback stability optimization, and a major UI redesign. For the full changelog, see GitHub Release Notes.
The previous post covered v0.3.x’s multi-protocol streaming and Xiaomi camera support (v0.3.0 Tech Post). If you haven’t read the first post, start with MiBeeNvr Introduction.
Audio Recording: From Silent to Sound
In the v0.3.x era, recorded MP4 files only had a video track. v0.4.0 introduces a complete audio capture and muxing pipeline, supporting AAC audio from RTSP cameras and G.711 audio from ONVIF/Xiaomi cameras.
Audio Pipeline Architecture
The core challenge of audio processing is that different protocol cameras use different audio codecs, while the final MP4 container needs unified muxing.
flowchart LR
subgraph "RTSP Camera"
BH[RTSP Source]
BM[AAC Audio]
end
subgraph "ONVIF / Xiaomi"
SQ[ONVIF/Xiaomi]
JR[G.711 mu-law]
VN[G.711 A-law]
end
subgraph "MiBeeNvr Audio Pipeline"
JQ[StreamHub Audio Broadcast]
SH[MP4 Muxer]
end
subgraph "Output"
YZ[MP4 Segment<br/>Video + Audio]
end
BH --> BM --> JQ
SQ --> JR --> JQ
SQ --> VN --> JQ
JQ --> SH
SH --> YZ
classDef input fill:#E3F2FD,stroke:#1565C0,color:#1565C0
classDef process fill:#FFF3E0,stroke:#E65100,color:#BF360C
classDef output fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20
class BH,BM,SQ,JR,VN input
class JQ,SH process
class YZ outputPer-Camera Audio Toggle
Audio recording is disabled by default, enabled per-camera via audio_enabled:
| |
G.711 Audio Muxing
G.711 (mu-law and A-law) is a common audio codec for IP cameras and Xiaomi cameras. The MP4 container doesn’t natively support G.711 directly, so v0.4.0 uses custom box types (ulaw / alaw) to encapsulate G.711 audio frames:
- AAC audio -> Standard
mp4abox - G.711 mu-law ->
ulawbox - G.711 A-law ->
alawbox
The advantage of this approach is zero CPU overhead – raw G.711 frames are packed directly into the MP4 without transcoding. The downside is that some generic media players may not recognize the custom boxes, but MiBeeNvr’s Web player handles them correctly.
StreamHub Audio Broadcast
The StreamHub introduced in v0.3.1 is extended with audio support in v0.4.0. The recorder now broadcasts not only video frames but also audio frames. All real-time consumers (WebRTC, HTTP-FLV, etc.) receive audio data, so “recordings have sound, and live preview has sound too.”
Audio Retention During Merging
The segment merge pipeline has also been adapted to ensure audio tracks are fully retained after merging. Previously, merging only processed video tracks; now it checks and preserves audio track information as well.
Multi-Layer Health Monitoring: Beyond “Connected or Not”
Health monitoring is v0.4.0’s most complex subsystem. internal/health/ contains 17 source files covering the full chain from connection detection to auto-recovery.
Three-Layer Detection Model
flowchart LR
L1["Layer 1 Connection Layer<br/>RTSP / ONVIF / CS2 Heartbeat"] --> L2["Layer 2 Stream Layer<br/>FPS / Bitrate / Keyframe Interval"]
L2 --> L3["Layer 3 Picture Layer<br/>Freeze Detection / Black Screen Detection"]
L3 --> SCORE["Health Score Engine"]
classDef layer fill:#FFF3E0,stroke:#E65100,color:#BF360C
classDef hub fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20
class L1,L2,L3 layer
class SCORE hubThree layers progress from shallow to deep: the connection layer checks RTSP session, ONVIF response, CS2 P2P heartbeat; the stream layer counts FPS, bitrate, and keyframe intervals; the picture layer performs freeze and black screen detection. Each layer runs independently with its own thresholds. Detection results are aggregated by the health score engine into a comprehensive score.
Health Score Engine
The scoring engine maintains a real-time health score for each camera, combining results from all three layers:
- Layer 1 has the highest weight – if the connection is down, nothing else matters
- Layer 2 is second – stream abnormalities mean degraded video quality
- Layer 3 is the safety net – if connection and stream are normal but picture is frozen, it might be a camera issue
Each layer has independent thresholds and cooldown periods to prevent brief jitter from triggering unnecessary alerts.
Auto-Recovery Engine
When problems are detected, what happens next? auto_remediate.go implements a recovery engine with safety boundaries:
flowchart LR
YK[Health detection anomaly] --> B{Within safety boundary?}
B -->|Yes| C[Exponential backoff retry]
B -->|No| D[Stop recovery<br/>Mark as failed]
C --> E{Recovery successful?}
E -->|Yes| F[Restore normal<br/>Reset counter]
E -->|No| G[Increase backoff time]
G --> B
classDef detect fill:#E3F2FD,stroke:#1565C0,color:#1565C0
classDef decision fill:#FFF3E0,stroke:#E65100,color:#BF360C
classDef success fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20
classDef failure fill:#FFEBEE,stroke:#C62828,color:#B71C1C
class YK detect
class B,E decision
class C,F success
class D,G failureKey design decisions:
- Safety boundary – Maximum N consecutive recovery attempts before stopping to prevent infinite loops
- Exponential backoff – Retry interval grows progressively (1s -> 2s -> 4s -> 8s -> …), won’t crash the device or network
- Alert cooldown – No duplicate alerts for the same camera within cooldown period to prevent alert storms
Health History and Visualization
Health events are persisted to storage, and the Web interface provides timeline visualization:
- Per-camera health event history
- Anomaly occurrence and recovery timestamps
- Real-time health indicator on camera cards
Health API endpoints:
GET /api/health– System-wide health statusGET /api/cameras/{id}/health– Individual camera health details
HLS/LL-HLS Stability Optimization
This release includes extensive HLS playback polish, especially for low-performance devices like Raspberry Pi.
IDR Frame Waiting
Previously, the recorder could start writing a segment at any frame, causing the player to receive a segment that doesn’t start with a keyframe – resulting in black frames. v0.4.0 waits for the next IDR frame before writing a new segment, sacrificing some segment boundary precision in exchange for every segment having a decodable first frame.
Credit-Based Frame Rate Throttling
Introduced a credit-based FPS throttling mechanism. The core idea is to give consumers a “frame budget” – each frame consumed consumes one credit, and the producer only sends new frames when credits are available. This smooths out frame delivery, preventing stuttering on low-performance devices during burst frames.
LL-HLS Parameter Tuning
For Raspberry Pi playback stability, two key parameters were adjusted:
backBuffer: 0.5 -> 2.0 seconds – larger playback bufferliveSync: 2 -> 3 seconds – looser live sync distance
These two parameter adjustments improved LL-HLS playback on Raspberry Pi from “frequently stuttering” to “basically smooth.”
Sub-Stream Fallback
When a sub-stream fails, it now automatically falls back to the main stream. Previously, if the sub-stream broke, playback just stopped. Now it tries switching to the main stream – resolution might be lower, but the video doesn’t cut out.
UI Redesign
Camera Page Tab Navigation
The camera page was restructured with tab navigation:
- Active – Cameras currently recording
- Archived – Archived cameras with expandable recording list
Previously, active and archived cameras were mixed in one list – now they’re clearly separated.
Settings Page Streaming Section
The Advanced tab in settings now includes a streaming protocol configuration section with detailed settings for WebRTC, HTTP-FLV, RTMP, and SRT.
Health History Page
A new health history page with Chinese/English i18n support. Timeline visualization shows each camera’s health events.
Other Improvements
ARMv7 Support
Added ARMv7 binary builds, covering Raspberry Pi 2/3 and older devices. Docker images now support three architectures:
| Architecture | Applicable Devices |
|---|---|
| linux/amd64 | PC, servers |
| linux/arm64 | Raspberry Pi 4/5, Mac M series |
| linux/arm/v7 | Raspberry Pi 2/3 |
ONVIF Encoding Auto-Detection
When adding ONVIF cameras, the encoding format (H.264/H.265) is auto-detected without manual selection.
Xiaomi Cloud Sync
New Xiaomi cloud sync endpoint that can synchronize camera metadata (name, model, etc.), saving the trouble of manual entry.
Security Hardening
Continuing the v0.3.0 security hardening tradition, v0.4.0 adds:
- Rate limiting for health/readyz endpoints
- Path traversal protection for download/FTP/WebDAV
- Input validation for camera URLs and ONVIF IPs
- SQL LIKE injection protection
- Minimum 8-character initialization password
- Auth bypass protection when no password is configured
Quality Assurance
1651 tests passing, 60.7% coverage. This release adds a complete test suite for health monitoring (connection detection, scoring engine, auto-recovery), plus HLS playback end-to-end tests.
Summary
From v0.3.1 to v0.4.0: 196 commits, 73 features, 54 fixes, 21 refactors, 9 tests, 16 documentation updates.
Audio recording fills MiBeeNvr’s biggest feature gap. The health monitoring system makes it more reliable in unattended scenarios. HLS optimizations bring a qualitative improvement to playback experience on low-performance devices.
If v0.3.x was about solving “can it work” (multi-protocol, Xiaomi cameras), v0.4.0 starts addressing “is it good to use” – with sound, self-healing, and smooth playback.
Open source:
If you’re looking for a lightweight, audio-capable, self-healing open-source NVR, v0.4.0 is worth a try.