YOLO Go Deployment Guide

14 min read
Chapter 8: Complete YOLO Tutorial with Golang Go language, with its high performance, low memory footprint, and native concurrency features, has become one of the preferred languages for industrial YOLO deployment. This chapter provides a comprehensive implementation guide for YOLO in the Go ecosystem. Introduction to YOLO-Related Libraries in Go Ecosystem Library Stars Maintenance Status Use Case Recommendation onnxruntime-go ⭐ 1.2k Active ONNX model inference, CPU/GPU acceleration ⭐⭐⭐⭐⭐ gocv ⭐ 5.8k Active OpenCV bindings, image processing + DNN inference ⭐⭐⭐⭐⭐ yolo-go ⭐ 800+ Active Pre-packaged YOLO detection library, out-of-the-box ⭐⭐⭐⭐ go-yolo ⭐ 300+ Maintained Darknet CGO bindings ⭐⭐⭐ gorgonia ⭐ 4.9k Active Pure Go computational graph, custom networks ⭐⭐⭐ Core Feature Comparison:
YOLO Golang ONNX Runtime Model Deployment Gocv
Continue reading →

MiBeeHive: The "Hive" Toolbox I Built for My Studio

8 min read
Coming from an operations background, later transitioning to development, the number of projects I maintain keeps growing. Various middleware, databases, monitoring components… each version upgrade is a manual labor: go to the official site to find the download link, compare version numbers, manually download to the internal network, then distribute to each machine. I used to write a bunch of Shell scripts to periodically pull the latest versions to the LAN — functional but not user-friendly: scripts scattered everywhere, adding new software required writing parsing logic by hand, and there was nothing to check when things went wrong.
Go Devops PXE WebDAV Docker ARM64 Self-Hosted Open Source
Continue reading →

MiBeeNvr v0.4.0: Audio Recording Finally Arrives, Auto-Recovery When Cameras Go Down

5 min read
Previously, MiBeeNvr’s MP4 files only had a video track — playback was silent. v0.4.0 fills this gap with audio recording. It also adds more practical camera health monitoring and auto-recovery. Recordings Now Have Sound Each camera can independently enable audio recording: yaml 1 2 3 4 5 6 cameras: - id: "front-door" name: "Front Door Camera" protocol: "rtsp" encoding: "h264" audio_enabled: true Supported audio formats:
NVR Go Smart Home Audio Recording Health Monitoring Open Source Raspberry Pi
Continue reading →

MiBeeNvr v0.4.0: Audio Recording Pipeline and Multi-Layer Health Monitoring Architecture

7 min read
After v0.3.1 shipped, I put in another 196 commits. v0.4.0 is a feature-dense release: audio recording pipeline, multi-layer health monitoring engine, HLS/LL-HLS playback stability optimization, and a major UI redesign. For the full changelog, see GitHub Release Notes. The previous post covered v0.3.x’s multi-protocol streaming and Xiaomi camera support (v0.3.0 Tech Post). If you haven’t read the first post, start with MiBeeNvr Introduction. Audio Recording: From Silent to Sound In the v0.3.x era, recorded MP4 files only had a video track. v0.4.0 introduces a complete audio capture and muxing pipeline, supporting AAC audio from RTSP cameras and G.711 audio from ONVIF/Xiaomi cameras.
NVR Go Audio Recording AAC G.711 Health Monitoring Auto Recovery Raspberry Pi HLS WebRTC
Continue reading →

YOLO FAQ: Common Problems and Solutions

13 min read
Environment Installation Issues Q1: CUDA not available, only using CPU? First confirm your NVIDIA driver version supports the required CUDA version. A driver that is too old will make CUDA unavailable: bash 1 2 3 4 5 6 # Check driver version (Driver Version must be >= minimum for target CUDA) nvidia-smi # Check CUDA toolkit version nvcc --version # Reinstall PyTorch with matching CUDA version pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 If nvidia-smi shows a CUDA version but PyTorch still uses CPU, you have installed the CPU-only PyTorch build. Uninstall and reinstall with the --index-url flag for the correct CUDA version. For CUDA 11.8, replace cu121 with cu118 in the URL. Always use a conda or venv virtual environment to isolate PyTorch versions and avoid system-level conflicts.
YOLO FAQ Troubleshooting Best Practices
Continue reading →

MiBeeNvr v0.3.1: Multi-Protocol Streaming and Native Xiaomi Camera Support

9 min read
#XQ|A lot of work went into the releases after v0.2.0. v0.3.x brings several major updates: native Xiaomi camera support, recording archiving, multi-protocol streaming architecture (WebRTC/HTTP-FLV/RTMP/SRT/LL-HLS), and a wave of security hardening. The architectural evolution from external dependencies to built-in implementation, from single protocol to full protocol support, was much more complex than I expected. The previous post introduced v0.2.0’s 15 new features (v0.2.0 Update). If you haven’t read the first post, start with MiBeeNvr Introduction. v0.3.0 focuses on deep Xiaomi camera integration, and v0.3.1 builds on that with a complete multi-protocol streaming architecture. For the full changelog, see GitHub Release Notes.
NVR Go Xiaomi Camera Smart Home RTSP CS2 WebRTC HTTP-FLV RTMP SRT
Continue reading →

From Compliance to Real-Time Defense: The Evolution of security-collector-exporter

6 min read
The Origin: Compliance Check Hassles Anyone in operations knows there’s no escaping one hurdle for domestic servers: Cybersecurity Level Protection (GB/T 22239-2019, commonly known as “Level Protection 2.0”). Whether you’re Level 3 or Level 2, auditors come asking about these things: Is SSH root login disabled? Are password policies compliant? Is the firewall on? Is SELinux enforcing? Are there expired accounts? What’s the password validity period? Which ports are open? Are there high-risk services running? Are audit logs enabled? How long are they retained? There are plenty of compliance check tools on the market—search GitHub and you’ll find a bunch: Golin, EvaluationTools, Linux-Security-Compliance-Check, etc. But they all share one limitation: Run once, get a report, done. You check compliance today, and someone changes sshd_config tomorrow, turns off the firewall, installs a backdoor service—you’d never know.
Prometheus EBPF Linux Security Monitoring Compliance Go Exporter
Continue reading →

YOLO Deployment: Model Export and Multi-Platform Deployment

12 min read
Model Export (17 Format Support) Ultralytics Unified Export API python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 from ultralytics import YOLO model = YOLO("yolo26n.pt") # ========== Export Various Formats ========== # 1. ONNX (Cross-platform Universal) model.export(format="onnx", simplify=True, dynamic=True) # 2. TensorRT (Best for NVIDIA GPU) model.export(format="engine", half=True, workspace=4) # 3. OpenVINO (Best for Intel CPU) model.export(format="openvino", half=True) # 4. CoreML (Apple Devices) model.export(format="coreml", int8=True) # 5. TFLite (Android/iOS Mobile) model.export(format="tflite", int8=True) # 6. NCNN (Mobile) model.export(format="ncnn") # 7. PaddlePaddle model.export(format="paddle") Version Export Compatibility Format YOLOv8 YOLO11 YOLO26 ONNX ✅ ✅ ✅ Best TensorRT ✅ ✅ ✅ No NMS, Simpler OpenVINO ✅ ✅ ✅ TFLite ✅ ✅ ✅ NCNN ✅ ✅ ✅ Python Deployment Practice ONNX Runtime Deployment python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import onnxruntime as ort import cv2 import numpy as np # Load ONNX model session = ort.InferenceSession( "yolo26n.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"] ) def preprocess(image, imgsz=640): """Image preprocessing""" img = cv2.resize(image, (imgsz, imgsz)) img = img.transpose(2, 0, 1) / 255.0 return img[np.newaxis].astype(np.float32) # Inference image = cv2.imread("test.jpg") input_data = preprocess(image) outputs = session.run(None, {"images": input_data}) # YOLO26 Special Note: No NMS post-processing needed! # Output is already the final detection results TensorRT Python Deployment python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 import tensorrt as trt import pycuda.driver as cuda import pycuda.autoinit import numpy as np import time # ========== 1. Engine Loading & Context Creation ========== TRT_LOGGER = trt.Logger(trt.Logger.WARNING) runtime = trt.Runtime(TRT_LOGGER) with open("yolo26n.engine", "rb") as f: engine = runtime.deserialize_cuda_engine(f.read()) context = engine.create_execution_context() # ========== 2. CUDA Memory Allocation ========== stream = cuda.Stream() bindings = [] for i in range(engine.num_io_tensors): name = engine.get_tensor_name(i) shape = engine.get_tensor_shape(name) dtype = trt.nptype(engine.get_tensor_dtype(name)) size = trt.volume(shape) host_mem = cuda.pagelocked_empty(size, dtype) # Host pinned memory device_mem = cuda.mem_alloc(host_mem.nbytes) # Device VRAM bindings.append({"name": name, "host": host_mem, "device": device_mem, "shape": shape, "size": size, "dtype": dtype}) # ========== 3. Async Inference Loop ========== def async_infer(input_blob): # H2D copy np.copyto(bindings[0]["host"], input_blob.ravel()) cuda.memcpy_htod_async(bindings[0]["device"], bindings[0]["host"], stream) # Set tensor addresses and execute context.set_tensor_address(bindings[0]["name"], int(bindings[0]["device"])) context.set_tensor_address(bindings[1]["name"], int(bindings[1]["device"])) context.execute_async_v3(stream.handle) # D2H copy cuda.memcpy_dtoh_async(bindings[1]["host"], bindings[1]["device"], stream) stream.synchronize() return bindings[1]["host"].copy() # ========== 4. Performance Benchmark ========== def benchmark(warmup=10, runs=100): dummy = np.random.randn(1, 3, 640, 640).astype(np.float32) for _ in range(warmup): async_infer(dummy) latencies = [] for _ in range(runs): t0 = time.perf_counter() async_infer(dummy) latencies.append((time.perf_counter() - t0) * 1000) latencies.sort() print(f"TensorRT FP16 | Mean: {np.mean(latencies):.1f}ms | " f"P50: {latencies[runs//2]:.1f}ms | " f"P99: {latencies[int(runs*0.99)]:.1f}ms | " f"Throughput: {1000/np.mean(latencies):.0f} FPS") benchmark() OpenVINO Deployment with Benchmarking python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 import openvino as ov import cv2 import numpy as np import time # ========== 1. ONNX → OpenVINO Conversion ========== # Ultralytics unified export: # model.export(format="openvino", half=True) core = ov.Core() model = core.read_model("yolo26n_openvino/yolo26n.xml") # ========== 2. CPU Inference ========== compiled_cpu = core.compile_model(model, device_name="CPU") infer_request = compiled_cpu.create_infer_request() def openvino_infer(image): img = cv2.resize(image, (640, 640)) blob = img.transpose(2, 0, 1)[np.newaxis].astype(np.float32) / 255.0 outputs = infer_request.infer({"images": blob}) return outputs[next(iter(outputs))] # ========== 3. Async Pipeline (Throughput Optimized) ========== def async_pipeline(images, num_requests=4): """Multi-request async inference pipeline""" requests = [core.compile_model(model, "CPU").create_infer_request() for _ in range(num_requests)] results = [None] * len(images) def completion_callback(request, userdata): idx = userdata results[idx] = request.get_output_tensor().data.copy() for req in requests: req.set_callback(completion_callback) for i, img in enumerate(images): req = requests[i % num_requests] req.start_async({"images": preprocess(img)}, userdata=i) for req in requests: req.wait() return results # ========== 4. CPU vs NPU Benchmark Comparison ========== def benchmark_openvino(): dummy = np.random.randn(1, 3, 640, 640).astype(np.float32) for device in ["CPU", "AUTO"]: compiled = core.compile_model(model, device) req = compiled.create_infer_request() # Warmup (avoid first-inference kernel compilation overhead) for _ in range(20): req.infer({"images": dummy}) times = [] for _ in range(200): t0 = time.perf_counter() req.infer({"images": dummy}) times.append((time.perf_counter() - t0) * 1000) times.sort() print(f"OpenVINO {device}: " f"Mean {np.mean(times):.1f}ms | " f"P99 {times[int(199*0.99)]:.1f}ms | " f"{1000/np.mean(times):.0f} FPS") benchmark_openvino() NCNN Mobile Deployment NCNN is Tencent’s open-source mobile inference framework supporting ARM NEON and Vulkan GPU acceleration.
YOLO Model Deployment ONNX TensorRT Edge Computing
Continue reading →

MiBeeNvr v0.3.0: One-Click Xiaomi Camera Integration, Recordings Never Lost

3 min read
Got Xiaomi cameras at home? Want to keep your recordings on your own storage instead of relying on the cloud? As someone with several Xiaomi cameras at home, I always had one frustration: every time I wanted to check the footage from my doorbell camera, I had to log into Xiaomi Cloud, wait for ages while it loaded, and it would often just spin. Plus, cloud storage charges by the day — it adds up over the month. And if you swap cameras, all your old recordings are gone. Pretty frustrating.
NVR Xiaomi Camera Smart Home Recording Open Source
Continue reading →

security-collector-exporter v0.3.0: Real-Time Security Monitoring with eBPF

8 min read
From Static to Real-Time The previous article introduced security-collector-exporter v0.1.0 — turning Linux security configuration states into Prometheus metrics. But v0.1.0 is essentially “snapshot-based”: periodically reading /etc, /proc, capturing the static configuration at a single point in time. There’s an area of security operations that snapshots can’t cover: real-time security events. Someone running a reverse shell, a process escalating privileges, an abnormal network connection, someone loading a kernel module — these events happen and pass; you’d never see them at your next scrape.
Prometheus EBPF Linux Security Monitoring Go Exporter
Continue reading →