AI & Tools

YOLO Go Deployment Guide

May 26, 2026

Chapter 8: Complete YOLO Tutorial with Golang Go language, with its high performance, low memory footprint, and native concurrency features, has become one of the preferred languages for industrial YOLO deployment. This chapter provides a comprehensive implementation guide for YOLO in the Go ecosystem. Introduction to YOLO-Related Libraries in Go Ecosystem Library Stars Maintenance Status Use Case Recommendation onnxruntime-go ⭐ 1.2k Active ONNX model inference, CPU/GPU acceleration ⭐⭐⭐⭐⭐ gocv ⭐ 5.8k Active OpenCV bindings, image processing + DNN inference ⭐⭐⭐⭐⭐ yolo-go ⭐ 800+ Active Pre-packaged YOLO detection library, out-of-the-box ⭐⭐⭐⭐ go-yolo ⭐ 300+ Maintained Darknet CGO bindings ⭐⭐⭐ gorgonia ⭐ 4.9k Active Pure Go computational graph, custom networks ⭐⭐⭐ Core Feature Comparison:

Continue reading →

YOLO FAQ: Common Problems and Solutions

May 23, 2026

Environment Installation Issues Q1: CUDA not available, only using CPU? First confirm your NVIDIA driver version supports the required CUDA version. A driver that is too old will make CUDA unavailable: bash 1 2 3 4 5 6 # Check driver version (Driver Version must be >= minimum for target CUDA) nvidia-smi # Check CUDA toolkit version nvcc --version # Reinstall PyTorch with matching CUDA version pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 If nvidia-smi shows a CUDA version but PyTorch still uses CPU, you have installed the CPU-only PyTorch build. Uninstall and reinstall with the --index-url flag for the correct CUDA version. For CUDA 11.8, replace cu121 with cu118 in the URL. Always use a conda or venv virtual environment to isolate PyTorch versions and avoid system-level conflicts.

Continue reading →

YOLO Deployment: Model Export and Multi-Platform Deployment

May 20, 2026

Model Export (17 Format Support) Ultralytics Unified Export API python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 from ultralytics import YOLO model = YOLO("yolo26n.pt") # ========== Export Various Formats ========== # 1. ONNX (Cross-platform Universal) model.export(format="onnx", simplify=True, dynamic=True) # 2. TensorRT (Best for NVIDIA GPU) model.export(format="engine", half=True, workspace=4) # 3. OpenVINO (Best for Intel CPU) model.export(format="openvino", half=True) # 4. CoreML (Apple Devices) model.export(format="coreml", int8=True) # 5. TFLite (Android/iOS Mobile) model.export(format="tflite", int8=True) # 6. NCNN (Mobile) model.export(format="ncnn") # 7. PaddlePaddle model.export(format="paddle") Version Export Compatibility Format YOLOv8 YOLO11 YOLO26 ONNX ✅ ✅ ✅ Best TensorRT ✅ ✅ ✅ No NMS, Simpler OpenVINO ✅ ✅ ✅ TFLite ✅ ✅ ✅ NCNN ✅ ✅ ✅ Python Deployment Practice ONNX Runtime Deployment python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import onnxruntime as ort import cv2 import numpy as np # Load ONNX model session = ort.InferenceSession( "yolo26n.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"] ) def preprocess(image, imgsz=640): """Image preprocessing""" img = cv2.resize(image, (imgsz, imgsz)) img = img.transpose(2, 0, 1) / 255.0 return img[np.newaxis].astype(np.float32) # Inference image = cv2.imread("test.jpg") input_data = preprocess(image) outputs = session.run(None, {"images": input_data}) # YOLO26 Special Note: No NMS post-processing needed! # Output is already the final detection results TensorRT Python Deployment python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 import tensorrt as trt import pycuda.driver as cuda import pycuda.autoinit import numpy as np import time # ========== 1. Engine Loading & Context Creation ========== TRT_LOGGER = trt.Logger(trt.Logger.WARNING) runtime = trt.Runtime(TRT_LOGGER) with open("yolo26n.engine", "rb") as f: engine = runtime.deserialize_cuda_engine(f.read()) context = engine.create_execution_context() # ========== 2. CUDA Memory Allocation ========== stream = cuda.Stream() bindings = [] for i in range(engine.num_io_tensors): name = engine.get_tensor_name(i) shape = engine.get_tensor_shape(name) dtype = trt.nptype(engine.get_tensor_dtype(name)) size = trt.volume(shape) host_mem = cuda.pagelocked_empty(size, dtype) # Host pinned memory device_mem = cuda.mem_alloc(host_mem.nbytes) # Device VRAM bindings.append({"name": name, "host": host_mem, "device": device_mem, "shape": shape, "size": size, "dtype": dtype}) # ========== 3. Async Inference Loop ========== def async_infer(input_blob): # H2D copy np.copyto(bindings[0]["host"], input_blob.ravel()) cuda.memcpy_htod_async(bindings[0]["device"], bindings[0]["host"], stream) # Set tensor addresses and execute context.set_tensor_address(bindings[0]["name"], int(bindings[0]["device"])) context.set_tensor_address(bindings[1]["name"], int(bindings[1]["device"])) context.execute_async_v3(stream.handle) # D2H copy cuda.memcpy_dtoh_async(bindings[1]["host"], bindings[1]["device"], stream) stream.synchronize() return bindings[1]["host"].copy() # ========== 4. Performance Benchmark ========== def benchmark(warmup=10, runs=100): dummy = np.random.randn(1, 3, 640, 640).astype(np.float32) for _ in range(warmup): async_infer(dummy) latencies = [] for _ in range(runs): t0 = time.perf_counter() async_infer(dummy) latencies.append((time.perf_counter() - t0) * 1000) latencies.sort() print(f"TensorRT FP16 | Mean: {np.mean(latencies):.1f}ms | " f"P50: {latencies[runs//2]:.1f}ms | " f"P99: {latencies[int(runs*0.99)]:.1f}ms | " f"Throughput: {1000/np.mean(latencies):.0f} FPS") benchmark() OpenVINO Deployment with Benchmarking python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 import openvino as ov import cv2 import numpy as np import time # ========== 1. ONNX → OpenVINO Conversion ========== # Ultralytics unified export: # model.export(format="openvino", half=True) core = ov.Core() model = core.read_model("yolo26n_openvino/yolo26n.xml") # ========== 2. CPU Inference ========== compiled_cpu = core.compile_model(model, device_name="CPU") infer_request = compiled_cpu.create_infer_request() def openvino_infer(image): img = cv2.resize(image, (640, 640)) blob = img.transpose(2, 0, 1)[np.newaxis].astype(np.float32) / 255.0 outputs = infer_request.infer({"images": blob}) return outputs[next(iter(outputs))] # ========== 3. Async Pipeline (Throughput Optimized) ========== def async_pipeline(images, num_requests=4): """Multi-request async inference pipeline""" requests = [core.compile_model(model, "CPU").create_infer_request() for _ in range(num_requests)] results = [None] * len(images) def completion_callback(request, userdata): idx = userdata results[idx] = request.get_output_tensor().data.copy() for req in requests: req.set_callback(completion_callback) for i, img in enumerate(images): req = requests[i % num_requests] req.start_async({"images": preprocess(img)}, userdata=i) for req in requests: req.wait() return results # ========== 4. CPU vs NPU Benchmark Comparison ========== def benchmark_openvino(): dummy = np.random.randn(1, 3, 640, 640).astype(np.float32) for device in ["CPU", "AUTO"]: compiled = core.compile_model(model, device) req = compiled.create_infer_request() # Warmup (avoid first-inference kernel compilation overhead) for _ in range(20): req.infer({"images": dummy}) times = [] for _ in range(200): t0 = time.perf_counter() req.infer({"images": dummy}) times.append((time.perf_counter() - t0) * 1000) times.sort() print(f"OpenVINO {device}: " f"Mean {np.mean(times):.1f}ms | " f"P99 {times[int(199*0.99)]:.1f}ms | " f"{1000/np.mean(times):.0f} FPS") benchmark_openvino() NCNN Mobile Deployment NCNN is Tencent’s open-source mobile inference framework supporting ARM NEON and Vulkan GPU acceleration.

Continue reading →

YOLO Advanced Optimization: Lightweight, Quantization and Accuracy

May 17, 2026

Model Lightweighting Strategies Model Size Selection Model Parameters (M) mAP CPU Inference Use Cases YOLO26n 2.8 38.9 Fastest Edge devices, Embedded YOLO26s 9.4 48.2 Very fast Mobile, Web YOLO26m 21.8 53.1 Medium Server, High performance YOLO11n 2.6 39.6 Fast Lightweight deployment YOLOv8n 3.2 37.3 Baseline General purpose Knowledge Distillation python 1 2 3 4 5 6 7 8 9 10 # Large model as teacher, small model as student teacher = YOLO("yolo26x.pt") student = YOLO("yolo26n.yaml") # Distillation training (Ultralytics built-in support) student.train( data="data.yaml", distill="yolo26x.pt", # Teacher model distill_ratio=0.5, # Distillation loss ratio ) Model Pruning Structured vs Unstructured Pruning Type Method Sparsity Pattern Hardware Acceleration Compression Ratio Unstructured Weight pruning Random sparse Difficult (special HW needed) High Structured Channel pruning Regular sparse Native acceleration Medium Torch Prune Channel Pruning Example python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import torch import torch.nn.utils.prune as prune # L1 unstructured pruning on conv layers model = YOLO("yolo26n.pt") for name, module in model.model.named_modules(): if isinstance(module, torch.nn.Conv2d): prune.l1_unstructured(module, name="weight", amount=0.3) prune.remove(module, "weight") # Make pruning permanent # Channel pruning with torch-pruning library # pip install torch-pruning import torch_pruning as tp model = YOLO("yolo26n.pt").model DG = tp.DependencyGraph() DG.build_dependency(model, example_inputs=torch.randn(1, 3, 640, 640)) # Prune 20% channels by L1 norm pruning_plan = DG.get_pruning_plan( model.model[4], tp.prune_conv, pruning_dim=0, # Output channel dimension idxs=list(range(0, 64, 5)) # Keep every 5th channel ) pruning_plan.exec() Pruning Ratio Guidelines Model Safe Ratio Aggressive Ratio mAP Drop YOLO26n ≤20% 20-40% <1% / 2-5% YOLO26s ≤30% 30-50% <1% / 3-6% YOLO26m ≤40% 40-60% <1% / 3-8% YOLOv8n ≤20% 20-35% <1% / 2-4% Model Pruning and Quantization Export Time Quantization python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 model = YOLO("yolo26n.pt") # INT8 quantization (requires calibration data) model.export( format="engine", # TensorRT int8=True, data="data.yaml", # Calibration dataset batch=8, ) # ONNX dynamic quantization model.export( format="onnx", dynamic=True, simplify=True, ) TensorRT INT8 Calibration Step-by-Step Calibration Dataset Preparation INT8 quantization requires representative calibration data to determine activation value ranges:

Continue reading →

YOLO Model Training: Complete Custom Dataset Workflow

May 14, 2026

Complete Custom Dataset Training Process Ultralytics Unified Training Code python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 from ultralytics import YOLO # Load model # model = YOLO("yolov8n.yaml") # Train from scratch # model = YOLO("yolo11n.pt") # Based on pre-trained weights model = YOLO("yolo26n.pt") # 2026 recommended, edge deployment first choice # Start training results = model.train( # Basic configuration data="data.yaml", # Dataset configuration epochs=100, # Training epochs imgsz=640, # Input size batch=16, # Batch size workers=8, # Data loading threads # Optimizer configuration optimizer="auto", # YOLO26 automatically uses MuSGD lr0=0.01, # Initial learning rate lrf=0.01, # Final learning rate factor momentum=0.937, # SGD momentum weight_decay=0.0005, # Weight decay # Data augmentation mosaic=1.0, mixup=0.1, copy_paste=0.1, # Other configuration device=0, # GPU device, "cpu" for CPU project="runs/train", # Save path name="yolo26_exp1", # Experiment name exist_ok=False, # Whether to overwrite pretrained=True, # Use pre-trained verbose=True, # Detailed logs seed=42, # Random seed ) # Validate model metrics = model.val() print(f"mAP50: {metrics.box.map50:.3f}") print(f"mAP50-95: {metrics.box.map:.3f}") Training Parameter Differences Across Versions Parameter YOLOv8 YOLO11 YOLO26 Default Optimizer SGD SGD MuSGD DFL Loss ✅ ✅ ❌ Removed NMS Post-processing ✅ ✅ ❌ Native no NMS Small Object Optimization Average Better Best (STAL) CPU Inference Speed Baseline +25% +43% Loss Function Breakdown YOLO’s loss function consists of three components, each targeting a different learning objective:

Continue reading →

YOLO Dataset Preparation: Annotation Tools and Format Conversion

May 11, 2026

Data Annotation Tools Usage LabelImg Installation and Usage bash 1 2 3 4 5 # Installation pip install labelImg # Launch labelImg Annotation Process: Open Dir → Select image folder Change Save Dir → Select annotation save folder Select YOLO format Create RectBox → Draw bounding box → Enter class name Save LabelMe Installation and Usage bash 1 2 pip install labelme labelme CVAT Self-Hosted Annotation Platform CVAT (Computer Vision Annotation Tool) is an open-source annotation platform by Intel, supporting Docker self-hosted deployment for team collaboration and large-scale annotation projects.

Continue reading →

YOLO Quick Start: Model Loading and Inference

May 8, 2026

Model Loading and Inference Across Versions Ultralytics Unified API (Works with v8/11/26) python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from ultralytics import YOLO # ========== YOLOv8 ========== model_v8 = YOLO("yolov8n.pt") # nano model_v8 = YOLO("yolov8s.pt") # small model_v8 = YOLO("yolov8m.pt") # medium model_v8 = YOLO("yolov8l.pt") # large model_v8 = YOLO("yolov8x.pt") # extra large # ========== YOLO11 ========== model_11 = YOLO("yolo11n.pt") # nano model_11 = YOLO("yolo11s.pt") # small model_11 = YOLO("yolo11m.pt") # medium model_11 = YOLO("yolo11l.pt") # large model_11 = YOLO("yolo11x.pt") # extra large # ========== YOLO26 (2026 latest) ========== model_26 = YOLO("yolo26n.pt") # nano recommended for edge deployment model_26 = YOLO("yolo26s.pt") # small model_26 = YOLO("yolo26m.pt") # medium model_26 = YOLO("yolo26l.pt") # large model_26 = YOLO("yolo26x.pt") # extra large Image Detection Hands-on Example python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from ultralytics import YOLO # Load model (YOLO26 example) model = YOLO("yolo26n.pt") # Single image detection results = model("test.jpg", conf=0.25, iou=0.45) # Process results for result in results: boxes = result.boxes # Detection boxes masks = result.masks # Segmentation masks probs = result.probs # Classification probabilities # Print detection results for box in boxes: print(f"Class: {result.names[int(box.cls)]}, " f"Confidence: {box.conf.item():.3f}, " f"Coordinates: {box.xyxy.tolist()[0]}") # Save visualization results result.save("result.jpg") Video Detection Hands-on Example python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from ultralytics import YOLO model = YOLO("yolo26n.pt") # Video file detection results = model.predict( source="input.mp4", save=True, # Save result video conf=0.3, show=False, # Whether to display in real-time stream=True # Stream processing to save memory ) # Process frame by frame for result in results: # Custom post-processing logic pass Real-time Camera Detection python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 from ultralytics import YOLO import cv2 model = YOLO("yolo26n.pt") # Open camera cap = cv2.VideoCapture(0) # 0 is default camera while cap.isOpened(): ret, frame = cap.read() if not ret: break # Inference results = model(frame, verbose=False) # Draw results annotated_frame = results[0].plot() # Display cv2.imshow("YOLO Real-time", annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() Version-specific Code Differences Feature YOLOv8 YOLO11 YOLO26 YOLOv9 YOLOv10 Unified API ✅ ✅ ✅ ❌ Separate repo ❌ Separate repo No NMS ❌ ❌ ✅ ❌ ✅ DFL Module ✅ ✅ ❌ Removed ✅ ✅ MuSGD Optimizer ❌ ❌ ✅ ❌ ❌ Export Compatibility Good Good Best Fair Fair Results Object API Deep Dive The model() or model.predict() call returns a list of Results objects. Each Results object encapsulates all inference outputs for a single image. Understanding its internal structure is essential for downstream processing.

Continue reading →

Evolution: Oh My OpenAgent Configuration Iteration Log

May 7, 2026

The previous article covered the initial configuration setup. This one documents the adjustments after two weeks of running: expanding from single vendor to a four-tier model pool, adding fallback chains, hitting the GLM-4.5-air trap of analyzing without writing code. This post covers: fallback strategy design, complete free model pool inventory and analysis, concurrency control configuration, and the decision process for GLM-4.5-air replacement. After the previous article’s initial configuration, I ran it for two weeks — all the issues that needed fixing surfaced.

Continue reading →

YOLO Getting Started: History, Version Comparison and Environment Setup

May 5, 2026

Learning Path and Version Selection Guide Version Selection Guide Version Release Date Development Team Use Cases Recommendation Index YOLO26 2026.01 Ultralytics Official Edge deployment, CPU inference, industrial applications ⭐⭐⭐⭐⭐ YOLOv8 2023.01 Ultralytics Official Beginner learning, complete ecosystem, general scenarios ⭐⭐⭐⭐⭐ YOLO11 2024.09 Ultralytics Official Efficiency optimization, lightweight deployment ⭐⭐⭐⭐ YOLOv10 2024.05 Tsinghua University Research exploration, NMS-free end-to-end ⭐⭐⭐⭐ YOLOv9 2024.01 National Taiwan University High precision, small object detection ⭐⭐⭐⭐ YOLOv12 2025.02 Buffalo University + Chinese Academy of Sciences Attention mechanism research ⭐⭐⭐ Learning Path Recommendations Beginner Stage (1-2 weeks): Start with YOLOv8, master basic concepts and API usage Intermediate Stage (2-3 weeks): Learn custom dataset training, parameter tuning and optimization Advanced Stage (2-3 weeks): Learn model deployment, engineering implementation Research Stage (ongoing): Explore new features in YOLO11, YOLO26, YOLOv9/v10/v12 Complete YOLO Development History Timeline Version Release Date Core Innovation Milestone Significance YOLOv1 2015.06 Pioneer single-stage detection Foundation for real-time detection YOLOv2 2016.12 Batch Normalization, Anchor Dual improvement in accuracy and speed YOLOv3 2018.04 Multi-scale detection, residual networks Industry standard YOLOv4 2020.04 CSPDarknet, Mosaic Peak of engineering implementation YOLOv5 2020.06 PyTorch framework, user-friendly Highest adoption rate YOLOv7 2022.07 E-ELAN, reparameterization Balance between speed and accuracy YOLOv8 2023.01 C2f, Anchor-Free, unified framework Ultralytics unified ecosystem YOLOv9 2024.01 GELAN, PGI programmable gradient Training efficiency revolution YOLOv10 2024.05 NMS-free, efficiency-precision tradeoff End-to-end detection YOLO11 2024.09 Architecture optimization, parameter reduction Efficiency optimized version YOLOv12 2025.02 Area Attention mechanism Attention architecture YOLO26 2026.01 DFL-free, NMS-free, 43% CPU optimization Edge computing new standard Core Principles and Version Comparison Ultralytics Official Main Line Versions YOLOv8 Core Features:

Continue reading →

Zhipu Coding Plan × Oh My OpenCode: Multi-Model Orchestration Setup Guide

April 5, 2026

Why Bother When it comes to writing code with AI, the gap between single-model and multi-model approaches keeps widening. No matter how strong a single model is, it can’t compete with a team of specialized models working in parallel. Oh My OpenCode (OmO for short) is a multi-model orchestration plugin in the OpenCode ecosystem, with 11 Agents each having distinct responsibilities and 48 Hooks spanning the entire lifecycle. Zhipu’s Coding Plan provides access to the full GLM model series. Combining the two allows you to assign different models by role — strong coders for coding, strong reasoners for reasoning, free models for busywork.

Continue reading →