Mi&Bee Blog - YOLO From Zero to Mastery

YOLO Getting Started: History, Version Comparison and Environment Setup

May 5, 2026 · 7 min read

Learning Path and Version Selection Guide

Version Selection Guide

Version	Release Date	Development Team	Use Cases	Recommendation Index
YOLO26	2026.01	Ultralytics Official	Edge deployment, CPU inference, industrial applications	⭐⭐⭐⭐⭐
YOLOv8	2023.01	Ultralytics Official	Beginner learning, complete ecosystem, general scenarios	⭐⭐⭐⭐⭐
YOLO11	2024.09	Ultralytics Official	Efficiency optimization, lightweight deployment	⭐⭐⭐⭐
YOLOv10	2024.05	Tsinghua University	Research exploration, NMS-free end-to-end	⭐⭐⭐⭐
YOLOv9	2024.01	National Taiwan University	High precision, small object detection	⭐⭐⭐⭐
YOLOv12	2025.02	Buffalo University + Chinese Academy of Sciences	Attention mechanism research	⭐⭐⭐

Learning Path Recommendations

Beginner Stage (1-2 weeks): Start with YOLOv8, master basic concepts and API usage
Intermediate Stage (2-3 weeks): Learn custom dataset training, parameter tuning and optimization
Advanced Stage (2-3 weeks): Learn model deployment, engineering implementation
Research Stage (ongoing): Explore new features in YOLO11, YOLO26, YOLOv9/v10/v12

Complete YOLO Development History Timeline

Version	Release Date	Core Innovation	Milestone Significance
YOLOv1	2015.06	Pioneer single-stage detection	Foundation for real-time detection
YOLOv2	2016.12	Batch Normalization, Anchor	Dual improvement in accuracy and speed
YOLOv3	2018.04	Multi-scale detection, residual networks	Industry standard
YOLOv4	2020.04	CSPDarknet, Mosaic	Peak of engineering implementation
YOLOv5	2020.06	PyTorch framework, user-friendly	Highest adoption rate
YOLOv7	2022.07	E-ELAN, reparameterization	Balance between speed and accuracy
YOLOv8	2023.01	C2f, Anchor-Free, unified framework	Ultralytics unified ecosystem
YOLOv9	2024.01	GELAN, PGI programmable gradient	Training efficiency revolution
YOLOv10	2024.05	NMS-free, efficiency-precision tradeoff	End-to-end detection
YOLO11	2024.09	Architecture optimization, parameter reduction	Efficiency optimized version
YOLOv12	2025.02	Area Attention mechanism	Attention architecture
YOLO26	2026.01	DFL-free, NMS-free, 43% CPU optimization	Edge computing new standard

Core Principles and Version Comparison

Ultralytics Official Main Line Versions

YOLOv8 Core Features:

YOLO Quick Start: Model Loading and Inference

May 8, 2026 · 10 min read

Model Loading and Inference Across Versions

Ultralytics Unified API (Works with v8/11/26)

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from ultralytics import YOLO

# ========== YOLOv8 ==========
model_v8 = YOLO("yolov8n.pt")      #  nano
model_v8 = YOLO("yolov8s.pt")      #  small
model_v8 = YOLO("yolov8m.pt")      #  medium
model_v8 = YOLO("yolov8l.pt")      #  large
model_v8 = YOLO("yolov8x.pt")      #  extra large

# ========== YOLO11 ==========
model_11 = YOLO("yolo11n.pt")      #  nano
model_11 = YOLO("yolo11s.pt")      #  small
model_11 = YOLO("yolo11m.pt")      #  medium
model_11 = YOLO("yolo11l.pt")      #  large
model_11 = YOLO("yolo11x.pt")      #  extra large

# ========== YOLO26 (2026 latest) ==========
model_26 = YOLO("yolo26n.pt")      #  nano  recommended for edge deployment
model_26 = YOLO("yolo26s.pt")      #  small
model_26 = YOLO("yolo26m.pt")      #  medium
model_26 = YOLO("yolo26l.pt")      #  large
model_26 = YOLO("yolo26x.pt")      #  extra large

Image Detection Hands-on Example

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from ultralytics import YOLO

# Load model (YOLO26 example)
model = YOLO("yolo26n.pt")

# Single image detection
results = model("test.jpg", conf=0.25, iou=0.45)

# Process results
for result in results:
    boxes = result.boxes          # Detection boxes
    masks = result.masks          # Segmentation masks
    probs = result.probs          # Classification probabilities
    
    # Print detection results
    for box in boxes:
        print(f"Class: {result.names[int(box.cls)]}, "
              f"Confidence: {box.conf.item():.3f}, "
              f"Coordinates: {box.xyxy.tolist()[0]}")
    
    # Save visualization results
    result.save("result.jpg")

Video Detection Hands-on Example

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from ultralytics import YOLO

model = YOLO("yolo26n.pt")

# Video file detection
results = model.predict(
    source="input.mp4",
    save=True,           # Save result video
    conf=0.3,
    show=False,          # Whether to display in real-time
    stream=True          # Stream processing to save memory
)

# Process frame by frame
for result in results:
    # Custom post-processing logic
    pass

Real-time Camera Detection

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from ultralytics import YOLO
import cv2

model = YOLO("yolo26n.pt")

# Open camera
cap = cv2.VideoCapture(0)  # 0 is default camera

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Inference
    results = model(frame, verbose=False)
    
    # Draw results
    annotated_frame = results[0].plot()
    
    # Display
    cv2.imshow("YOLO Real-time", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Version-specific Code Differences

Feature	YOLOv8	YOLO11	YOLO26	YOLOv9	YOLOv10
Unified API	✅	✅	✅	❌ Separate repo	❌ Separate repo
No NMS	❌	❌	✅	❌	✅
DFL Module	✅	✅	❌ Removed	✅	✅
MuSGD Optimizer	❌	❌	✅	❌	❌
Export Compatibility	Good	Good	Best	Fair	Fair

Results Object API Deep Dive

The model() or model.predict() call returns a list of Results objects. Each Results object encapsulates all inference outputs for a single image. Understanding its internal structure is essential for downstream processing.

YOLO Dataset Preparation: Annotation Tools and Format Conversion

May 11, 2026 · 15 min read

Data Annotation Tools Usage

LabelImg Installation and Usage

bash
1
2
3
4
5
# Installation
pip install labelImg

# Launch
labelImg

Annotation Process:

Open Dir → Select image folder
Change Save Dir → Select annotation save folder
Select YOLO format
Create RectBox → Draw bounding box → Enter class name
Save

LabelMe Installation and Usage

bash
1
2
pip install labelme
labelme

CVAT Self-Hosted Annotation Platform

CVAT (Computer Vision Annotation Tool) is an open-source annotation platform by Intel, supporting Docker self-hosted deployment for team collaboration and large-scale annotation projects.

YOLO Model Training: Complete Custom Dataset Workflow

May 14, 2026 · 12 min read

Complete Custom Dataset Training Process

Ultralytics Unified Training Code

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from ultralytics import YOLO

# Load model
# model = YOLO("yolov8n.yaml")  # Train from scratch
# model = YOLO("yolo11n.pt")    # Based on pre-trained weights
model = YOLO("yolo26n.pt")      # 2026 recommended, edge deployment first choice

# Start training
results = model.train(
    # Basic configuration
    data="data.yaml",        # Dataset configuration
    epochs=100,              # Training epochs
    imgsz=640,               # Input size
    batch=16,                # Batch size
    workers=8,               # Data loading threads
    
    # Optimizer configuration
    optimizer="auto",        # YOLO26 automatically uses MuSGD
    lr0=0.01,                # Initial learning rate
    lrf=0.01,                # Final learning rate factor
    momentum=0.937,          # SGD momentum
    weight_decay=0.0005,     # Weight decay
    
    # Data augmentation
    mosaic=1.0,
    mixup=0.1,
    copy_paste=0.1,
    
    # Other configuration
    device=0,                # GPU device, "cpu" for CPU
    project="runs/train",    # Save path
    name="yolo26_exp1",      # Experiment name
    exist_ok=False,          # Whether to overwrite
    pretrained=True,         # Use pre-trained
    verbose=True,            # Detailed logs
    seed=42,                 # Random seed
)

# Validate model
metrics = model.val()
print(f"mAP50: {metrics.box.map50:.3f}")
print(f"mAP50-95: {metrics.box.map:.3f}")

Training Parameter Differences Across Versions

Parameter	YOLOv8	YOLO11	YOLO26
Default Optimizer	SGD	SGD	MuSGD
DFL Loss	✅	✅	❌ Removed
NMS Post-processing	✅	✅	❌ Native no NMS
Small Object Optimization	Average	Better	Best (STAL)
CPU Inference Speed	Baseline	+25%	+43%

Loss Function Breakdown

YOLO’s loss function consists of three components, each targeting a different learning objective:

YOLO Advanced Optimization: Lightweight, Quantization and Accuracy

May 17, 2026 · 11 min read

Model Lightweighting Strategies

Model Size Selection

Model	Parameters (M)	mAP	CPU Inference	Use Cases
YOLO26n	2.8	38.9	Fastest	Edge devices, Embedded
YOLO26s	9.4	48.2	Very fast	Mobile, Web
YOLO26m	21.8	53.1	Medium	Server, High performance
YOLO11n	2.6	39.6	Fast	Lightweight deployment
YOLOv8n	3.2	37.3	Baseline	General purpose

Knowledge Distillation

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Large model as teacher, small model as student
teacher = YOLO("yolo26x.pt")
student = YOLO("yolo26n.yaml")

# Distillation training (Ultralytics built-in support)
student.train(
    data="data.yaml",
    distill="yolo26x.pt",  # Teacher model
    distill_ratio=0.5,     # Distillation loss ratio
)

Model Pruning

Structured vs Unstructured Pruning

Type	Method	Sparsity Pattern	Hardware Acceleration	Compression Ratio
Unstructured	Weight pruning	Random sparse	Difficult (special HW needed)	High
Structured	Channel pruning	Regular sparse	Native acceleration	Medium

Torch Prune Channel Pruning Example

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
import torch.nn.utils.prune as prune

# L1 unstructured pruning on conv layers
model = YOLO("yolo26n.pt")
for name, module in model.model.named_modules():
    if isinstance(module, torch.nn.Conv2d):
        prune.l1_unstructured(module, name="weight", amount=0.3)
        prune.remove(module, "weight")  # Make pruning permanent

# Channel pruning with torch-pruning library
# pip install torch-pruning
import torch_pruning as tp

model = YOLO("yolo26n.pt").model
DG = tp.DependencyGraph()
DG.build_dependency(model, example_inputs=torch.randn(1, 3, 640, 640))

# Prune 20% channels by L1 norm
pruning_plan = DG.get_pruning_plan(
    model.model[4], tp.prune_conv,
    pruning_dim=0,  # Output channel dimension
    idxs=list(range(0, 64, 5))  # Keep every 5th channel
)
pruning_plan.exec()

Pruning Ratio Guidelines

Model	Safe Ratio	Aggressive Ratio	mAP Drop
YOLO26n	≤20%	20-40%	<1% / 2-5%
YOLO26s	≤30%	30-50%	<1% / 3-6%
YOLO26m	≤40%	40-60%	<1% / 3-8%
YOLOv8n	≤20%	20-35%	<1% / 2-4%

Model Pruning and Quantization

Export Time Quantization

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
model = YOLO("yolo26n.pt")

# INT8 quantization (requires calibration data)
model.export(
    format="engine",      # TensorRT
    int8=True,
    data="data.yaml",     # Calibration dataset
    batch=8,
)

# ONNX dynamic quantization
model.export(
    format="onnx",
    dynamic=True,
    simplify=True,
)

TensorRT INT8 Calibration Step-by-Step

Calibration Dataset Preparation

INT8 quantization requires representative calibration data to determine activation value ranges:

YOLO Deployment: Model Export and Multi-Platform Deployment

May 20, 2026 · 12 min read

Model Export (17 Format Support)

Ultralytics Unified Export API

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from ultralytics import YOLO

model = YOLO("yolo26n.pt")

# ========== Export Various Formats ==========
# 1. ONNX (Cross-platform Universal)
model.export(format="onnx", simplify=True, dynamic=True)

# 2. TensorRT (Best for NVIDIA GPU)
model.export(format="engine", half=True, workspace=4)

# 3. OpenVINO (Best for Intel CPU)
model.export(format="openvino", half=True)

# 4. CoreML (Apple Devices)
model.export(format="coreml", int8=True)

# 5. TFLite (Android/iOS Mobile)
model.export(format="tflite", int8=True)

# 6. NCNN (Mobile)
model.export(format="ncnn")

# 7. PaddlePaddle
model.export(format="paddle")

Version Export Compatibility

Format	YOLOv8	YOLO11	YOLO26
ONNX	✅	✅	✅ Best
TensorRT	✅	✅	✅ No NMS, Simpler
OpenVINO	✅	✅	✅
TFLite	✅	✅	✅
NCNN	✅	✅	✅

Python Deployment Practice

ONNX Runtime Deployment

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import onnxruntime as ort
import cv2
import numpy as np

# Load ONNX model
session = ort.InferenceSession(
    "yolo26n.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

def preprocess(image, imgsz=640):
    """Image preprocessing"""
    img = cv2.resize(image, (imgsz, imgsz))
    img = img.transpose(2, 0, 1) / 255.0
    return img[np.newaxis].astype(np.float32)

# Inference
image = cv2.imread("test.jpg")
input_data = preprocess(image)
outputs = session.run(None, {"images": input_data})

# YOLO26 Special Note: No NMS post-processing needed!
# Output is already the final detection results

TensorRT Python Deployment

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import time

# ========== 1. Engine Loading & Context Creation ==========
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)

with open("yolo26n.engine", "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

context = engine.create_execution_context()

# ========== 2. CUDA Memory Allocation ==========
stream = cuda.Stream()
bindings = []

for i in range(engine.num_io_tensors):
    name = engine.get_tensor_name(i)
    shape = engine.get_tensor_shape(name)
    dtype = trt.nptype(engine.get_tensor_dtype(name))
    size = trt.volume(shape)
    
    host_mem = cuda.pagelocked_empty(size, dtype)   # Host pinned memory
    device_mem = cuda.mem_alloc(host_mem.nbytes)    # Device VRAM
    bindings.append({"name": name, "host": host_mem, "device": device_mem,
                     "shape": shape, "size": size, "dtype": dtype})

# ========== 3. Async Inference Loop ==========
def async_infer(input_blob):
    # H2D copy
    np.copyto(bindings[0]["host"], input_blob.ravel())
    cuda.memcpy_htod_async(bindings[0]["device"], bindings[0]["host"], stream)
    
    # Set tensor addresses and execute
    context.set_tensor_address(bindings[0]["name"], int(bindings[0]["device"]))
    context.set_tensor_address(bindings[1]["name"], int(bindings[1]["device"]))
    context.execute_async_v3(stream.handle)
    
    # D2H copy
    cuda.memcpy_dtoh_async(bindings[1]["host"], bindings[1]["device"], stream)
    stream.synchronize()
    
    return bindings[1]["host"].copy()

# ========== 4. Performance Benchmark ==========
def benchmark(warmup=10, runs=100):
    dummy = np.random.randn(1, 3, 640, 640).astype(np.float32)
    for _ in range(warmup):
        async_infer(dummy)
    
    latencies = []
    for _ in range(runs):
        t0 = time.perf_counter()
        async_infer(dummy)
        latencies.append((time.perf_counter() - t0) * 1000)
    
    latencies.sort()
    print(f"TensorRT FP16 | Mean: {np.mean(latencies):.1f}ms | "
          f"P50: {latencies[runs//2]:.1f}ms | "
          f"P99: {latencies[int(runs*0.99)]:.1f}ms | "
          f"Throughput: {1000/np.mean(latencies):.0f} FPS")

benchmark()

OpenVINO Deployment with Benchmarking

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import openvino as ov
import cv2
import numpy as np
import time

# ========== 1. ONNX → OpenVINO Conversion ==========
# Ultralytics unified export:
#   model.export(format="openvino", half=True)

core = ov.Core()
model = core.read_model("yolo26n_openvino/yolo26n.xml")

# ========== 2. CPU Inference ==========
compiled_cpu = core.compile_model(model, device_name="CPU")
infer_request = compiled_cpu.create_infer_request()

def openvino_infer(image):
    img = cv2.resize(image, (640, 640))
    blob = img.transpose(2, 0, 1)[np.newaxis].astype(np.float32) / 255.0
    outputs = infer_request.infer({"images": blob})
    return outputs[next(iter(outputs))]

# ========== 3. Async Pipeline (Throughput Optimized) ==========
def async_pipeline(images, num_requests=4):
    """Multi-request async inference pipeline"""
    requests = [core.compile_model(model, "CPU").create_infer_request()
                for _ in range(num_requests)]
    results = [None] * len(images)
    
    def completion_callback(request, userdata):
        idx = userdata
        results[idx] = request.get_output_tensor().data.copy()
    
    for req in requests:
        req.set_callback(completion_callback)
    
    for i, img in enumerate(images):
        req = requests[i % num_requests]
        req.start_async({"images": preprocess(img)}, userdata=i)
    
    for req in requests:
        req.wait()
    
    return results

# ========== 4. CPU vs NPU Benchmark Comparison ==========
def benchmark_openvino():
    dummy = np.random.randn(1, 3, 640, 640).astype(np.float32)
    
    for device in ["CPU", "AUTO"]:
        compiled = core.compile_model(model, device)
        req = compiled.create_infer_request()
        
        # Warmup (avoid first-inference kernel compilation overhead)
        for _ in range(20):
            req.infer({"images": dummy})
        
        times = []
        for _ in range(200):
            t0 = time.perf_counter()
            req.infer({"images": dummy})
            times.append((time.perf_counter() - t0) * 1000)
        
        times.sort()
        print(f"OpenVINO {device}: "
              f"Mean {np.mean(times):.1f}ms | "
              f"P99 {times[int(199*0.99)]:.1f}ms | "
              f"{1000/np.mean(times):.0f} FPS")

benchmark_openvino()

NCNN Mobile Deployment

NCNN is Tencent’s open-source mobile inference framework supporting ARM NEON and Vulkan GPU acceleration.

YOLO FAQ: Common Problems and Solutions

May 23, 2026 · 13 min read

Environment Installation Issues

Q1: CUDA not available, only using CPU?

First confirm your NVIDIA driver version supports the required CUDA version. A driver that is too old will make CUDA unavailable:

bash
1
2
3
4
5
6
# Check driver version (Driver Version must be >= minimum for target CUDA)
nvidia-smi
# Check CUDA toolkit version
nvcc --version
# Reinstall PyTorch with matching CUDA version
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121

If nvidia-smi shows a CUDA version but PyTorch still uses CPU, you have installed the CPU-only PyTorch build. Uninstall and reinstall with the --index-url flag for the correct CUDA version. For CUDA 11.8, replace cu121 with cu118 in the URL. Always use a conda or venv virtual environment to isolate PyTorch versions and avoid system-level conflicts.

YOLO Go Deployment Guide

May 26, 2026 · 14 min read

Chapter 8: Complete YOLO Tutorial with Golang

Go language, with its high performance, low memory footprint, and native concurrency features, has become one of the preferred languages for industrial YOLO deployment. This chapter provides a comprehensive implementation guide for YOLO in the Go ecosystem.

Library	Stars	Maintenance Status	Use Case	Recommendation
onnxruntime-go	⭐ 1.2k	Active	ONNX model inference, CPU/GPU acceleration	⭐⭐⭐⭐⭐
gocv	⭐ 5.8k	Active	OpenCV bindings, image processing + DNN inference	⭐⭐⭐⭐⭐
yolo-go	⭐ 800+	Active	Pre-packaged YOLO detection library, out-of-the-box	⭐⭐⭐⭐
go-yolo	⭐ 300+	Maintained	Darknet CGO bindings	⭐⭐⭐
gorgonia	⭐ 4.9k	Active	Pure Go computational graph, custom networks	⭐⭐⭐

Core Feature Comparison:

YOLO Rust Deployment Guide

May 29, 2026 · 6 min read

Chapter 9: Complete YOLO Tutorial with Rust

With its three core characteristics of memory safety, zero-cost abstractions, and extreme performance, Rust has become the ultimate choice for production-grade YOLO deployment. In edge computing and high-concurrency scenarios, Rust’s performance advantages are particularly significant.

Library Name	Crates.io	Maintenance Status	Use Cases	Recommendation Index
ort (onnxruntime-rs)	v2.0.0	Super Active	Official ONNX binding, full platform support	⭐⭐⭐⭐⭐
ultralytics-inference	v0.0.11	Official Maintenance	Official Ultralytics Rust library	⭐⭐⭐⭐⭐
tract	v0.21.0	Active	Pure Rust inference engine, no external dependencies	⭐⭐⭐⭐
opencv-rust	v0.94.0	Active	OpenCV binding, DNN + image processing	⭐⭐⭐⭐
tch-rs	v0.15.0	Active	LibTorch binding, PyTorch models	⭐⭐⭐
candle	v0.6.0	Super Active	HuggingFace pure Rust ML framework	⭐⭐⭐⭐

Core Features Comparison: