YOLO From Zero to Mastery

9 posts
1
YOLO Getting Started: History, Version Comparison and Environment Setup
· 7 min read

Learning Path and Version Selection Guide

Version Selection Guide

VersionRelease DateDevelopment TeamUse CasesRecommendation Index
YOLO262026.01Ultralytics OfficialEdge deployment, CPU inference, industrial applications⭐⭐⭐⭐⭐
YOLOv82023.01Ultralytics OfficialBeginner learning, complete ecosystem, general scenarios⭐⭐⭐⭐⭐
YOLO112024.09Ultralytics OfficialEfficiency optimization, lightweight deployment⭐⭐⭐⭐
YOLOv102024.05Tsinghua UniversityResearch exploration, NMS-free end-to-end⭐⭐⭐⭐
YOLOv92024.01National Taiwan UniversityHigh precision, small object detection⭐⭐⭐⭐
YOLOv122025.02Buffalo University + Chinese Academy of SciencesAttention mechanism research⭐⭐⭐

Learning Path Recommendations

  1. Beginner Stage (1-2 weeks): Start with YOLOv8, master basic concepts and API usage
  2. Intermediate Stage (2-3 weeks): Learn custom dataset training, parameter tuning and optimization
  3. Advanced Stage (2-3 weeks): Learn model deployment, engineering implementation
  4. Research Stage (ongoing): Explore new features in YOLO11, YOLO26, YOLOv9/v10/v12

Complete YOLO Development History Timeline

VersionRelease DateCore InnovationMilestone Significance
YOLOv12015.06Pioneer single-stage detectionFoundation for real-time detection
YOLOv22016.12Batch Normalization, AnchorDual improvement in accuracy and speed
YOLOv32018.04Multi-scale detection, residual networksIndustry standard
YOLOv42020.04CSPDarknet, MosaicPeak of engineering implementation
YOLOv52020.06PyTorch framework, user-friendlyHighest adoption rate
YOLOv72022.07E-ELAN, reparameterizationBalance between speed and accuracy
YOLOv82023.01C2f, Anchor-Free, unified frameworkUltralytics unified ecosystem
YOLOv92024.01GELAN, PGI programmable gradientTraining efficiency revolution
YOLOv102024.05NMS-free, efficiency-precision tradeoffEnd-to-end detection
YOLO112024.09Architecture optimization, parameter reductionEfficiency optimized version
YOLOv122025.02Area Attention mechanismAttention architecture
YOLO262026.01DFL-free, NMS-free, 43% CPU optimizationEdge computing new standard

Core Principles and Version Comparison

Ultralytics Official Main Line Versions

YOLOv8 Core Features:

2
YOLO Quick Start: Model Loading and Inference
· 10 min read

Model Loading and Inference Across Versions

Ultralytics Unified API (Works with v8/11/26)

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from ultralytics import YOLO

# ========== YOLOv8 ==========
model_v8 = YOLO("yolov8n.pt")      #  nano
model_v8 = YOLO("yolov8s.pt")      #  small
model_v8 = YOLO("yolov8m.pt")      #  medium
model_v8 = YOLO("yolov8l.pt")      #  large
model_v8 = YOLO("yolov8x.pt")      #  extra large

# ========== YOLO11 ==========
model_11 = YOLO("yolo11n.pt")      #  nano
model_11 = YOLO("yolo11s.pt")      #  small
model_11 = YOLO("yolo11m.pt")      #  medium
model_11 = YOLO("yolo11l.pt")      #  large
model_11 = YOLO("yolo11x.pt")      #  extra large

# ========== YOLO26 (2026 latest) ==========
model_26 = YOLO("yolo26n.pt")      #  nano  recommended for edge deployment
model_26 = YOLO("yolo26s.pt")      #  small
model_26 = YOLO("yolo26m.pt")      #  medium
model_26 = YOLO("yolo26l.pt")      #  large
model_26 = YOLO("yolo26x.pt")      #  extra large

Image Detection Hands-on Example

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from ultralytics import YOLO

# Load model (YOLO26 example)
model = YOLO("yolo26n.pt")

# Single image detection
results = model("test.jpg", conf=0.25, iou=0.45)

# Process results
for result in results:
    boxes = result.boxes          # Detection boxes
    masks = result.masks          # Segmentation masks
    probs = result.probs          # Classification probabilities
    
    # Print detection results
    for box in boxes:
        print(f"Class: {result.names[int(box.cls)]}, "
              f"Confidence: {box.conf.item():.3f}, "
              f"Coordinates: {box.xyxy.tolist()[0]}")
    
    # Save visualization results
    result.save("result.jpg")

Video Detection Hands-on Example

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from ultralytics import YOLO

model = YOLO("yolo26n.pt")

# Video file detection
results = model.predict(
    source="input.mp4",
    save=True,           # Save result video
    conf=0.3,
    show=False,          # Whether to display in real-time
    stream=True          # Stream processing to save memory
)

# Process frame by frame
for result in results:
    # Custom post-processing logic
    pass

Real-time Camera Detection

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from ultralytics import YOLO
import cv2

model = YOLO("yolo26n.pt")

# Open camera
cap = cv2.VideoCapture(0)  # 0 is default camera

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Inference
    results = model(frame, verbose=False)
    
    # Draw results
    annotated_frame = results[0].plot()
    
    # Display
    cv2.imshow("YOLO Real-time", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Version-specific Code Differences

FeatureYOLOv8YOLO11YOLO26YOLOv9YOLOv10
Unified API❌ Separate repo❌ Separate repo
No NMS
DFL Module❌ Removed
MuSGD Optimizer
Export CompatibilityGoodGoodBestFairFair

Results Object API Deep Dive

The model() or model.predict() call returns a list of Results objects. Each Results object encapsulates all inference outputs for a single image. Understanding its internal structure is essential for downstream processing.

3
YOLO Dataset Preparation: Annotation Tools and Format Conversion
· 15 min read

Data Annotation Tools Usage

LabelImg Installation and Usage

bash
1
2
3
4
5
# Installation
pip install labelImg

# Launch
labelImg

Annotation Process:

  1. Open Dir → Select image folder
  2. Change Save Dir → Select annotation save folder
  3. Select YOLO format
  4. Create RectBox → Draw bounding box → Enter class name
  5. Save

LabelMe Installation and Usage

bash
1
2
pip install labelme
labelme

CVAT Self-Hosted Annotation Platform

CVAT (Computer Vision Annotation Tool) is an open-source annotation platform by Intel, supporting Docker self-hosted deployment for team collaboration and large-scale annotation projects.

4
YOLO Model Training: Complete Custom Dataset Workflow
· 12 min read

Complete Custom Dataset Training Process

Ultralytics Unified Training Code

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from ultralytics import YOLO

# Load model
# model = YOLO("yolov8n.yaml")  # Train from scratch
# model = YOLO("yolo11n.pt")    # Based on pre-trained weights
model = YOLO("yolo26n.pt")      # 2026 recommended, edge deployment first choice

# Start training
results = model.train(
    # Basic configuration
    data="data.yaml",        # Dataset configuration
    epochs=100,              # Training epochs
    imgsz=640,               # Input size
    batch=16,                # Batch size
    workers=8,               # Data loading threads
    
    # Optimizer configuration
    optimizer="auto",        # YOLO26 automatically uses MuSGD
    lr0=0.01,                # Initial learning rate
    lrf=0.01,                # Final learning rate factor
    momentum=0.937,          # SGD momentum
    weight_decay=0.0005,     # Weight decay
    
    # Data augmentation
    mosaic=1.0,
    mixup=0.1,
    copy_paste=0.1,
    
    # Other configuration
    device=0,                # GPU device, "cpu" for CPU
    project="runs/train",    # Save path
    name="yolo26_exp1",      # Experiment name
    exist_ok=False,          # Whether to overwrite
    pretrained=True,         # Use pre-trained
    verbose=True,            # Detailed logs
    seed=42,                 # Random seed
)

# Validate model
metrics = model.val()
print(f"mAP50: {metrics.box.map50:.3f}")
print(f"mAP50-95: {metrics.box.map:.3f}")

Training Parameter Differences Across Versions

ParameterYOLOv8YOLO11YOLO26
Default OptimizerSGDSGDMuSGD
DFL Loss❌ Removed
NMS Post-processing❌ Native no NMS
Small Object OptimizationAverageBetterBest (STAL)
CPU Inference SpeedBaseline+25%+43%

Loss Function Breakdown

YOLO’s loss function consists of three components, each targeting a different learning objective:

5
YOLO Advanced Optimization: Lightweight, Quantization and Accuracy
· 11 min read

Model Lightweighting Strategies

Model Size Selection

ModelParameters (M)mAPCPU InferenceUse Cases
YOLO26n2.838.9FastestEdge devices, Embedded
YOLO26s9.448.2Very fastMobile, Web
YOLO26m21.853.1MediumServer, High performance
YOLO11n2.639.6FastLightweight deployment
YOLOv8n3.237.3BaselineGeneral purpose

Knowledge Distillation

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Large model as teacher, small model as student
teacher = YOLO("yolo26x.pt")
student = YOLO("yolo26n.yaml")

# Distillation training (Ultralytics built-in support)
student.train(
    data="data.yaml",
    distill="yolo26x.pt",  # Teacher model
    distill_ratio=0.5,     # Distillation loss ratio
)

Model Pruning

Structured vs Unstructured Pruning

TypeMethodSparsity PatternHardware AccelerationCompression Ratio
UnstructuredWeight pruningRandom sparseDifficult (special HW needed)High
StructuredChannel pruningRegular sparseNative accelerationMedium

Torch Prune Channel Pruning Example

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
import torch.nn.utils.prune as prune

# L1 unstructured pruning on conv layers
model = YOLO("yolo26n.pt")
for name, module in model.model.named_modules():
    if isinstance(module, torch.nn.Conv2d):
        prune.l1_unstructured(module, name="weight", amount=0.3)
        prune.remove(module, "weight")  # Make pruning permanent

# Channel pruning with torch-pruning library
# pip install torch-pruning
import torch_pruning as tp

model = YOLO("yolo26n.pt").model
DG = tp.DependencyGraph()
DG.build_dependency(model, example_inputs=torch.randn(1, 3, 640, 640))

# Prune 20% channels by L1 norm
pruning_plan = DG.get_pruning_plan(
    model.model[4], tp.prune_conv,
    pruning_dim=0,  # Output channel dimension
    idxs=list(range(0, 64, 5))  # Keep every 5th channel
)
pruning_plan.exec()

Pruning Ratio Guidelines

ModelSafe RatioAggressive RatiomAP Drop
YOLO26n≤20%20-40%<1% / 2-5%
YOLO26s≤30%30-50%<1% / 3-6%
YOLO26m≤40%40-60%<1% / 3-8%
YOLOv8n≤20%20-35%<1% / 2-4%

Model Pruning and Quantization

Export Time Quantization

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
model = YOLO("yolo26n.pt")

# INT8 quantization (requires calibration data)
model.export(
    format="engine",      # TensorRT
    int8=True,
    data="data.yaml",     # Calibration dataset
    batch=8,
)

# ONNX dynamic quantization
model.export(
    format="onnx",
    dynamic=True,
    simplify=True,
)

TensorRT INT8 Calibration Step-by-Step

Calibration Dataset Preparation

INT8 quantization requires representative calibration data to determine activation value ranges:

6
YOLO Deployment: Model Export and Multi-Platform Deployment
· 12 min read

Model Export (17 Format Support)

Ultralytics Unified Export API

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from ultralytics import YOLO

model = YOLO("yolo26n.pt")

# ========== Export Various Formats ==========
# 1. ONNX (Cross-platform Universal)
model.export(format="onnx", simplify=True, dynamic=True)

# 2. TensorRT (Best for NVIDIA GPU)
model.export(format="engine", half=True, workspace=4)

# 3. OpenVINO (Best for Intel CPU)
model.export(format="openvino", half=True)

# 4. CoreML (Apple Devices)
model.export(format="coreml", int8=True)

# 5. TFLite (Android/iOS Mobile)
model.export(format="tflite", int8=True)

# 6. NCNN (Mobile)
model.export(format="ncnn")

# 7. PaddlePaddle
model.export(format="paddle")

Version Export Compatibility

FormatYOLOv8YOLO11YOLO26
ONNXBest
TensorRTNo NMS, Simpler
OpenVINO
TFLite
NCNN

Python Deployment Practice

ONNX Runtime Deployment

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import onnxruntime as ort
import cv2
import numpy as np

# Load ONNX model
session = ort.InferenceSession(
    "yolo26n.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

def preprocess(image, imgsz=640):
    """Image preprocessing"""
    img = cv2.resize(image, (imgsz, imgsz))
    img = img.transpose(2, 0, 1) / 255.0
    return img[np.newaxis].astype(np.float32)

# Inference
image = cv2.imread("test.jpg")
input_data = preprocess(image)
outputs = session.run(None, {"images": input_data})

# YOLO26 Special Note: No NMS post-processing needed!
# Output is already the final detection results

TensorRT Python Deployment

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import time

# ========== 1. Engine Loading & Context Creation ==========
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)

with open("yolo26n.engine", "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())

context = engine.create_execution_context()

# ========== 2. CUDA Memory Allocation ==========
stream = cuda.Stream()
bindings = []

for i in range(engine.num_io_tensors):
    name = engine.get_tensor_name(i)
    shape = engine.get_tensor_shape(name)
    dtype = trt.nptype(engine.get_tensor_dtype(name))
    size = trt.volume(shape)
    
    host_mem = cuda.pagelocked_empty(size, dtype)   # Host pinned memory
    device_mem = cuda.mem_alloc(host_mem.nbytes)    # Device VRAM
    bindings.append({"name": name, "host": host_mem, "device": device_mem,
                     "shape": shape, "size": size, "dtype": dtype})

# ========== 3. Async Inference Loop ==========
def async_infer(input_blob):
    # H2D copy
    np.copyto(bindings[0]["host"], input_blob.ravel())
    cuda.memcpy_htod_async(bindings[0]["device"], bindings[0]["host"], stream)
    
    # Set tensor addresses and execute
    context.set_tensor_address(bindings[0]["name"], int(bindings[0]["device"]))
    context.set_tensor_address(bindings[1]["name"], int(bindings[1]["device"]))
    context.execute_async_v3(stream.handle)
    
    # D2H copy
    cuda.memcpy_dtoh_async(bindings[1]["host"], bindings[1]["device"], stream)
    stream.synchronize()
    
    return bindings[1]["host"].copy()

# ========== 4. Performance Benchmark ==========
def benchmark(warmup=10, runs=100):
    dummy = np.random.randn(1, 3, 640, 640).astype(np.float32)
    for _ in range(warmup):
        async_infer(dummy)
    
    latencies = []
    for _ in range(runs):
        t0 = time.perf_counter()
        async_infer(dummy)
        latencies.append((time.perf_counter() - t0) * 1000)
    
    latencies.sort()
    print(f"TensorRT FP16 | Mean: {np.mean(latencies):.1f}ms | "
          f"P50: {latencies[runs//2]:.1f}ms | "
          f"P99: {latencies[int(runs*0.99)]:.1f}ms | "
          f"Throughput: {1000/np.mean(latencies):.0f} FPS")

benchmark()

OpenVINO Deployment with Benchmarking

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import openvino as ov
import cv2
import numpy as np
import time

# ========== 1. ONNX → OpenVINO Conversion ==========
# Ultralytics unified export:
#   model.export(format="openvino", half=True)

core = ov.Core()
model = core.read_model("yolo26n_openvino/yolo26n.xml")

# ========== 2. CPU Inference ==========
compiled_cpu = core.compile_model(model, device_name="CPU")
infer_request = compiled_cpu.create_infer_request()

def openvino_infer(image):
    img = cv2.resize(image, (640, 640))
    blob = img.transpose(2, 0, 1)[np.newaxis].astype(np.float32) / 255.0
    outputs = infer_request.infer({"images": blob})
    return outputs[next(iter(outputs))]

# ========== 3. Async Pipeline (Throughput Optimized) ==========
def async_pipeline(images, num_requests=4):
    """Multi-request async inference pipeline"""
    requests = [core.compile_model(model, "CPU").create_infer_request()
                for _ in range(num_requests)]
    results = [None] * len(images)
    
    def completion_callback(request, userdata):
        idx = userdata
        results[idx] = request.get_output_tensor().data.copy()
    
    for req in requests:
        req.set_callback(completion_callback)
    
    for i, img in enumerate(images):
        req = requests[i % num_requests]
        req.start_async({"images": preprocess(img)}, userdata=i)
    
    for req in requests:
        req.wait()
    
    return results

# ========== 4. CPU vs NPU Benchmark Comparison ==========
def benchmark_openvino():
    dummy = np.random.randn(1, 3, 640, 640).astype(np.float32)
    
    for device in ["CPU", "AUTO"]:
        compiled = core.compile_model(model, device)
        req = compiled.create_infer_request()
        
        # Warmup (avoid first-inference kernel compilation overhead)
        for _ in range(20):
            req.infer({"images": dummy})
        
        times = []
        for _ in range(200):
            t0 = time.perf_counter()
            req.infer({"images": dummy})
            times.append((time.perf_counter() - t0) * 1000)
        
        times.sort()
        print(f"OpenVINO {device}: "
              f"Mean {np.mean(times):.1f}ms | "
              f"P99 {times[int(199*0.99)]:.1f}ms | "
              f"{1000/np.mean(times):.0f} FPS")

benchmark_openvino()

NCNN Mobile Deployment

NCNN is Tencent’s open-source mobile inference framework supporting ARM NEON and Vulkan GPU acceleration.

7
YOLO FAQ: Common Problems and Solutions
· 13 min read

Environment Installation Issues

Q1: CUDA not available, only using CPU?

First confirm your NVIDIA driver version supports the required CUDA version. A driver that is too old will make CUDA unavailable:

bash
1
2
3
4
5
6
# Check driver version (Driver Version must be >= minimum for target CUDA)
nvidia-smi
# Check CUDA toolkit version
nvcc --version
# Reinstall PyTorch with matching CUDA version
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121

If nvidia-smi shows a CUDA version but PyTorch still uses CPU, you have installed the CPU-only PyTorch build. Uninstall and reinstall with the --index-url flag for the correct CUDA version. For CUDA 11.8, replace cu121 with cu118 in the URL. Always use a conda or venv virtual environment to isolate PyTorch versions and avoid system-level conflicts.

8
YOLO Go Deployment Guide
· 14 min read

Chapter 8: Complete YOLO Tutorial with Golang

Go language, with its high performance, low memory footprint, and native concurrency features, has become one of the preferred languages for industrial YOLO deployment. This chapter provides a comprehensive implementation guide for YOLO in the Go ecosystem.

LibraryStarsMaintenance StatusUse CaseRecommendation
onnxruntime-go⭐ 1.2kActiveONNX model inference, CPU/GPU acceleration⭐⭐⭐⭐⭐
gocv⭐ 5.8kActiveOpenCV bindings, image processing + DNN inference⭐⭐⭐⭐⭐
yolo-go⭐ 800+ActivePre-packaged YOLO detection library, out-of-the-box⭐⭐⭐⭐
go-yolo⭐ 300+MaintainedDarknet CGO bindings⭐⭐⭐
gorgonia⭐ 4.9kActivePure Go computational graph, custom networks⭐⭐⭐

Core Feature Comparison:

9
YOLO Rust Deployment Guide
· 6 min read

Chapter 9: Complete YOLO Tutorial with Rust

With its three core characteristics of memory safety, zero-cost abstractions, and extreme performance, Rust has become the ultimate choice for production-grade YOLO deployment. In edge computing and high-concurrency scenarios, Rust’s performance advantages are particularly significant.

Library NameCrates.ioMaintenance StatusUse CasesRecommendation Index
ort (onnxruntime-rs)v2.0.0Super ActiveOfficial ONNX binding, full platform support⭐⭐⭐⭐⭐
ultralytics-inferencev0.0.11Official MaintenanceOfficial Ultralytics Rust library⭐⭐⭐⭐⭐
tractv0.21.0ActivePure Rust inference engine, no external dependencies⭐⭐⭐⭐
opencv-rustv0.94.0ActiveOpenCV binding, DNN + image processing⭐⭐⭐⭐
tch-rsv0.15.0ActiveLibTorch binding, PyTorch models⭐⭐⭐
candlev0.6.0Super ActiveHuggingFace pure Rust ML framework⭐⭐⭐⭐

Core Features Comparison: