YOLO Getting Started: History, Version Comparison and Environment Setup

Learning Path and Version Selection Guide

Version Selection Guide

VersionRelease DateDevelopment TeamUse CasesRecommendation Index
YOLO262026.01Ultralytics OfficialEdge deployment, CPU inference, industrial applications⭐⭐⭐⭐⭐
YOLOv82023.01Ultralytics OfficialBeginner learning, complete ecosystem, general scenarios⭐⭐⭐⭐⭐
YOLO112024.09Ultralytics OfficialEfficiency optimization, lightweight deployment⭐⭐⭐⭐
YOLOv102024.05Tsinghua UniversityResearch exploration, NMS-free end-to-end⭐⭐⭐⭐
YOLOv92024.01National Taiwan UniversityHigh precision, small object detection⭐⭐⭐⭐
YOLOv122025.02Buffalo University + Chinese Academy of SciencesAttention mechanism research⭐⭐⭐

Learning Path Recommendations

  1. Beginner Stage (1-2 weeks): Start with YOLOv8, master basic concepts and API usage
  2. Intermediate Stage (2-3 weeks): Learn custom dataset training, parameter tuning and optimization
  3. Advanced Stage (2-3 weeks): Learn model deployment, engineering implementation
  4. Research Stage (ongoing): Explore new features in YOLO11, YOLO26, YOLOv9/v10/v12

Complete YOLO Development History Timeline

VersionRelease DateCore InnovationMilestone Significance
YOLOv12015.06Pioneer single-stage detectionFoundation for real-time detection
YOLOv22016.12Batch Normalization, AnchorDual improvement in accuracy and speed
YOLOv32018.04Multi-scale detection, residual networksIndustry standard
YOLOv42020.04CSPDarknet, MosaicPeak of engineering implementation
YOLOv52020.06PyTorch framework, user-friendlyHighest adoption rate
YOLOv72022.07E-ELAN, reparameterizationBalance between speed and accuracy
YOLOv82023.01C2f, Anchor-Free, unified frameworkUltralytics unified ecosystem
YOLOv92024.01GELAN, PGI programmable gradientTraining efficiency revolution
YOLOv102024.05NMS-free, efficiency-precision tradeoffEnd-to-end detection
YOLO112024.09Architecture optimization, parameter reductionEfficiency optimized version
YOLOv122025.02Area Attention mechanismAttention architecture
YOLO262026.01DFL-free, NMS-free, 43% CPU optimizationEdge computing new standard

Core Principles and Version Comparison

Ultralytics Official Main Line Versions

YOLOv8 Core Features:

  • C2f module replaces C3, enhancing gradient flow
  • Anchor-Free detection head, simplifying post-processing
  • Unified framework supporting detection, segmentation, classification, pose estimation
  • Most complete ecosystem, comprehensive documentation

YOLO11 Core Improvements:

  • Backbone/Neck structure lightweight optimization
  • 22% parameter reduction, 25% speed improvement
  • Fully API compatible with YOLOv8, zero code changes required
  • Improved small object detection accuracy

YOLO26 Revolutionary Breakthrough (2026 Latest):

  • Removed DFL module: Simplified bounding box prediction, significantly improved hardware compatibility
  • Native NMS-free: End-to-end inference, 50% reduction in deployment complexity
  • 43% faster CPU inference: Optimized for edge devices, real-time without GPU
  • ProgLoss + STAL: Significant improvement in small object detection accuracy
  • MuSGD optimizer: Faster training convergence, stronger robustness
  • Supports 6 major vision tasks: Detection, segmentation, classification, pose, rotated bounding boxes, keypoints

Third-party Research Versions

YOLOv9 (National Taiwan University):

  • GELAN (Generalized Efficient Layer Aggregation Network)
  • PGI (Programmable Gradient Information)
  • Highest accuracy version: YOLOv9e achieves 55.6% mAP

YOLOv10 (Tsinghua University):

  • Consistent dual assignment strategy
  • Overall efficiency-precision optimization
  • NMS-free end-to-end inference

YOLOv12 (Buffalo University + Chinese Academy of Sciences):

  • Area Attention regional attention mechanism
  • Linear complexity O(n)
  • YOLOv12-N: 40.6% mAP @ 1.64ms T4

Complete Environment Setup Guide

Basic Environment Preparation

System Requirements:

  • Windows 10/11, Ubuntu 20.04+, macOS 12+
  • Python: 3.8 ~ 3.11 (recommended 3.10)
  • PyTorch: >= 2.0 (recommended 2.3+)

Anaconda Environment Creation

bash
1
2
3
# Create virtual environment
conda create -n yolo python=3.10 -y
conda activate yolo

PyTorch Installation (GPU/CPU Versions)

GPU Version (Recommended, CUDA 12.1):

bash
1
2
3
4
5
# CUDA 12.1 version (2026 recommended)
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia

# Verify installation
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('PyTorch version:', torch.__version__)"

CPU Version:

bash
1
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu

Ultralytics Installation (Supports all official versions)

bash
1
2
3
4
5
# Install latest version (supports YOLOv8 / YOLO11 / YOLO26)
pip install ultralytics -U

# Verify installation
yolo version

Third-party versions separate installation

YOLOv9:

bash
1
2
git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9 && pip install -r requirements.txt

YOLOv10:

bash
1
2
git clone https://github.com/THU-MIG/yolov10.git
cd yolov10 && pip install -r requirements.txt

IoU (Intersection over Union) Explained

IoU (Intersection over Union) is one of the most fundamental evaluation metrics in object detection. It measures the overlap between a predicted bounding box and the ground truth.

IoU Formula

Mathematically:

1
IoU = (Prediction ∩ Ground Truth) / (Prediction ∪ Ground Truth)

Visually: the intersection area of two boxes divided by their union area. IoU = 1 means perfect overlap, IoU = 0 means no overlap.

IoU Threshold Selection Guide

ThresholdStrictnessUse Case
0.5LenientGeneral detection, quick evaluation
0.75MediumPrecise localization required
0.9StrictHigh-precision detection, industrial QC

mAP@50 vs mAP@50:95

  • mAP@50: mAP calculated at a fixed IoU threshold of 0.5, measuring the ability to “detect objects”
  • mAP@50:95: Average mAP across 10 IoU thresholds from 0.5 to 0.95 (step 0.05), measuring “localization precision”

In practice: focus on mAP@50 for high recall, focus on mAP@50:95 for precise localization. Using both together gives a comprehensive view of model performance.

Quick Verification Script

After installing Ultralytics, run this minimal script to verify your environment:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from ultralytics import YOLO
import numpy as np

# Load pretrained model (auto-download)
model = YOLO("yolov8n.pt")

# Generate random test image (640x640 RGB)
img = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)

# Run inference
results = model(img)

# Print results
print(f"Model: {model.model_name}")
print(f"Detected {len(results[0].boxes)} objects")
for box in results[0].boxes:
    print(f"  {model.names[int(box.cls[0])]}: {float(box.conf[0]):.3f}")

Expected output (0 detections on random noise is normal):

1
2
3
Model: yolov8n.pt
Detected 0 objects
Speed: 1.8ms preprocess, 12.3ms inference, 0.5ms postprocess per image

Zero detections on random noise is expected — it confirms the model loaded, preprocessing works, and inference runs correctly. Ultralytics automatically uses GPU when CUDA is available.

Docker Environment Setup

Prefer not to pollute your local Python environment or need GPU acceleration? Docker is the ideal solution.

Dockerfile Example

dockerfile
1
2
3
4
5
6
7
8
9
FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

RUN apt-get update && apt-get install -y \
    git wget libgl1-mesa-glx libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir ultralytics

WORKDIR /workspace

Build and Run

bash
1
2
3
4
5
# Build image
docker build -t yolo-dev .

# Run with GPU acceleration (requires NVIDIA Container Toolkit)
docker run --gpus all -it --rm -v $(pwd):/workspace yolo-dev

docker-compose Configuration

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
version: '3.8'
services:
  yolo:
    image: pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
    container_name: yolo-dev
    runtime: nvidia
    working_dir: /workspace
    volumes:
      - .:/workspace
    command: >
      sh -c "pip install ultralytics && tail -f /dev/null"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Start commands:

bash
1
2
docker-compose up -d           # Start container
docker-compose exec yolo bash  # Enter container shell

NVIDIA Container Toolkit Installation Guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Your First Detection: Complete Hands-On Tutorial

Now let’s run a complete object detection example: download a test image, run inference with a pretrained model, and save the annotated result.

Step 1: Download a Test Image

bash
1
2
3
4
5
6
mkdir yolo-first-detection && cd yolo-first-detection

# Download a street scene test image
wget https://ultralytics.com/images/bus.jpg -O test.jpg
# Or using Python:
# python -c "import urllib.request; urllib.request.urlretrieve('https://ultralytics.com/images/bus.jpg', 'test.jpg')"

Step 2: Create the Inference Script

Create detect.py:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from ultralytics import YOLO

# Load pretrained model (auto-downloads YOLOv8n)
model = YOLO("yolov8n.pt")

# Run inference
results = model("test.jpg")

# Display detection results
for r in results:
    print(f"Detected {len(r.boxes)} objects")
    for box in r.boxes:
        cls_id = int(box.cls[0])
        conf = float(box.conf[0])
        xyxy = box.xyxy[0].tolist()
        print(f"  {model.names[cls_id]}: confidence {conf:.2f}, bbox {xyxy}")

# Save annotated image
results[0].save("output.jpg")
print("Annotated result saved to output.jpg")

Step 3: Run and View Results

bash
1
python detect.py

Expected output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to yolov8n.pt...
100%|████████████████████| 6.23M/6.23M [00:01<00:00, 4.12MB/s]

image 1/1 test.jpg: 640x480 4 persons, 1 bus, 1 stop sign
Detected 6 objects
  person: confidence 0.89, bbox [112.5, 237.4, 215.6, 478.2]
  person: confidence 0.87, bbox [264.3, 241.7, 328.9, 476.8]
  person: confidence 0.83, bbox [72.1, 253.8, 124.9, 478.1]
  person: confidence 0.76, bbox [378.2, 217.4, 435.1, 477.6]
  bus: confidence 0.92, bbox [5.4, 132.7, 609.8, 468.3]
  stop sign: confidence 0.81, bbox [321.5, 85.3, 361.2, 142.8]
Speed: 3.2ms preprocess, 14.7ms inference, 1.1ms postprocess per image

The first run auto-downloads pretrained weights (~6MB); subsequent runs skip the download. YOLOv8n is the fastest Nano version; switch to yolov8s.pt or yolov8m.pt for higher accuracy.