YOLO Getting Started: History, Version Comparison and Environment Setup

May 5, 2026 AI Tools YOLO, Computer Vision, Deep Learning, Object Detection AI Engineering Series 1362 words 7 min read

🔊

Learning Path and Version Selection Guide

Version Selection Guide

Version	Release Date	Development Team	Use Cases	Recommendation Index
YOLO26	2026.01	Ultralytics Official	Edge deployment, CPU inference, industrial applications	⭐⭐⭐⭐⭐
YOLOv8	2023.01	Ultralytics Official	Beginner learning, complete ecosystem, general scenarios	⭐⭐⭐⭐⭐
YOLO11	2024.09	Ultralytics Official	Efficiency optimization, lightweight deployment	⭐⭐⭐⭐
YOLOv10	2024.05	Tsinghua University	Research exploration, NMS-free end-to-end	⭐⭐⭐⭐
YOLOv9	2024.01	National Taiwan University	High precision, small object detection	⭐⭐⭐⭐
YOLOv12	2025.02	Buffalo University + Chinese Academy of Sciences	Attention mechanism research	⭐⭐⭐

Learning Path Recommendations

Beginner Stage (1-2 weeks): Start with YOLOv8, master basic concepts and API usage
Intermediate Stage (2-3 weeks): Learn custom dataset training, parameter tuning and optimization
Advanced Stage (2-3 weeks): Learn model deployment, engineering implementation
Research Stage (ongoing): Explore new features in YOLO11, YOLO26, YOLOv9/v10/v12

Complete YOLO Development History Timeline

Version	Release Date	Core Innovation	Milestone Significance
YOLOv1	2015.06	Pioneer single-stage detection	Foundation for real-time detection
YOLOv2	2016.12	Batch Normalization, Anchor	Dual improvement in accuracy and speed
YOLOv3	2018.04	Multi-scale detection, residual networks	Industry standard
YOLOv4	2020.04	CSPDarknet, Mosaic	Peak of engineering implementation
YOLOv5	2020.06	PyTorch framework, user-friendly	Highest adoption rate
YOLOv7	2022.07	E-ELAN, reparameterization	Balance between speed and accuracy
YOLOv8	2023.01	C2f, Anchor-Free, unified framework	Ultralytics unified ecosystem
YOLOv9	2024.01	GELAN, PGI programmable gradient	Training efficiency revolution
YOLOv10	2024.05	NMS-free, efficiency-precision tradeoff	End-to-end detection
YOLO11	2024.09	Architecture optimization, parameter reduction	Efficiency optimized version
YOLOv12	2025.02	Area Attention mechanism	Attention architecture
YOLO26	2026.01	DFL-free, NMS-free, 43% CPU optimization	Edge computing new standard

Core Principles and Version Comparison

Ultralytics Official Main Line Versions

YOLOv8 Core Features:

C2f module replaces C3, enhancing gradient flow
Anchor-Free detection head, simplifying post-processing
Unified framework supporting detection, segmentation, classification, pose estimation
Most complete ecosystem, comprehensive documentation

YOLO11 Core Improvements:

Backbone/Neck structure lightweight optimization
22% parameter reduction, 25% speed improvement
Fully API compatible with YOLOv8, zero code changes required
Improved small object detection accuracy

YOLO26 What’s New (2026 Latest):

✅ Removed DFL module: Simplified bounding box prediction, significantly improved hardware compatibility
✅ Native NMS-free: End-to-end inference, 50% reduction in deployment complexity
✅ 43% faster CPU inference: Optimized for edge devices, real-time without GPU
✅ ProgLoss + STAL: Significant improvement in small object detection accuracy
✅ MuSGD optimizer: Faster training convergence, stronger robustness
✅ Supports 6 major vision tasks: Detection, segmentation, classification, pose, rotated bounding boxes, keypoints

Third-party Research Versions

YOLOv9 (National Taiwan University):

GELAN (Generalized Efficient Layer Aggregation Network)
PGI (Programmable Gradient Information)
Highest accuracy version: YOLOv9e achieves 55.6% mAP

YOLOv10 (Tsinghua University):

Consistent dual assignment strategy
Overall efficiency-precision optimization
NMS-free end-to-end inference

YOLOv12 (Buffalo University + Chinese Academy of Sciences):

Area Attention regional attention mechanism
Linear complexity O(n)
YOLOv12-N: 40.6% mAP @ 1.64ms T4

Complete Environment Setup Guide

Basic Environment Preparation

System Requirements:

Windows 10/11, Ubuntu 20.04+, macOS 12+
Python: 3.8 ~ 3.11 (recommended 3.10)
PyTorch: >= 2.0 (recommended 2.3+)

Anaconda Environment Creation

bash
1
2
3
# Create virtual environment
conda create -n yolo python=3.10 -y
conda activate yolo

PyTorch Installation (GPU/CPU Versions)

GPU Version (Recommended, CUDA 12.1):

bash
1
2
3
4
5
# CUDA 12.1 version (2026 recommended)
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia

# Verify installation
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('PyTorch version:', torch.__version__)"

CPU Version:

bash
1
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu

Ultralytics Installation (Supports all official versions)

bash
1
2
3
4
5
# Install latest version (supports YOLOv8 / YOLO11 / YOLO26)
pip install ultralytics -U

# Verify installation
yolo version

Third-party versions separate installation

YOLOv9:

bash
1
2
git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9 && pip install -r requirements.txt

YOLOv10:

bash
1
2
git clone https://github.com/THU-MIG/yolov10.git
cd yolov10 && pip install -r requirements.txt

IoU (Intersection over Union) Explained

IoU (Intersection over Union) is one of the most fundamental evaluation metrics in object detection. It measures the overlap between a predicted bounding box and the ground truth.

IoU Formula

Mathematically:

1
IoU = (Prediction ∩ Ground Truth) / (Prediction ∪ Ground Truth)

Visually: the intersection area of two boxes divided by their union area. IoU = 1 means perfect overlap, IoU = 0 means no overlap.

IoU Threshold Selection Guide

Threshold	Strictness	Use Case
0.5	Lenient	General detection, quick evaluation
0.75	Medium	Precise localization required
0.9	Strict	High-precision detection, industrial QC

mAP@50 vs mAP@50:95

mAP@50: mAP calculated at a fixed IoU threshold of 0.5, measuring the ability to “detect objects”
mAP@50:95: Average mAP across 10 IoU thresholds from 0.5 to 0.95 (step 0.05), measuring “localization precision”

In practice: focus on mAP@50 for high recall, focus on mAP@50:95 for precise localization. Using both together gives a comprehensive view of model performance.

Quick Verification Script

After installing Ultralytics, run this minimal script to verify your environment:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from ultralytics import YOLO
import numpy as np

# Load pretrained model (auto-download)
model = YOLO("yolov8n.pt")

# Generate random test image (640x640 RGB)
img = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)

# Run inference
results = model(img)

# Print results
print(f"Model: {model.model_name}")
print(f"Detected {len(results[0].boxes)} objects")
for box in results[0].boxes:
    print(f"  {model.names[int(box.cls[0])]}: {float(box.conf[0]):.3f}")

Expected output (0 detections on random noise is normal):

1
2
3
Model: yolov8n.pt
Detected 0 objects
Speed: 1.8ms preprocess, 12.3ms inference, 0.5ms postprocess per image

Zero detections on random noise is expected — it confirms the model loaded, preprocessing works, and inference runs correctly. Ultralytics automatically uses GPU when CUDA is available.

Docker Environment Setup

Prefer not to pollute your local Python environment or need GPU acceleration? Docker is the ideal solution.

Dockerfile Example

dockerfile
1
2
3
4
5
6
7
8
9
FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

RUN apt-get update && apt-get install -y \
    git wget libgl1-mesa-glx libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir ultralytics

WORKDIR /workspace

Build and Run

bash
1
2
3
4
5
# Build image
docker build -t yolo-dev .

# Run with GPU acceleration (requires NVIDIA Container Toolkit)
docker run --gpus all -it --rm -v $(pwd):/workspace yolo-dev

docker-compose Configuration

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
version: '3.8'
services:
  yolo:
    image: pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
    container_name: yolo-dev
    runtime: nvidia
    working_dir: /workspace
    volumes:
      - .:/workspace
    command: >
      sh -c "pip install ultralytics && tail -f /dev/null"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Start commands:

bash
1
2
docker-compose up -d           # Start container
docker-compose exec yolo bash  # Enter container shell

NVIDIA Container Toolkit Installation Guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Your First Detection: Complete Hands-On Tutorial

Now let’s run a complete object detection example: download a test image, run inference with a pretrained model, and save the annotated result.

Step 1: Download a Test Image

bash
1
2
3
4
5
6
mkdir yolo-first-detection && cd yolo-first-detection

# Download a street scene test image
wget https://ultralytics.com/images/bus.jpg -O test.jpg
# Or using Python:
# python -c "import urllib.request; urllib.request.urlretrieve('https://ultralytics.com/images/bus.jpg', 'test.jpg')"

Step 2: Create the Inference Script

Create detect.py:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from ultralytics import YOLO

# Load pretrained model (auto-downloads YOLOv8n)
model = YOLO("yolov8n.pt")

# Run inference
results = model("test.jpg")

# Display detection results
for r in results:
    print(f"Detected {len(r.boxes)} objects")
    for box in r.boxes:
        cls_id = int(box.cls[0])
        conf = float(box.conf[0])
        xyxy = box.xyxy[0].tolist()
        print(f"  {model.names[cls_id]}: confidence {conf:.2f}, bbox {xyxy}")

# Save annotated image
results[0].save("output.jpg")
print("Annotated result saved to output.jpg")

Step 3: Run and View Results

bash

1
python detect.py

Expected output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to yolov8n.pt...
100%|████████████████████| 6.23M/6.23M [00:01<00:00, 4.12MB/s]

image 1/1 test.jpg: 640x480 4 persons, 1 bus, 1 stop sign
Detected 6 objects
  person: confidence 0.89, bbox [112.5, 237.4, 215.6, 478.2]
  person: confidence 0.87, bbox [264.3, 241.7, 328.9, 476.8]
  person: confidence 0.83, bbox [72.1, 253.8, 124.9, 478.1]
  person: confidence 0.76, bbox [378.2, 217.4, 435.1, 477.6]
  bus: confidence 0.92, bbox [5.4, 132.7, 609.8, 468.3]
  stop sign: confidence 0.81, bbox [321.5, 85.3, 361.2, 142.8]
Speed: 3.2ms preprocess, 14.7ms inference, 1.1ms postprocess per image

The first run auto-downloads pretrained weights (~6MB); subsequent runs skip the download. YOLOv8n is the fastest Nano version; switch to yolov8s.pt or yolov8m.pt for higher accuracy.

Part of series: AI Engineering Series

← Previous Loop Engineering: Designing AI's Self-Driving Systems Next → YOLO Quick Start: Model Loading and Inference