YOLO FAQ: Common Problems and Solutions

Environment Installation Issues

Q1: CUDA not available, only using CPU?

First confirm your NVIDIA driver version supports the required CUDA version. A driver that is too old will make CUDA unavailable:

bash
1
2
3
4
5
6
# Check driver version (Driver Version must be >= minimum for target CUDA)
nvidia-smi
# Check CUDA toolkit version
nvcc --version
# Reinstall PyTorch with matching CUDA version
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121

If nvidia-smi shows a CUDA version but PyTorch still uses CPU, you have installed the CPU-only PyTorch build. Uninstall and reinstall with the --index-url flag for the correct CUDA version. For CUDA 11.8, replace cu121 with cu118 in the URL. Always use a conda or venv virtual environment to isolate PyTorch versions and avoid system-level conflicts.

Q2: ultralytics installation failed?

Common causes: outdated pip, dependency conflicts (numpy/opencv version incompatibility), or network timeouts:

bash
1
2
3
4
5
6
7
# Upgrade pip
pip install --upgrade pip
# Use domestic PyPI mirror for faster downloads
pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple
# If dependency conflicts persist, manually install core dependencies first
pip install numpy==1.26.0 opencv-python==4.9.0.80
pip install ultralytics

ultralytics supports Python 3.8–3.12, with Python 3.10 recommended. Use conda for isolation: conda create -n yolo python=3.10 -y && conda activate yolo. On ARM macOS, if wheel conflicts arise, try pip install ultralytics --no-deps and install dependencies individually.

Q3: Out of Memory (OOM) error during training?

OOM is the most common YOLO training issue. Combine these strategies:

python
1
2
3
4
5
6
model.train(
    batch=-1,           # Auto-adjust batch size (starts from 16, decreases)
    amp=True,           # Mixed precision, reduces memory ~40%
    imgsz=640,          # Lower input resolution, 640→480 saves ~30% memory
    gradient_accumulation_steps=4,  # Gradient accumulation simulates larger batch
)

If still OOM, specify device=0 for a specific GPU or limit memory fraction via torch.cuda.set_per_process_memory_fraction(0.8). Monitor peak memory with nvidia-smi -l 1 to see if the bottleneck is data loading or backpropagation. For GPUs with less than 8GB VRAM, use lightweight models like YOLOv8n or YOLO11n with imgsz=416.

Q4: Loss doesn’t converge, mAP is very low?

Systematically diagnose from data and training dimensions:

  1. Dataset validation: Use ultralytics inspector data.yaml to verify annotations. Ensure each image has at least one bounding box and no empty annotation files exist. Visualize augmented batch samples with TensorBoard or W&B to confirm augmentation strength is appropriate
  2. Learning rate tuning: Default lr0=0.01 may be too high for some datasets. Try lr0=0.001 or enable cos_lr=True cosine annealing. Loss should steadily decrease after the warmup phase (default 3 epochs)
  3. Training epochs: Small datasets (<1000 images) need 300–500 epochs, large datasets 100–300 epochs. Use patience=50 for early stopping
  4. Class balance: When sample count varies by >10x across classes, enable cls_pw=1.5 or switch to Focal Loss. Monitor recall for rare classes
  5. Augmentation intensity: Over-augmentation can prevent convergence. Reduce hsv_h=0.015, degrees=0.0 etc., especially for small datasets

Q5: What’s different between YOLO26 and v8 training?

YOLO26 introduces significant simplifications to the training workflow:

  • Optimizer: Automatically uses MuSGD (fused SGD + momentum), no need to set optimizer parameter. Converges ~15% faster than AdamW
  • Loss function: No DFL (Distribution Focal Loss) — only classification and regression branches, making training more stable with fewer hyperparameters to tune
  • Small object detection: STAL (Scale-Transfer Attention Layer) automatically activates in the neck, improving small object AP by 3–5%
  • Default hyperparameters: lr0=0.005 (lower than v8), momentum=0.937, weight_decay=0.0005, converging in 300 epochs on COCO
  • Training speed: ~20% faster than v8 at the same batch size with lower memory usage

Migration requires only upgrading the ultralytics package — no code changes needed.

Training Failure Diagnosis

Q14: NaN loss during training?

Common causes and fixes for NaN (Not a Number) loss:

  • Learning rate too high: Loss jumps to NaN at a specific step. Reduce lr0 to 0.0005 or lower and restart training
  • Invalid annotations: Check for negative coordinates, boxes extending beyond image boundaries, or zero-width/height boxes
  • Gradient explosion: Enable gradient clipping with model.train(gradient_clip_val=1.0) or reduce batch size to lower gradient variance
  • Numerical overflow: Ensure AMP is working correctly; check if logits or asc (anchor selection cost) values are overflowing
  • Corrupted data: Confirm no empty, broken, or single-pixel images. Validate with PIL.Image.open() on each file

After fixing, resume from the latest checkpoint with model.train(resume=True).

Q15: Dataset issues causing training to stall?

  • Insufficient samples: Each class needs at least 100 annotated images. Below 50, the model struggles to generalize. Enable mosaic=1.0 and mixup=0.5 for augmentation
  • Extreme class imbalance: When one class accounts for 90%+, the model becomes biased. Pass class_weights or switch to Focal Loss
  • Hard example contamination: Blurry or heavily occluded annotations slow convergence. Manually filter or add more similar hard examples
  • Validation distribution shift: Ensure train/val sets come from the same distribution. Use stratified sampling (train_test_split(stratify=y)) for class-proportional splits

Inference and Deployment Issues

Q6: Inference results incorrect after ONNX export?

Key configuration considerations for ONNX export:

python
1
2
3
4
5
6
7
model.export(
    format="onnx",
    simplify=True,       # Simplify computation graph, remove redundant ops
    opset=17,            # Recommended 17 or 18, compatible with ONNX Runtime 1.14+
    dynamic=True,        # Enable dynamic batch and input size
    imgsz=640,           # Must match training size
)

Common causes: 1) opset too low (<15) missing operator support; 2) simplify=False prevents runtime optimization; 3) input size mismatch with training causes interpolation errors. Validate with ONNX Runtime: python -c "import onnxruntime as ort; sess=ort.InferenceSession('model.onnx'); print(sess.get_inputs()[0].shape)". Note ONNX output format is (batch, num_dets, 6) with [x1,y1,x2,y2,conf,cls], different from PyTorch’s raw output.

Q7: TensorRT export failed?

TensorRT export failures are typically version mismatch:

TensorRT VersionMinimum CUDARecommended PyTorch
8.6CUDA 11.6+2.0 / 2.1
10.0CUDA 12.0+2.3+
10.7CUDA 12.4+2.5+

YOLO26 has the simplest export due to no NMS: model.export(format="engine", half=True). INT8 quantization requires a calibration dataset: model.export(format="engine", int8=True, data="data.yaml"). TensorRT engines are hardware-locked — different GPU architectures (A100 vs RTX 4090) need separate exports.

Q8: Why is YOLO26 faster for inference?

Optimization at three levels:

  • Architecture: Removed DFL module reduces head computation by ~50%; native no-NMS eliminates post-processing time (typically 5–15% of inference latency)
  • Operators: Extensive use of RepVGG-style reparameterized convolutions, equivalent to plain 3×3 convolutions at inference, more efficient than v8’s multi-branch structure
  • Instruction sets: Core operators deeply optimized with AVX2/AVX512 and NEON, achieving 43% CPU speedup. On RTX 4090: YOLO26n (FP16) reaches 2.5ms, YOLO26s ~4.0ms

Recommendations: edge devices → YOLO26n; high-throughput API → YOLO26s (INT8); maximum accuracy → YOLO26m/l.

ONNX Export Troubleshooting

Q16: ONNX export failed: Unsupported op?

Incorrect opset version selection is the most common cause:

  • opset=16: Supports most CNN operators, works with DFL-based YOLO versions
  • opset=17: Recommended — covers all YOLO26 operators, compatible with ONNX Runtime 1.14+
  • opset=18+: Latest operator set, but some inference frameworks (older OpenVINO, Triton) may not support it
  • Safe default: model.export(format="onnx", opset=17)
  • If still failing, add dynamic_axes={'input': {0: 'batch'}, 'output': {0: 'batch'}} parameter

Q17: Dynamic axes configuration issues?

python
1
2
3
4
# Dynamic batch + dynamic dimensions (output contains -1 axes)
model.export(format="onnx", dynamic=True, imgsz=[640, 480])
# Fixed size (better inference framework compatibility)
model.export(format="onnx", dynamic=False, imgsz=640)

dynamic=True produces -1 dimension inputs that some frameworks (older OpenVINO) cannot handle — use fixed-size export instead. Pass imgsz as a list [640, 480] to allow dynamic resizing in both dimensions.

Q18: Exported ONNX model validation failed?

Standard validation workflow:

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 1. Model structure check
python -c "import onnx; m=onnx.load('model.onnx'); onnx.checker.check_model(m); print('ONNX check OK')"
# 2. ONNX Runtime inference test
python -c "
import onnxruntime as ort, numpy as np
sess = ort.InferenceSession('model.onnx')
inp = np.random.randn(1,3,640,640).astype(np.float32)
out = sess.run(None, {sess.get_inputs()[0].name: inp})
print(f'Output shape: {out[0].shape}, mean: {out[0].mean():.4f}')
"
# 3. Compare with PyTorch output (numerical error should be < 1e-3)

If shape inference fails (output dimensions are 0 or -1), retry with model.export(format="onnx", simplify=False).

Deployment Environment Specific Issues

Q19: Rust ort crate CUDA compilation error?

toml
1
2
3
# Cargo.toml
[dependencies]
ort = { version = "2.0", features = ["cuda"] }

Ensure CUDA_PATH points to the correct CUDA installation and cudart shared libraries are in PATH (Windows) or LD_LIBRARY_PATH (Linux). The 'ort-sys' build failed error typically means only the NVIDIA driver is installed without the CUDA Toolkit. Explicitly configure the CUDA execution provider in Rust:

rust
1
2
3
let session = ort::SessionBuilder::new()?
    .with_execution_providers([CUDAExecutionProvider::default().build()])?
    .commit_from_file("model.onnx")?;

Q20: Go CGO cross-compilation for ARM?

bash
1
2
3
4
5
CGO_ENABLED=1 \
GOOS=linux GOARCH=arm64 \
CC=aarch64-linux-gnu-gcc \
CGO_LDFLAGS="-L/path/to/arm64/onnxruntime/lib -lonnxruntime" \
go build -o app main.go

The CGO_LDFLAGS allowed error occurs because Go’s security policy blocks custom linker flags. Set CGO_LDFLAGS_ALLOW=".*" to bypass. For ARM platforms, use the onnxruntime-arm64 package (v1.17+) to avoid compiling from source.

Q21: ONNX Runtime version and CUDA compatibility?

ONNX RuntimeMinimum CUDAcuDNNNotes
1.16.x11.88.6Stable, general purpose
1.17.x11.8 / 12.x8.7CUDA 12 support
1.18.x12.28.9CUDA 11 dropped
1.19.x12.49.0+Latest features

Mismatched versions cause session creation failed: Error in Bind. Verify with python -m onnxruntime.capi._pybind_state.

Q22: GPU not available in Docker container?

dockerfile
1
2
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y nvidia-container-toolkit

The host needs NVIDIA drivers + nvidia-container-toolkit. Run with --gpus all flag. If nvidia-smi is invisible inside the container, check /etc/docker/daemon.json for nvidia runtime configuration. Docker Compose setup:

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
services:
  yolo:
    image: yolo-service
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Q23: Windows vs Linux path issues?

python
1
2
3
4
5
6
7
# Windows ❌
data_path = "data\\images\\train"
# Windows ✅ — forward slashes work cross-platform
data_path = "data/images/train"
# Cross-platform solution
import pathlib
data_path = pathlib.Path("data") / "images" / "train"

Always use POSIX forward slashes in YOLO configuration files for cross-platform compatibility. The same applies to model weight file paths. On Windows, paths longer than 260 characters require enabling long path support in the registry or group policy.

Q9: Annotation file format errors?

YOLO annotation format is strict. Common mistakes:

  • Coordinate normalization: Each line <class_id> <x_center> <y_center> <width> <height>, all values must be in 0~1 range (divided by image width/height). These are normalized coordinates, not pixel coordinates
  • File structure: One .txt file per image (keep empty file for no-object images to avoid warnings). One object per line
  • Format validation: Use ultralytics inspector data.yaml to verify integrity
  • Coordinate type: YOLO uses xywh (center + dimensions), different from COCO’s xyxy (top-left + bottom-right). Confirm conversion when migrating formats
  • Empty annotations: Keep empty .txt for truly empty images, set forbid_empty=False in data.yaml

Use LabelImg or Label Studio for YOLO format export to avoid manual editing errors.

Q10: Class IDs not consecutive?

YOLO requires class IDs to start from 0 and increment consecutively without gaps:

  • Remapping script: Original IDs [0, 1, 4, 7] → map to [0, 1, 2, 3] using a dictionary {0:0, 1:1, 4:2, 7:3} for batch replacement
  • Background class: YOLO has no explicit background class (unlike Faster R-CNN’s class 0). All annotation IDs are foreground. If your dataset has background ID 0, offset all IDs: new_id = old_id - 1
  • Verification: python -c "import yaml; d=yaml.safe_load(open('data.yaml')); print(len(d['names']))" should equal max ID + 1

Version Selection Issues

Q11: Which version should beginners start with?

  • Recommend YOLOv8: Most tutorials (docs, videos, community), most complete ecosystem (detection, segmentation, pose, classification), best for systematic learning
  • After learning v8, switching to YOLO11/26 is zero-cost — API is 100% compatible, only architecture differs internally
  • Not recommended to start with YOLOv5: v5’s API differs from the Ultralytics framework, requiring a migration later
  • Learning path: YOLOv8 official notebook → custom dataset training → ONNX/TensorRT deployment → switch to YOLO26 for performance

Q12: Which version for industrial deployment?

ScenarioRecommendedReason
Edge devices (Jetson/phone)YOLO26n43% faster CPU, no NMS, simplest deployment
Server GPUYOLO26s/mBest accuracy-speed balance
CPU-onlyYOLO26n INT8INT8 accuracy loss < 1% mAP
Segmentation/pose neededYOLOv8-seg/posev8 covers all task types
Legacy maintenanceYOLOv8Most mature ecosystem, fewest issues

Decision tree: Edge → YOLO26n; GPU server → YOLO26s/m; CPU-only → YOLO26n INT8; Multi-task → YOLOv8

Q13: Which version for research exploration?

Different versions represent different technical directions:

  • YOLOv9: PGI (Programmable Gradient Information) + GELAN architecture, first choice for high-precision tasks
  • YOLOv10: No-NMS end-to-end detection, consistent dual assignment strategy, clean architecture for academic analysis
  • YOLOv12: CNN + attention hybrid (Area Attention), exploring CNN-Transformer fusion
  • YOLO26: Latest SOTA, ideal as comparison baseline
  • Run multiple versions simultaneously for research; Ultralytics’ unified API dramatically reduces experiment code complexity

Version Migration Guide

Q24: Migrating from YOLOv5 to Ultralytics (v8/v11/26)?

YOLOv5 uses a separate repository with a different API:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# YOLOv5 legacy code
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
results = model(img)
results.pandas().xyxy[0]  # Different result parsing

# Ultralytics new code (v8/v11/26)
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
results = model(img)
results[0].boxes.data  # Unified result interface

Key changes: 1) Result parsing changes from .pandas().xyxy[0] to .boxes.data; 2) Training parameters move from CLI args to Python dictionaries; 3) data.yaml data configuration format is compatible and reusable. Use ultralytics migrate command for automatic legacy weight conversion.

Q25: Migrating from Detectron2 to YOLO?

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Detectron2 — extensive configuration
cfg = get_cfg()
cfg.merge_from_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(img)

# YOLO — three lines
model = YOLO("yolo11n.pt")
results = model(img)
boxes = results[0].boxes.xyxy.cpu().numpy()

Key differences: 1) Detectron2’s XYXY coordinates map directly to YOLO’s boxes.xyxy; 2) YOLO has no background class (IDs start at 0), while Detectron2 starts at 1 (0 for background); 3) Evaluation metrics are the same (COCO mAP); 4) Dataset conversion from COCO JSON to YOLO TXT uses ultralytics.data.converter.coco2yolo.

Q26: Migrating from MMDetection to YOLO?

  • Dataset: MMDetection uses COCO JSON → convert to YOLO TXT with the coco2yolo command-line tool
  • Evaluation: MMDetection outputs COCO mAP; YOLO’s val command produces identical metrics
  • Inference output: MMDetection returns DetDataSample objects; YOLO returns concise Results objects
  • Migration strategy: First get training working on MMDetection with COCO format, then use coco2yolo to convert, and finally train on YOLO with equivalent config for accuracy comparison

Q27: How is API backward compatibility?

Ultralytics follows semantic versioning: v8.x → v8.y (same major version) is 100% API compatible; v8 → v11 → v26 core APIs are stable (YOLO(), model.train(), model.predict(), model.export() interfaces unchanged). New versions only add parameters rather than modifying existing signatures. For third-party library compatibility, refer to Q21’s ONNX Runtime / TensorRT compatibility table.

Error Code Reference Table

Error MessageCauseSolution
CUDA out of memory. Tried to allocate ...Insufficient VRAMReduce batch size, enable AMP, lower imgsz, use gradient accumulation
No module named 'ultralytics'ultralytics not installedpip install ultralytics, verify Python 3.8–3.12
ONNX export failed: Couldn't export operator ...Operator unsupported in current opsetTry opset=17/18, or upgrade PyTorch
CGO_LDFLAGS allowedGo security policy blocks custom linker flagsSet CGO_LDFLAGS_ALLOW=".*"
ImportError: libcudart.so... cannot open shared object fileCUDA runtime library missing or version mismatchInstall matching CUDA Toolkit, check LD_LIBRARY_PATH
InvalidArgument: ... ORT ...ONNX Runtime version incompatible with exportCheck Q21 compatibility table
'NoneType' object has no attribute 'shape'Model weights not loaded correctlyVerify .pt path or re-download
No such file or directory: 'data.yaml'Dataset config file path incorrectUse absolute path or check working directory
UserWarning: Class ... has only ... imagesSevere class sample shortageCollect more data or use augmentation
Training cannot resume after OOMMemory leakRestart process, call torch.cuda.empty_cache(), check DataLoader num_workers

📖 References and Official Documentation

  1. Ultralytics Official Documentation: https://docs.ultralytics.com/
  2. YOLO26 Release Blog: https://www.ultralytics.com/blog/yolo26
  3. YOLOv9 Paper: https://arxiv.org/abs/2402.13616
  4. YOLOv10 Paper: https://arxiv.org/abs/2405.14458
  5. YOLOv12 Paper: https://arxiv.org/abs/2502.12524
  6. GitHub Repository: https://github.com/ultralytics/ultralytics

🎯 Summary and Next Steps

Key Conclusions:

  1. YOLO26 is the latest version for 2026, optimized for edge computing, first choice for industrial deployment
  2. YOLOv8 recommended for beginners, complete ecosystem, 100% API compatibility with new versions
  3. Ultralytics unified framework is the biggest advantage, learn one version to master all
  4. Version differences mainly in architecture, user-level APIs remain consistent, migration cost is minimal

Next Learning Steps:

  1. Set up development environment according to the steps in this article
  2. First run image/video detection examples with YOLOv8
  3. Try training a simple custom dataset
  4. Learn ONNX/TensorRT model deployment
  5. Gradually explore YOLO26’s edge deployment advantages

Wish you success in your YOLO learning journey! 🚀