YOLO FAQ: Common Problems and Solutions
Environment Installation Issues
Q1: CUDA not available, only using CPU?
First confirm your NVIDIA driver version supports the required CUDA version. A driver that is too old will make CUDA unavailable:
| |
If nvidia-smi shows a CUDA version but PyTorch still uses CPU, you have installed the CPU-only PyTorch build. Uninstall and reinstall with the --index-url flag for the correct CUDA version. For CUDA 11.8, replace cu121 with cu118 in the URL. Always use a conda or venv virtual environment to isolate PyTorch versions and avoid system-level conflicts.
Q2: ultralytics installation failed?
Common causes: outdated pip, dependency conflicts (numpy/opencv version incompatibility), or network timeouts:
| |
ultralytics supports Python 3.8–3.12, with Python 3.10 recommended. Use conda for isolation: conda create -n yolo python=3.10 -y && conda activate yolo. On ARM macOS, if wheel conflicts arise, try pip install ultralytics --no-deps and install dependencies individually.
Training Related Issues
Q3: Out of Memory (OOM) error during training?
OOM is the most common YOLO training issue. Combine these strategies:
| |
If still OOM, specify device=0 for a specific GPU or limit memory fraction via torch.cuda.set_per_process_memory_fraction(0.8). Monitor peak memory with nvidia-smi -l 1 to see if the bottleneck is data loading or backpropagation. For GPUs with less than 8GB VRAM, use lightweight models like YOLOv8n or YOLO11n with imgsz=416.
Q4: Loss doesn’t converge, mAP is very low?
Systematically diagnose from data and training dimensions:
- Dataset validation: Use
ultralytics inspector data.yamlto verify annotations. Ensure each image has at least one bounding box and no empty annotation files exist. Visualize augmented batch samples with TensorBoard or W&B to confirm augmentation strength is appropriate - Learning rate tuning: Default
lr0=0.01may be too high for some datasets. Trylr0=0.001or enablecos_lr=Truecosine annealing. Loss should steadily decrease after the warmup phase (default 3 epochs) - Training epochs: Small datasets (<1000 images) need 300–500 epochs, large datasets 100–300 epochs. Use
patience=50for early stopping - Class balance: When sample count varies by >10x across classes, enable
cls_pw=1.5or switch to Focal Loss. Monitor recall for rare classes - Augmentation intensity: Over-augmentation can prevent convergence. Reduce
hsv_h=0.015,degrees=0.0etc., especially for small datasets
Q5: What’s different between YOLO26 and v8 training?
YOLO26 introduces significant simplifications to the training workflow:
- Optimizer: Automatically uses MuSGD (fused SGD + momentum), no need to set
optimizerparameter. Converges ~15% faster than AdamW - Loss function: No DFL (Distribution Focal Loss) — only classification and regression branches, making training more stable with fewer hyperparameters to tune
- Small object detection: STAL (Scale-Transfer Attention Layer) automatically activates in the neck, improving small object AP by 3–5%
- Default hyperparameters:
lr0=0.005(lower than v8),momentum=0.937,weight_decay=0.0005, converging in 300 epochs on COCO - Training speed: ~20% faster than v8 at the same batch size with lower memory usage
Migration requires only upgrading the ultralytics package — no code changes needed.
Training Failure Diagnosis
Q14: NaN loss during training?
Common causes and fixes for NaN (Not a Number) loss:
- Learning rate too high: Loss jumps to NaN at a specific step. Reduce
lr0to 0.0005 or lower and restart training - Invalid annotations: Check for negative coordinates, boxes extending beyond image boundaries, or zero-width/height boxes
- Gradient explosion: Enable gradient clipping with
model.train(gradient_clip_val=1.0)or reduce batch size to lower gradient variance - Numerical overflow: Ensure AMP is working correctly; check if
logitsorasc(anchor selection cost) values are overflowing - Corrupted data: Confirm no empty, broken, or single-pixel images. Validate with
PIL.Image.open()on each file
After fixing, resume from the latest checkpoint with model.train(resume=True).
Q15: Dataset issues causing training to stall?
- Insufficient samples: Each class needs at least 100 annotated images. Below 50, the model struggles to generalize. Enable
mosaic=1.0andmixup=0.5for augmentation - Extreme class imbalance: When one class accounts for 90%+, the model becomes biased. Pass
class_weightsor switch to Focal Loss - Hard example contamination: Blurry or heavily occluded annotations slow convergence. Manually filter or add more similar hard examples
- Validation distribution shift: Ensure train/val sets come from the same distribution. Use stratified sampling (
train_test_split(stratify=y)) for class-proportional splits
Inference and Deployment Issues
Q6: Inference results incorrect after ONNX export?
Key configuration considerations for ONNX export:
| |
Common causes: 1) opset too low (<15) missing operator support; 2) simplify=False prevents runtime optimization; 3) input size mismatch with training causes interpolation errors. Validate with ONNX Runtime: python -c "import onnxruntime as ort; sess=ort.InferenceSession('model.onnx'); print(sess.get_inputs()[0].shape)". Note ONNX output format is (batch, num_dets, 6) with [x1,y1,x2,y2,conf,cls], different from PyTorch’s raw output.
Q7: TensorRT export failed?
TensorRT export failures are typically version mismatch:
| TensorRT Version | Minimum CUDA | Recommended PyTorch |
|---|---|---|
| 8.6 | CUDA 11.6+ | 2.0 / 2.1 |
| 10.0 | CUDA 12.0+ | 2.3+ |
| 10.7 | CUDA 12.4+ | 2.5+ |
YOLO26 has the simplest export due to no NMS: model.export(format="engine", half=True). INT8 quantization requires a calibration dataset: model.export(format="engine", int8=True, data="data.yaml"). TensorRT engines are hardware-locked — different GPU architectures (A100 vs RTX 4090) need separate exports.
Q8: Why is YOLO26 faster for inference?
Optimization at three levels:
- Architecture: Removed DFL module reduces head computation by ~50%; native no-NMS eliminates post-processing time (typically 5–15% of inference latency)
- Operators: Extensive use of RepVGG-style reparameterized convolutions, equivalent to plain 3×3 convolutions at inference, more efficient than v8’s multi-branch structure
- Instruction sets: Core operators deeply optimized with AVX2/AVX512 and NEON, achieving 43% CPU speedup. On RTX 4090: YOLO26n (FP16) reaches 2.5ms, YOLO26s ~4.0ms
Recommendations: edge devices → YOLO26n; high-throughput API → YOLO26s (INT8); maximum accuracy → YOLO26m/l.
ONNX Export Troubleshooting
Q16: ONNX export failed: Unsupported op?
Incorrect opset version selection is the most common cause:
- opset=16: Supports most CNN operators, works with DFL-based YOLO versions
- opset=17: Recommended — covers all YOLO26 operators, compatible with ONNX Runtime 1.14+
- opset=18+: Latest operator set, but some inference frameworks (older OpenVINO, Triton) may not support it
- Safe default:
model.export(format="onnx", opset=17) - If still failing, add
dynamic_axes={'input': {0: 'batch'}, 'output': {0: 'batch'}}parameter
Q17: Dynamic axes configuration issues?
| |
dynamic=True produces -1 dimension inputs that some frameworks (older OpenVINO) cannot handle — use fixed-size export instead. Pass imgsz as a list [640, 480] to allow dynamic resizing in both dimensions.
Q18: Exported ONNX model validation failed?
Standard validation workflow:
| |
If shape inference fails (output dimensions are 0 or -1), retry with model.export(format="onnx", simplify=False).
Deployment Environment Specific Issues
Q19: Rust ort crate CUDA compilation error?
| |
Ensure CUDA_PATH points to the correct CUDA installation and cudart shared libraries are in PATH (Windows) or LD_LIBRARY_PATH (Linux). The 'ort-sys' build failed error typically means only the NVIDIA driver is installed without the CUDA Toolkit. Explicitly configure the CUDA execution provider in Rust:
| |
Q20: Go CGO cross-compilation for ARM?
| |
The CGO_LDFLAGS allowed error occurs because Go’s security policy blocks custom linker flags. Set CGO_LDFLAGS_ALLOW=".*" to bypass. For ARM platforms, use the onnxruntime-arm64 package (v1.17+) to avoid compiling from source.
Q21: ONNX Runtime version and CUDA compatibility?
| ONNX Runtime | Minimum CUDA | cuDNN | Notes |
|---|---|---|---|
| 1.16.x | 11.8 | 8.6 | Stable, general purpose |
| 1.17.x | 11.8 / 12.x | 8.7 | CUDA 12 support |
| 1.18.x | 12.2 | 8.9 | CUDA 11 dropped |
| 1.19.x | 12.4 | 9.0+ | Latest features |
Mismatched versions cause session creation failed: Error in Bind. Verify with python -m onnxruntime.capi._pybind_state.
Q22: GPU not available in Docker container?
| |
The host needs NVIDIA drivers + nvidia-container-toolkit. Run with --gpus all flag. If nvidia-smi is invisible inside the container, check /etc/docker/daemon.json for nvidia runtime configuration. Docker Compose setup:
| |
Q23: Windows vs Linux path issues?
| |
Always use POSIX forward slashes in YOLO configuration files for cross-platform compatibility. The same applies to model weight file paths. On Windows, paths longer than 260 characters require enabling long path support in the registry or group policy.
Dataset Related Issues
Q9: Annotation file format errors?
YOLO annotation format is strict. Common mistakes:
- Coordinate normalization: Each line
<class_id> <x_center> <y_center> <width> <height>, all values must be in 0~1 range (divided by image width/height). These are normalized coordinates, not pixel coordinates - File structure: One
.txtfile per image (keep empty file for no-object images to avoid warnings). One object per line - Format validation: Use
ultralytics inspector data.yamlto verify integrity - Coordinate type: YOLO uses xywh (center + dimensions), different from COCO’s xyxy (top-left + bottom-right). Confirm conversion when migrating formats
- Empty annotations: Keep empty
.txtfor truly empty images, setforbid_empty=Falseindata.yaml
Use LabelImg or Label Studio for YOLO format export to avoid manual editing errors.
Q10: Class IDs not consecutive?
YOLO requires class IDs to start from 0 and increment consecutively without gaps:
- Remapping script: Original IDs
[0, 1, 4, 7]→ map to[0, 1, 2, 3]using a dictionary{0:0, 1:1, 4:2, 7:3}for batch replacement - Background class: YOLO has no explicit background class (unlike Faster R-CNN’s class 0). All annotation IDs are foreground. If your dataset has background ID 0, offset all IDs:
new_id = old_id - 1 - Verification:
python -c "import yaml; d=yaml.safe_load(open('data.yaml')); print(len(d['names']))"should equal max ID + 1
Version Selection Issues
Q11: Which version should beginners start with?
- Recommend YOLOv8: Most tutorials (docs, videos, community), most complete ecosystem (detection, segmentation, pose, classification), best for systematic learning
- After learning v8, switching to YOLO11/26 is zero-cost — API is 100% compatible, only architecture differs internally
- Not recommended to start with YOLOv5: v5’s API differs from the Ultralytics framework, requiring a migration later
- Learning path: YOLOv8 official notebook → custom dataset training → ONNX/TensorRT deployment → switch to YOLO26 for performance
Q12: Which version for industrial deployment?
| Scenario | Recommended | Reason |
|---|---|---|
| Edge devices (Jetson/phone) | YOLO26n | 43% faster CPU, no NMS, simplest deployment |
| Server GPU | YOLO26s/m | Best accuracy-speed balance |
| CPU-only | YOLO26n INT8 | INT8 accuracy loss < 1% mAP |
| Segmentation/pose needed | YOLOv8-seg/pose | v8 covers all task types |
| Legacy maintenance | YOLOv8 | Most mature ecosystem, fewest issues |
Decision tree: Edge → YOLO26n; GPU server → YOLO26s/m; CPU-only → YOLO26n INT8; Multi-task → YOLOv8
Q13: Which version for research exploration?
Different versions represent different technical directions:
- YOLOv9: PGI (Programmable Gradient Information) + GELAN architecture, first choice for high-precision tasks
- YOLOv10: No-NMS end-to-end detection, consistent dual assignment strategy, clean architecture for academic analysis
- YOLOv12: CNN + attention hybrid (Area Attention), exploring CNN-Transformer fusion
- YOLO26: Latest SOTA, ideal as comparison baseline
- Run multiple versions simultaneously for research; Ultralytics’ unified API dramatically reduces experiment code complexity
Version Migration Guide
Q24: Migrating from YOLOv5 to Ultralytics (v8/v11/26)?
YOLOv5 uses a separate repository with a different API:
| |
Key changes: 1) Result parsing changes from .pandas().xyxy[0] to .boxes.data; 2) Training parameters move from CLI args to Python dictionaries; 3) data.yaml data configuration format is compatible and reusable. Use ultralytics migrate command for automatic legacy weight conversion.
Q25: Migrating from Detectron2 to YOLO?
| |
Key differences: 1) Detectron2’s XYXY coordinates map directly to YOLO’s boxes.xyxy; 2) YOLO has no background class (IDs start at 0), while Detectron2 starts at 1 (0 for background); 3) Evaluation metrics are the same (COCO mAP); 4) Dataset conversion from COCO JSON to YOLO TXT uses ultralytics.data.converter.coco2yolo.
Q26: Migrating from MMDetection to YOLO?
- Dataset: MMDetection uses COCO JSON → convert to YOLO TXT with the
coco2yolocommand-line tool - Evaluation: MMDetection outputs COCO mAP; YOLO’s
valcommand produces identical metrics - Inference output: MMDetection returns
DetDataSampleobjects; YOLO returns conciseResultsobjects - Migration strategy: First get training working on MMDetection with COCO format, then use
coco2yoloto convert, and finally train on YOLO with equivalent config for accuracy comparison
Q27: How is API backward compatibility?
Ultralytics follows semantic versioning: v8.x → v8.y (same major version) is 100% API compatible; v8 → v11 → v26 core APIs are stable (YOLO(), model.train(), model.predict(), model.export() interfaces unchanged). New versions only add parameters rather than modifying existing signatures. For third-party library compatibility, refer to Q21’s ONNX Runtime / TensorRT compatibility table.
Error Code Reference Table
| Error Message | Cause | Solution |
|---|---|---|
CUDA out of memory. Tried to allocate ... | Insufficient VRAM | Reduce batch size, enable AMP, lower imgsz, use gradient accumulation |
No module named 'ultralytics' | ultralytics not installed | pip install ultralytics, verify Python 3.8–3.12 |
ONNX export failed: Couldn't export operator ... | Operator unsupported in current opset | Try opset=17/18, or upgrade PyTorch |
CGO_LDFLAGS allowed | Go security policy blocks custom linker flags | Set CGO_LDFLAGS_ALLOW=".*" |
ImportError: libcudart.so... cannot open shared object file | CUDA runtime library missing or version mismatch | Install matching CUDA Toolkit, check LD_LIBRARY_PATH |
InvalidArgument: ... ORT ... | ONNX Runtime version incompatible with export | Check Q21 compatibility table |
'NoneType' object has no attribute 'shape' | Model weights not loaded correctly | Verify .pt path or re-download |
No such file or directory: 'data.yaml' | Dataset config file path incorrect | Use absolute path or check working directory |
UserWarning: Class ... has only ... images | Severe class sample shortage | Collect more data or use augmentation |
| Training cannot resume after OOM | Memory leak | Restart process, call torch.cuda.empty_cache(), check DataLoader num_workers |
📖 References and Official Documentation
- Ultralytics Official Documentation: https://docs.ultralytics.com/
- YOLO26 Release Blog: https://www.ultralytics.com/blog/yolo26
- YOLOv9 Paper: https://arxiv.org/abs/2402.13616
- YOLOv10 Paper: https://arxiv.org/abs/2405.14458
- YOLOv12 Paper: https://arxiv.org/abs/2502.12524
- GitHub Repository: https://github.com/ultralytics/ultralytics
🎯 Summary and Next Steps
Key Conclusions:
- YOLO26 is the latest version for 2026, optimized for edge computing, first choice for industrial deployment
- YOLOv8 recommended for beginners, complete ecosystem, 100% API compatibility with new versions
- Ultralytics unified framework is the biggest advantage, learn one version to master all
- Version differences mainly in architecture, user-level APIs remain consistent, migration cost is minimal
Next Learning Steps:
- Set up development environment according to the steps in this article
- First run image/video detection examples with YOLOv8
- Try training a simple custom dataset
- Learn ONNX/TensorRT model deployment
- Gradually explore YOLO26’s edge deployment advantages
Wish you success in your YOLO learning journey! 🚀