YOLO Quick Start: Model Loading and Inference

10 min read
Model Loading and Inference Across Versions Ultralytics Unified API (Works with v8/11/26) python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from ultralytics import YOLO # ========== YOLOv8 ========== model_v8 = YOLO("yolov8n.pt") # nano model_v8 = YOLO("yolov8s.pt") # small model_v8 = YOLO("yolov8m.pt") # medium model_v8 = YOLO("yolov8l.pt") # large model_v8 = YOLO("yolov8x.pt") # extra large # ========== YOLO11 ========== model_11 = YOLO("yolo11n.pt") # nano model_11 = YOLO("yolo11s.pt") # small model_11 = YOLO("yolo11m.pt") # medium model_11 = YOLO("yolo11l.pt") # large model_11 = YOLO("yolo11x.pt") # extra large # ========== YOLO26 (2026 latest) ========== model_26 = YOLO("yolo26n.pt") # nano recommended for edge deployment model_26 = YOLO("yolo26s.pt") # small model_26 = YOLO("yolo26m.pt") # medium model_26 = YOLO("yolo26l.pt") # large model_26 = YOLO("yolo26x.pt") # extra large Image Detection Hands-on Example python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from ultralytics import YOLO # Load model (YOLO26 example) model = YOLO("yolo26n.pt") # Single image detection results = model("test.jpg", conf=0.25, iou=0.45) # Process results for result in results: boxes = result.boxes # Detection boxes masks = result.masks # Segmentation masks probs = result.probs # Classification probabilities # Print detection results for box in boxes: print(f"Class: {result.names[int(box.cls)]}, " f"Confidence: {box.conf.item():.3f}, " f"Coordinates: {box.xyxy.tolist()[0]}") # Save visualization results result.save("result.jpg") Video Detection Hands-on Example python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from ultralytics import YOLO model = YOLO("yolo26n.pt") # Video file detection results = model.predict( source="input.mp4", save=True, # Save result video conf=0.3, show=False, # Whether to display in real-time stream=True # Stream processing to save memory ) # Process frame by frame for result in results: # Custom post-processing logic pass Real-time Camera Detection python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 from ultralytics import YOLO import cv2 model = YOLO("yolo26n.pt") # Open camera cap = cv2.VideoCapture(0) # 0 is default camera while cap.isOpened(): ret, frame = cap.read() if not ret: break # Inference results = model(frame, verbose=False) # Draw results annotated_frame = results[0].plot() # Display cv2.imshow("YOLO Real-time", annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() Version-specific Code Differences Feature YOLOv8 YOLO11 YOLO26 YOLOv9 YOLOv10 Unified API ✅ ✅ ✅ ❌ Separate repo ❌ Separate repo No NMS ❌ ❌ ✅ ❌ ✅ DFL Module ✅ ✅ ❌ Removed ✅ ✅ MuSGD Optimizer ❌ ❌ ✅ ❌ ❌ Export Compatibility Good Good Best Fair Fair Results Object API Deep Dive The model() or model.predict() call returns a list of Results objects. Each Results object encapsulates all inference outputs for a single image. Understanding its internal structure is essential for downstream processing.
YOLO Computer Vision Python Inference
Continue reading →

ESP32-CAM Monitor: DIY Auto Flash for Dark Scenes

9 min read
Why I built a surveillance camera with ESP32-S3 before, and it worked well. Later, while rummaging through a drawer, I found an AI-Thinker ESP32-CAM development board — that classic board costing about ten bucks with a built-in OV2640 camera and TF card slot. No reason to let it go to waste, so I built another one: ai-thinker-esp32-cam. This time I wrote the firmware from scratch using ESP-IDF again, with similar capabilities to the previous project but with lots of adaptations for the AI-Thinker board. Here’s what it ended up doing:
ESP32-CAM ESP-IDF Camera Embedded Surveillance
Continue reading →

Evolution: Oh My OpenAgent Configuration Iteration Log

8 min read
The previous article covered the initial configuration setup. This one documents the adjustments after two weeks of running: expanding from single vendor to a four-tier model pool, adding fallback chains, hitting the GLM-4.5-air trap of analyzing without writing code. This post covers: fallback strategy design, complete free model pool inventory and analysis, concurrency control configuration, and the decision process for GLM-4.5-air replacement. After the previous article’s initial configuration, I ran it for two weeks — all the issues that needed fixing surfaced.
AI Programming Multi-Model Orchestration
Continue reading →

MiBeeNvr: A Lightweight Home NVR System I Built

10 min read
I have several cameras at home — a few Xiaomi cameras, some DIY ESP32 cameras, and multiple Raspberry Pi CSI cameras. I’d been using cloud storage solutions, but I was never comfortable with them: vendor lock-in, network dependency, and the costs add up. So I decided to build my own NVR system, called MiBeeNvr. Why Build MiBeeNvr To be honest, I was never satisfied with existing cloud storage solutions. Take Xiaomi cameras, for example. By default, you can only view them through the Mi Home app. Recordings are either stored on an SD card (limited capacity, frequent plugging/unplugging) or in the cloud. Cloud storage costs tens of dollars per month, and there’s the privacy concern — you never know when the manufacturer might use your video data for AI training or sell it to third parties. Not to mention vendor lock-in — switching platforms is nearly impossible.
NVR Go RTSP ESP32 Camera Smart Home
Continue reading →

YOLO Getting Started: History, Version Comparison and Environment Setup

7 min read
Learning Path and Version Selection Guide Version Selection Guide Version Release Date Development Team Use Cases Recommendation Index YOLO26 2026.01 Ultralytics Official Edge deployment, CPU inference, industrial applications ⭐⭐⭐⭐⭐ YOLOv8 2023.01 Ultralytics Official Beginner learning, complete ecosystem, general scenarios ⭐⭐⭐⭐⭐ YOLO11 2024.09 Ultralytics Official Efficiency optimization, lightweight deployment ⭐⭐⭐⭐ YOLOv10 2024.05 Tsinghua University Research exploration, NMS-free end-to-end ⭐⭐⭐⭐ YOLOv9 2024.01 National Taiwan University High precision, small object detection ⭐⭐⭐⭐ YOLOv12 2025.02 Buffalo University + Chinese Academy of Sciences Attention mechanism research ⭐⭐⭐ Learning Path Recommendations Beginner Stage (1-2 weeks): Start with YOLOv8, master basic concepts and API usage Intermediate Stage (2-3 weeks): Learn custom dataset training, parameter tuning and optimization Advanced Stage (2-3 weeks): Learn model deployment, engineering implementation Research Stage (ongoing): Explore new features in YOLO11, YOLO26, YOLOv9/v10/v12 Complete YOLO Development History Timeline Version Release Date Core Innovation Milestone Significance YOLOv1 2015.06 Pioneer single-stage detection Foundation for real-time detection YOLOv2 2016.12 Batch Normalization, Anchor Dual improvement in accuracy and speed YOLOv3 2018.04 Multi-scale detection, residual networks Industry standard YOLOv4 2020.04 CSPDarknet, Mosaic Peak of engineering implementation YOLOv5 2020.06 PyTorch framework, user-friendly Highest adoption rate YOLOv7 2022.07 E-ELAN, reparameterization Balance between speed and accuracy YOLOv8 2023.01 C2f, Anchor-Free, unified framework Ultralytics unified ecosystem YOLOv9 2024.01 GELAN, PGI programmable gradient Training efficiency revolution YOLOv10 2024.05 NMS-free, efficiency-precision tradeoff End-to-end detection YOLO11 2024.09 Architecture optimization, parameter reduction Efficiency optimized version YOLOv12 2025.02 Area Attention mechanism Attention architecture YOLO26 2026.01 DFL-free, NMS-free, 43% CPU optimization Edge computing new standard Core Principles and Version Comparison Ultralytics Official Main Line Versions YOLOv8 Core Features:
YOLO Computer Vision Deep Learning Object Detection
Continue reading →

Building a Surveillance Camera with ESP32-S3 — WiFi, TF Card, Video Output Pitfalls

14 min read
Why We have a few parrots at home, and during the workday nobody’s around. I wanted to check in on them anytime. The requirement sounds simple: real-time video streaming, recording to storage, and ideally automatic backup to NAS. Off-the-shelf cameras are either expensive or require installing apps, registering accounts, and binding phone numbers — privacy concerns. I just want to watch my birds, not stream video to someone else’s server.
ESP32-S3 ESP-IDF Camera Embedded
Continue reading →

Replacing VMs with ESP32 for Network Probing — esp32-blackbox Project in Action

7 min read
Why I have several LANs in different locations around the city, roughly 10 km apart. To make these networks talk to each other, I used tools like NetBird, ZeroTier, and Cloudflare Tunnel to set up a cross-region virtual LAN. The network was set up, but how to ensure stability? After all, these tunnels traverse the public internet with varying link quality. The most direct approach is to use Prometheus’s blackbox_exporter for probing — periodic HTTP requests, Pings, DNS queries — feeding results into a time-series database with alert rules, so problems are detected immediately.
ESP32 Network Probing Prometheus NetBird
Continue reading →

Switched My Blog Theme: From Hugo NexT to Self-Written Zhi

8 min read
Why Switch This blog previously used Hugo NexT, forked for custom modifications. NexT itself is a feature-rich theme, but when it comes to “customizing things yourself,” the experience wasn’t great. The issues boiled down to a few things: SCSS nesting hell. 101 SCSS files, three levels of directory nesting. _common/components/post, _common/components/third-party, _common/outline/sidebar… To change a style, you first had to figure out which file it was in, where variables were defined, and which scheme was overriding it. Not that it couldn’t be done, but each change meant half an hour of hunting.
Hugo Theme Development Zhi AI Programming
Continue reading →

Zhipu Coding Plan × Oh My OpenCode: Multi-Model Orchestration Setup Guide

6 min read
Why Bother When it comes to writing code with AI, the gap between single-model and multi-model approaches keeps widening. No matter how strong a single model is, it can’t compete with a team of specialized models working in parallel. Oh My OpenCode (OmO for short) is a multi-model orchestration plugin in the OpenCode ecosystem, with 11 Agents each having distinct responsibilities and 48 Hooks spanning the entire lifecycle. Zhipu’s Coding Plan provides access to the full GLM model series. Combining the two allows you to assign different models by role — strong coders for coding, strong reasoners for reasoning, free models for busywork.
AI Programming Multi-Model Orchestration
Continue reading →

Domestic LLM Resource and Cost Comparison: GLM-5 / Kimi K2.5 / MiniMax M2.7

3 min read
Overview This article compares the resource requirements and usage costs of three major domestic LLMs, helping developers choose the right solution for their scenarios. Model Vendor Architecture Minimum Deployable VRAM API Available GLM-5 Zhipu AI Dense (multiple versions) 24GB (8B) ✅ Kimi K2.5 Moonshot AI MoE (undisclosed) 24GB (lightweight) ✅ MiniMax M2.7 MiniMax MoE 230B Not yet open-sourced ✅ GLM-5 (Zhipu AI) Versions & Hardware Requirements GLM-5 offers 4 parameter versions, making it the widest-coverage domestic LLM currently available.
GLM Kimi MiniMax LLMs
Continue reading →