AI相关

YOLO Golang 部署实战

May 26, 2026

第 8 章：Golang 使用 YOLO 完整教程 Go 语言凭借其高性能、低内存占用、原生并发特性，成为工业级 YOLO 部署的首选语言之一。本章将详细介绍 Go 生态中 YOLO 的完整实现方案。

环境安装问题 Q1: CUDA 不可用，只使用 CPU？首先确认 NVIDIA 驱动版本是否支持所需 CUDA 版本，驱动版本过低会导致 CUDA 不可用： bash 1 2 3 4 5 6 # 检查驱动版本（Driver Version 需 >= 支持CUDA的最低版本） nvidia-smi # 检查CUDA工具包版本 nvcc --version # 重新安装对应CUDA版本的PyTorch pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 如果 nvidia-smi 显示 CUDA 版本但 PyTorch 仍使用 CPU，说明安装的是 CPU 版 PyTorch。卸载后用 --index-url 重新安装 CUDA 版本。CUDA 11.8 用户将 URL 中的 cu121 替换为 cu118。建议使用 conda 或 venv 虚拟环境隔离不同 CUDA 版本的 PyTorch，避免系统级冲突。

Continue reading →

YOLO 部署落地：模型导出与多平台部署

May 20, 2026

模型导出（17 种格式支持） Ultralytics 统一导出 API python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 from ultralytics import YOLO model = YOLO("yolo26n.pt") # ========== 各种格式导出 ========== # 1. ONNX（跨平台通用） model.export(format="onnx", simplify=True, dynamic=True) # 2. TensorRT（NVIDIA GPU最佳） model.export(format="engine", half=True, workspace=4) # 3. OpenVINO（Intel CPU最佳） model.export(format="openvino", half=True) # 4. CoreML（Apple设备） model.export(format="coreml", int8=True) # 5. TFLite（Android/iOS移动端） model.export(format="tflite", int8=True) # 6. NCNN（移动端） model.export(format="ncnn") # 7. PaddlePaddle model.export(format="paddle") 各版本导出兼容性格式 YOLOv8 YOLO11 YOLO26 ONNX ✅ ✅ ✅ 最佳 TensorRT ✅ ✅ ✅ 无 NMS 更简单 OpenVINO ✅ ✅ ✅ TFLite ✅ ✅ ✅ NCNN ✅ ✅ ✅ Python 部署实战 ONNX Runtime 部署 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import onnxruntime as ort import cv2 import numpy as np # 加载ONNX模型 session = ort.InferenceSession( "yolo26n.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"] ) def preprocess(image, imgsz=640): """图片预处理""" img = cv2.resize(image, (imgsz, imgsz)) img = img.transpose(2, 0, 1) / 255.0 return img[np.newaxis].astype(np.float32) # 推理 image = cv2.imread("test.jpg") input_data = preprocess(image) outputs = session.run(None, {"images": input_data}) # YOLO26特别注意：无需NMS后处理！ # 输出已是最终检测结果 TensorRT Python 部署 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 import tensorrt as trt import pycuda.driver as cuda import pycuda.autoinit import numpy as np import time # ========== 1. 引擎加载与上下文创建 ========== TRT_LOGGER = trt.Logger(trt.Logger.WARNING) runtime = trt.Runtime(TRT_LOGGER) with open("yolo26n.engine", "rb") as f: engine = runtime.deserialize_cuda_engine(f.read()) context = engine.create_execution_context() # ========== 2. CUDA 内存分配 ========== stream = cuda.Stream() bindings = [] for i in range(engine.num_io_tensors): name = engine.get_tensor_name(i) shape = engine.get_tensor_shape(name) dtype = trt.nptype(engine.get_tensor_dtype(name)) size = trt.volume(shape) host_mem = cuda.pagelocked_empty(size, dtype) # 主机锁页内存 device_mem = cuda.mem_alloc(host_mem.nbytes) # 设备显存 bindings.append({"name": name, "host": host_mem, "device": device_mem, "shape": shape, "size": size, "dtype": dtype}) # ========== 3. 异步推理循环 ========== def async_infer(input_blob): # 输入拷贝：主机 → 设备 np.copyto(bindings[0]["host"], input_blob.ravel()) cuda.memcpy_htod_async(bindings[0]["device"], bindings[0]["host"], stream) # 设置张量地址并执行 context.set_tensor_address(bindings[0]["name"], int(bindings[0]["device"])) context.set_tensor_address(bindings[1]["name"], int(bindings[1]["device"])) context.execute_async_v3(stream.handle) # 输出拷贝：设备 → 主机 cuda.memcpy_dtoh_async(bindings[1]["host"], bindings[1]["device"], stream) stream.synchronize() return bindings[1]["host"].copy() # ========== 4. 性能基准测试 ========== def benchmark(warmup=10, runs=100): dummy = np.random.randn(1, 3, 640, 640).astype(np.float32) for _ in range(warmup): async_infer(dummy) latencies = [] for _ in range(runs): t0 = time.perf_counter() async_infer(dummy) latencies.append((time.perf_counter() - t0) * 1000) latencies.sort() print(f"TensorRT FP16 | 平均: {np.mean(latencies):.1f}ms | " f"P50: {latencies[runs//2]:.1f}ms | " f"P99: {latencies[int(runs*0.99)]:.1f}ms | " f"吞吐: {1000/np.mean(latencies):.0f} FPS") benchmark() OpenVINO 部署与性能基准 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 import openvino as ov import cv2 import numpy as np import time # ========== 1. ONNX → OpenVINO 转换 ========== # Ultralytics 统一导出: # model.export(format="openvino", half=True) core = ov.Core() model = core.read_model("yolo26n_openvino/yolo26n.xml") # ========== 2. CPU 推理 ========== compiled_cpu = core.compile_model(model, device_name="CPU") infer_request = compiled_cpu.create_infer_request() def openvino_infer(image): img = cv2.resize(image, (640, 640)) blob = img.transpose(2, 0, 1)[np.newaxis].astype(np.float32) / 255.0 outputs = infer_request.infer({"images": blob}) return outputs[next(iter(outputs))] # ========== 3. 异步推理管线（提升吞吐）========== def async_pipeline(images, num_requests=4): """多请求异步推理管线""" requests = [core.compile_model(model, "CPU").create_infer_request() for _ in range(num_requests)] results = [None] * len(images) def completion_callback(request, userdata): idx = userdata results[idx] = request.get_output_tensor().data.copy() for req in requests: req.set_callback(completion_callback) for i, img in enumerate(images): req = requests[i % num_requests] req.start_async({"images": preprocess(img)}, userdata=i) for req in requests: req.wait() return results # ========== 4. CPU vs NPU 基准对比 ========== def benchmark_openvino(): dummy = np.random.randn(1, 3, 640, 640).astype(np.float32) for device in ["CPU", "AUTO"]: compiled = core.compile_model(model, device) req = compiled.create_infer_request() # 预热（避免首次 kernel 编译开销） for _ in range(20): req.infer({"images": dummy}) times = [] for _ in range(200): t0 = time.perf_counter() req.infer({"images": dummy}) times.append((time.perf_counter() - t0) * 1000) times.sort() print(f"OpenVINO {device}: " f"平均 {np.mean(times):.1f}ms | " f"P99 {times[int(199*0.99)]:.1f}ms | " f"{1000/np.mean(times):.0f} FPS") benchmark_openvino() NCNN 移动端部署 NCNN 是腾讯开源的移动端推理框架，支持 ARM NEON 和 Vulkan GPU 加速。

Continue reading →

YOLO 进阶优化：轻量化、量化与精度提升

May 17, 2026

模型轻量化策略模型尺寸选择模型参数 (M) mAP CPU 推理适用场景 YOLO26n 2.8 38.9 最快边缘设备、嵌入式 YOLO26s 9.4 48.2 很快移动端、Web YOLO26m 21.8 53.1 中等服务器、高性能 YOLO11n 2.6 39.6 快轻量部署 YOLOv8n 3.2 37.3 基准通用知识蒸馏 python 1 2 3 4 5 6 7 8 9 10 # 大模型作为教师，小模型作为学生 teacher = YOLO("yolo26x.pt") student = YOLO("yolo26n.yaml") # 蒸馏训练（Ultralytics内置支持） student.train( data="data.yaml", distill="yolo26x.pt", # 教师模型 distill_ratio=0.5, # 蒸馏损失比例 ) 模型剪枝结构化剪枝 vs 非结构化剪枝类型方法稀疏模式硬件加速压缩率非结构化权重剪枝随机稀疏困难（需专用硬件）高结构化通道剪枝规整稀疏原生加速中等 Torch Prune 通道剪枝示例 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import torch import torch.nn.utils.prune as prune # 对卷积层进行 L1 非结构化剪枝 model = YOLO("yolo26n.pt") for name, module in model.model.named_modules(): if isinstance(module, torch.nn.Conv2d): prune.l1_unstructured(module, name="weight", amount=0.3) prune.remove(module, "weight") # 使剪枝永久化 # 通道剪枝使用 torch-pruning 库 # pip install torch-pruning import torch_pruning as tp model = YOLO("yolo26n.pt").model DG = tp.DependencyGraph() DG.build_dependency(model, example_inputs=torch.randn(1, 3, 640, 640)) # 按 L1 范数剪枝 20% 通道 pruning_plan = DG.get_pruning_plan( model.model[4], tp.prune_conv, pruning_dim=0, # 输出通道维度 idxs=list(range(0, 64, 5)) # 每 5 个通道保留一个 ) pruning_plan.exec() 剪枝比例指南模型安全剪枝比例激进剪枝比例 mAP 损失 YOLO26n ≤20% 20-40% <1% / 2-5% YOLO26s ≤30% 30-50% <1% / 3-6% YOLO26m ≤40% 40-60% <1% / 3-8% YOLOv8n ≤20% 20-35% <1% / 2-4% 模型剪枝与量化导出时量化 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 model = YOLO("yolo26n.pt") # INT8量化（需要校准数据） model.export( format="engine", # TensorRT int8=True, data="data.yaml", # 校准数据集 batch=8, ) # ONNX动态量化 model.export( format="onnx", dynamic=True, simplify=True, ) TensorRT INT8 校准流程详解校准数据集准备 INT8 量化需要代表性校准数据来确定激活值的动态范围：

Continue reading →

YOLO 模型训练：自定义数据集完整流程

May 14, 2026

自定义数据集完整训练流程 Ultralytics 统一训练代码 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 from ultralytics import YOLO # 加载模型 # model = YOLO("yolov8n.yaml") # 从头训练 # model = YOLO("yolo11n.pt") # 基于预训练权重 model = YOLO("yolo26n.pt") # 2026推荐，边缘部署首选 # 开始训练 results = model.train( # 基础配置 data="data.yaml", # 数据集配置 epochs=100, # 训练轮数 imgsz=640, # 输入尺寸 batch=16, # 批次大小 workers=8, # 数据加载线程数 # 优化器配置 optimizer="auto", # YOLO26自动使用MuSGD lr0=0.01, # 初始学习率 lrf=0.01, # 最终学习率因子 momentum=0.937, # SGD动量 weight_decay=0.0005, # 权重衰减 # 数据增强 mosaic=1.0, mixup=0.1, copy_paste=0.1, # 其他配置 device=0, # GPU设备，"cpu"为CPU project="runs/train", # 保存路径 name="yolo26_exp1", # 实验名称 exist_ok=False, # 是否覆盖 pretrained=True, # 使用预训练 verbose=True, # 详细日志 seed=42, # 随机种子 ) # 验证模型 metrics = model.val() print(f"mAP50: {metrics.box.map50:.3f}") print(f"mAP50-95: {metrics.box.map:.3f}") 各版本训练参数差异参数 YOLOv8 YOLO11 YOLO26 默认优化器 SGD SGD MuSGD DFL 损失 ✅ ✅ ❌ 已移除 NMS 后处理 ✅ ✅ ❌ 原生无 NMS 小目标优化一般较好最佳 (STAL) CPU 推理速度基准 +25% +43% 损失函数详解 YOLO 的损失函数由三部分组成，每部分针对不同的学习目标：

Continue reading →

YOLO 数据集制作：标注工具与格式转换

May 11, 2026

数据标注工具使用 LabelImg 安装与使用 bash 1 2 3 4 5 # 安装 pip install labelImg # 启动 labelImg 标注流程： Open Dir → 选择图片文件夹 Change Save Dir → 选择标注保存文件夹选择 YOLO 格式 Create RectBox → 框选目标 → 输入类别名 Save 保存 LabelMe 安装与使用 bash 1 2 pip install labelme labelme CVAT 自托管标注平台 CVAT（Computer Vision Annotation Tool）是由 Intel 开源的强大标注平台，支持 Docker 自托管部署，适合团队协作和大规模标注项目。

Continue reading →

YOLO 快速实战：模型加载与推理

May 8, 2026

各版本模型加载与推理 Ultralytics 统一 API（v8/11/26 通用） python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from ultralytics import YOLO # ========== YOLOv8 ========== model_v8 = YOLO("yolov8n.pt") # nano model_v8 = YOLO("yolov8s.pt") # small model_v8 = YOLO("yolov8m.pt") # medium model_v8 = YOLO("yolov8l.pt") # large model_v8 = YOLO("yolov8x.pt") # extra large # ========== YOLO11 ========== model_11 = YOLO("yolo11n.pt") # nano model_11 = YOLO("yolo11s.pt") # small model_11 = YOLO("yolo11m.pt") # medium model_11 = YOLO("yolo11l.pt") # large model_11 = YOLO("yolo11x.pt") # extra large # ========== YOLO26 (2026最新) ========== model_26 = YOLO("yolo26n.pt") # nano 推荐边缘部署 model_26 = YOLO("yolo26s.pt") # small model_26 = YOLO("yolo26m.pt") # medium model_26 = YOLO("yolo26l.pt") # large model_26 = YOLO("yolo26x.pt") # extra large 图片检测实战 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from ultralytics import YOLO # 加载模型（以YOLO26为例） model = YOLO("yolo26n.pt") # 单张图片检测 results = model("test.jpg", conf=0.25, iou=0.45) # 结果处理 for result in results: boxes = result.boxes # 检测框 masks = result.masks # 分割掩码 probs = result.probs # 分类概率 # 打印检测结果 for box in boxes: print(f"类别: {result.names[int(box.cls)]}, " f"置信度: {box.conf.item():.3f}, " f"坐标: {box.xyxy.tolist()[0]}") # 保存可视化结果 result.save("result.jpg") 视频检测实战 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from ultralytics import YOLO model = YOLO("yolo26n.pt") # 视频文件检测 results = model.predict( source="input.mp4", save=True, # 保存结果视频 conf=0.3, show=False, # 是否实时显示 stream=True # 流式处理，节省内存 ) # 逐帧处理 for result in results: # 自定义后处理逻辑 pass 摄像头实时检测 python 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 from ultralytics import YOLO import cv2 model = YOLO("yolo26n.pt") # 打开摄像头 cap = cv2.VideoCapture(0) # 0为默认摄像头 while cap.isOpened(): ret, frame = cap.read() if not ret: break # 推理 results = model(frame, verbose=False) # 绘制结果 annotated_frame = results[0].plot() # 显示 cv2.imshow("YOLO Real-time", annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() 各版本代码差异说明特性 YOLOv8 YOLO11 YOLO26 YOLOv9 YOLOv10 API 统一 ✅ ✅ ✅ ❌ 独立仓库 ❌ 独立仓库无 NMS ❌ ❌ ✅ ❌ ✅ DFL 模块 ✅ ✅ ❌ 已移除 ✅ ✅ MuSGD 优化器 ❌ ❌ ✅ ❌ ❌ 导出兼容性良好良好最佳一般一般 Results 对象 API 详解 model() 或 model.predict() 返回的是一个 Results 对象列表。每个 Results 对象封装了单张图片的所有推理输出。理解其内部结构是进行后续处理的基础。

Continue reading →

进化：Oh My OpenAgent 配置迭代实录

May 7, 2026

上一篇讲了初始配置的搭建过程，这篇记录跑了两周之后的调整：从单供应商扩展到四层模型池、补了降级链路、踩了 GLM-4.5-air 只分析不写代码的坑。文档包含：降级策略设计、免费模型池完整清单和分析、并发控制配置、GLM-4.5-air 替换方案的决策过程。

Continue reading →

YOLO 入门基础：发展历史、版本对比与环境搭建

May 5, 2026

📚 学习路径与版本选择建议版本选择指南版本发布时间开发团队适用场景推荐指数 YOLO26 2026.01 Ultralytics 官方边缘部署、CPU 推理、工业落地 ⭐⭐⭐⭐⭐ YOLOv8 2023.01 Ultralytics 官方入门学习、生态完善、通用场景 ⭐⭐⭐⭐⭐ YOLO11 2024.09 Ultralytics 官方效率优化、轻量部署 ⭐⭐⭐⭐ YOLOv10 2024.05 清华大学研究探索、无 NMS 端到端 ⭐⭐⭐⭐ YOLOv9 2024.01 台湾大学高精度、小目标检测 ⭐⭐⭐⭐ YOLOv12 2025.02 布法罗大学 + 中科院注意力机制研究 ⭐⭐⭐ 学习路径建议入门阶段（1-2 周）：从 YOLOv8 开始，掌握基础概念和 API 使用进阶阶段（2-3 周）：学习自定义数据集训练、调参优化实战阶段（2-3 周）：学习模型部署、工程化落地研究阶段（持续）：探索 YOLO11、YOLO26、YOLOv9/v10/v12 新特性 YOLO 发展历史完整时间线版本发布时间核心创新里程碑意义 YOLOv1 2015.06 单阶段检测开山之作实时检测奠基 YOLOv2 2016.12 Batch Normalization、Anchor 精度速度双提升 YOLOv3 2018.04 多尺度检测、残差网络工业界标准 YOLOv4 2020.04 CSPDarknet、Mosaic 工程化巅峰 YOLOv5 2020.06 PyTorch 框架、易用性普及度最高 YOLOv7 2022.07 E-ELAN、重参数化速度精度平衡 YOLOv8 2023.01 C2f、Anchor-Free、统一框架 Ultralytics 统一生态 YOLOv9 2024.01 GELAN、PGI 可编程梯度训练效率革命 YOLOv10 2024.05 无 NMS、效率-精度权衡端到端检测 YOLO11 2024.09 架构优化、参数减少效率优化版本 YOLOv12 2025.02 Area Attention 注意力机制注意力架构 YOLO26 2026.01 无 DFL、无 NMS、CPU 优化 43% 边缘计算新标准各版本核心原理与差异对比 Ultralytics 官方主线版本 YOLOv8 核心特性：

Continue reading →

智谱 Coding Plan × Oh My OpenCode：多模型编排配置实战

April 5, 2026

为什么折腾这个用 AI 写代码这事儿，单模型和多人模型的差距越来越大。一个模型再强，也干不过一组各司其职的模型并行推进。 Oh My OpenCode（下文简称 OmO）是 OpenCode 生态里的多模型编排插件，11 个 Agent 各有分工，48 个 Hook 贯穿整个生命周期。智谱的 Coding Plan 则提供了 GLM 全系列的模型访问。两者搭配起来，就能按角色分配不同的模型——编码强的干编码，推理强的干推理，免费的干杂活。

Continue reading →