LabelImg Installation and Usage
1
2
3
4
5
| # Installation
pip install labelImg
# Launch
labelImg
|
Annotation Process:
- Open Dir → Select image folder
- Change Save Dir → Select annotation save folder
- Select YOLO format
- Create RectBox → Draw bounding box → Enter class name
- Save
LabelMe Installation and Usage
1
2
| pip install labelme
labelme
|
CVAT (Computer Vision Annotation Tool) is an open-source annotation platform by Intel, supporting Docker self-hosted deployment for team collaboration and large-scale annotation projects.
1
2
3
4
| # Docker deployment
git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d
|
Annotation Workflow:
- Create a project → Define label list (person, car, dog…)
- Upload images or video frame sequences
- Create tasks → Assign annotators
- Annotate using rectangles, polygons, keypoints, etc.
- Review → Export in YOLO format
Export to YOLO: CVAT supports YOLO 1.1 format export, automatically generating data.yaml.
Roboflow is a cloud-based full-pipeline dataset management platform that requires no local deployment, providing a complete toolchain from annotation to model deployment.
Core Features:
- Online annotation (bounding boxes, polygons, segmentation masks, keypoints)
- Dataset version management (new version on each modification)
- Built-in preprocessing (auto-resize, normalization, etc.)
- Built-in data augmentation (rotation, flip, noise, mosaic, etc.)
- One-click export to YOLO, COCO, Pascal VOC formats
Export to YOLO Steps:
- Create a project → Upload images
- Annotate online or import existing annotations (supports COCO/VOC/YOLO format import)
- Click Generate → Select preprocessing and augmentation
- Export → Select YOLO v5/v8 PyTorch format
- Download ZIP, ready for training
1
2
3
4
5
6
7
| # Download dataset directly via Roboflow API
# pip install roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("workspace-name").project("project-name")
dataset = project.version(1).download("yolov8")
|
Label Studio Multi-Task Annotation
Label Studio is an open-source multi-task annotation platform supporting images, text, audio, time series, and more.
1
2
3
4
5
6
7
8
| # Installation
pip install label-studio
# Launch
label-studio
# Docker deployment (recommended for production)
docker run -it -p 8080:8080 -v $(pwd)/data:/label-studio/data heartexlabs/label-studio:latest
|
YOLO Annotation Configuration:
- Create a project → Select Object Detection with Bounding Boxes template
- Configure labels (Labeling Setup → Add label names)
- Import images → Start annotating
- Export after completion → Select YOLO format
Advanced Features:
- Multi-user collaboration and annotation consistency checks
- ML-assisted labeling for auto pre-annotation
- Customizable annotation interface and templates
1
2
3
4
5
6
7
8
9
10
| my_dataset/
├── images/
│ ├── train/ # Training set images
│ ├── val/ # Validation set images
│ └── test/ # Test set images (optional)
├── labels/
│ ├── train/ # Training set annotations
│ ├── val/ # Validation set annotations
│ └── test/ # Test set annotations
└── data.yaml # Dataset configuration file
|
Each .txt annotation file format:
1
| <class_id> <x_center> <y_center> <width> <height>
|
- All coordinates are normalized values (0~1)
- x_center, y_center: Bounding box center relative to image width/height
- width, height: Bounding box width/height relative to image width/height
data.yaml Configuration File
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # Dataset root path (absolute or relative)
path: ../datasets/my_dataset
# Train/validation/test set paths (relative to path)
train: images/train
val: images/val
test: images/test # Optional
# Number of classes
nc: 3
# Class names
names:
0: person
1: car
2: dog
|
VOC → YOLO Conversion
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| import xml.etree.ElementTree as ET
import os
def voc_to_yolo(xml_path, img_w, img_h):
tree = ET.parse(xml_path)
root = tree.getroot()
yolo_lines = []
for obj in root.iter('object'):
cls = obj.find('name').text
xmlbox = obj.find('bndbox')
xmin = float(xmlbox.find('xmin').text)
ymin = float(xmlbox.find('ymin').text)
xmax = float(xmlbox.find('xmax').text)
ymax = float(xmlbox.find('ymax').text)
# Convert to normalized coordinates
x_center = (xmin + xmax) / 2.0 / img_w
y_center = (ymin + ymax) / 2.0 / img_h
width = (xmax - xmin) / img_w
height = (ymax - ymin) / img_h
yolo_lines.append(f"0 {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}")
return yolo_lines
|
COCO -> YOLO Conversion
COCO format stores all annotations in a single JSON file, where each image can have multiple object instances and categories. Converting to YOLO format requires splitting the JSON into individual .txt files per image.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
| import json
import os
def coco_to_yolo(coco_json_path, output_dir, images_dir):
"""
Convert COCO JSON annotations to YOLO txt format
Args:
coco_json_path: Path to COCO annotation JSON
output_dir: Output directory for YOLO labels
images_dir: Image directory (for validation)
"""
with open(coco_json_path, 'r') as f:
coco = json.load(f)
# Build category ID to sequential index mapping
# COCO category IDs are not sequential (e.g., 1: person, 3: car...)
categories = {cat['id']: idx for idx, cat in enumerate(coco['categories'])}
print(f"Category mapping: {categories}")
# Build image info dictionary
images_info = {}
for img in coco['images']:
images_info[img['id']] = {
'file_name': img['file_name'],
'width': img['width'],
'height': img['height']
}
# Group annotations by image_id
annotations_by_image = {}
for ann in coco['annotations']:
img_id = ann['image_id']
if img_id not in annotations_by_image:
annotations_by_image[img_id] = []
annotations_by_image[img_id].append(ann)
# Convert per-image to YOLO format
for img_id, anns in annotations_by_image.items():
img_info = images_info[img_id]
img_w, img_h = img_info['width'], img_info['height']
base_name = os.path.splitext(img_info['file_name'])[0]
txt_path = os.path.join(output_dir, f"{base_name}.txt")
with open(txt_path, 'w') as f:
for ann in anns:
cls_id = categories.get(ann['category_id'], -1)
if cls_id == -1:
continue # Skip unmapped categories
# COCO format: [x, y, width, height] (top-left + dimensions)
bbox = ann['bbox']
x, y, w, h = bbox
# Convert to YOLO normalized center coordinates
x_center = (x + w / 2) / img_w
y_center = (y + h / 2) / img_h
width = w / img_w
height = h / img_h
f.write(f"{cls_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")
print(f"Conversion complete! Processed {len(annotations_by_image)} images")
# Usage example
coco_to_yolo(
coco_json_path="annotations/instances_train2017.json",
output_dir="labels/train",
images_dir="images/train"
)
|
Split images and annotation files into training, validation, and test sets by ratio.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
| import os
import random
import shutil
from sklearn.model_selection import train_test_split
def split_dataset(image_dir, label_dir, output_dir,
train_ratio=0.7, val_ratio=0.15,
test_ratio=0.15, random_seed=42):
"""
Split dataset by ratio
Args:
image_dir: Source image directory
label_dir: Source label directory
output_dir: Output root directory
train_ratio: Training set ratio
val_ratio: Validation set ratio
test_ratio: Test set ratio
random_seed: Random seed (for reproducibility)
"""
random.seed(random_seed)
# Get all image files
images = [f for f in os.listdir(image_dir)
if f.lower().endswith(('.jpg', '.jpeg', '.png'))]
if test_ratio > 0:
# Split test set first, then split validation from remainder
train_val, test = train_test_split(
images, test_size=test_ratio, random_state=random_seed)
val_ratio_adj = val_ratio / (train_ratio + val_ratio)
train, val = train_test_split(
train_val, test_size=val_ratio_adj, random_state=random_seed)
else:
train, val = train_test_split(
images, test_size=val_ratio/(train_ratio+val_ratio),
random_state=random_seed)
test = []
splits = {'train': train, 'val': val, 'test': test}
# Copy files to corresponding directories
for split_name, split_images in splits.items():
os.makedirs(f"{output_dir}/images/{split_name}", exist_ok=True)
os.makedirs(f"{output_dir}/labels/{split_name}", exist_ok=True)
for img_file in split_images:
# Copy image
shutil.copy2(
f"{image_dir}/{img_file}",
f"{output_dir}/images/{split_name}/{img_file}")
# Copy corresponding label file
base = os.path.splitext(img_file)[0]
label_file = f"{base}.txt"
if os.path.exists(f"{label_dir}/{label_file}"):
shutil.copy2(
f"{label_dir}/{label_file}",
f"{output_dir}/labels/{split_name}/{label_file}")
print(f"Dataset split complete!")
print(f" Training set: {len(train)} images")
print(f" Validation set: {len(val)} images")
print(f" Test set: {len(test)} images")
# Usage example
split_dataset(
image_dir="raw_images",
label_dir="raw_labels",
output_dir="datasets/my_dataset",
train_ratio=0.7,
val_ratio=0.15,
test_ratio=0.15,
random_seed=42
)
|
Stratified Split:
When class distribution is severely imbalanced, simple random splitting may cause a subset to lack certain classes. Use stratified splitting to maintain class proportions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| from sklearn.model_selection import StratifiedShuffleSplit
def stratified_split(image_dir, label_dir, output_dir,
test_size=0.2, random_seed=42):
"""Split dataset by class proportion"""
images, labels = [], []
for f in os.listdir(image_dir):
if not f.lower().endswith(('.jpg', '.jpeg', '.png')):
continue
base = os.path.splitext(f)[0]
txt_path = f"{label_dir}/{base}.txt"
if not os.path.exists(txt_path):
continue
# Read classes present in this image
with open(txt_path) as fh:
classes = [int(line.split()[0]) for line in fh if line.strip()]
if classes:
images.append(f)
labels.append(classes[0]) # Use first class for stratification
sss = StratifiedShuffleSplit(
n_splits=1, test_size=test_size, random_state=random_seed)
train_idx, val_idx = next(sss.split(images, labels))
train = [images[i] for i in train_idx]
val = [images[i] for i in val_idx]
# File copying logic (same as split_dataset)
print(f"Stratified split complete: {len(train)} training, {len(val)} validation")
|
Class Balance Analysis
Analyze the distribution of each class in the dataset, detect class imbalance issues, and take corrective measures.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| import matplotlib.pyplot as plt
from collections import Counter
def analyze_class_balance(label_dir, class_names=None):
"""
Analyze dataset class distribution
Args:
label_dir: Label file directory
class_names: List of class names (optional)
Returns:
class_counts: Counter object with instance counts per class
"""
class_counts = Counter()
label_files = [f for f in os.listdir(label_dir) if f.endswith('.txt')]
for f in label_files:
with open(f"{label_dir}/{f}", 'r') as fh:
for line in fh:
if line.strip():
cls_id = int(line.split()[0])
class_counts[cls_id] += 1
# Visualize distribution
if class_names is None:
class_names = [f"class_{i}" for i in range(len(class_counts))]
names = [class_names[cid] for cid, _ in class_counts.most_common()]
counts = [cnt for _, cnt in class_counts.most_common()]
plt.figure(figsize=(10, 5))
bars = plt.bar(range(len(names)), counts, color='steelblue')
plt.xticks(range(len(names)), names, rotation=45)
plt.ylabel('Instance Count')
plt.title('Dataset Class Distribution')
# Annotate bar chart with values
for bar, count in zip(bars, counts):
plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(counts)*0.01,
str(count), ha='center', va='bottom')
plt.tight_layout()
plt.savefig('class_distribution.png', dpi=150)
plt.show()
return class_counts
# Usage example
class_names = ['person', 'car', 'dog']
counts = analyze_class_balance('labels/train', class_names)
print(f"Class distribution: {dict(counts)}")
total_instances = sum(counts.values())
print(f"Total instances: {total_instances}")
|
Handling Class Imbalance:
| Method | Description | Use Case |
|---|
| Oversampling | Duplicate minority class samples or apply mild augmentation | Majority class has enough samples |
| Undersampling | Randomly discard majority class samples | Large dataset with redundant majority class |
| Class Weight | Assign higher weight to minority classes in the loss function | Set via cls_pw in YOLO |
| Data Augmentation | Apply more augmentations to minority classes | General approach, recommended first |
1
2
3
4
5
| # Set per-class weights in YOLO training
model.train(
data="data.yaml",
cls_pw=[1.0, 2.0, 5.0], # Per-class loss weights (3rd class has highest weight)
)
|
Data Augmentation Strategies
Built-in Augmentation (Ultralytics)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| model.train(
data="data.yaml",
# Basic augmentation
hsv_h=0.015, # Hue augmentation
hsv_s=0.7, # Saturation augmentation
hsv_v=0.4, # Brightness augmentation
degrees=0.0, # Rotation angle
translate=0.1, # Translation
scale=0.5, # Scale
shear=0.0, # Shear
perspective=0.0, # Perspective transformation
flipud=0.0, # Vertical flip probability
fliplr=0.5, # Horizontal flip probability
# Advanced augmentation
mosaic=1.0, # Mosaic augmentation
mixup=0.0, # Mixup augmentation
copy_paste=0.0, # Copy-Paste augmentation
)
|
Custom Augmentation (Albumentations)
1
2
3
4
5
6
7
8
| import albumentations as A
transform = A.Compose([
A.RandomBrightnessContrast(p=0.5),
A.GaussianBlur(p=0.3),
A.GaussNoise(p=0.3),
A.HorizontalFlip(p=0.5),
], bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))
|
Mosaic Augmentation Explained
Mosaic augmentation, introduced in YOLOv4, is a key technique that stitches 4 images into one large composite, significantly improving small object detection.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| import cv2
import numpy as np
def mosaic_augmentation(image1, image2, image3, image4, img_size=640):
"""Simulate the core stitching logic of Mosaic augmentation"""
h, w = img_size, img_size
mid_x = w // 2
mid_y = h // 2
# Resize all four images to the same dimensions
images = [image1, image2, image3, image4]
resized = [cv2.resize(img, (w, h)) for img in images]
# Create canvas
canvas = np.zeros((h, w, 3), dtype=np.uint8)
# Place images at top-left, top-right, bottom-left, bottom-right
canvas[:mid_y, :mid_x] = cv2.resize(resized[0], (mid_x, mid_y)) # top-left
canvas[:mid_y, mid_x:] = cv2.resize(resized[1], (w-mid_x, mid_y)) # top-right
canvas[mid_y:, :mid_x] = cv2.resize(resized[2], (mid_x, h-mid_y)) # bottom-left
canvas[mid_y:, mid_x:] = cv2.resize(resized[3], (w-mid_x, h-mid_y)) # bottom-right
# In practice, the stitch point is randomly chosen each epoch
return canvas
|
Internal Mechanics:
- Randomly select 4 images per training iteration
- Randomly choose a stitch center point (not fixed image center)
- Stitch the 4 images around the center point into one composite
- Recalculate all bounding box coordinates relative to the composite
- Boxes that fall outside the composite are cropped or discarded
Effect: Each training image contains content from 4 images, enriching BN layer statistics and improving small object context diversity.
Mixup Augmentation Explained
Mixup originates from image classification and was later adopted in object detection. The core idea is to blend two images proportionally.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| import cv2
import numpy as np
def mixup_augmentation(image1, image2, alpha=0.5):
"""Simulate the core blending logic of Mixup augmentation"""
# Mixup coefficient sampled from Beta distribution
lam = np.random.beta(alpha, alpha)
# Blend two images proportionally
mixed = lam * image1 + (1 - lam) * image2
# Bounding boxes from both images are retained
# but contribute to the loss with weights lam and (1-lam) respectively
return mixed.astype(np.uint8)
|
Internal Mechanics:
- Randomly select two images and their annotations from the current batch
- Sample blend coefficient lam from Beta(alpha, alpha) (typically alpha=0.5~1.0)
- Pixel-level blending:
mixed_img = lam * img1 + (1-lam) * img2 - Bounding boxes from both images are retained, contributing to the loss with weights lam and 1-lam
- Label smoothing effect: the model sees blended images, providing implicit regularization
Note: Mixup slightly reduces training speed (double the boxes to process) but effectively reduces overfitting.
Copy-Paste Augmentation Explained
Copy-Paste augmentation copies objects from one image and pastes them onto another, significantly increasing instance count and background diversity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| import cv2
import numpy as np
def copy_paste_augmentation(source_img, target_img, source_bbox):
"""Simulate the core pasting logic of Copy-Paste augmentation"""
x1, y1, x2, y2 = map(int, source_bbox)
obj = source_img[y1:y2, x1:x2].copy()
# Randomly select paste location
h, w = target_img.shape[:2]
obj_h, obj_w = obj.shape[:2]
paste_x = np.random.randint(0, w - obj_w)
paste_y = np.random.randint(0, h - obj_h)
# Overlay onto target image
target_img[paste_y:paste_y+obj_h, paste_x:paste_x+obj_w] = obj
# New bounding box
new_bbox = [paste_x, paste_y, paste_x+obj_w, paste_y+obj_h]
return target_img, new_bbox
|
Internal Mechanics:
- Randomly select two images from the dataset: source (providing the object) and target
- Randomly choose one object instance from the source image
- Use segmentation mask for precise contour extraction if available; otherwise crop via bounding box
- Apply slight scaling and rotation to the object
- Paste onto the target image — may apply random scaling/flipping before pasting
- If the paste location overlaps with other boxes, skip or blend with transparency
Use Cases: Small object detection, rare class augmentation. Avoid overuse to prevent losing background information.
Augmentation Parameter Reference:
1
2
3
4
5
6
7
8
9
10
11
12
13
| model.train(
# Mosaic parameters
mosaic=1.0, # 1.0 = apply Mosaic to every training image
mosaic_center=0.5, # Random range for stitch center point
# Mixup parameters
mixup=0.1, # Mixup application probability
mixup_alpha=0.5, # Beta distribution alpha parameter
# Copy-Paste parameters
copy_paste=0.1, # Copy-Paste application probability
paste_in=0.15, # Probability of pasting objects onto background
)
|
Data Quality Checklist
Carefully inspect dataset quality before training to avoid wasting training time.
1. Annotation Consistency
- Are bounding box sizes and aspect ratios consistent for the same class?
- Are all objects annotated (no missing labels)?
- Are class names spelled consistently (no typos or capitalization issues)?
2. Annotation Boundary Check
- Are coordinates within the 0~1 range (YOLO normalized format)?
- Are there any invalid boxes with width or height of 0?
- Do boxes extend beyond image boundaries? Minor issues are acceptable, excessive ones indicate annotation errors
3. Image Quality
- Are there any corrupted or unopenable image files?
- Are image channel counts consistent (all RGB)?
- Are image resolution differences too large? Recommend keeping the longest side under 1920
4. Class Check
- Does
nc (number of classes) in data.yaml match the actual annotation files? - Are all class_ids in
.txt files within the 0 ~ nc-1 range? - Are there empty annotation files (background images)? Is that expected?
5. Dataset Balance
- Are instance counts severely imbalanced across classes?
- Is the class distribution consistent across training/validation/test sets?
6. Automated Validation Script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
| import os
def validate_dataset(image_dir, label_dir, num_classes):
"""Automated dataset quality check"""
issues = []
for f in os.listdir(label_dir):
if not f.endswith('.txt'):
continue
filepath = os.path.join(label_dir, f)
with open(filepath, 'r') as fh:
for line_num, line in enumerate(fh, 1):
parts = line.strip().split()
if len(parts) != 5:
issues.append(f"{f}:{line_num} format error — expected 5 fields, got {len(parts)}")
continue
cls_id, xc, yc, w, h = parts
cls_id = int(cls_id)
xc, yc, w, h = map(float, [xc, yc, w, h])
if cls_id < 0 or cls_id >= num_classes:
issues.append(f"{f}:{line_num} class_id {cls_id} out of range 0~{num_classes-1}")
if w <= 0 or h <= 0:
issues.append(f"{f}:{line_num} invalid box size: w={w}, h={h}")
if xc < 0 or xc > 1 or yc < 0 or yc > 1:
issues.append(f"{f}:{line_num} center coordinates out of bounds: xc={xc}, yc={yc}")
# Check if corresponding image exists
base = os.path.splitext(f)[0]
img_exists = any(os.path.exists(f"{image_dir}/{base}{ext}")
for ext in ['.jpg', '.jpeg', '.png'])
if not img_exists:
issues.append(f"{f}: corresponding image not found")
if issues:
print("Issues found:")
for issue in issues:
print(f" - {issue}")
else:
print("Dataset validation passed!")
return len(issues)
# Usage example
validate_dataset(
image_dir='datasets/my_dataset/images/train',
label_dir='datasets/my_dataset/labels/train',
num_classes=3
)
|