Image Basics & Getting Started with OpenCV

What Is a Digital Image

Computer vision operates on digital images. At its core, a digital image is just a 2D array of pixels.

Pixels and Grayscale Images

A grayscale image is the simplest form—each pixel stores a single brightness value from 0 (pure black) to 255 (pure white). In Python, a grayscale image of width W and height H is a 2D array with shape (H, W):

python
1
2
3
4
import numpy as np

# Create a 100x100 gray square (brightness 128)
gray_img = np.full((100, 100), 128, dtype=np.uint8)

uint8 ranges from 0 to 255 because 8 bits can represent 2^8 = 256 unique values. If a pixel value exceeds its range, wrap around occurs: 255 + 1 becomes 0, not 256 (like an odometer rolling over). Use cv2.add() (saturating arithmetic) or np.clip() to prevent wrap around.

Color Images and RGB

Each pixel in a color image has three channels—R (red), G (green), B (blue)—each in the 0–255 range. OpenCV uses BGR ordering by default, not the more common RGB. The shape is (H, W, 3):

python
1
2
3
# Create a red square (BGR ordering)
red_patch = np.zeros((100, 100, 3), dtype=np.uint8)
red_patch[:, :] = [0, 0, 255]  # B=0, G=0, R=255

Why BGR? When OpenCV was first developed, camera manufacturers commonly used BGR byte ordering, so OpenCV adopted it. If you display an OpenCV image directly with Matplotlib’s imshow, the colors will look wrong (R↔B swapped) because Matplotlib expects RGB. Fix: convert the color space first: img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).

Figure 1 - Data structure of a digital image (color image example):

mermaid
flowchart TD
    IMG["Color Image<br/>shape: (H, W, 3)"] --> R["R Channel<br/>0-255 Red"]
    IMG --> G["G Channel<br/>0-255 Green"]
    IMG --> B["B Channel<br/>0-255 Blue"]
    R --> PIXEL["Single Pixel<br/>[B, G, R]<br/>e.g. [0, 0, 255] = pure red"]
    G --> PIXEL
    B --> PIXEL

    classDef img fill:#9C27B0,color:#fff
    classDef ch fill:#2196F3,color:#fff
    classDef px fill:#f44336,color:#fff
    class IMG img
    class R,G,B ch
    class PIXEL px

Try the RGB/BGR difference yourself — drag the sliders to see how the [B, G, R] array maps to color:

R 128
G 128
B 128
[128, 128, 128] BGR

Resolution and Bit Depth

  • Resolution: pixel width × height, determines the level of detail
  • Bit depth: how many bits per channel. uint8 is 8-bit (0–255). 16-bit and 32-bit float images also exist

What Is OpenCV

OpenCV (Open Source Computer Vision Library) is the most widely used computer vision library, with bindings for C++, Python, Java, and more. Its core modules include:

  • core: basic data structures (Mat, Point, Rect)
  • imgproc: image processing (filtering, geometric transforms, color spaces)
  • highgui: GUI interaction (display windows, sliders)
  • videoio: video reading and writing

After installing the Python package, just import cv2. The Python API wraps the C++ interface, so function names and parameters are essentially the same.

Basic Operations

Read, Write, and Display Images

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import cv2

# Read image as color (BGR)
img = cv2.imread('photo.jpg')

# Read as grayscale
gray = cv2.imread('photo.jpg', cv2.IMREAD_GRAYSCALE)

# Save image
cv2.imwrite('output.jpg', img)

# Display image (press any key to close window)
cv2.imshow('Window Title', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

imread returns None if the path is wrong or the format isn’t supported—always check the return value.

Image Properties

python
1
2
3
4
print(img.shape)    # (H, W, 3)
print(img.dtype)    # uint8
print(img.size)     # total pixels = H * W * C
print(len(img))     # height in rows

ROI Cropping

NumPy slicing extracts sub-regions directly—the most common operation:

python
1
2
3
4
5
# Extract region (y1:y2, x1:x2)
roi = img[50:200, 100:300]

# Copy region to another location
img[50:200, 100:300] = img[0:150, 400:600]

Remember the slice order: [row_range, col_range], equivalent to [y1:y2, x1:x2].

Most common beginner mistake: img[y1:y2, x1:x2] — the first dimension is the row range (Y direction), the second is the column range (X direction). Don’t accidentally write it as img[x1:x2, y1:y2]!

Figure 3 - OpenCV image coordinate system convention:

mermaid
flowchart TD
    NOTE["OpenCV image coordinate system:<br/>Origin at top-left<br/>rows = Y (downward)<br/>cols = X (rightward)<br/>img[y, x] NOT img[x, y]"]

    classDef note fill:#f44336,color:#fff
    class NOTE note

Channel Splitting and Merging

python
1
2
3
4
5
6
# Split BGR channels
b, g, r = cv2.split(img)

# Zero out the red channel
img_copy = img.copy()
img_copy[:, :, 2] = 0  # 3rd channel is R

split creates new arrays with overhead. Use NumPy slicing for zero-copy access:

python
1
2
# Direct access to red channel (no copy)
r = img[:, :, 2]

Basic Image Processing

Color Space Conversion

python
1
2
3
4
5
6
7
8
# BGR → Grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# BGR → HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# HSV → BGR
back = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

HSV is common for color range filtering—H is hue, S is saturation, V is value (brightness).

Geometric Transforms

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Scale down to half
small = cv2.resize(img, None, fx=0.5, fy=0.5)

# Resize to explicit dimensions
resized = cv2.resize(img, (640, 480))

# Rotate 45 degrees (requires rotation matrix)
h, w = img.shape[:2]
M = cv2.getRotationMatrix2D((w//2, h//2), 45, 1.0)
rotated = cv2.warpAffine(img, M, (w, h))

Image Filtering

Filters reduce noise or smooth the image. The key parameter is kernel size—larger kernels produce stronger smoothing.

Intuitively, filtering means recomputing each pixel from its neighborhood:

  • Mean filter (blur): simple arithmetic average of all pixels in the kernel—fastest, but blurs edges
  • Gaussian filter (GaussianBlur): weighted average, highest weight at the center, decreasing with distance following the Gaussian distribution (bell curve)—preserves edges better than mean filtering
  • Median filter (medianBlur): takes the median of all pixels in the kernel—particularly effective for salt-and-pepper noise (random black/white dots) because the median ignores extreme values
python
1
2
3
4
5
6
7
8
# Mean filter (box blur)
blur = cv2.blur(img, (5, 5))

# Gaussian filter (retains more edges)
gauss = cv2.GaussianBlur(img, (5, 5), 0)

# Median filter (best for salt-and-pepper noise)
median = cv2.medianBlur(img, 5)

The convolution kernel sliding across the image is clearest as an animation — at each position, take the 3×3 neighborhood and compute the weighted average:

Input (5×5)
1/91/91/9 1/91/91/9 1/91/91/9
Kernel (3×3 mean blur)
Output

The third parameter of GaussianBlur is the standard deviation—pass 0 to let the function compute it automatically.

Thresholding

Thresholding converts a grayscale image to binary (black and white):

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Fixed threshold
_, binary = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)

# Otsu's method (auto-calculates optimal threshold)
_, otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Adaptive threshold (handles uneven lighting)
adaptive = cv2.adaptiveThreshold(
    gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    cv2.THRESH_BINARY, 11, 2
)

Use adaptive threshold for images with uneven illumination—it computes a local threshold per pixel neighborhood.

Figure 2 - Common OpenCV image processing pipeline (from reading to contour extraction):

mermaid
flowchart TD
    A["imread<br/>Read image"] --> B["cvtColor<br/>BGR→GRAY"]
    B --> C["GaussianBlur<br/>Denoise"]
    C --> D["threshold<br/>Binarize"]
    D --> E["Canny<br/>Edge detection"]
    E --> F["findContours<br/>Contour extraction"]

    classDef io fill:#2196F3,color:#fff
    classDef proc fill:#9C27B0,color:#fff
    class A,F io
    class B,C,D,E proc

Edge Detection

Canny edge detection is the most widely used edge detector:

python
1
2
3
4
5
# Apply Gaussian blur first
blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Canny edge detection (dual threshold)
edges = cv2.Canny(blurred, 50, 150)

Canny edge detection works in two steps: first computes image gradients using the Sobel operator (finding magnitude and direction of intensity changes), then applies hysteresis thresholding to decide which are true edges— Two threshold parameters: pixels below threshold1 are discarded (non-edge), pixels above threshold2 are confirmed edges, and in-between pixels are kept only if connected to a strong edge. A good rule of thumb is threshold2 = 2-3× threshold1.