Image Basics & Getting Started with OpenCV
What Is a Digital Image
Computer vision operates on digital images. At its core, a digital image is just a 2D array of pixels.
Pixels and Grayscale Images
A grayscale image is the simplest form—each pixel stores a single brightness value from 0 (pure black) to 255 (pure white). In Python, a grayscale image of width W and height H is a 2D array with shape (H, W):
| |
uint8 ranges from 0 to 255 because 8 bits can represent 2^8 = 256 unique values. If a pixel value exceeds its range, wrap around occurs: 255 + 1 becomes 0, not 256 (like an odometer rolling over). Use cv2.add() (saturating arithmetic) or np.clip() to prevent wrap around.
Color Images and RGB
Each pixel in a color image has three channels—R (red), G (green), B (blue)—each in the 0–255 range. OpenCV uses BGR ordering by default, not the more common RGB. The shape is (H, W, 3):
| |
Why BGR? When OpenCV was first developed, camera manufacturers commonly used BGR byte ordering, so OpenCV adopted it. If you display an OpenCV image directly with Matplotlib’s
imshow, the colors will look wrong (R↔B swapped) because Matplotlib expects RGB. Fix: convert the color space first:img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).
Figure 1 - Data structure of a digital image (color image example):
flowchart TD
IMG["Color Image<br/>shape: (H, W, 3)"] --> R["R Channel<br/>0-255 Red"]
IMG --> G["G Channel<br/>0-255 Green"]
IMG --> B["B Channel<br/>0-255 Blue"]
R --> PIXEL["Single Pixel<br/>[B, G, R]<br/>e.g. [0, 0, 255] = pure red"]
G --> PIXEL
B --> PIXEL
classDef img fill:#9C27B0,color:#fff
classDef ch fill:#2196F3,color:#fff
classDef px fill:#f44336,color:#fff
class IMG img
class R,G,B ch
class PIXEL pxTry the RGB/BGR difference yourself — drag the sliders to see how the [B, G, R] array maps to color:
Resolution and Bit Depth
- Resolution: pixel width × height, determines the level of detail
- Bit depth: how many bits per channel.
uint8is 8-bit (0–255). 16-bit and 32-bit float images also exist
What Is OpenCV
OpenCV (Open Source Computer Vision Library) is the most widely used computer vision library, with bindings for C++, Python, Java, and more. Its core modules include:
- core: basic data structures (Mat, Point, Rect)
- imgproc: image processing (filtering, geometric transforms, color spaces)
- highgui: GUI interaction (display windows, sliders)
- videoio: video reading and writing
After installing the Python package, just import cv2. The Python API wraps the C++ interface, so function names and parameters are essentially the same.
Basic Operations
Read, Write, and Display Images
| |
imread returns None if the path is wrong or the format isn’t supported—always check the return value.
Image Properties
| |
ROI Cropping
NumPy slicing extracts sub-regions directly—the most common operation:
| |
Remember the slice order: [row_range, col_range], equivalent to [y1:y2, x1:x2].
Most common beginner mistake:
img[y1:y2, x1:x2]— the first dimension is the row range (Y direction), the second is the column range (X direction). Don’t accidentally write it asimg[x1:x2, y1:y2]!
Figure 3 - OpenCV image coordinate system convention:
flowchart TD
NOTE["OpenCV image coordinate system:<br/>Origin at top-left<br/>rows = Y (downward)<br/>cols = X (rightward)<br/>img[y, x] NOT img[x, y]"]
classDef note fill:#f44336,color:#fff
class NOTE noteChannel Splitting and Merging
| |
split creates new arrays with overhead. Use NumPy slicing for zero-copy access:
| |
Basic Image Processing
Color Space Conversion
| |
HSV is common for color range filtering—H is hue, S is saturation, V is value (brightness).
Geometric Transforms
| |
Image Filtering
Filters reduce noise or smooth the image. The key parameter is kernel size—larger kernels produce stronger smoothing.
Intuitively, filtering means recomputing each pixel from its neighborhood:
- Mean filter (
blur): simple arithmetic average of all pixels in the kernel—fastest, but blurs edges - Gaussian filter (
GaussianBlur): weighted average, highest weight at the center, decreasing with distance following the Gaussian distribution (bell curve)—preserves edges better than mean filtering - Median filter (
medianBlur): takes the median of all pixels in the kernel—particularly effective for salt-and-pepper noise (random black/white dots) because the median ignores extreme values
| |
The convolution kernel sliding across the image is clearest as an animation — at each position, take the 3×3 neighborhood and compute the weighted average:
The third parameter of GaussianBlur is the standard deviation—pass 0 to let the function compute it automatically.
Thresholding
Thresholding converts a grayscale image to binary (black and white):
| |
Use adaptive threshold for images with uneven illumination—it computes a local threshold per pixel neighborhood.
Figure 2 - Common OpenCV image processing pipeline (from reading to contour extraction):
flowchart TD
A["imread<br/>Read image"] --> B["cvtColor<br/>BGR→GRAY"]
B --> C["GaussianBlur<br/>Denoise"]
C --> D["threshold<br/>Binarize"]
D --> E["Canny<br/>Edge detection"]
E --> F["findContours<br/>Contour extraction"]
classDef io fill:#2196F3,color:#fff
classDef proc fill:#9C27B0,color:#fff
class A,F io
class B,C,D,E procEdge Detection
Canny edge detection is the most widely used edge detector:
| |
Canny edge detection works in two steps: first computes image gradients using the Sobel operator (finding magnitude and direction of intensity changes), then applies hysteresis thresholding to decide which are true edges—
Two threshold parameters: pixels below threshold1 are discarded (non-edge), pixels above threshold2 are confirmed edges, and in-between pixels are kept only if connected to a strong edge. A good rule of thumb is threshold2 = 2-3× threshold1.